I’ve recently seen a bunch of discussions of the wisdom of publicly releasing the weights1 of advanced AI models. A common argument form that pops up in these discussions is this:
One example of this argument form is about the potential to cause devastating pandemics, and goes as follows:
In this example, thing X is teaching people a bunch of facts, bad effect Y is creating devastating pandemics, and Z is the existence of teachers and textbooks.
Another example is one that I’m not sure has been publicly written up, but occurred to me:
In this example, thing X is running the model, bad effect Y is generic bad things that people worry about, and Z is the model existing in the first place.
However, I think these arguments don’t actually work, because they implicitly assume that the costs and benefits scale proportionally to how much X happens. Suppose instead that the benefits of thing X grow proportionally to how much it happens2: for example, maybe every person who learns about biology makes roughly the same amount of incremental progress in learning how to cure disease and make humans healthier. Also suppose that every person who does thing X has a small probability of causing bad effect Y for everyone that negates all the benefits of X: for example, perhaps 0.01% of people would cause a global pandemic killing everyone if they learned enough about biology. Then, the expected value of X happening can be high when it happens a little (because you probably get the good effects and not the bad effects Y), but low when it happens a lot (because you almost certainly get bad effect Y, and the tiny probability of the good effects isn’t worth it). In this case, it makes sense that it might be fine that Z is true (e.g. that some people can learn various sub-topics of biology with great tutors), but bad to publicly release model weights to make X happen a ton.
So what’s the up-shot? To know whether it’s a good idea to publicly release model weights, you need to know the costs and benefits of various things that can happen, and how those scale with the user-base. It’s not enough to just point to a small amount of the relevant effects of releasing the weights and note that those are fine. I didn’t go thru this here, but you can also reverse the sign: it’s possible that there’s some activity that people can do with model weights that’s bad if a small number of people do it, but good if a large number of people do it: so you can’t necessarily just point to a small number of people doing nefarious things with some knowledge and conclude that it would be bad if that knowledge were widely publicized.