Why keep a diary, and why wish for large language models

Fri, 14 Jun 2024 00:00:00 +0000

Inspired by a dream I just woke up from, where I did not keep a diary

One of the people with whom I have the most intimate of connections is my past self - in particular, my child self. We share a large number of commonalities: much of our basic outlook, our personality, many of our drives. But, of course, my child self is different from me in many ways. He had thought less about things, encountered fewer things, developed and drifted less.

It seems valuable to become more acquainted with my child self. I’d like to know the things he would want from me today, but also just what he was like and how he thought differently than I do. I don’t have a strict utilitarian case for this, to be clear: but imagine you had a child in your care. Wouldn’t you want to know those things about the child, just out of curiosity? and to help build a mutually agreeable local world? And shouldn’t I feel even more strongly about the child who was me, who entrusted their future to me, with whom I have in some ways an even stronger relationship and to whom I have in some ways an even greater duty of care?

Right now, perhaps because of the dream I just woke up from, I feel this most acutely for my child self. But there are other selves (as if ‘childhood Daniel’ was merely one self) I feel similarly about. Myself during the first and second halves of my undergraduate years, beginning to live away from family. Myself after just having moved to Berkeley, becoming one of the ‘rationalists’. Myself during the more difficult parts of my PhD. Right now, I have a pretty strong connection with most of these, but in the future I won’t. And even now I can feel undergraduate Daniel slipping out of my hands.

So I wish I had kept a diary, or blogged (in an unusually personal manner), or somehow or other done a better job of recording my thoughts and desires and frames and fears and hopes. I currently keep a weekly journal, which I hope is sufficient, but I must admit it’s a bit businesslike. Another way to preserve these would be interviews - perhaps this could be a new year tradition, recording a few hours of audio/video about how the past year was, what you hope for the next year, and anything from idle chit-chat to deep conversation with the hope of capturing something of what it’s like to be you on this first of January. The sleep deprivation would probably help.

But diaries are a difficult medium to extract value from. I suppose some people become famous and then publish their diaries, or they become famous for the wrong reasons and their diaries are published and censored for them, and I suppose people choose to read those. But to be honest I can’t imagine that reading my journal entries is a particularly enjoyable pursuit. And at the very least it takes quite a long time to get a sufficient sample.

This is a nice service that large language models could provide - reading your diaries for you, and being able to simulate your past self. Yes, I’m an AI doomer, and I instinctually dislike these sorts of things. And yes, wouldn’t it be awful if some alien machine overwrote your memories of yourself. But it’s not inconceivable that it could work, right? And if it worked, wouldn’t that be good? To bridge the chasm of time and connect to a child who is now half-gone? For someone to efficiently read those records and act as an empathetic historian?

I suppose people usually make this proposal in the third person - a LLM that could simulate Ruth Bader Ginsburg or George Washington or your deceased spouse (or your parents as they were when you were 5? 15?). Perhaps it’s somewhat narcissistic to pine for this version. But I guess I can be excused, since I didn’t in fact dream of those things.

But when I was 10 I don’t think I would have been sufficiently compelled by this reasoning anyway.

Bayesian inference without priors

Wed, 24 Apr 2024 00:00:00 +0000

Epistemic status: party trick

Why remove the prior

One famed feature of Bayesian inference is that it involves prior probability distributions. Given an exhaustive collection of mutually exclusive ways the world could be (hereafter called ‘hypotheses’), one starts with a sense of how likely the world is to be described by each hypothesis, in the absence of any contingent relevant evidence. One then combines this prior with a likelihood distribution, which for each hypothesis gives the probability that one would see any particular set of evidence, to get a posterior distribution of how likely each hypothesis is to be true given observed evidence. The prior and the likelihood seem pretty different: the prior is looking at the probability of the hypotheses in question, whereas the likelihood is looking at the probability of the evidence (assuming the hypothesis is true).¹

Critics of Bayesian inference sometimes denounce the reliance on priors for being subjective or unscientific. Indeed, they are by design meant to describe what one would think without any relevant (contingent) data. One might therefore be tempted to describe a form of Bayesian inference where no special role is played by the prior distribution, as distinct from the likelihood.

Another motivation comes from doing Bayesian calculations by hand. In real-world cases, such as trying to infer whether the first COVID-19 outbreak spread from a laboratory or human contact with infected animals, the kind of thinking one does to determine a prior probability distribution is very similar to the kind of thinking one does to determine likelihoods: in both cases, one has some sort of generative model in mind—that is, some sort of probabilistic process of generating worlds—and one is trying to figure out how often worlds produced by this generative model have various properties. This might make one wonder if one could unify the prior and the likelihood.

How to remove the prior (by turning it into a likelihood)

So, how are we going to do this?

First, a prerequisite. I’m going to be talking about the “odds ratio” form of Bayes’ theorem. This involves comparing the ratio of the probabilities of two hypotheses—that is, asking questions like “how many times more likely is the COVID outbreak to be a lab leak (LL) rather than a zoonotic spillover (Zoo), given the evidence E we’ve seen?”. Symbolically, we’re asking about P(LL | E) / P(Zoo | E). Bayes’ theorem tells us that this is equal to P(LL) / P(Zoo) times P(E | LL) / P(E | Zoo) - that is, the ratio of the hypotheses’ prior probabilities, multiplied by the ratio of the likelihoods of the given evidence under the hypotheses. If we then observed subsequent evidence E’, we would want to know P(LL | E, E’) / P(Zoo | E, E’), and Bayes’ theorem says that that’s equal to P(LL) / P(Zoo) times P(E | LL) / P(E | Zoo) times P(E’ | LL, E) / P(E’ | Zoo, E)—basically, for each additional piece of evidence, we get a new likelihood ratio for the new evidence given the hypotheses and the old evidence.

With that set-up established, I’d like you to imagine a certain way you could come to be doing this calculation. Suppose someone first asks you: “How many times more likely is the first COVID-19 outbreak to have been a lab leak rather than a zoonotic spillover?”. However, you’re kind of tired and not paying that close attention, so what you hear is “How many times more likely is mumble to have been mumble rather than mumble”. You know that the speaker made two utterances, that represent some sort of mutually exclusive hypotheses, but you have no idea what’s going on beyond that. You are now in the position of wondering how much more likely the referent of utterance 1 (U1) is to be true compared to the referent of utterance 2 (U2).

In this case, I’m going to assume you have a probability distribution over what hypotheses various utterances might mean. I’m also going to make further assumptions about these hypotheses:

The hypotheses are all mutually exclusive.²
Both utterances “come from the same distribution”, meaning that there’s no difference between how likely utterances 1 and 2 are to mean various things. That is, P(U1 means H) = P(U2 means H) for all H.
The probability that some utterance U is true, conditional on it meaning hypothesis H, is just the probability that H is true. That is, P(U | U means H) = P(H | U means H).
The probability of any “mundane” event E1 not involving utterances conditional on utterance U being true, U meaning H, and various other utterances meaning various other things, and possibly also on mundane event E2, is equal to the probability of that event given H being true, U meaning H, and various other utterances meaning various other things, and on E2. That is, P(E1 | U, U means H, U’ means H’, E2) = P(E | H, U means H, U’ means H’, E2).
Which utterances mean which things is probabilistically independent of anything else in the world (except for which utterances are true), including which hypotheses are true and which evidence we’d see under which hypotheses.
Furthermore, conditioned on the meaning of utterance U, whether or not U is true is probabilistically independent of the meaning of other utterances.

Assumption 1 lets us treat the hypotheses as usual, assumption 2 encodes that there’s no difference between the first and second utterances, assumptions 3 and 4 say that if utterance U means hypothesis H then we can treat “U is true” the same as “H is true”, and assumptions 5 and 6 say that learning what various utterances mean doesn’t tell you anything about substantive questions about the world. Note: I wouldn’t be surprised if there were a more compact way of writing these assumptions, but I don’t know what it is.

Now that we have these assumptions, we can do some calculations. First of all: what’s our prior ratio over whether U1 or U2 is true? Intuitively, it should be exactly 1, meaning that they’re just as likely as each other to be true, because there’s no difference between them. Here’s a proof of that: P(U1) can be calculated by summing the probability that U1 means H and U1 is true over every hypothesis H. That is, P(U1) = sum over H of P(U1, U1 means H) = sum over H of P(U1 means H)P(U1 | U1 means H) = sum over H of P(U1 means H) P(H | U1 means H) = sum over H of P(U1 means H) P(H), where first we used the chain rule of probability, second we used assumption 3, and third we used assumption 5. Likewise, P(U2) = sum over H of P(U2 means H) P(H). Next, we should notice that assumption 2 says that P(U1 means H) is equal to P(U2 means H) for every H. Therefore, P(U1) = sum over H of P(U1 means H) P(H) = sum over H of P(U2 means H) P(H) = P(U2), so P(U1) / P(U2) = 1.

Alright, so our prior ratio is exactly 1. This is great news, because it means that the prior is doing no work in our computation, because multiplying numbers by 1 doesn’t change them! We have therefore banished the feared prior from Bayesian statistics.

Next up, revisit the scenario where someone is asking you to compare the probabilities of two hypotheses, but you didn’t really pay attention to understand what they mean. Suppose you then think about it more, and you discover that the first utterance meant “The first COVID-19 outbreak was a lab leak” and the second utterance meant “The first COVID-19 outbreak was a zoonotic spillover”. How should you update on this evidence? Intuitively, all we’ve learned is the meanings of the utterances, without learning anything about how COVID-19 actually started, so our posterior ratio should just be P(LL) / P(Zoo), which means our likelihood ratio would have to be the same (given that our prior ratio is 1).

Here’s the proof: for utterance 1, the relevant likelihood term is P(U1 means LL and U2 means Zoo | U1). Using the definition of conditional probability, this is P(U1, U1 means LL, U2 means Zoo) / P(U1). Using the chain rule, we can manipulate this into P(U1 | U1 means LL, U2 means Zoo) P(U1 means LL, U2 means Zoo) / P(U1). By assumption 6, P(U1 | U1 means LL, U2 means Zoo) = P(U1 | U1 means LL), which by assumption 3 is equal to P(LL). Putting that all together, P(U1 means LL and U2 means Zoo | U1) = P(LL) P(U1 means LL, U2 means Zoo) / P(U1). Similarly, for utterance 2, the relevant likelihood term is P(U1 means LL and U2 means Zoo | U2), which is equal to P(Zoo) P(U1 means LL, U2 means Zoo) / P(U2). Since P(U1) = P(U2), the likelihood ratio is therefore P(U1 means LL and U2 means Zoo | U1) / P(U1 means LL and U2 means Zoo | U2) = P(LL) / P(Zoo).

What’s the significance of this? It means that we can recast the P(LL) / P(Zoo) term as a likelihood ratio, rather than a prior ratio.

Finally, we should check that this different formalism doesn’t change how we update on evidence. That is, suppose we further observe evidence E. We should multiply our old posterior ratio by P(E | U1, U1 means LL, U2 means Zoo) / P(E | U2, U1 means LL, U2 means Zoo). Intuitively, this should just be the likelihood ratio P(E | LL) / P(E | Zoo) because we’re just doing normal Bayesian inference, and understanding it in terms of updating on the meanings of utterances shouldn’t change anything. Formally, we can look at the numerator, P(E | U1, U1 means LL, U2 means Zoo), and by assumption 4, write it as P(E | LL, U1 means LL, U2 means Zoo). By assumption 5, this is just P(E | LL). Similarly, P(E | U2, U1 means LL, U2 means Zoo) = P(E | Zoo). Therefore, our new likelihood ratio P(E | U1, U1 means LL, U2 means Zoo) / P(E | U2, U2 means LL, U2 means Zoo) = P(E | LL) / P(E | Zoo). Therefore, we’re updating the same as we used to be. You can also check that this remains true if we get further “mundane” evidence.

What does this mean?

Basically, this shows that every term in a standard Bayesian inference, including the prior ratio, can be re-cast as a likelihood term in a setting where you start off unsure about what words mean, and have a flat prior over which set of words is true. How should we interpret that fact?

Firstly, I think that there’s some kind of interesting mapping to the intuitive experience of doing Bayesian inference in real-world settings. A lot of the initial task of determining what the prior should be involves understanding what the hypotheses actually mean in a probabilistic sense—what kinds of things would have to happen for COVID-19 to have started via a lab leak, and what would that say about the world? That said, it’s possible to over-emphasize these similarities. In the toy setting I sketch, you should be asking yourself “If ‘COVID-19 was a lab leak’ was true, what’s the chance that it would have these implications?”, which doesn’t quite match to the kinds of thinking I’d tend to do.

Secondly, it points to how strange likelihood ratios can be, by turning likelihood ratios into priors. There are other reasons to think that likelihoods are funny things: if the hypothesis in question is false, the likelihood is asking about how likely we would be to see some evidence in a world that doesn’t exist, which is a question that may be hard to get data on. There are therefore serious challenges with thinking of likelihood ratios as more “objective” or “scientific” than priors. As Gelman and Robert say, “It is perhaps merely an accident of history that skeptics and subjectivists alike strain on the gnat of the prior distribution while swallowing the camel that is the likelihood”.

Finally, it points to an interesting extension. In some cases, the meaning of various utterances might tell you something relevant about the world in question. For instance, suppose some utterance is a computer program, and its “meaning” is what it evaluates to. Learning this might serve as evidence about what other computer programs evaluate to (e.g. those computer programs that use your ‘utterance’ as a subroutine), meaning that one could not apply Bayesian statistics quite so simply in this setting.

A challenge

This construction was inspired by noting the similarity between the calculation of the prior term and the likelihood term in Bayes’ formula. The way it highlighted that similarity was by turning the prior term into a likelihood. But is there some way of re-casting the problem so that the likelihood term becomes a prior, and the prior term becomes a likelihood?

Compare priors and posteriors, which are both about the probability of the hypotheses in question, and are therefore more similar—you can use a posterior as a new prior when facing further evidence. ↩
This can actually be relaxed without changing our results: we can instead suppose that you’re not sure which way the speaker is carving up “hypotheses”, but that once they pick such a way, the two hypotheses they state will be mutually exclusive. ↩

n of m ring signatures

Mon, 04 Dec 2023 00:00:00 +0000

A normal cryptographic signature associated with a message and a public key lets you prove to the world that it was made by someone with access to the private key associated with the known public key, without revealing that private key. You can read about it on Wikipedia here.

A ring signature associated with a message and a set of public keys lets you prove to the world that it was made by someone with access to the message and one private key associated to one of the public keys in the set, but nobody will be able to tell which public key it was. This lets you say something semi-anonymously, which is neat. It’s also used in the private cryptocurrency Monero. You can read about them on Wikipedia here.

Here’s a thing that would be better than a ring signature: a signature that proved that it was made by a subset of public keys of a certain size. In my head, I was calling this an n of m ring signature for a while. But when I googled “n of m ring signature”, nothing came up. It turns out this is because in the literature, it’s called a “threshold ring signature”, a “k of n ring signature”, or a “t of n ring signature” instead. I think perhaps the first paper about it is this one, but I haven’t checked very hard.

Anyway: I would like to make it so that when you search for n-of-m ring signatures online, you find a thing telling you that you should instead search for “threshold ring signature”. Hence this post.

How to type Aleksander Mądry's last name in LaTeX

Mon, 20 Nov 2023 00:00:00 +0000

Type “Madry”.
Realize that the a has a little tail that you need to include.
That’s a feature of the Polish alphabet called an ogonek.
You type it in LaTeX like so: M\k{a}dry.
You get the error “Command \k unavailable in encoding OT1”.
That’s because you need LaTeX to use a slightly different font package.
In your preamble, add \usepackage[T1]{fontenc}.
You’re done.

My thanks to the Stack Exchange articles about how to use that symbol and how to deal with the LaTeX error I got implementing that fix.

If a little is good, is more better?

Sat, 04 Nov 2023 00:00:00 +0000

I’ve recently seen a bunch of discussions of the wisdom of publicly releasing the weights¹ of advanced AI models. A common argument form that pops up in these discussions is this:

The problem with releasing weights is that it means that thing X can happen on a large scale, which causes bad effect Y.
But bad effect Y can already happen on a smaller scale because of Z.
Therefore, either it’s OK to release weights, or it’s not OK that Z is true.

One example of this argument form is about the potential to cause devastating pandemics, and goes as follows:

The putative problem with releasing the weights of Large Language Models (LLMs) is that it can help teach people a bunch of facts about virology, bacteriology, and biology more generally, that can teach people how to produce pathogens that cause devastating pandemics.
But we already have people paid to teach students about those topics.
Therefore, if that putative problem is enough to say that we shouldn’t release the weights of large language models, we should also not have textbooks and teachers on the topics of virology, bacteriology, and other relevant sub-topics of biology. But that’s absurd!

In this example, thing X is teaching people a bunch of facts, bad effect Y is creating devastating pandemics, and Z is the existence of teachers and textbooks.

Another example is one that I’m not sure has been publicly written up, but occurred to me:

Releasing the weights of LLMs is supposed to be bad because if people run the LLMs without supervision, they can do bad things.
But if you make LLMs in the first place, you can run them without supervision.
So if it’s bad to publicly release their weights, isn’t it also bad to make them in the first place?

In this example, thing X is running the model, bad effect Y is generic bad things that people worry about, and Z is the model existing in the first place.

However, I think these arguments don’t actually work, because they implicitly assume that the costs and benefits scale proportionally to how much X happens. Suppose instead that the benefits of thing X grow proportionally to how much it happens²: for example, maybe every person who learns about biology makes roughly the same amount of incremental progress in learning how to cure disease and make humans healthier. Also suppose that every person who does thing X has a small probability of causing bad effect Y for everyone that negates all the benefits of X: for example, perhaps 0.01% of people would cause a global pandemic killing everyone if they learned enough about biology. Then, the expected value of X happening can be high when it happens a little (because you probably get the good effects and not the bad effects Y), but low when it happens a lot (because you almost certainly get bad effect Y, and the tiny probability of the good effects isn’t worth it). In this case, it makes sense that it might be fine that Z is true (e.g. that some people can learn various sub-topics of biology with great tutors), but bad to publicly release model weights to make X happen a ton.

So what’s the up-shot? To know whether it’s a good idea to publicly release model weights, you need to know the costs and benefits of various things that can happen, and how those scale with the user-base. It’s not enough to just point to a small amount of the relevant effects of releasing the weights and note that those are fine. I didn’t go thru this here, but you can also reverse the sign: it’s possible that there’s some activity that people can do with model weights that’s bad if a small number of people do it, but good if a large number of people do it: so you can’t necessarily just point to a small number of people doing nefarious things with some knowledge and conclude that it would be bad if that knowledge were widely publicized.

Basically, the parameters of these models. Once you know the parameters and how to put them together, you can run the model and do what you want with it. ↩
Or more generally, polynomially (e.g. maybe quadratically because of Metcalfe’s law). ↩

Watermarking considered overrated?

Mon, 31 Jul 2023 00:00:00 +0000

Status: a slightly-edited copy-paste of a ~~Twitter~~ X thread I quickly dashed off a week or so ago.

Here’s a thought I’m playing with that I’d like feedback on: I think watermarking large language models is probably overrated. Most of the time, I think what you want to know is “is this text endorsed by the person who purportedly authored it”, which can be checked with digital signatures. Another big concern is that people are able to cheat on essays. This is sad. But what do we give up by having watermarking?

Well, as far as I can tell, if you give people access to model internals - certainly weights, certainly logprobs, but maybe even last-layer activations if they have enough - they can bypass the watermarking scheme. This is even sadder - it means you have to strictly limit the set of people who are able to do certain kinds of research that could be pretty useful for safety. In my mind, that makes it not worth the benefit.

What could I be missing here?

Maybe we can make watermarking compatible with releasing model info, e.g. by baking it into the weights?
Maybe the info I want to be available is inherently dangerous, by e.g. allowing people to fine-tune scary models?
Maybe I’m missing some important reasons we care about watermarking, that make the cost-benefit analysis look better? E.g. avoiding a situations where AIs become really good at manipulation, so good that you don’t want to inadvertently read AI-generated text, but we don’t notice until too late?

Anyway there’s a good shot I don’t know what I’m missing, so let me know if you know what it is.

Postscript: Someone has pointed me to this paper that purports to bake a watermark into the weights. I can’t figure out how it works (at least not at twitter-compatible speeds), but if it does, I think that would alleviate my concerns.

Difficulties in making powerful aligned AI

Sun, 14 May 2023 00:00:00 +0000

Here’s my breakdown of the difficulties involved in ensuring powerful AI makes our lives radically better, rather than taking over the world, as well as some reasons why I think they’re hard. Here are things it’s not:

It’s not primarily a justification of why very powerful AI is possible or scary (altho it briefly discusses why very powerful AI would be scary).
It’s not primarily a list of underlying factors that cause these difficulties (altho it does include and gesture to some of those).
It’s not at all original - basically everything here has been said many times before, plausibly more eloquently.

That said, it is my attempt to group the problems in my own words, in a configuration that I haven’t seen before, with enough high-level motivation that one can hopefully tell the extent to which advances in the state of the art address them.

1. What sort of thinking do we want?

The first difficulty: we don’t have a sense of what sort of thinking we would want AI systems to use, in sufficient detail that one could (for instance) write python code to execute it. Of course, some of the difficulty here is that we don’t know how smart machines think, but we can give ourselves access to subroutines like “do perfect Bayesian inference on a specified prior and likelihood” or “take a function from vectors to real numbers and find the vector that minimizes the function” and still not solve the problem. To illustrate:

Take a hard-coded goal predicate, consider a bunch of plans you could take, and execute the plan that best achieves the goal? Unfortunately, the vast majority of goals you could think of writing down in an executable way will incentivize behaviour like gaining control over sources of usable energy (so that you definitely have enough to achieve your goal, and to double- and triple-check that you’ve really achieved it) and stopping other agents from being able to meddle with your plans (because if they could, maybe they’d stop you from achieving your goal).
Do things that maximize the number of thumbs up you get from humans?¹ Best plan: take control of the humans, force them to give you a thumbs up, or trick them into doing so. Presumably this is possible if you’re much smarter than humans, and it’s more reliable than doing good things - some people might not see why your good thing is actually good if left to their own devices.
Look at humans, figure out what they want based on what they’re doing, and do whatever that is? Main problem: people don’t do the literally optimal thing for what they want. For instance, when people play chess, they usually don’t play perfect moves - even if they’re experts! You need some rule that tells you what people would do if they wanted some goal or another, but it’s not clear what this rule would be, it’s not clear how you make this rule more in line with reality if you never observe “wanting”, and so this ends up having essentially the same problems as plans 1 and 2.
Read some text written by humans about what they’d like you to do, and do that?² This is passing the buck to the text written by humans to specify how we want the AI to think, but that’s precisely the problem we’re trying to solve. Concretely, one way you could imagine doing this is to write something relatively informal like “Please be helpful and harmless to your human operators”, and have your AI correctly understand what we mean by that. That (a) presumes that there is a coherent thing that we mean by that (which doesn’t seem obvious to me, given our difficulty in explicitly formalizing this request), and (b) passes the specification buck to the problem of specifying how you should understand this request.

It’s not a priori definitely impossible to build a thinking machine that does what we want without knowing how we want it to think, but it’s not at all obvious how one would. A core difficulty here is that the sorts of signs of positive outcomes we know how to specify (like “GDP has gone up a lot” or “a human says that they’re happy with the AI’s performance”) are compatible with extremely bad outcomes - and in general, as mentioned in point 1, things that are trying to achieve their own objectives in the physical world will be incentivized to cause those bad outcomes.

2. How do we recognize advanced AIs that we like?

Given that we don’t know how to specify advanced AI cognition that will do good stuff and not take control of Earth, how could we hope to build it? One obvious path is a sort of trial and error: we build some AIs, and before putting them in situations where they could conceivably take over (e.g. by having them become able to influence enough of the physical world to build fancy new technology), we figure out if they would do good stuff. Then, we can only deploy things that actually do good stuff, or even better, tweak things such that they’re more likely to do better stuff, and less likely to take over. The question is: how would we determine if our AIs will do good stuff once they’re able to take over?

One possibility you could imagine is trying to write a proof - after all, AIs are algorithms written in computer code, and one can often prove things about algorithms. The problem is that it’s entirely unclear what property we’d want to prove that our AI has, to the level of formal specificity that one could write a proof about it.³ This is closely related to the difficulty in section 1: if we had such a “goodness” property, we could build an AI that thought of plans that scored highly on “goodness”.

A second possibility is that you could look at your AI’s behaviour in a range of circumstances, and see how good it is. If your ‘goodness’ ratings come as numbers, and there are a bunch of free variables in your AI design, you can even automatedly do gradient descent to set those variables to values that get your AI to do things rated as highly good. The basic problem here is that just because your AI does good stuff when it can’t take over the world, doesn’t mean that it will do good stuff once it can. The basic reason is that by and large, there are a lot of motivations that can cause AIs to do stuff that looks good:

It could be motivated by goodness. In this case, things are OK!
It could be motivated by trying to get you to approve of it. In this case, once it can, it will probably try to control your brain to get you to approve of it (of course, without letting you know beforehand, so that you don’t disapprove in the mean time).
It could be motivated by random weird abstractions that come apart from what you were looking for once they’re optimized hard enough. For instance, consider how humans were optimized by evolution to reproduce a lot - this seems to have been implemented by enjoying genital contact when in the presence of attractive other humans, so once humans were capable of inventing contraception, they used that instead⁴. Similarly, you could imagine AIs taking over and pursuing some strange goals, vaguely reminiscent of the goals we attempted to select it for.
It could be motivated by near-arbitrary long-term goals, that all incentivize the AI convincing you to release it. As long as your AI has goals that are better satisfied when it’s out of your testing box - and there are tons of goals like this, like “amass a bountiful fortune” or “solve a ton of math” - and as long as it can tell that it’s being tested, it can choose to ‘play nice’ in the short run and pass your tests, until it’s free to take over and pursue its own desires.

So, it seems like there are more AIs that pass your behavioural tests without being aligned with your interests than AIs that pass your behavioural tests by being aligned with your interests. Note that this issue is, again, related to difficulties discussed in section 1: just like how many goals we could initially imagine writing down (like “get humans to approve of you” or “run a profitable business without being caught breaking any laws”) produce bad behaviour when optimized by an advanced AI, similarly there are many motivations that produce good behaviour before an advanced AI can take over the world, but not after.

Also note that we are talking as if our AIs have “motivations”, thus allowing us to re-use some of the reasoning from section 1: thinking of strategies that help achieve some goal, and concluding that the AI will take those strategies. This should be understood as saying that they coherently steer the world into some narrow set of states⁵ (aka the states they are ‘motivated’ to reach), not as a strong claim about their exact internal functioning. And in order for AIs to be useful, they need to be steering the world into observably different, hard-to-reach states, compared to if they weren’t made.

Finally, a worrying aspect of this second possibility is that many of its failure modes can only be exhibited once AI is advanced enough to be dangerous. By analogy, external observers may not have been able to tell that humans would end up using contraception until they were technologically advanced enough to make reliable contraceptives. Similarly, possibility 4 will only show up once AIs can come up with and competently execute such deceptive plans.⁶

3. How do neural networks actually work?

A third issue is that our current best ways of making AI involve taking gigantic tensors of numbers glued together by matrix multiplication and some non-linear functions (aka ‘neural networks’), and tweaking them until they do something impressive when run. This design doesn’t place strong constraints on specific parts of those tensors having any particular known function - it’s just a collection of numbers that happens to exhibit the right behaviours.

There are two closely-related key problems with this type of AI design:

Because the gigantic tensors have no particular pre-determined semantic meaning, it’s hard to instill any particular cognitive algorithm into them.
Because the tensors are so large and devoid of meaningful structure that we are currently able to easily comprehend, it’s difficult for human engineers to understand the algorithms being implemented by the AI, or to make grounded predictions about how they will behave in new situations.

Problem 1 means that we aren’t able to precisely steer the cognition of smart AIs into styles that we like, even if we knew the sort of cognition we wanted to distill; and problem 2 means that we can’t easily perform meaningful safety analysis for large capable AIs, even if we knew what this would look like.

4. Can safely limited AI solve the problem for us?

Given that we face these difficult problems, you might hope that we are able to use AI to solve them - just like we’ve used it to solve other problems that are insurmountable by humans, like “beat the best human chess player at chess”. This strategy only works if the AI we use isn’t the sort that we might be scared of. However, there are a few aspects of the alignment problem that make it seem very difficult for AIs that aren’t advanced enough to be scary:

It’s hard enough that humans don’t have a convincing solution yet, despite many people trying for many years.
It involves thinking carefully about the design of smart, capable agents. Presumably, if you’re able to do really good reasoning about the design of such agents, you’re in a position to make some for yourself, potentially engendering the problems that such agents bring about.
It involves achieving big successes in technical research. To solve these problems, you likely need to be able to prove novel theorems, think of untested strategies, come up with new sorts of algorithms, etc. These are broadly similar to the abilities necessary to do other kinds of technical research - of course, the detailed types of thinking and knowledge required for different fields are different, but the same sorts of humans can learn to be proficient in multiple different fields of research, and likewise the sort of AI that can learn to successfully do alignment research could also learn to successfully do other sorts of technical research. If we have an AI on our hands that can outcompete humans at a wide array of fields of technical research, that sounds like the sort of AI that may be able to take over the world via technological superiority.

To be sure, limited AIs can help in the meantime by e.g. making Google search better, or facilitating other kinds of human cognitive labour. But it’s not obvious how we can successfully outsource the AI alignment problem to other AIs, while being confident that the AIs we outsource to don’t need to be aligned themselves.

Discussion

As mentioned in the introduction, these problems are by no means unknown in the literature. Section 1 is related to work on value learning, corrigibility, and multi-multi alignment. Section 2 is related to work on inner alignment, robustness and interpretability in machine learning, as well as informed and scalable oversight. Section 3 is related to work on interpretability in machine learning, as well as deep learning theory. Finally, section 4 is related to OpenAI’s approach to AI alignment.

Furthermore, not all these problems need to be solved in order to build powerful aligned AI. I would break it down this way:

Do you want humans to build powerful aligned AI themselves, or build a machine to solve the problem for them?
- If we are trying to build powerful aligned AI ourselves, we need to either know what sort of AI cognition we want, or know how to recognize the sort of AI that we want (or perhaps both).
  - Learning what sort of AI cognition we want involves facing difficulty 1. After solving that difficulty, we would still face the problem of building it, which involves facing difficulty 3, either by understanding current machine learning better, or using something else.
  - Recognizing the sort of AI that we want requires facing difficulty 2. Does this involve looking at the internals of the AI, or merely its behaviour?
    - If this involves looking at the internals of the AI, we face difficulty 3.
    - If it instead involves building the sorts of models that only have the right sort of behaviour while unable to take over the world if they would also have the right sort of behaviour when able to take over the world, that sounds like it involves facing difficulty 1 and 3.
- If we are trying to make a machine build powerful aligned AI, we face difficulty 4.

My thanks to Erik Jenner for commenting on a draft of this post.

It’s actually slightly unfair to conflate this with RLHF, because reinforcement learning uses reward to shape agents’ thoughts, rather than building agents that optimize for reward, but I think this critique is relevant to understanding problems with RLHF, for reasons gestured to in section 2. ↩
I don’t think that this is actually what the people behind ‘constitutional AI’ were thinking, but it’s nice and linkable, and this is a proposal that some people talk about. ↩
Also, such a proof would plausibly require modelling the range of situations your AI would find itself in, which is a challenge to formalize. h/t Erik Jenner for making this point. ↩
Presumably evolution would, given enough time, eventually shape our psychology so that we abstain from contraception enough to have lots of children. But for the present point, what’s important is that it didn’t manage to instill the right desires on the first try, before we were powerful enough to invent technology to suit our interests. ↩
Note that there are some subtleties in this definition, as described here, but it will do for now. ↩
It’s been proposed that AIs will first be bad at deception before they’re good at it, just like they were bad at chess before they were good at it, and this will give us advanced warning to solve the problem. Besides my worry that existing AIs can already exhibit primitive deceptive behaviour, and that this doesn’t seem to be spurring effective research to deal with this failure mode, I also think that AIs will be able to evaluate whether they’re able to effectively deceive (in service of another goal) before they can actually effectively deceive (in service of another goal), and given that ineffective deception is worse than useless, I’d expect some regime where AIs refrain from behaving deceitfully until they’re able to do so effectively. ↩

On Blogging and Podcasting

Sun, 08 Jan 2023 00:00:00 +0000

A cover of and response to ‘Why and how to write things on the internet’, by Ben Kuhn.

Blogging

One thing you could be doing with your limited number of hours on this earth is writing a blog. This basically involves writing essays and then publishing them online on a site you control for people to read. The idea is that people can share links to essays you have written, and if people like them you can build some sort of following¹. Unlike being a newspaper columnist or a television presenter, it is actually quite easy to start a blog: if you are reading this, you are likely able to start your own blog for almost no money and not all that much time. And, of course, if you already have a blog, you could choose to publish on your blog more frequently. This presents the following question: should you be blogging more than you currently are?

I see blogging as having a few benefits. Firstly, writing about ideas changes your relationship to them. For example, the practice of writing essays on your blog might cause you to have more essay-shaped ideas by increasing the salience of mental motions like “perhaps I should have an idea so that I could then write about it” or “perhaps I should develop this idea into the sort of thing I could write an essay about on my blog”. Anecdotally, I have heard that frequent users of twitter often have an instinct to have the sorts of ideas that are pithy and make good tweets, so it seems plausible that frequent essayists might have something similar going on.

Relatedly, I think that once you have an idea, actually writing it out and publishing it changes your relationship to that particular idea. Writing an essay offers you an affordance to think of evidence for your idea, counter-examples, etc. It gives the idea a certain structure. Also, once the idea is published, you can refer to it, ask people to read it, build on it, et cetera.

A second benefit of blogging is that you can advance an argument, and perhaps persuade people of the point you are making. For example, if you think that more people should be blogging, you could write an essay arguing that, people could read it, and perhaps they would respond by blogging when they previously were not.

A third benefit of blogging is that you can share lovely things with the world. For example, if you think of a neat puzzle or a nice mathematical fact, together with a proof of that fact, you could share those on your blog, people could read them, and then they would share in the pleasure you take in beholding these things.

Fourthly, you could practise the skill of writing. In your life it will likely be useful for you to be able to communicate things via the written word - perhaps explaining things that are difficult to grasp, perhaps inchoate things that you must first mold into intelligible form - and blogging offers you the opportunity to practise this. This benefit is especially large if your blog has comments, where people can let you know if you have failed to communicate your idea or if you have missed an important consideration, or if you inhabit a corner of the blogosphere where it is common to write blog posts replying to other people’s blog posts.

The final benefit of blogging that I can think of is that it may improve your relationships with other people. For one, the practice of having ideas and writing them up may give you more things to talk about with your existing friends, or with people with whom you could become friends if only you had something to talk about. But also, by keeping your writings in one place where you are identified as the author, you could gain a following and a reputation, and people could be more willing to approach you and become friends with you, because of the traits you have advertised by having a blog - this could look like having a blog about tea and making friends who like people who like tea, or like people wanting to be friends with you because they were impressed by your blogging ability and have heard of you from your blog.

So: why would one not have a blog, or not write frequently on it? I see two big drawbacks. The first is that you have to write. For me, writing is time-consuming, difficult and painful. This means that blog posts are costly for me to produce, both in terms of amount of time (this one took around 4 hours, and I needed a writing buddy to pick up the momentum to actually write it) and in terms of how much you’d rather do something else with that time. I do not think I am alone in having this relationship to writing. The second is that you might not be very good at it. Your ideas might not be that interesting, or you might write in a way that people find difficult or unpleasant to read. Unlike the time cost, which merely balances out the benefits, this puts into doubt the very existence of the purported benefits of blogging.

When people recommend blogging, I think it is because these disadvantages are less important for them than for others - of course, you’d expect the people for whom blogging has the best cost/benefit tradeoff to feel the most good about blogging and recommend it the most. “You’ll get better”, they say, but from what level, and how much better will you get? You can improve your running speed by practising, but that doesn’t mean you’ll be running a four-minute mile any time soon. “The bar is low”, they say, but what they consider “low” you might not. (But there is some truth to this: there are benefits to writing good posts, but no serious costs to writing bad posts above and beyond the difficulty of writing.)

My own experience of having a blog where I publish infrequently (5 posts in 2022, 7 posts in 2021) is that it’s been fine, but not that great. The occasional post has gained traction, but most have not really. I can’t say that I’ve noticed benefits to my relationships from blogging, or improvement in my writing skill. That said, it’s plausible that the major benefits of writing a blog only accrue if you write more frequently, so perhaps I am poorly placed to assess how good it would be if I wrote more. Overall, I’d say it’s been worth it, but it’s not the sort of thing I’d be incredibly excited to recommend to all. That said, I do have some blogging recommendations.

Blogging recommendations

Firstly: it seems pretty plausible that it’s worth starting a blog, writing one post a week for three months (or one post a day for two weeks, or something), and seeing how that goes. Is the experience terrible? Are any of your posts good? This might be a way of determining whether regularly blogging will be worth it for you.

Secondly: it would be sort of good to know what fraction of people who were on the fence about starting a blog would endorse actually starting one if they tried it. So, you could gather a bunch of your blogging-curious friends, all try out blogging for a while, gather data on how many people like it enough to continue, and publish that data. This could help inform the wider blogging-curious world whether they should or should not take the leap.

Thirdly: I think it is worth trying writing blog posts that are covers of other things. In the domain of music, sometimes musician A will take a song of musician B, and perform that song in the style of musician A (or perhaps the style musician A thinks musician B should have performed it in). But one could also do this in the world of writing: for example, you could take an essay that you liked and write it in your own style, or make slightly different points that you agree with more. You could also cross genres (e.g. take someone’s essay and write it as a poem, or take someone’s sermon and write it as an essay). This strikes me as an easier and more approachable way to write, and I may write other blog post covers this year.

Podcasting

As you may know, a podcast is when you record audio (usually primarily audio of people speaking) and publish it on the internet, in a manner such that dedicated ‘podcast apps’ can let people subscribe to your podcast and listen to ‘episodes’ right after you publish them. Because podcasts are sort of like radio shows, and the ones you have probably heard of are high-quality and well-polished, I think people think of podcasting as somewhat inaccessible, or a thing for fancy famous people - but it is not! The barriers and costs are somewhat higher than those of blogging, but, I suspect, still surmountable by the average reader of this post without a heroic amount of effort.

Podcasting has many of the benefits of blogging, in that they are both ways of publishing ideas. That said, the profile of benefits and drawbacks is somewhat different, making it a better fit for different people.

The most distinct advantage of podcasting is that it involves speaking rather than writing. I find speaking much easier, it being a thing I do all the time (and even as a form of recreation with friends!). This makes the basic production of ‘material’ easier. Also, by virtue of speech communicating more information per word than writing (one can use pitch, intonation, and tone of voice), as well as being a primary way people interact with their friends, listening to someone’s podcast can be something of a simulacrum of having a friendly relationship with them (see the concept of parasocial relationships). This is sometimes thought of as a downside, but it means that podcasting offers your friends and acquaintances an easy way of ‘interacting’ with ‘you’ when you cannot actually be there.

Podcasting does have some downsides relative to blogging. For one, the process of speaking takes less time per word than the process of writing, so the ideas you express might have less preparation put into them. Note that this can be ameliorated by speaking on topics that you have already thought a lot about, and therefore have the ability to say thoughtful things when speaking extemporaneously about them.

It is also more difficult to set up: audio must be edited, which is likely a more foreign process to you than that of editing text, and you have to figure out how to get someone to host your audio files, what you have to do to convince Apple Podcasts that they should let their users listen to your podcast, et cetera. Also, there is some expectation that podcasts should be more ‘polished’: people might anticipate that your podcast has a consistent theme or format (I guess this is because podcasts are more often listened to by dedicated subscribers rather than people who have been linked to a single episode). But of course one is allowed to buck these expectations!

A final difference between podcasts and blogs is that it is more frequent for podcast episodes to be a collaboration between multiple people - perhaps the people are having a conversation, or perhaps one is interviewing another². This gives them a certain kind of freshness - certain ideas and dynamics can more easily thrive in the interaction between two people who don’t know the same things or think the same way. It also has the cost that in order to make them, you need two people to do something rather than just one, and perhaps they both have to be available at the same time, which they may rarely be: in general, my experience is that the more people are required to make a thing happen, the less it will happen.

I have two podcasts, and as you might expect of someone who has two podcasts, I have quite enjoyed the experience of being a podcaster. One of my podcasts interviews researchers who are working in the field of AI existential risk reduction, and gets them to explain what they’re on about. This has both been valuable for people who want to know what those researchers are on about (which seems to be the sort of thing it is often easier to talk about than to write down), and has given me a pleasant sliver of fame among a certain sub-sub-culture. It also means that I have the excuse to have some kinds of conversations I enjoy having. Overall, my suspicion is that more people I know could get something out of starting podcasts, but this could well be the same mistake that the people who are good at blogging and think more people should blog are making.

In the remainder of this post, I will talk a bit about how one might start a podcast, so that you can more easily recognize if that might be for you.

How one might start a podcast

Podcasts usually have some sort of format or theme. Here are some kinds of podcasts you could make:

Interview podcasts. Examples include Conversations with Tyler and The Lunar Society. On these shows, you can get someone to explain a thing they know and you’re curious about. This is kind of nice because if you know something, you might not realize that other people don’t know it and would like to, but if other people know something that you don’t understand so well, it’s easier to tell that more people would like to hear about it.
Debate podcasts. Like interview podcasts but more adversarial and revealing more viewpoints. I don’t listen to any, and when I google “debate podcasts” the results don’t look enticing, but in principle it seems like there should be good versions of these. (In my experience, debates that are shown on youtube are often good, so perhaps there is something about the podcast format that is unfriendly to debates?)
Two pals talking. Examples include Cortex and Minds Almost Meeting. Conversations can be structured around what people have been going thru or thinking about recently, set topics they would like to talk about, books they have just read, or a host of other things. This can be much more informal than some other kinds of podcasts. In principle this could involve more than two pals, but adding more than two people seems like it would seriously increase the difficulty of coordinating times to record, topics to talk about, etc.
Expository podcasts. Examples include More or Less and Darknet Diaries. In these podcasts, there’s a specific thing you’d like to explain, and the podcast is structured around conveying that. These could have just one speaker, or could involve interviewing one or multiple guests. This genre strikes me as requiring more preparation than other genres of podcast.
Diary/experiential podcasts. This could look like vlogging (without the v) or like uploading audio every day during NaNoWriMo. To be honest, I listen to fewer of these, but they strike me as either having the potential to be a version of social media, where people who already know you could be interested to hear what your life is like, or if you recount your experience doing something interesting and unusual, a wider audience could be interested.

If you would like to get lots of listeners, you might want to spend time thinking of a target audience, what sort of thing they might want to get out of a podcast, and try to make a podcast that does a good job at producing that. This could also be a way of giving you an idea for what to talk about.

Next: how do you actually make a podcast? Here is a list of steps that I previously wrote on this topic, slightly edited, that should get you started.

Buy a decent microphone, e.g. the Blue Yeti (costs $130). This will help you not sound bad.
If you’re going to be talking to people who aren’t physically near you, use some service that will record both of you talking. I recommend Zencastr (free for how I use it).
Record some talking (this is the hard part). My strong advice is that if you’re doing this with someone else who isn’t physically present, you should both be wearing wired headphones and ideally using a wired internet connection, to reduce lag and make the conversation smoother. Please do this in a non-echoey, non-noisy space if you can. Kitchen is bad, sound-isolated place with blankets is good.
Do some minimal editing. Don’t try to delete every um and ah, that will take way too long. You can use the computer program Audacity for this if you want to be able to get into the weeds (free), or ask me who I pay to do my editing. There is also a program called Descript that I’ve heard is easy to use and costs $12/mo, but I have not used it myself.
Optionally, make transcripts by uploading your edited audio files to rev.com ($1.50 per minute of audio). You’ll then have to re-listen to the audio and fix mistakes in the transcript. If you do this, you will probably want to make a website to put transcripts on, which will maybe involve using Github Pages or Squarespace (or maybe you just put transcripts on a pre-existing Medium/Substack/blog?). This is quite time-consuming and not obviously worth it. You could also try using OpenAI’s tool called Whisper for this.
Think of a name and logo for your podcast. Your logo needs to be exactly square and high-res.
Use a podcast hosting service. I like Libsyn (~$10/month for basic plan). Upload your audio files there, write descriptions and episode titles. You should now have an RSS feed.
Submit your RSS feed to Google Podcasts, Apple Podcasts, and Spotify. This will involve googling how to do this, you might make some errors, and then it will take ages for Apple to list your podcast.

In conclusion: do what you want, but it’s kind of nice to put your thoughts out there into the world, and mathematics says it’s often worth trying things that you haven’t tried before.

I am focussing on essays because that is the predominant type of blog ‘content’ (and when people recommend you start a blog I think they are usually imagining an essay-dominated blog), but other formats are possible, such as short stories, serialized long works of fiction, poetry, collections of links to other internet content, beautiful works of art you have found, non-essay diary entries, reporting, product reviews, descriptions of places you are travelling, mathematical proofs, and I’m sure much more. Of course, all these categories blend into each other around the edges. For examples of blogs containing some things other than standard essays, I recommend Marginal Revolution, Jeff Kaufman’s blog, and World Spirit Sock Puppet. ↩
Of course, one could just as well do the same sort of collaboration on a blog post. I know that this is much less common, but I don’t really know why. ↩

Things I carry almost every day, as of late December 2022

Thu, 29 Dec 2022 00:00:00 +0000

Here we see things I carry in my pockets regularly.

To the left is my phone case, with two bandaids in it. This is a stand-in for the phone (a Pixel 6) that the phone case usually encases (I was busy taking photos with it). I keep two bandaids between the case and the phone in case I need one - which does sometimes happen. I find my phone useful for the normal things one uses a phone for. As of this year, I’ve started using it in lieu of a credit card, which feels very cool and 21st century.

At the top right is a pocket constitution made by Legal Impact for Chickens. I received this at an Effective Altruism Global conference, during the career fair. What actually happened was that someone came up to the booth I was at holding the pocket constitution, I noted that it looked cool, and they were kind enough to offer it to me. Unfortunately, I have never knowingly met anybody from Legal Impact for Chickens. I have not actually used this pocket constitution, but I carry it anyway in my winter jacket’s inner breast pocket since (a) it fits very unobtrusively and (b) it seems cool to carry around a pocket constitution.

At the bottom right is my wallet, a Bellroy Slim Sleeve. I very much like the material it is made of: called “baida nylon”, it is pleasingly canvas-like. Unfortunately I do not know how to faithfully convey this to you via the internet, but you will have to take my word that this is one of my two favourite purchases in 2022, ranked by my tactile pleasure in interacting with it.

Here we see the wallet open. On the left is my student ID card (with my COVID vaccine card tucked behind it out of sight), with a collection of folded bills tucked behind it. On the right is my Alcor membership card. Behind the right card is a tab that can be pulled to reveal more cards.

As you can see, there are several cards stowed away: on top are emergency medical instructions if I am found dead, to prepare my head for cryogenic storage, and below that is:

my WeWork card
my health insurance card
my public transit pass
my state ID card, and
my credit card.

All in all, I like the way this wallet lets me store many cards with very little space. But this is not all that is in my wallet.

Inside the bills, I have hidden away two items. The first is a Purell hand sanitizing wipe, useful when I touch something gross and want to disinfect it (or when something gross gets on my clothes). The second is an Eisenhower dollar coin - a Christmas gift for me this year, I like its size, heft, and image of an eagle landing on the moon with the Earth in the background, and plan to use it for flipping, scratching, and other coin-related needs.

I also have a backpack. It is a 21L GoRuck GR1 in ‘Coyote brown’. I chose that colour rather than black, which I would normally choose, because for this line of backpacks the inside is the same colour as the outside, and I wanted black items of mine to be visible against the fabric of the backpack. I chose it because:

it is only just large enough to hold all items I might use in a day (or take with me on a weekend getaway) while not being too large to stow underneath an aeroplane seat
it appears to be made of sturdy materials with reliable stitching
it does not have large amounts of pockets etc. that I do not use
it opens top-to-bottom in a clamshell fashion
it is waterproof enough for my purposes

All in all, while it is a new purchase, I am tentatively satisfied with it.

Here is a picture of it from the front, with high-vis features to avoid me being hit by a car at night.

Here is a picture of its back. You can see a zippered back compartment: designed to hold laptops, unfortunately my laptop is too big for it. I instead keep paper and masks there.

Here it is from the side, where you can see its top handle.

And here it is fully open, packed as it usually would be.

Next, I will show pictures of things stored in the backpack.

Here is the water bottle I use - a 32 oz Takeya Actives Insulated Water Bottle with Spout Lid. I quite like it: the lid screws on and off in a way that makes leakage not an issue, and the material is pleasant to interact with. It is the other of my two favourite purchases in 2022, ranked by my tactile pleasure in interacting with it.

Here is the first of two bags you see: made by Origo, I use it to hold a large number of Expo whiteboard markers. You never know when you will find yourself at a whiteboard without workable markers in sufficient number, or without enough colours.

Here is the second bag, which contains things I bring with me to lunch:

Purell hand sanitizer
Supplements
- Green-lipped mussel powder
- Vitamin D (to be discontinued once this bottle runs out)
- Omega-3
- Creatine capsules (much more pleasant for me to take than creatine powder)

The presence of the bag makes it much easier for me to take these with me to lunch at work, where I can remember to take the supplements and sanitize hands before eating.

My backpack also contains my laptop and laptop charger. The laptop has a number of stickers on it: going clockwise from the top-right, it advertises:

CHAI, the research group where I work
EAGxBerkeley 2022 (actually held in Oakland)
Shure, a company that makes microphones, some of which I use
Monero, a private cryptocurrency, and
CS 188, a UC Berkeley course I TA’ed in 2021.

I also carry around Sony MDR7506 wired headphones in the pouch they came with. The pourch also contains the warranty for the headphones, as well as an adapter that fits them into other types of audio recording equipment.

This concludes the things I keep in the main area of my backpack.

In the lower mesh pocket of my backpack, I keep wirey things, most of which are good for charging. Going clock-wise from top left, we have:

An Anker power brick
USB-A to lightning cable
Wall to USB-C converter
USB-A to USB-C cable
Wall to USB-A converter
USB-A to micro-USB cable
USB-C to USB-C cable
Headphone splitter (inspired by one time I wished I had one, have used it hardly at all since)
USB-C to lightning cable

These satisfy all of my charging needs.

Finally, the things I keep in the top pocket of the GR1. Left-to-right, I have Aquaphor lip repair ointment, which I use on my lips when chapped (I am told that it does not cause later chapping, unlike ChapStick), another Eisenhower dollar (in case I need two), 10 Bicycle dice (in case I need to come up with a password), and pens:

Uniball vision, in green, purple, and black. My favourite pen. I originally just had black, but branched out to green and purple in case I needed to mark students’ work with pen while TA’ing. I did not, but I find them to be nice highlight colours.
a CHAI pen.
a pen that doesn’t explode when it’s in an airplane in the sky (the Uniball vision does, which it one of its two major failings, the other being that it doesn’t write on some glossy materials).
two sharpies, in red and black.

This concludes my presentation of my “everyday carry”. I hope it gives you inspiration for ways of carrying things and things to carry that make your life more convenient.

Announcing The Filan Cabinet

Thu, 29 Dec 2022 00:00:00 +0000

Happy holidays! Some months ago, I launched a new podcast called The Filan Cabinet, but forgot to announce the podcast on this blog. Today, I rectify that mistake.

In some ways, the podcast is similar to AXRP - the AI X-risk Research Podcast. On that show, I interview AI x-risk researchers about their work, and try to bring their underlying views about AI x-risk research into the open: why do they think what they’re doing matters, and which research avenues do they find more or less promising?

The main difference is that in The Filan Cabinet, I talk about whatever I want to talk about, while still maintaining the goal of helping my audience understand my guests’ perspectives. To give you some sense of the show’s range, the first four episodes are about:

A secondary goal of the podcast is to give me practice at interviewing well, with the hope that this practice improves AXRP. With this in mind, I’ve optimized the production process for speed, meaning that I do less research before each interview, and that I do not release transcripts for episodes. With luck, this will enable me to release more frequently without sacrificing too much quality.

If you would like to listen to the show, you can search “The Filan Cabinet” on your podcast app of choice, or just click here to see it on Google Podcasts. You can also see announcements of new episodes on this Twitter account. You should see some new episodes being released in 2023.