Accidentally cross-posted to the AI Alignment Forum, where the math renders better.
The Kelly criterion for betting tells you how much to wager when someone offers you a bet. First introduced in this paper, it deals with the situation where someone is offering you a contract that pays you €1 if the event $E$ (for concreteness, you can imagine \(E\) as the event that the Republican candidate wins the 2020 US Presidential election) and €0 otherwise. They are selling it for €\(q\), and your probability for \(E\) is \(p > q\) (this is equivalent to the more common formulation with odds, but it’s easier for me to think about). As a result, you think that it’s worth buying this contract. In fact, they will sell you a scaled-up contract of your choice: for any real number \(r \geq 0\), you can buy a contract that pays you €\(r/q\) if \(E\) occurs for €\(r\), just as if you could buy \(r/q\) copies of the original contract. The question you face is this: how much of your money should you spend on this scaled-up contract? The Kelly criterion gives you an answer: you should spend \((p-q)/(1-q)\) of your money.
Why would you spend this exact amount? One reason would be if you were an expected utility maximiser, and your utility was the logarithm of your wealth. Note that the logarithm is important here to make you risk averse: if you simply wanted to maximise your expected wealth after the bet, you would bet all your money. To show that expected log-wealth maximisers use the Kelly criterion, note that if your initial wealth is \(W\), you spend \(fW\) on the scaled contract, and \(E\) occurs, you then have \((1-f)W + fW/q\), while if you bet that much and \(E\) does not occur, your wealth is only \((1-f)W\). The expected log-wealth maximiser therefore wants to maximise
\[\begin{align} U &= p\log \left( (1-f)W + \frac{fW}{q} \right) + (1-p) \log ((1-f)W) \\ &= p \log \left(1 - f + \frac{f}{q} \right) + (1-p) \log (1-f) + \log (W)\text{.} \end{align}\]The derivative of this with respect to \(f\) is $^$\frac{\partial U}{\partial f} = \left( \frac{p}{1-f + f/q} \right) \left( \frac{1}{q} - 1 \right) - \frac{1-p}{1-f}\text{.}$^$ Setting this derivative to 0 and rearranging produces the stated formula.
The Wikipedia page as of 25 October 2016 gives another appealing fact about Kelly betting. Suppose that this contract-buying opportunity recurs again and again: that is, there are many events \(E_t\) in a row that you think each independently have probability \(q\), and after your contract about \(E_{t-1}\) resolves, you can always spend €\(r\) on a contract that will pay €\(r/p\) if \(E_t\) happens. Suppose that you always spend \(f\) of your wealth on these contracts, you make \(N\) of these bets, and \(K\) pay off. Then, your final wealth after the \(N\) bets will be $^$\text{Wealth} = \left(1-f+\frac{f}{q} \right)^K (1-f)^{N-K} W \text{.}$^$ The derivative of this with respect to \(f\) is
\[\begin{align} \frac{\partial \text{Wealth}}{\partial f} &= W\left( K \left(1-f+ \frac{f}{q} \right)^{K-1} \left(\frac{1}{q} - 1 \right)(1-f)^{N-K} \right. \\ &\quad \left. {} - \left(1-f+ \frac{f}{q} \right)^K (N-K)(1-f)^{N-K-1}\right) \text{.} \end{align}\]Setting this to 0 gives \(f = K/N - ((N-K)/N)(q/(1-q))\), and if \(K/N = p\) (which it should in the long run), this simplifies to \(f = (p-q)/(1-q)\), the Kelly criterion. This makes it look like Kelly betting maximises your total wealth after the \(N\) runs, so why wouldn’t an expected wealth maximiser use the Kelly criterion? Well, the rule of betting all your money every chance you have leaves you with nothing if \(K = pN\), but in the unlikely case that \(K = N\), the rule works out so well that expected wealth maximisers think that it’s worth the risk.
Before I move on, I’d like to share one interesting fact about the Kelly criterion that gives a flavour of the later results. You might wonder what the expected utility of using the Kelly criterion is. Well, by simple substitution it’s just \(p \log (p/q) + (1-p) \log ((1-p)/(1-q)) + \log (W)\). Ignoring the \(\log (W)\) utility that you already have, this is just \(D_{KL}(p||q)\). Bam! Information theory!
Previously, we talked about the case where somebody was offering to sell you a contract at a fixed price, and all you could do was keep it. Instead, we can consider a market full of these contracts, where all of the participants are log wealth maximisers, and think about what the equilibrium price is. Our proof will be similar to the one found here.
Before diving into the math, let’s clarify exactly what sort of situation we’re imagining. First of all, there are going to be lots of contracts available that correspond to different outcomes in the same event, at least one of which will occur. For instance, the event could be “Who will win the Democratic nomination for president in 2020?”, and the outcomes could be “Cory Booker”, “Elizabeth Warren”, “Martin O’Malley”, and all the other candidates (once they are known - before then, the outcomes could be each member of a list of prominent Democrats and one other outcome corresponding to “someone else”). Alternatively, the event could be “What will the map of winners of each state in the 2020 presidential election be?”, and the outcomes would be lists of the form “Republicans win Alabama, Democrats win Alaska, Democrats win Arizona, …”. This latter type of market actually forms a combinatorial prediction market – by buying and short-selling bundles of contracts, you can make bets of the form “Republicans will win Georgia”, “If Democrats win Ohio, then Republicans will win Florida”, or “Republicans will win either North Dakota or South Dakota, but not both”. Such markets are interesting for their own reasons, but we will not elaborate on them here.
We should also clarify our assumptions about the traders. The participants are log wealth maximisers who have different priors and don’t think that the other participants know anything that they don’t – otherwise, the no-trade theorem could apply. We also assume that they are price takers, who decide to buy or sell contracts at whatever the equilibrium price is, not considering how their trades effect the equilibrium price.
Now that we know the market setup, we can derive the purchasing behaviour of the participants for a given market price. We will index market participants by \(i\) and outcomes by \(j\). We write \(q_j\) for the market price of the contract that pays €1 if outcome \(j\) occurs, \(p^i_j\) for the probability that participant \(i\) assigns to outcome \(j\), and \(W^i\) for the initial wealth of participant \(i\).
First of all, without loss of generality, we can assume that participant \(i\) spends all of their wealth on contracts. This is because if they spend some money on all contracts, they are guaranteed some payoff, just as if they had saved some money. We can therefore write the amount that participant \(i\) spends on contracts for outcome \(j\) as \(W^i \tilde{p}^i_j\), under the condition that \(\sum_j \tilde{p}^i_j = 1\). Then, if outcome \(j\) occurs, their posterior wealth will be \(W^i \tilde{p}^i_j / q_j\). We can use the method of Lagrange multipliers to determine how much participant \(i\) will bet on each outcome, by maximising $^$L(\tilde{p}^i, \lambda) = \sum_j p^i_j \log \left(\frac{W^i \tilde{p}^i_j}{q_j}\right) - \lambda \left( \sum_j \tilde{p}^i_j - 1\right)\text{.}$^$ Taking partial derivatives,
\[\begin{align}\frac{\partial L}{\partial \tilde{p}^i_j} &= \frac{p^i_j}{\tilde{p}^i_j} - \lambda \\ &= 0\text{,}\end{align}\]so \(\tilde{p}^i_j = p^i_j / \lambda\), and
\[\begin{align}\frac{\partial L}{\partial \lambda} &= \sum_j \tilde{p}^i_j - 1 \\ & = 0\text{,}\end{align}\]so \(\lambda^{-1} \sum_j p^i_j = 1\), so \(\lambda = 1\). Therefore, regardless of the market prices, participant \(i\) will spend \(W^i p^i_j\) on contracts for outcome \(j\). You might notice that this looks different to the previous section – this is because previously our bettor could only bet on one outcome, as opposed to betting on both.
Next, we can generalise to the case where the market participants save some amount of money, buy some contracts, and sell some others. This will be important for deriving the equilibrium market behaviour, since you can’t have a market where everyone wants to buy contracts and nobody wants to sell them.
Suppose trader \(i\) saves \(W^i s^i\) and spends \(W^i \tilde{p}^i_j\) on contracts for each outcome \(j\). Here, we allow \(\tilde{p}^i_j\) to be negative - this means that \(i\) will sell another trader \(-W^i \tilde{p}^i_j\) worth of contracts, and will supply that trader with \(-W^i \tilde{p}^i_j/q_j\) if outcome \(j\) occurs. We now demand that \(s^i + \sum_j \tilde{p}^i_j = 1\) for \(s_i\) to make sense. Now, if outcome \(j\) occurs, trader \(i\)’s wealth will be \(W^i(s^i + \tilde{p}^i_j/q_j)\) – if \(\tilde{p}^i_j > 0\) then the trader makes money off their contracts in outcome \(j\), and if \(\tilde{p}^i_j < 0\) then the trader pays their dues to the holder of the contract in outcome \(j\) they sold. We’d like this to be equal to \(W^i p^i_j/q_j\), so that the trader’s wealth is the same as if they had saved all their money. This happens if \(s^i + \tilde{p}^i_j/q_j = p^i_j/q_j\), i.e. \(\tilde{p}^i_j = p^i_j - s^i q_j\).
Now that we have the behaviour of each trader for a fixed market price, we can derive the equilibrium prices of the market. At equilibrium, supply should be equal to demand, meaning that there are as many contracts being bought as being sold: for all \(j\), \(\sum_i W^i \tilde{p}^i_j = 0\). This implies that \(\sum_i W^i (p^i_j - s^i q_j) = 0\), or \(q_j = \left( \sum_i W^i p^i_j \right)/\left( \sum_i W^i s^i \right)\). It must also be the case that \(\sum_j q_j = 1\), since otherwise the agents could arbitrage, putting pressure on the prices to satisfy \(\sum_j q_j = 1\). This means that \(\sum_j \left(\sum_i W^i p^i_j\right)/\left(\sum_i W^i s^i\right) = 1\), implying that \(\sum_i W^i = \sum_i W^i s^i\) and \(q_j = \sum_i W^i p^i_j / \left(\sum_i W^i\right)\).
Note the significance of this price: it’s as if we have a Bayesian mixture where each trader corresponds to a hypothesis, our prior in hypothesis \(i\) is \(h^i = W^i / \left(\sum_i W^i\right)\), and the market price is the Bayesian mixture probability \(\sum_i h^i p^i_j\). How much wealth does the participant/hypothesis have after we know the outcome? Exactly \(W_i p^i_j \left(\sum_i W^i\right) / \left(\sum_i W^i p^i_j\right) = \left(\sum_i W_i\right) h^i p^i_j / \left(\sum_i h^i p^i_j\right)\), proportional to the posterior probability of that hypothesis. Our market has done an excellent job of replicating a Bayesian mixture!
You might have thought that the above discussion was sufficiently general, but you’d be wrong. It only applies to markets with a countable number of possible outcomes. Suppose instead that we’re watching someone throw a dart at a dartboard, and will be able to see the exact point where the dart will land. In general, we imagine that there’s a set \(\Omega\) (the dartboard) of outcomes \(\omega\) (points on the dartboard), and you have a probability distribution \(P\) that assigns probability to any event \(E \subseteq \Omega\) (region of the dartboard). (More technically, \(\Omega\) will be a measurable set with sigma-algebra \(\mathcal{F}\) which all our subsets will belong to, \(P\) will be a probability measure, and all functions mentioned will be measurable.)
First, let’s imagine that there’s just one agent with wealth \(W\) and probability distribution \(P\), betting against the house which has probability distribution \(Q\). This agent can buy some number \(b(\omega)\) of contracts from the house that each pay €1 if \(\omega\) occurs and €0 otherwise, for every \(\omega \in \Omega\) (similarly to the previous section, if \(b(\omega) < 0\) the agent is selling these contracts to the house). The house charges the agent the expected value of their bets: \(\mathbb{E}_{Q} [b(\omega)]\). The question: what function \(b\) should the agent choose to bet with?
Our agent is an expected log wealth maximiser, so they want to choose \(b\) to maximise \(\mathbb{E}_{P} [\log b(\omega)]\). However, they are constrained by only betting as much money as they have (and without loss of generality, exactly as much money as they have). Therefore, the problem is to optimise the Lagrangian
\(\begin{align} L(b, \lambda) &= \mathbb{E}_{P} [ \log b(\omega) ] - \lambda \left( W - \mathbb{E}_Q [ b(\omega) ] \right) \\ &= \mathbb{E}_{P} [ \log b(\omega) ] - \mathbb{E}_Q[\lambda(W - b(\omega))] \end{align}\)
To make this easier to manipulate, we’re going to want to make all of this an expectation with respect to \(Q\), using an object \(dP/dQ (\omega)\) called the Radon-Nikodym derivative. Essentially, if we were thinking about \(\omega\) as being a point on a dartboard, we could think of the probability density functions \(p(\omega)\) and \(q(\omega)\), and it would be the case that \(\mathbb{E}_{P}[f(\omega)] = \mathbb{E}_{Q} [f(\omega) p(\omega) / q(\omega)]\). The Radon-Nikodym derivative acts just like the factor \(p(\omega) / q(\omega)\), and is always defined as long as whenever \(Q\) assigns some set probability 0, \(P\) does as well (otherwise, you should imagine that \(q(\omega) = 0\) so \(p(\omega) / q(\omega)\) isn’t defined). This lets us rewrite the Lagrangian as
$^$ L(b, \lambda) = \mathbb{E}_{Q} \left[ \left( \log b(\omega) \right) \frac{dP}{dQ}(\omega) - \lambda(W - b(\omega)) \right] $^$
We have one more trick up our sleeves to maximise this with respect to \(b\). At a maximum, changing \(b\) to \(b + \delta b\) should only change \(L\) up to second order, for any small \(\delta b\). So,
\(\begin{align}L(b + \delta b, \lambda) &= \mathbb{E}_{Q} \left[ \left( \log (b(\omega) + \delta b(\omega)) \right) \frac{dP}{dQ}(\omega) - \lambda(W - b(\omega) - \delta b(\omega)) \right] \\ &= \mathbb{E}_{Q} \left[ \left( \log b(\omega) + \frac{\delta b(\omega)}{b(\omega)} \right) \frac{dP}{dQ}(\omega) - \lambda (W - b(\omega) - \delta b(\omega))\right] \\ &\quad {} + o(\delta b(\omega)^2)\\ &= L(b, \lambda) + \mathbb{E}_Q \left[ \frac{\delta b(\omega)}{b(\omega)} \frac{dP}{dQ}(\omega) + \lambda \delta b(\omega)\right] + o(\delta b(\omega)^2)\end{align}\)
We therefore require that \(\mathbb{E}_Q [(\delta b(\omega) / b(\omega)) (dP/dQ(\omega)) + \lambda \delta b(\omega)] = 0\) for all \(\delta b(\omega)\). This can only happen when \(b(\omega) = - \lambda^{-1} dP/dQ(\omega)\), and it’s easy to check that we need \(\lambda = -W^{-1}\). Therefore, the agent buys \(W \times dP/dQ(\omega)\) shares in outcome \(\omega\), which you should be able to check is the same as in the case of countably many contracts.
Suppose we want to express the bet equivalently as our agent saving \(S\). For this to be equivalent to the agent spending all their money, we need \(b(\omega) + S = W \times dP/dQ(\omega)\) for all \(\omega\), which is easily solved.
Now, suppose we’re in a market with many agents, indexed by \(i\). Each agent has wealth \(W^i\), probabilities \(P^i\), and saves \(S^i\). In response to a market probability \(Q\), they buy \(W^i dP^i/dQ(\omega) - S^i\) contracts for outcome \(\omega\). What is this equilibrium market probability?
We would like to think of markets for each outcome \(\omega\) and solve for equilibrium, but it could be that each agent assigns probability 0 to every outcome. For instance, if I’m throwing a dart at a dartboard, and your probability that I hit some region is proportional to the area of the region, then for any particular point your probability that I will hit that point is 0. If this is the case, then the equilibrium price for contracts in every outcome will be 0, which tells us nothing about how traders buy and sell these contracts. Instead, we’ll imagine that there’s a set of events \(\{ E_j \}\) that are mutually exclusive with the property that one of them is sure to happen – in the case of the dartboard, this would be a collection of regions that don’t overlap and cover the whole dartboard. The agents will bundle all of their contracts for outcomes of the same event, and buy and sell those together. In this case, letting \([[ \omega \in E]]\) be the function that is 1 if \(\omega \in E\) and 0 otherwise, the condition for equilibrium is
\(\begin{align} 0 &= \sum_i \mathbb{E}_Q \left[ [[\omega \in E_j]] \left( W^i \frac{dP^i}{dQ}(\omega) - S^i \right)\right] \\ \sum_i W^i P^i (E_j) &= Q(E_j) \left( \sum_i S^i \right) \end{align}\)
To avoid arbitrage, it must be the case that \(\sum_i S^i = \sum_i W^i\), therefore we require that for all \(j\), \(Q(E_j) = \sum_i W^i P^i (E_j) / \left( \sum_i W^i \right)\). Now, in the limit of there being infinitely many infinitely small sets \(E_j\), all sets are just a union of some of the sets \(E_j\), and in general we will have \(Q(E) = \sum_i W^i P^i(E) / \left( \sum_i W^i \right)\). This is just like the discrete case: our market prices are exactly Bayes mixture probabilities, and as a result the wealth of each agent after the bets are paid will be proportional to their posterior credence in the mixture.
Finally, it’s worth noting something interesting that’s perhaps more obvious in this formalism than in others. Suppose we again have a single agent betting on which outcome would occur with the house, but instead of learning the outcome \(\omega\), the house and agent only learn that the outcome was in some event \(E\). In this case, the agent would have spent \(\mathbb{E}_Q [W \times dP/dQ(\omega) [[\omega \in E]] ] = W P(E)\) on contracts for outcomes in \(E\), and should presumably be paid the house’s posterior expected value of those contracts: $^$ \frac{\mathbb{E}_Q [W \times dP/dQ(\omega) [[\omega \in E]] ]}{Q(E)} = W \frac{P(E)}{Q(E)} $^$ Now, this is exactly what would have happened if the agent had been asked which of events \(E_1\) through \(E_n\) would occur: the agent would bet \(W P(E_i)\) on each event \(E_i\) and in the case that event \(E_j\) occurred, would be paid \(W P(E_j)/ Q(E_j)\). In the dartboard example, instead of learning which point the dart would land on, you only learned how many points the throw was worth and your bets only paid out accordingly, but it turned out that you bet optimally anyway, despite the payouts being different to what you thought they were. Therefore, Kelly betting has this nice property that you don’t need to know exactly what you’re betting on: as long as you know the space of possible outcomes, you’ll bet optimally no matter what the question about the outcomes is.