There is a population of 1 million people, of which some are infected by a disease. Each two weeks the following happens:
- Every infectious person has 50% chance to be cured.
- Each infectious person infects 2 other healty people (if healty people still exist).
What is the change the disease will eventually be eradicated?
A more interesting (and at the time of the writing of this blog, actual) way to look at this problem other than the abstract notion of coins in a jar, we can model as a pandemic spreading through a population.
Suppose a jar that can hold a million coins, with a certain number of coins in it. You can do the following procedure:
- Take out all coins
- Flip all of them
- Count number of heads
- Put 3x the number of heads back into the jar.
If the coins to not fit in the jar, we only leave a million, and toss the rest awayIf we start with a jar with a single coin in it, what is the chance that we eventually the jar will be empty.
I would like to note that this thought experiment, even tough -as we will see- will have some properties in common with how a real pandemic spreads, is a vast oversimplification and can in no way to be used to model a real pandemic. The interesting quirk of probability that is displayed here most certainly does not work in real life.
Infinite case
First, let us take a look at the infinite case, so where there is no limit on the size of the jar.
Expected value
An observation that can be made is that the expected value of the number of coins gets multiplied with 1.5 after each step. This might seem obvious, as for example if we have exactly 10 coins, it is easy to see that we will have an expected value of 15 coins next step, but the truth is even stronger, as for example, if we would have 50% change to have 6 coins, and 50% to have 14 coins, the expected value is which is still 15. It works in general for any given distribution, that after each step, the expected value will increase by 50%.
This means that the expected value will grow to infinity, and specifially, the expected value after steps is
Jar duplication
Imagine instead of a single jar, we have multiple jars that we apply the process to. So for each jar, we take out all the coins, flip them, count the heads, multiply by three, and put them back in the original jar. It is easy to see that this process is equivalent to combining all the different coins in a single jar, as the total number of coins added back in is equal to the total number of flipped heads.
This means that a jar with coins can be considered as jars, with each a single coin in it. The total jar will be empty only if every jar ends up empty, and all jars are independant.
So if the chance a jar with a single coin ends up empty is called , then the chance that a jar with coins ends up empty is , as it is the same as independant jars ending up empty.
Chance
As is a global independant property, it is conserved with our porcedure. After one step of our porcedure, we have 50% chance to become empty (so a chance of 1) and 50% to end up with three coins. This means that follows the following equation:
The equation has three solutions, , and , with the golden ratio. The last solution is smaller then zero, so that is impossible. It might be natural to scrap the solution of , and keep , and while it is correct, the mathematical reasons why this is actually are quite complex.
An important note to make is that most of the chance to empty the jar (or eradicate the disease) happens when the number of coins is low. For example, if there is one coin, there is already a 50% chance that the jar will end up empty after the first step, while the chances of ending up empty after any other number of steps is only about 23% chance of successfull eridication if it fails the first step.
Even worse, if we ever reach a million, the chance to ever empty the jar drops to , which equals about , so basically impossible.
Finite case
One might reasonably assume that, as most of the chances to empty the jar are contained in the low numbers, that adding a constraint to the maximum number of coins in the jar will only change the chance slightly. One would be wrong. To show this, let us look at a slight variant of the problem:
You have a jar that can hold a million coins. Each step you do the following procedure:
- Take out all coins
- Flip all coins
- If there is at least one head, refill the jar again up to a million coins.
It is easy to see that the jar will always contains more coins then in the original problem, so it is a stronger version, with a lower chance to eradicate.
Markov chain
The above problem can be represented by a Markov chain with two possible states: a million coins, or no coins. As the chance of going from a million coins to no coins is nonzero (But very small, about ) and constant, the chance to eventually have an empty jar follows from the following equation:
And even tough that the coefficient of is very small, it is nonzero, so has to be one. This means that even this extreme system, were a million coins are added, there is still a 100% chance (almost surely) To end up with zero coins. On the other hand, the expected number of steps has to be lower than .
Infimum
There is a chance of that the ininite system grows beyond one million (because it needs to not surely finish, and it had a chance of to not finish.) This means that this is also the chance that our finite system reaches a million. If it has reached a million, it has a chance of to get out of this state of a million. We can calculate p as follows:
is the chance that a million coins throw less then 333333 heads. This folows the binonial distribution, but is estimated really well in this region by the normal distribution with avarage 500000 and standard deviation of 500. The zscore of 333333 in this normal distribution is (333333-500000)/500= 333. The p value of the left tale is: which means that in 30% of the cases it takes at on avarage more than steps, not including the steps to get to one million, and the steps to get back to zero afterwards, and the large chance that when you go below a million, you will probably immedaitly get stuck again. This means that the expected value has to be bigger than steps.
Conclusion
The combination of this means that the expected number of steps falls between and . We could narrow this further down, but both numbers are so unphantomably huge, that there will be no point, just rest assured that even if there is certainty that the system will end up without any coins, it will take a very long time.