Word Gems: Mathematics: Monkeys Typing Shakespeare

home | what's new | other sites | contact | about

Word Gems

exploring self-realization, sacred personhood, and full humanity

Monkeys Typing Shakespeare

return to "Mathematics" main-page

Editor's note: The mathematics of probability inform us that randomness cannot account for the complexity we find in nature. Even a "simple" production of "monkeys typing the works of Shakespeare" would require far, far more time than the age of the universe. The forces of evolution are still in play but do not work according to the rules taught in high-school biology textbooks. (See the following, original article online at https://plus.maths.org/content/infinite-monkey-businesst)

Understanding uncertainty: Infinite monkey business

David Spiegelhalter and Owen Smith

Submitted by plusadmin on March 1, 2010

The idea that an infinite number of monkeys typing at random on an infinite number of typewriters will eventually produce the complete works of Shakespeare apparently dates from 1913, and has appeared repeatedly in popular culture ever since. When the BBC Horizon team decided to make a programme about infinity, they contacted us about simulating the monkeys. We said we needed a program to churn out random letters and match them to Shakespeare, and so they commissioned a Monkey Simulator program from Aaron Russell, which is available from this website.

The Monkey Simulator program generates random symbols from a list of 31 options: 26 lower-case letters, a space, a comma, a full stop, a semicolon and a hyphen. After a sequence of four symbols has been generated, the program searches for a match in a stored plain-text version of the Complete Works of Shakespeare, ignoring whether a letter is capital or lower-case. If the procedure finds one or more matches, it generates a further character and again checks if there is a match for five symbols, and so on, until no matches are found. Then it starts again with a new sequence of four characters. Characters are generated at a rate of 50 per second. (We note that this procedure cannot match sequences that stretch over the 36,357 line returns in the text file, and also cannot match all the other punctuation in the text, such as the 10,475 question marks and 8,827 exclamation marks.)

A screenshot from the Monkey Simulator programe

The image to the right shows the situation after the program has generated the sequence "y sak" and matched it with the phrase "and for my sake even so doth she abuse me", which comes from Sonnet 42. In fact, this was only one of 37 matches of this particular sequence in the whole of the Complete Works.

The image shows that after 113 million monkey seconds (or around 26 days for 50 monkeys typing one character a second) the longest match was "we lover", which matched with the speech by Boyet, "With that which we lovers entitle affected", in Love's Labours Lost, Act 2 scene 1.

The claim is that, given enough time, such a program would generate the entire Complete Works. What is the chance of this happening for any particular generated sequence? The sequence would have to start with the first character (the first "T" in "The complete works...") and continue through around 5,000,000 characters (including spaces) until it reaches the end.

Let's be generous to the poor monkey and ignore the fact that there are lower and upper case characters and various extra bits of punctuation, so there is one of only 31 characters in each position. Each character typed therefore has a 1 in 31 chance of being the right one, so the chance that the first 2 are correct is 1 in 31 × 31, which equates to 1 in 961. The chance that they are all right, from beginning to end, is 1 in 31^5,000,000. This is approximately equal to 1 in (10^1.5)^5,000,000 = 10^7,500,000. So each time the monkey starts typing, there is a chance of p=1/10^7,500,000 of completing Shakespeare. This probability is rather small but it is finite. Since 10^7,500,000 2^25,000,000, it is about the same chance as flipping a fair coin 25,000,000 times and it coming up heads every time.

Alternatively we can think of the National Lottery, where the chance of winning the jackpot is around 1 in 14,000,000 10^7.1. So the tiny chance p is equivalent to winning around 1,000,000 lotteries in a row, since 10^7,500,000 is around 10^7.1 to the power 1,000,000. If you bought one lottery ticket a week, this is like winning every week for 20,000 years. Of course in real life people might start getting suspicious.

It's just a matter of time...

Even though the event of one monkey producing Shakespeare when randomly typing 5,000,000 characters has such a microscopic probability, we are "certain" of it eventually occurring in exactly the same sense that we are "certain" that if we repeatedly flip a fair coin, it will eventually come up heads. This will take on average two flips, just as the time to get the first double-six when throwing two dice is on average 36 throws. So we expect to see Shakespeare after an average of 1/p tries by the monkey, which of course is a very long time. Even with our program, which generates 50 characters a second, corresponding to 50 monkeys typing slowly or one typing incredibly fast, we would not expect to get Shakespeare until well after the Universe has come to an end.

This is an average, but how long might we actually have to wait? Waiting times have what is known as a geometric distribution. If p is the probability of the event of interest, and there are an unlimited number of independent opportunities for that event to occur, then the chance that the event happens at exactly the n^th attempt is the chance that it does not happen for (n-1) attempts (that is, (1-p)^n-1) times the chance that it finally does happen at the n^th try (which is p), to give the total probability as:

Prob (First event happens at attempt n) = (1-p)^n-1p.

The chance that the event does not happen at attempt n or before is the probability of n failures in a row, or (1-p)ⁿ. This means that the chance that the event happens by attempt n, that is, at n or beforehand, is

Prob (First event happens at or before attempt n) = 1 - (1-p)ⁿ.

As long as p is not zero, this number can be made as small as you like by increasing the number n of attempts. This is essentially a special case of the law of large numbers, which (very) roughly says that things tend to average out in the end. After the Horizon programme was aired I had an email discussion with a philosopher who argued that it was not logically certain that the monkeys would eventually type Shakespeare, which is true, but it is probabilistically certain, which is good enough for me.

We can easily work out how long we expect to wait to have, say, a 99% chance of generating Shakespeare. Suppose we want a chance F of being finished by attempt n. Then F = 1 - (1-p)ⁿ, and so rearranging terms we get

where the logarithms can be to any base, but we shall assume natural logarithms to base e. For small p we can get a convenient approximation, since and so

We can express this in an even simpler way. Suppose p = 1/T, that is there is a 1 in T chance of the event occurring. And suppose we want to be really sure of observing the event, so that F = 1 - 1/K where K is large. That is, we want there to be only a 1 in K chance of still being waiting after time n. Then

So if you have an event with a 1 in 1000 chance of occurring, and you want to have only a 1 in 100 chance of still waiting by time n, then set n to be 1,000 log(100) = 4,600. This means that there is a 99% chance that the event will have occurred before 4,600 attempts.

We can use the exact and approximate formulas to generate the following table for different events:

		Coin coming up heads p=0.5000, T=1/p=2	Two 6s from 2 dice p=0.028, T=1/p=36
Desired probability F of success	K=1-1/F	Number of attempts n	Number of attempts n
50%	2	1	25
90%	10	3	82
99%	100	7	163
99.9%	1000	10	245
99.99%	10000	13	372
		Any 17 character sequence from Shakespeare being produced by the monkeys p=2.2×10^-19, T=1/p=4.5×10¹⁸
Desired probability F of success	K=1-1/F	Number of attempts n	Billion years
50%	2	3.1×10¹⁸	2.0
90%	10	1.0×10¹⁹	6.6
99%	100	2.1×10¹⁹	13.2
99.9%	1000	3.1×10¹⁹	19.8
99.99%	10000	4.2×10¹⁹	26.3

The number of attempts required to have a specified probability of success when repeatedly trying to observe an event with probability p = 1/T. For small p and F near 100%, n≈T log K.

So to be 99% sure of observing at least one double-six when throwing two dice, you will have to plan to make 163 throws. And to get a particular sequence of 17 characters right, for example "to be or not to b", would require an event with chance 1 in 31¹⁷ 2.3 × 10²⁵. This is for a particular sequence: there are around five million 17-letter sequences in Shakespeare. So to get any 17-letter sequence right is an event with probability 1 in 4.5 × 10¹⁸. Our program types characters at 50 a second, so assuming the program generates a continuous sequence of characters, we would expect to wait around 2.9 billion years, or to be 99% sure, 13.2 billion years — as long as the time since the Big Bang signalled the start of our Universe.

Theory does not always meet practice. A wonderful arts project in Paignton Zoo put a computer in a monkeys' enclosure to see how they got on. The monkeys typed 5 pages, mainly consisting of the letter "s", and then used the keyboard as a toilet. So rather a limited output of classic English literature, which just shows the problems that turn up when maths meets the real world.

The image of monkeys and typewriters is powerful and will keep on being used whenever we need an image of an apparently impossible event made feasible by thinking on a large enough scale. I think it provides a fine incentive for analysing waiting times for rare events, but does not really give us any insight into what forms of infinity play out in the physical universe. But who cares about that anyway?

Editor's last word: