In the last edition of this letter, I presented a bonus puzzle.
Last week our village had a fête. One of the competitions on offer was to guess the number of balls in a bag. There were N balls in the bag, and they were numbered 1, 2, 3, …, N. To help competitors make a sensible guess, they were allowed to take out four balls and note the numbers on them.
When I took part, I pulled out balls numbered 24, 87, 14 and 35.
There is a prize for the person who guesses the correct number exactly. How many balls should I estimate are in the bag?
I didn’t know what was
Significance magazine‘s solution is, but here’s an approach. It’s called the german tank problem.
During World War II, the Allies needed a method to know how many tanks the Germans had. The spies weren’t reliable; some said thousands, some said hundreds. The problem came to statisticians. All the “data” they had was the numbers painted on tanks. Like 24, 87, 14 and 35. Statisticians assumed these were serial numbers and designed a simple formula to estimate the maximum.
N = m + (m-k)/k - 1
Initially, our best estimate was 87 — the sample maximum m, which is also the maximum likelihood estimate. But since we know it is not the best estimate, statisticians made adjustments. If we know k numbers and we assume the numbers we saw so far are equally spaced, we can estimate what could be the following number, right?
That’s what the second term in the formula did. And they were surprisingly accurate. My estimate for the village fete? 107. It was wrong though.