« Home alone | Main | Review: Torchwood, season 1 »

Math puzzle

| 3 Comments

When Randall Munroe spoke at Google a couple weeks ago, he revealed the answer to a question I've been wondering about for a while: what does "xkcd" stand for?

Turns out that he chose it years ago as a unique identifier for himself, then later applied it to the webcomic.

This morning, that connected to a thought about my own quasi-unique online identifier: I often use "elysdir" (my middle name) as a username. Only when I Googled it the other day, I discovered that someone else (who claims to be, and may well be, an 84-year-old woman) uses it as an ID on MySpace. (And someone used it as an AIM ID long ago, though it's possible that was me and I just forgot the password.)

Anyway, so then I wondered: how long a string of non-accented modern English lowercase letters do you need if you want to assign a unique alphabetic ID of uniform length to every person in the US? That is, every person in the US gets a unique n-letter string; what's the minimum n that provides enough unique strings for everyone (and will continue to provide enough strings for at least another ten years or so)?

It would be easy to figure this out with a calculator, but I didn't have a calculator handy (I was in the shower). So, the puzzle (which is fairly easy) is to come up with an easy way to estimate the answer without needing a calculator and without having to do a lot of complicated multiplication in your head.

Ignore practical, social, privacy, and other constraints; this is just a math puzzle.

Hover your mouse pointer over this sentence for a hint.

(That pop-up hint may not work in all browsers; sorry.)

P.S. added later: It turns out that I was misremembering the current number of people in the US, and the difference between what I thought it was and what it really is does change the actual answer, and makes it much harder for the estimate to match the answer. (Which is to say, it turns out we're currently just under the boundary between n and n+1 letters being needed, in such a way that my estimation method gives the answer n+1.) But in my original formulation of the question (in my head), I wanted to allow for future population growth, so I've now added a clause to the puzzle statement above to allow for ten years of projected US population growth, which resolves the boundary issue.

3 Comments

I was jere7my before the internets, but it turned out to have been forethought, in hindsight.


Here's the answer to the puzzle, in case anyone was wondering:

There are 26 letters in the English alphabet. That's not much more than 25.

100 = 25 x 4.

Therefore, a power of 25 is a power of 100 divided by a power of 4; which is to say, an even power of 10 divided by an even power of 2.

So find the power of 100 that's greater than the number of people in the US--which I mis-estimated as 350 million, but is actually more like 300 million at the moment.

100^4 = 100,000,000, which obviously isn't enough. Try 100^5 = 10,000,000,000.

And 4^5 = 2^10, which just happens to be a well-known number in computers: it's 1024, or about 1000.

So 25^5 = (100^5)/(4^5), or about 10 billion divided by 1000, or about 10 million.

Which isn't quite enough. Multiply again by 25 to get 25 ^ 6, about 250 million, which still isn't enough. But it's almost enough, so it's clear that 25 ^ 7 would be much bigger than 350 million, so the estimated answer by this method is 7, which means I get to keep "elysdir" as my unique identifier. :)

As it turns out, the estimates (especially the "25 is almost like 26" one) throw things off just enough to be a problem: in fact, a calculator reveals that 26^6 = 308,915,776, which is a little more than the current US population, so in fact right now we'd only need 6 letters. But in ten years, the population will be roughly 320 or 330 million, according to various projections, for which we'd need 7 letters.


If you look at it as a practical matter, rather than as a math puzzle, you have to knock out a lot of seven-letter combinations. Nobody is going to want to have the unique identifier BASTARD, or if somebody does, that person will not be allowed to have it. Or BIGTITS. Or NOBALLS. And not only do you have to take out AFUCKER but SHITHED and ASHITHD and SHTHEAD. In fact, I think you have to take out SQFUCKP and VFUCKRL, which brings it a little closer to the math puzzle: if you have to remove all the combinations that have as subsets any of (lessee, f, s, t, c, maybe h, p, another c, oh, hell with it) ten or so four-letter words, does that bring the total too low? And, again in real life, somebody is going to have to actually look at every one of the three hundred million combinations, and it would be good if it were a polyglot with a dirty mind.

Thanks,
-V.


Post a comment