Monday, February 9, 2009

30% of all numbers start with 1

Well, not exactly. However, it is true that in certain common types of data, numbers beginning with 1 appear the most frequently, making up about 30% of the values. Numbers beginning with 2 appear slightly less frequently, about 18% of the time. Each successive digit appears with a lower frequency, until 9 shows up as the leading digit in less than 5% of the values. This property is called Benford's Law, after physicist Frank Benford.

This is quite a counter-intuitive thing. Why should there be more bank balances starting with a 1 than with a 9? Six times more of them, in fact. Why should this be true for the lengths of all the rivers in the world? The technical explanation for this is that these types of real world values are distributed logarithmically. For a more intuitive explanation, an example is probably more helpful.

Let's say you invest $100 in an account that pays 10% annually. This means that your investment will double every 7.3 years. The investment will reach $200 after 7.3 years, so for that entire first 7.3 years the investment's value began with a 1. Now, it will take another 7.3 years to double again. However, this time it's doubling to $400, not $300. The investment was valued in the $100s for the same amount of time it was valued in the $200s and $300s. The point is that the investment is growing at a rate proportional to its own size. It's compounding. The investment only spends 4 years or so in the $200s, 3 years in the $300s, and so on, finally breezing through the $900s in just over a year. Then this repeats for all the 4-digit values: 7.3 years in the $1000s, 4 in the $2000s, etc.

The reason I've been thinking about this topic recently is that it's been mentioned in relation to Bernie Madoff's ponzi scheme. Benford's Law is a good tool for detecting fraudulent data, because people faking such data often don't take it into account. Apparently, Madoff was sophisticated enough to generate numbers that met Benford's Law reasonably well.

As a quick real-world test of this, I thought I'd check the sizes of all the files on my computer's hard drive. The results seem to fit the Law's prediction quite well:











Digit# Files% FilesBenford
148,29528.6%30.1%
232,89319.5%17.6%
323,29713.8%12.5%
415,9259.4%9.7%
512,6127.5%7.9%
610,6656.3%6.7%
78,4385.0%5.8%
89,9725.9%5.1%
96,4983.9%4.6%


Wikipedia Link: Benford's Law

3 comments:

  1. It disturbs me to admit this, but I am completely nerded-out by this post. Completely.

    ReplyDelete
  2. Not only nerded-out, but out-nerded as well :).

    ReplyDelete
  3. Rob...

    I LOVED this post. I was smiling the whole time I read it. You can be assured that I will be passing it along to someone (whom I think would appreciate it) tonight; giving you full credit of course.

    Bestest,
    ...Dave Aring

    ReplyDelete