Friday, May 29, 2009

13 is worth more than 14

In the NFL it is, anyway. Maybe.

I read a fair bit about sports, and in particular I have an interest in the statistics that people use to try to analyze them. One thing I've heard a few times as an example of a counter-intuitive stat is this: NFL teams scoring 13 points in a game win more often than teams scoring 14 points. I've recently come into possession of a database of game data, so I thought I'd have a look at this for myself.

My data contains all the regular-season and playoff games back to the 1978 season, so it's a pretty good sample size of about 7500 games. The first item to look at is the 13 vs. 14 thing, and sure enough:

13: 225-562-2 28.6%
14: 144-670-2 17.8%

There you have it - teams scoring 13 win significantly more often than teams scoring 14. Of course, the real question is what (if anything) this means. The most likely cause for this effect is the unusual way points are scored in football. Almost all scoring is through 3-point field goals and 7-point touchdowns. This means that 13 can really be thought of as 2 field goals and 1 touchdown, and 14 as 2 touchdowns. Maybe field-goal-heavy scores outperform touchdown-heavy scores in general. Let's see:

6: 18-307-0 5.5%
7: 16-621-2 2.7%

16: 249-256-0 49.3%
19: 160-129-0 55.4%
20: 543-432-2 55.7%
21: 308-405-0 43.2%

27: 568-176-0 76.3%
28: 328-152-2 68.3%

That certainly seems to support the FG vs. TD explanation, and in fact it's quite striking how poorly the multiples of 7 perform. 7 points wins 3% of games; it performs worse than 5, 6, 8, and 9 points. 14 wins 18%; it performs worse than 11, 12, and 13. 21 points is the highest score that loses more than half its games, and it performs worse than 16 (!). 28 does worse than 23, and so on.

The FG vs. TD explanation makes sense for a few reasons. First, the time one team is scoring is time that the other team isn't. 3 successful possessions, for a TD and 2 FGs, will generally take more time than 2 successful TD possessions, leaving the opponent with less time to score their own points. Second, teams that are trailing by a large amount won't try for field goals. That is, a team losing 20-7 will have to go for a touchdown, while a team losing 10-7 is more likely to take a field goal. Finally, there may be game conditions making certain games conducive to more field goals. For instance, a game with heavy snow or fog might reduce offense, causing both teams to score few TDs.

I must admit, though, that this effect doesn't last forever. Teams scoring 49 or 56 points have won 100% of their games since 1978. I guess the lesson is that if you're going to score touchdowns, you should try to score 7 or 8 of them, not just 2 :).

Monday, May 25, 2009

FileSystemWatcher

I recently came across the interesting FileSystemWatcher class in the .NET Framework. It's pretty cool; the class will watch a folder, and raise events when there are changes to the files in that folder. You can specify a filter (like "*.txt") to only watch for certain files, and react when files are created, deleted, and modified.

It's pretty easy to come up with possible uses for this class. Maybe you have an old process somewhere that produces data files at irregular intervals; you could watch the output folder, and immediately act whenever one of those files is added. You could create a kind of auto-publishing system; let your users know that any file they save in a certain folder will automatically be posted for them. You could set up a mechanism for communicating between processes.

This last one is something that I've seen done before in VFP. A timer is set up to repeatedly check a certain location for a "message" file from another process, and then the app can react accordingly. A FileSystemWatcher makes this kind of setup simple - just set properties specifying the files to look for, and the file system events to watch.

Implementing this is straightforward as well. Create an instance of the class:

FileSystemWatcher fsw = new FileSystemWatcher();

Set its properties, and hook up an event handler:

fsw.Path = "C:\\SomeFolder\\WatchFolder\\";
fsw.Filter = "*.txt";
fsw.NotifyFilter = NotifyFilters.LastWrite;
fsw.Changed += new FileSystemEventHandler(this.OnFileChange);


Write the handler to do the work:

private void OnFileChange(object source, FileSystemEventArgs e)
{
// ...
}

Thursday, May 14, 2009

Interesting Numbers

There's a well-known (to mathematicians) story about a famous mathematician, Srinivasa Ramanujan. The story goes that Ramanujan was taking a taxi ride with another mathematician, Godfrey Hardy. Their taxi was number 1729, and Hardy commented that this was rather an uninteresting number. Ramanujan replied that it was in fact quite interesting, as it is the smallest whole number expressible as the sum of two cubes in two different ways.

Sure enough, it's true: 1729 = 103 + 93 = 123 + 13. This is the kind of thing mathematicians love, and Ramanujan is very well-respected, so this story is popular, and 1729 has even come to be known as the Hardy-Ramanujan number. Mathematicians also enjoy generalizing, so there's now a whole set of taxicab numbers, having to do with summing up powers like the cubes in this example.

There's another thing about this story that I like to think about - the notion of an interesting number. It seems an obvious and expected thing that some numbers are interesting and others aren't; there's even a Book of Curious and Interesting Numbers. This fact about 1729 makes it interesting (at least to me), and I can imagine some other number that has no similar interesting properties.

However, something odd happens if you try to actually find an uninteresting number. Let's just look at whole numbers, starting with zero. 0 is interesting because it is the additive identity, among other reasons. 1 is interesting because it's the multiplicative identity. 2 is the first prime number. 3 is the first odd prime number. 4 is 22. We can continue this way until we find our first uninteresting number. But wait! The first uninteresting number seems like an interesting property for our number to have, so it turns out to be interesting after all.

This seems like a bit of a trick, and maybe it is. There's some kind of paradox or weird self-reference at work here; our decision that a number is uninteresting is what makes it interesting. You can come up with more tricks like this without too much trouble. Here's another quick one, just to make the point: what is the smallest whole number that is not describable in twenty or fewer words?

With 20 words, you can describe a lot of numbers. One hundred; fifty million and three; three googol squared; Steve Wozniak's bank balance. However, there aren't an infinite number of english words, so there aren't an infinite number of 20-english-word combinations. That means there are more whole numbers than 20-word combinations, so some numbers are not describable in 20 words, and there must be a smallest one of these. But wait! The smallest whole number that is not describable in twenty or fewer words is only 13 words long, so this number is describable in less than 20 words after all.

Ok, it's another nice little trick. These are fun little diversions, and they've been known for a long time. They just seem like curiosities, though, without much real meaning or importance in more concrete matters. You wouldn't think that this idea of self-reference could undermine the foundation of all mathematical thought. It did, but that's a topic for a future post.

Friday, May 1, 2009

100% of all numbers contain a 3

Or: Infinity Weirdness, Part 1.

What percentage of all whole numbers contain at least one digit 3? It seems like a simple enough question. The simplest way to start trying to answer it is to have a look at some numbers, and do some counting.

Let's look at 1-digit numbers first. There are 10 of them: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. Only one of these 10 contains a 3, so that's 10%.

Now, let's extend our list to the first 100 whole numbers: 0 through 99. We know that 10% of the numbers from 0 to 9 contain a 3. It's the same 10% with numbers from 10 to 19, because they are just the numbers from 0 to 9 with a 1 attached. Adding a 1 doesn't affect our count of numbers with 3s. Similarly, our count is 10% for 20-29, 40-49, 50-59, and so on. The interesting case is 30-39; obviously, all 100% of these numbers contain a 3. Taken together, our count looks like this:

0 - 9: 10%
10 - 19: 10%
20 - 29: 10%
30 - 39: 100%
40 - 49: 10%
...
90 - 99: 10%

We have 9 sets at 10%, and 1 set at 100%. Average this out, and we find that 19% of the numbers from 0 to 99 contain at least one 3.

The way this step from 1-digit numbers to 2-digit numbers worked gives us an insight into how this works generally. When we add a digit, we're adding each of the digits 0 - 9 to all of our existing numbers. 9 of these 10 digits (0-2, 4-9) have no effect on the count of 3s, and the last digit (3) creates a 100% count.

This means that if X is the fraction of n-digit numbers containing a 3, then the fraction of n+1-digit numbers containing a 3 is given by: 0.9X + 0.1. In more formal notation, this is a recursion where:

Fn+1 = 0.9Fn + 0.1
F1 = 0.1

This can be expressed as a closed form in the following way (this can be shown with a small induction proof):

Fn = 1 - 0.9n

That's fine; the first few values of this are 10%, 19%, 27.1%. But we're interested in all the whole numbers, and the limit as n goes to infinity here is 100%. So we end up with an odd conclusion: 100% of all whole numbers contain a 3, even though not all whole numbers contain a 3. It's an important difference when dealing with infinite sets - 100% and all don't mean the same thing.