Saturday, May 21, 2011

Introducing Clark Robert Eisler

Clark Robert Eisler, born 1:21 am May 17, weighing 6lb 11oz.

Thursday, February 17, 2011

Using a SQL Table of Numbers

An application that we work with has certain fields where the user is able to select multiple values for a single record. For example, for a contact's interests, the user might select Golf, Poker, and Tennis, and for a contact's groups, the user might select Customer and Friend. These values are stored as a list in the appropriate contact field, as in:
CustID  CustName       GroupList
1 Joe Smith Customer,Reseller
2 John Doe Competitor,Prospect,Reseller
3 Jane Doe Customer,Friend
4 Guy Incognito Competitor
This works well from a data-entry perspective, but from our point of view in reporting, it causes some trouble. First, filtering on these values is a bit of an issue. If we want to find everyone who is marked as a Customer, we need to use a contains operation, rather than equals. With this, we need to be sure to account for the case where one of the options is a substring of another. If one of our group options is Former Customer, we don't want it showing up in a search for Customer. This is easy enough to work around, but it is an extra step to deal with.

The larger issue is that users will often want to use these fields to group or summarize a report. In this case, the only real solution is to split the list of values for each contact, and treat them as a child table.

There are a number of approaches that will accomplish this. We can scan through the contact table, creating child records for each contact one by one. We can create a few select statements to retrieve each contact's first group, second group, etc, and then union them together. However, a simpler approach is to use a table of numbers.

A table of numbers is just that - a table containing a list of numbers. For our example, we only need a few:
num
1
2
3
4
...
That's it - probably the simplest table you'll ever use. With this table, splitting our value lists to create a child table is easy.
select cust.custid,
padr( GetWordNum( cust.grouplist, ton.num, "," ), 20 ) as OneGroup
from cust inner join ton on GetWordCount( cust.grouplist, "," ) >= ton.num
custid  onegroup
1 Customer
1 Reseller
2 Competitor
2 Prospect
2 Reseller
3 Customer
3 Friend
4 Competitor
This is made particularly easy by VFP's GetWordNum and GetWordCount functions, but a similar idea can be used without these functions.

Another trick with a table of numbers is to "fill in" a range of discrete values, such as a range of dates. Consider the following table of sales totals.
dDate        amount
02/15/2011 1500.00
02/16/2011 4000.00
02/18/2011 750.00
This table might be the result of aggregating the sales records for the week of Feb. 13 - 19. Dates where there were no sales do not show up, but we might want to include them with a zero in the amount field. A table of numbers makes it simple to generate the list of dates:
dStart = {^2011-02-12}
dEnd = {^2011-02-19}
nDays = dEnd - dStart

select dStart + ton.num as dDate from ton where ton.num <= nDays
We simply outer join this to our sales table to complete our date range.
select AllDates.dDate, nvl( sales.amount, cast(0 as Y) ) as amount
from ( select dStart + ton.num as dDate from ton where ton.num <= nDays ) as AllDates
left outer join sales on AllDates.dDate = sales.dDate

dDate amount
02/13/2011 0.00
02/14/2011 0.00
02/15/2011 1500.00
02/16/2011 4000.00
02/17/2011 0.00
02/18/2011 750.00
02/19/2011 0.00

Wednesday, February 16, 2011

Super Bowl Notes

A couple items came up during the Super Bowl that touch on posts I've made here in the past. First, it was repeatedly mentioned during the game broadcast that although Green Bay ended up with a somewhat mediocre record of 10-6, all their losses were extremely competitive. Green Bay never trailed by more than 7 points during any game at any point in the season, and their 6 losses were by 3, 3, 3, 3, 4, and 4 points, for an average of 3.33. This average is one of the all-time lows, and compares well to the teams mentioned in my Close But No Cigar post from a year ago.

The second item has to do with the endgame strategy surrounding Green Bay's field goal attempt at the end of the game. Holding a three point lead, Green Bay had fourth down from the Steelers' 5 yard line with just over 2 minutes left in the game. This is a classic situation where statisticians claim teams are routinely too conservative. Kicking the field goal hands the ball back to Pittsburgh, while going for it either wins the game with a touchdown, or pins Pittsburgh deep in their own end of the field.

That kind of analysis is not too unusual; similar arguments about 4th downs come up many times each season. However, on the Advanced NFL Stats site, an interesting addition to this argument appeared. That site has a "win probability" engine - for any given game situation, it provides the probability of each team winning, based on historical game outcomes from similar situations. For the game situation after the Green Bay field goal (~2 minutes left, in their own end of the field, down 6), the trailing team is expected to win 25% of the time. For the same situation, but with the team trailing by 3 instead of by 6, they are expected to win 21% of the time.

This looks like something along the lines of my 13 is worth more than 14 post from 2009. How can it be that teams trailing by 3 in this situation are less successful than teams trailing by more? It may be that teams trailing by 3 put too much value on reaching overtime, so they play to tie the game rather than to win it outright. When teams are further behind, they are forced to avoid this bad strategy. Also, on the other side of the ball, defenses may play differently as well, when they feel they have a safer lead. This may also be a poor strategy.

In any case, it's another interesting example of a counterintuitive statistic.

Monday, February 14, 2011

Owen

Here are a few recent pictures of Owen.

Watching the Super Bowl.

Showing mom his iPad skills.

Playing ball on Christmas.

Saturday, January 22, 2011

Reading List Again

Owen decided to get up at 3:30 this morning, so I thought I might as well write a post while I watch him play. It's a little early to try to write anything complicated, so I'll just do a reading list again.

Currently Reading:

The Drunkard's Walk, Leonard Mlodinow
This is a nice non-mathematical discussion of randomness and probability. It has a good description of a few of the common counterintuitive probability questions, like the Monty Hall Problem and the Two Child Problem.

Anton, Dale Eisler
See the previous entry.

The Princeton Companion to Mathematics
Ditto.

Recently Read:

American Gods, Neil Gaiman
Great. I really enjoyed this one.

Gödel's Proof, Ernest Nagel and James Newman
Kurt Gödel's Incompleteness Theorem is a great landmark in mathematical thought. While Gödel's actual paper is quite technical, the ideas are not too difficult to understand without getting too deep into the technical details. This book is a nice explanation of Gödel's proof and its implications.

In the "To Read" Pile:

A Spark at the End of Summer, David Glen Kerr
This is the first book in the Creation Myth series. I went to school with the author, and I'm looking forward to getting into this one.

Neverwhere, Neil Gaiman
Room, Emma Donoghue
Full Dark No Stars, Stephen King
Lisey's Story, Stephen King
Shalimar the Clown, Salman Rushdie
A Mathematical Nature Walk, John Adam

Also, recently I've been reading a lot of Curious George for some reason :).