Saturday, May 21, 2011

Introducing Clark Robert Eisler

Clark Robert Eisler, born 1:21 am May 17, weighing 6lb 11oz.

Thursday, February 17, 2011

Using a SQL Table of Numbers

An application that we work with has certain fields where the user is able to select multiple values for a single record. For example, for a contact's interests, the user might select Golf, Poker, and Tennis, and for a contact's groups, the user might select Customer and Friend. These values are stored as a list in the appropriate contact field, as in:
CustID  CustName       GroupList
1 Joe Smith Customer,Reseller
2 John Doe Competitor,Prospect,Reseller
3 Jane Doe Customer,Friend
4 Guy Incognito Competitor
This works well from a data-entry perspective, but from our point of view in reporting, it causes some trouble. First, filtering on these values is a bit of an issue. If we want to find everyone who is marked as a Customer, we need to use a contains operation, rather than equals. With this, we need to be sure to account for the case where one of the options is a substring of another. If one of our group options is Former Customer, we don't want it showing up in a search for Customer. This is easy enough to work around, but it is an extra step to deal with.

The larger issue is that users will often want to use these fields to group or summarize a report. In this case, the only real solution is to split the list of values for each contact, and treat them as a child table.

There are a number of approaches that will accomplish this. We can scan through the contact table, creating child records for each contact one by one. We can create a few select statements to retrieve each contact's first group, second group, etc, and then union them together. However, a simpler approach is to use a table of numbers.

A table of numbers is just that - a table containing a list of numbers. For our example, we only need a few:
num
1
2
3
4
...
That's it - probably the simplest table you'll ever use. With this table, splitting our value lists to create a child table is easy.
select cust.custid,
padr( GetWordNum( cust.grouplist, ton.num, "," ), 20 ) as OneGroup
from cust inner join ton on GetWordCount( cust.grouplist, "," ) >= ton.num
custid  onegroup
1 Customer
1 Reseller
2 Competitor
2 Prospect
2 Reseller
3 Customer
3 Friend
4 Competitor
This is made particularly easy by VFP's GetWordNum and GetWordCount functions, but a similar idea can be used without these functions.

Another trick with a table of numbers is to "fill in" a range of discrete values, such as a range of dates. Consider the following table of sales totals.
dDate        amount
02/15/2011 1500.00
02/16/2011 4000.00
02/18/2011 750.00
This table might be the result of aggregating the sales records for the week of Feb. 13 - 19. Dates where there were no sales do not show up, but we might want to include them with a zero in the amount field. A table of numbers makes it simple to generate the list of dates:
dStart = {^2011-02-12}
dEnd = {^2011-02-19}
nDays = dEnd - dStart

select dStart + ton.num as dDate from ton where ton.num <= nDays
We simply outer join this to our sales table to complete our date range.
select AllDates.dDate, nvl( sales.amount, cast(0 as Y) ) as amount
from ( select dStart + ton.num as dDate from ton where ton.num <= nDays ) as AllDates
left outer join sales on AllDates.dDate = sales.dDate

dDate amount
02/13/2011 0.00
02/14/2011 0.00
02/15/2011 1500.00
02/16/2011 4000.00
02/17/2011 0.00
02/18/2011 750.00
02/19/2011 0.00

Wednesday, February 16, 2011

Super Bowl Notes

A couple items came up during the Super Bowl that touch on posts I've made here in the past. First, it was repeatedly mentioned during the game broadcast that although Green Bay ended up with a somewhat mediocre record of 10-6, all their losses were extremely competitive. Green Bay never trailed by more than 7 points during any game at any point in the season, and their 6 losses were by 3, 3, 3, 3, 4, and 4 points, for an average of 3.33. This average is one of the all-time lows, and compares well to the teams mentioned in my Close But No Cigar post from a year ago.

The second item has to do with the endgame strategy surrounding Green Bay's field goal attempt at the end of the game. Holding a three point lead, Green Bay had fourth down from the Steelers' 5 yard line with just over 2 minutes left in the game. This is a classic situation where statisticians claim teams are routinely too conservative. Kicking the field goal hands the ball back to Pittsburgh, while going for it either wins the game with a touchdown, or pins Pittsburgh deep in their own end of the field.

That kind of analysis is not too unusual; similar arguments about 4th downs come up many times each season. However, on the Advanced NFL Stats site, an interesting addition to this argument appeared. That site has a "win probability" engine - for any given game situation, it provides the probability of each team winning, based on historical game outcomes from similar situations. For the game situation after the Green Bay field goal (~2 minutes left, in their own end of the field, down 6), the trailing team is expected to win 25% of the time. For the same situation, but with the team trailing by 3 instead of by 6, they are expected to win 21% of the time.

This looks like something along the lines of my 13 is worth more than 14 post from 2009. How can it be that teams trailing by 3 in this situation are less successful than teams trailing by more? It may be that teams trailing by 3 put too much value on reaching overtime, so they play to tie the game rather than to win it outright. When teams are further behind, they are forced to avoid this bad strategy. Also, on the other side of the ball, defenses may play differently as well, when they feel they have a safer lead. This may also be a poor strategy.

In any case, it's another interesting example of a counterintuitive statistic.

Monday, February 14, 2011

Owen

Here are a few recent pictures of Owen.

Watching the Super Bowl.

Showing mom his iPad skills.

Playing ball on Christmas.

Saturday, January 22, 2011

Reading List Again

Owen decided to get up at 3:30 this morning, so I thought I might as well write a post while I watch him play. It's a little early to try to write anything complicated, so I'll just do a reading list again.

Currently Reading:

The Drunkard's Walk, Leonard Mlodinow
This is a nice non-mathematical discussion of randomness and probability. It has a good description of a few of the common counterintuitive probability questions, like the Monty Hall Problem and the Two Child Problem.

Anton, Dale Eisler
See the previous entry.

The Princeton Companion to Mathematics
Ditto.

Recently Read:

American Gods, Neil Gaiman
Great. I really enjoyed this one.

Gödel's Proof, Ernest Nagel and James Newman
Kurt Gödel's Incompleteness Theorem is a great landmark in mathematical thought. While Gödel's actual paper is quite technical, the ideas are not too difficult to understand without getting too deep into the technical details. This book is a nice explanation of Gödel's proof and its implications.

In the "To Read" Pile:

A Spark at the End of Summer, David Glen Kerr
This is the first book in the Creation Myth series. I went to school with the author, and I'm looking forward to getting into this one.

Neverwhere, Neil Gaiman
Room, Emma Donoghue
Full Dark No Stars, Stephen King
Lisey's Story, Stephen King
Shalimar the Clown, Salman Rushdie
A Mathematical Nature Walk, John Adam

Also, recently I've been reading a lot of Curious George for some reason :).

Thursday, August 19, 2010

Reading List Update

I posted a reading list a few months ago, and I thought I'd post an update.

Currently Reading:

Anton, Dale Eisler
This is my uncle's third book and first novel. It's a fictionalized account of the events surrounding my grandmother's family during the Bolshevik Revolution after the first world war - Anton, the main character, is based on my great-uncle Tony. The full synopsis and other information is given on the book's website.

I'm a couple chapters in, and enjoying the writing so far. It's a bit tough for me to read, since it's very personal for me, and there are some really tragic events that happen in the story. It's going to be an interesting experience to read this.

The Princeton Companion to Mathematics
I've read a few sections here and there, which have been uniformly excellent. It's quite slow going, though, as the topics require full attention and can't really be read casually.

The Dark Tower: The Gunslinger Born
I've only read the first chapter of this, but I'm enjoying it so far.

Recently Read:

Logicomix, Apostolos Doxiadis & Christos Papadimitriou
This is a book that I almost can't believe exists - it's a graphic novel about the life of Bertrand Russell and his involvement in the development of the modern philosophy of logic and mathematics. It's difficult to describe, so I'll just link to the Amazon page.

Under the Dome, Stephen King
I thought this was one of the better recent King books, and was a quick read for a thousand-page book. It was nothing earth-shattering, but I quite enjoyed it. 4/5.

The Book of Basketball, Bill Simmons
I generally like Bill's writing style and share his sense of humour, and this book was no exception. On the other hand, I'm only a casual basketball fan, so the topic itself wasn't great for me. I'd say 5/5 for writing and 3/5 for subject matter, so I guess 4/5 overall.

In the "To Read" Pile:

Shalimar the Clown, Salman Rushdie
Godel's Proof, Ernest Nagel & James Newman
A Mathematical Nature Walk, John Adam

Monday, March 15, 2010

Chess Query Language

It's amazing how many tools are available on the web for seemingly obscure tasks. Recently, a friend of mine was writing a short story, and he needed an answer to this question: In high-level chess games, how often do the different pieces survive through the game without being captured (ignoring kings)? In the context of this question, each of the 30 starting pieces is treated as distinct; we want to know how often the pawn that starts on the a2 square survives, how often the b2-pawn survives, etc., rather than how often general pawns survive.

I think this qualifies as an obscure question. It seems simple enough to answer in principle - just get a database of games, and write something to play through each game, tracking which pieces survive. Simple enough, but a fair bit of work. Luckily there's a tool that will do this type of thing: Chess Query Language.

CQL is quite powerful, and it's pretty straightforward to set up a CQL query. For example, to answer the above question about survival rates, I started by creating a query to see how often the white queen's rook survives:

:forany Rook R
(:position :initial $Rook[a1])
(:position :terminal $Rook[a-h1-8])

That's it. The first line creates a Rook piece designator; the second and third lines specify positions that have to exist in a game for the game to match the query. Thus the query will match any game where a rook is on the a1 square in the initial game position, and that same rook is somewhere on the board in the terminal game position.

This query took about 45 minutes to run through a database of about 2.5 million games, and found that this rook survived in about 1.4 million of them. I just had this repeat for all pieces and pawns to generate the final answer.

So, I can advise that if you're ever involved in a Harry Potter-style human chess game, you should volunteer to be one of the wing pawns. Don't allow yourself to play as a knight, whatever you do.