Thursday, February 17, 2011

Using a SQL Table of Numbers

An application that we work with has certain fields where the user is able to select multiple values for a single record. For example, for a contact's interests, the user might select Golf, Poker, and Tennis, and for a contact's groups, the user might select Customer and Friend. These values are stored as a list in the appropriate contact field, as in:
CustID  CustName       GroupList
1 Joe Smith Customer,Reseller
2 John Doe Competitor,Prospect,Reseller
3 Jane Doe Customer,Friend
4 Guy Incognito Competitor
This works well from a data-entry perspective, but from our point of view in reporting, it causes some trouble. First, filtering on these values is a bit of an issue. If we want to find everyone who is marked as a Customer, we need to use a contains operation, rather than equals. With this, we need to be sure to account for the case where one of the options is a substring of another. If one of our group options is Former Customer, we don't want it showing up in a search for Customer. This is easy enough to work around, but it is an extra step to deal with.

The larger issue is that users will often want to use these fields to group or summarize a report. In this case, the only real solution is to split the list of values for each contact, and treat them as a child table.

There are a number of approaches that will accomplish this. We can scan through the contact table, creating child records for each contact one by one. We can create a few select statements to retrieve each contact's first group, second group, etc, and then union them together. However, a simpler approach is to use a table of numbers.

A table of numbers is just that - a table containing a list of numbers. For our example, we only need a few:
num
1
2
3
4
...
That's it - probably the simplest table you'll ever use. With this table, splitting our value lists to create a child table is easy.
select cust.custid,
padr( GetWordNum( cust.grouplist, ton.num, "," ), 20 ) as OneGroup
from cust inner join ton on GetWordCount( cust.grouplist, "," ) >= ton.num
custid  onegroup
1 Customer
1 Reseller
2 Competitor
2 Prospect
2 Reseller
3 Customer
3 Friend
4 Competitor
This is made particularly easy by VFP's GetWordNum and GetWordCount functions, but a similar idea can be used without these functions.

Another trick with a table of numbers is to "fill in" a range of discrete values, such as a range of dates. Consider the following table of sales totals.
dDate        amount
02/15/2011 1500.00
02/16/2011 4000.00
02/18/2011 750.00
This table might be the result of aggregating the sales records for the week of Feb. 13 - 19. Dates where there were no sales do not show up, but we might want to include them with a zero in the amount field. A table of numbers makes it simple to generate the list of dates:
dStart = {^2011-02-12}
dEnd = {^2011-02-19}
nDays = dEnd - dStart

select dStart + ton.num as dDate from ton where ton.num <= nDays
We simply outer join this to our sales table to complete our date range.
select AllDates.dDate, nvl( sales.amount, cast(0 as Y) ) as amount
from ( select dStart + ton.num as dDate from ton where ton.num <= nDays ) as AllDates
left outer join sales on AllDates.dDate = sales.dDate

dDate amount
02/13/2011 0.00
02/14/2011 0.00
02/15/2011 1500.00
02/16/2011 4000.00
02/17/2011 0.00
02/18/2011 750.00
02/19/2011 0.00

2 comments:

  1. Another trick you might consider for this type of problem is temporary tables, where you split out the comma-separated fields into normalized tables. Then you or your users can write simpler queries.

    Personally, I *hate* comma separated data in my SQL. Thanks for illustrating how to deal with it, I never saw the PADR function before (hopefully it is in MySQL)

    ReplyDelete
  2. I know this is an old thread, but can I make a couple of points in reply to Dave Kerr's question about PADR().

    If a given DBMS doesn't support PADR(), you can always achieve the same effect by concatenating a string of spaces to the original value, then using LEFT() to truncate to the required length. (Always assuming the DBMS supports LEFT()). Similarly, add the spaces to the left of the original string, and use RIGHT() to truncate; this is equivalent to PADL().

    Second point: In most systems, you wouldn't need PADR() in these circumstances. Visual FoxPro needs it because, when creating a result set, it sets the column width to the width of the value in the first row. So you need to always ensure that the first value is wide enough to hold any possible result.

    Other DBMSs will set the column width as wide as necessary to hold ALL the values. So no special action is needed for the first row.

    Hope this is of interest.

    Mike

    ReplyDelete