Here are some new photos of Owen. He's six months old now!
Saturday, October 24, 2009
Wednesday, October 21, 2009
SQL Subqueries with Null Values
Null values in data can often cause unexpected results. Recently I came across a case where the field in a NOT IN subquery contained null values, and I didn't get the behaviour I was expecting. After spending some time with this, I have it worked out, and the behaviour does make sense. It's worth having a look at.
We commonly use a NOT IN subquery to retrieve records that do not have a related record in another table. The common example of this type of query is to retrieve all customers that do not have any orders. Here's a simple example of a Customers table and an Orders table:
select * from Customers where CustID in (select CustID from Orders)
2 Jane Doe
3 Jack Smith
select * from Customers where CustID not in (select CustID from Orders)
1 John Doe
4 Jane Smith
This is as we would expect. However, let's add a fourth record to the Orders table, with a null value for the CustID:
select * from Customers where CustID not in (select CustID from Orders)
No records returned
Why are there no records returned? Shouldn't we still be getting customers 1 and 4, since these CustIDs do not appear in the Orders table? Well, let's look at how this gets handled. The subquery generates a list of CustIDs, like this:
select * from Customers where CustID not in ( 2, 3, null )
Logically, the NOT IN is treated as a series of not equals expressions, like this:
select * from Customers where ( CustID <> 2 and CustID <> 3 and CustID <> null )
Now, we can consider how this evaluates for our customer records. For customer 1, the where clause becomes:
1 <> 2 and 1 <> 3 and 1 <> null
= true and true and null
= null
This is why record 1 doesn't appear in the result set; for records to appear, the where clause must evaluate to true, not to null.
Now that I've gone through the logic on this, it's not really correct to say that customers 1 and 4 don't have any orders. The null CustID value in the Orders table means that the customer for that order is unknown, so we can't guarantee that this order doesn't belong to customer 1 or 4.
We commonly use a NOT IN subquery to retrieve records that do not have a related record in another table. The common example of this type of query is to retrieve all customers that do not have any orders. Here's a simple example of a Customers table and an Orders table:
Customers OrdersThere are four customers, and two of these customers have placed orders. We can use IN and NOT IN subqueries like these:
CustID FullName OrderID CustID OrderDate
1 John Doe 1 2 1/1/2009
2 Jane Doe 2 3 1/2/2009
3 Jack Smith 3 3 1/3/2009
4 Jane Smith
select * from Customers where CustID in (select CustID from Orders)
2 Jane Doe
3 Jack Smith
select * from Customers where CustID not in (select CustID from Orders)
1 John Doe
4 Jane Smith
This is as we would expect. However, let's add a fourth record to the Orders table, with a null value for the CustID:
OrderID CustID OrderDateNow, when we run the IN query, the results are unchanged; we still get customers 2 and 3. However, when we run the NOT IN query:
4 null 1/4/2009
select * from Customers where CustID not in (select CustID from Orders)
No records returned
Why are there no records returned? Shouldn't we still be getting customers 1 and 4, since these CustIDs do not appear in the Orders table? Well, let's look at how this gets handled. The subquery generates a list of CustIDs, like this:
select * from Customers where CustID not in ( 2, 3, null )
Logically, the NOT IN is treated as a series of not equals expressions, like this:
select * from Customers where ( CustID <> 2 and CustID <> 3 and CustID <> null )
Now, we can consider how this evaluates for our customer records. For customer 1, the where clause becomes:
1 <> 2 and 1 <> 3 and 1 <> null
= true and true and null
= null
This is why record 1 doesn't appear in the result set; for records to appear, the where clause must evaluate to true, not to null.
Now that I've gone through the logic on this, it's not really correct to say that customers 1 and 4 don't have any orders. The null CustID value in the Orders table means that the customer for that order is unknown, so we can't guarantee that this order doesn't belong to customer 1 or 4.
Subscribe to:
Posts (Atom)