Search The Web

Today's Headlines

Monday, March 15, 2010

Microsoft Access Tips & Tricks: Find Unmatched Rows

Finding unmatched rows is in some senses the opposite of doing a join between two tables. Instead of finding rows that match (which is what you do when you perform a join), you are trying to find the rows in one table that do not have a match in the other table. It is a very useful concept since we need to find unmatched rows in several practical instances. For instance, if you have a table of customers and another table of customer orders, you can use an unmatched query to find a list of customers with no orders.

If you are interested, you can find my earlier posts on finding the median, the mode, the geometric and harmonic means, ranking every row in a query, selecting random rows out of a table, calculating running sums and averages, calculating running differences, creating histograms, calculating probability masses out of given data, calculating cumulative distributions out of given data, finding percentile scores, percentile values, calculating distinct counts, full outer joins, parameter queries, crosstab queries, working with system objects, and listing table fields.

Before we look at how to find unmatched rows, let us look at a few diagrams (fashioned after Venn diagrams) to understand exactly what we are attempting to do.

Consider two tables in our database that have some elements in common based on some key values (the entire rows in the two tables will probably not be identical. If they are, there is probably something fundamentally wrong with the structure of the database, but that is another topic for another day). This is represented below by the two circles that intersect and overlap in the middle. The circle to the left represents the contents of table A, while the circle to the right represents the contents of table B.
tableA and tableB
Now, when we perform an inner join on the two tables, we are picking out the rows in the middle whose key values are common between both tables A and B. In the diagram below, the area chosen is colored blue.
Inner Join
When we perform a left outer join on the two tables, we are picking all of table A along with the rows in the middle whose key values are common between both tables A and B. In the diagram below, the area chosen is colored blue.
Left Outer Join
Similarly, when we perform a right outer join on the two tables, we are picking all of table B along with the rows in the middle whose key values are common between both tables A and B. In the diagram below, the area chosen is colored blue.
Right Outer Join
In a full outer join, as we saw in this earlier post, we pick all rows of both tables, but make sure the rows whose key values match combine the information from both tables. In some senses, a full outer join selects all the information in both tables, and can therefore be represented by the diagram below, but make sure you understand how this differs from two other concepts:
  • a cartesian join between the two tables
  • a select from both tables using a UNION between them
In both the above cases, the information in the rows that share the same key information in both tables are not combined into one row in the output (actually in the case of a cartesian join, they are, but the cartesian join also combines information into output rows where the keys don't match). This combining of the information into the same row in the output is represented by the slightly darker shade of blue in the area where the circles representing the two tables intersect.
Full Outer Join
Carrying the Venn diagram analogies further, we can now easily define the output of an unmatched query. Similar to joins, you can have a left unmatched query, a right unmatched query or a full unmatched query.

A left unmatched query includes information from rows in table A that don't have matching key information in table B. This is represented by the blue area in the figure below.
Left Unmatched Query
A right unmatched query includes information from rows in table B that don't have matching key information in table A. This is represented by the blue area in the figure below.
Right Unmatched Query
A full unmatched query includes information from rows in table A that don't have matching key information in table B, and also information from rows in table B that don't have matching key information in table B. This is represented by the blue area in the figure below.
Full Unmatched Query
So, how exactly do we write an unmatched query? Actually there is a query wizard in Access that does just that. When you create a new query, you can choose the "Find Unmatched Query Wizard", and provide the information about your tables and key fields to have Access create an unmatched query automatically for you.

If Access already allows you to write unmatched queries using a wizard, why should we learn how to do it manually? The reason is that Access always writes unmatched queries in only one way. If we understand how an unmatched query is fashioned, we can write it in other ways that may suit our needs better. The query produced by the Access wizard uses a left join. It will look something like the query below:

select tableA.* from tableA left join tableB on tableA.keyfield = tableB.keyfield
where tableB.keyfield is NULL

Let us analyze the query briefly to understand why it works as an unmatched query. As explained in the post on full outer joins, a left join selects all rows from the left table (tableA in this case). It then looks for matches of the keyfield in the right table (tableB in this case). Where it finds a match, the information from tableB is appended to the information from tableA to form a joined row. Where such a match is not found (the unmatched rows we want), the information from tableB is NULL. The query above takes advantage of that fact to find unmatched rows. The WHERE clause in the query above filters out rows where matching information exists in tableB, giving us just the unmatched rows.

This query structure, using a join, is very confusing to many beginners. The basic problem is the juxtaposition of the join condition with the WHERE clause which seem to be contradictory to each other. It would seem to imply that somehow we are choosing rows in tableA where the keyfield is NULL because we have the join condition that equates tableA's keyfield to tableB's keyfield and then we have the WHERE condition which selects only rows where tableB's keyfield is NULL. The important thing to remember that in a left join, the join condition is only used to form combined rows of information. It is not used to filter out rows from the left table.

So, this creates a left unmatched query. How do you create a right unmatched query using the wizard? The answer is, you don't! When you use the wizard, you always get a left unmatched query. The order of the tables is changed depending on your inputs, but the join is retained as a left join. Thus, if you actually wanted unmatched query for rows in B that have no match in A, you would get the following query out of the wizard:

select tableB.* from tableB left join tableA on tableB.keyfield = tableA.keyfield
where tableA.keyfield is NULL

But if you wanted to create a right unmatched query by hand, you can do it easily as below:

select tableB.* from tableA right join tableB on tableA.keyfield = tableB.keyfield
where tableA.keyfield is NULL

There is a actually another way of writing a left or right unmatched query that does not use joins. I find it much easier to use, and a lot more intuitive to boot. This query is written as below:

select tableA.* from tableA
where tableA.keyfield not in (select distinct tableB.keyfield from tableB)

As you can see, the query is easier is much easier to read and understand. We are simply choosing rows of tableA where the keyfield is not one of the keyfields in tableB. We use a "select distinct" in the subquery rather than a simple select simply to reduce the number of elements against which the WHERE clause has to filter.

Why does the Access query wizard choose to perform this using a join when a simpler syntax exists? The reason could simply be that the graphical query display works with joins, but not with subqueries. If you wrote the above query into the SQL view of a query, you would not be able to view it in Design view in Access. This is one more reason I prefer to learn and use SQL directly rather than using the Design view of queries in Access. The Design view has very many limitations, this being just one of them.

As you can see, with this new syntax, there is really no concept of left or right unmatched query. We only have a table from which we want unmatched rows and another table to figure out what the matching rows are. You could consider the query above to be a left unmatched query just to be consistent with the figures we have drawn before. So, if we wanted to write a right unmatched query using the same figures a guidance, we would write it as below:

select tableB.* from tableB
where tableB.keyfield not in (select distinct tableA.keyfield from tableA)

Now that we have dealt with left and right unmatched queries, how do we perform a full unmatched query? Once again, Access does not have a facility to directly do a full unmatched query, just like it does not have a facility to directly do a full outer join. A full unmatched query's results are produced by combining the results of a left unmatched query with a right unmatched query using a UNION statement.

Keep in mind that when using a UNION between two queries, the two queries must pull out the same number of fields each. Otherwise, the UNION is not valid. Therefore, it is best to pull out all the fields you want to see from both tables into each of the queries, then join them with a UNION. The resulting result set will have a set of rows in which the columns to the right (from tableB) will be NULL, and another set of rows in which the columns to the left (from tableA) will be NULL. A query written like that is shown below:

select tableA.*, tableB.* from tableA left join tableB on tableA.keyfield = tableB.keyfield
where tableB.keyfield is NULL
UNION
select tableA.*, tableB.* from tableB left join tableA on tableB.keyfield = tableA.keyfield
where tableA.keyfield is NULL

The results of the query above can also be recreated by rewriting the query as below:

select tableA.*, tableB.* from tableA left join tableB on tableA.keyfield = tableB.keyfield
where tableB.keyfield is NULL
UNION
select tableA.*, tableB.* from tableA right join tableB on tableB.keyfield = tableA.keyfield
where tableA.keyfield is NULL

Note that it is difficult to write these queries using a subquery construct because the resultset consists of information from both tables joined together into corresponding rows, and doing that without a join would result in a cartesian product between the tables with a lot of unwanted rows of information.

One could pull out a selected number of columns from tableA as part of the left unmatched query, then pull out an equal number of columns from tableB as part of the right unmatched query, and then combined them with a UNION, but this will make it very difficult to figure out which rows were unmatched in tableA and which were unmatched in tableB. Such a query is shown below:

select tableA.field1, tableA.field2, tableA.field3
from tableA left join tableB on tableA.keyfield = tableB.keyfield
where tableB.keyfield is NULL
UNION
select tableB.field3, tableB.field4, tableB.field5
from tableB left join tableA on tableB.keyfield = tableA.keyfield
where tableA.keyfield is NULL

Note that the number of fields pulled out of the tables in each query is three in this case. That number can be anything, but the numbers have to match across the queries. The actual datatypes of the fields pulled out do not have to match (many people have the mistaken notion that the datatypes have to match across queries in a UNION query. They do not have to match. Try it out for yourself!). I will leave it up to the reader to write the second query as a right join if that is their preference! Note that the query above can be written using subqueries without any problem. That syntax is shown below:

select tableA.field1, tableA.field2, tableA.field3 from tableA
where tableA.keyfield not in (select distinct tableB.keyfield from tableB)
UNION
select tableB.field3, tableB.field4, tableB.field5 from tableB
where tableB.keyfield not in (select distinct tableA.keyfield from tableA)

Hope this post has been helpful in solving any problems you might have had with unmatched queries in Access. If you have any problems or concerns with the SQL in this lesson, please feel free to let me know by posting a comment. If you have other questions on Access that you would like me to address in future lessons, please feel free to let me know through your comments too. Good luck!

5 comments:

Anonymous said...

This is a really great explanation. So many of us get confused about left and right outer joins. I always have to stop and think about it, but this article is the first I've seen to use such clear language and useful graphics. Thanks!
One thing that I would like to see added is a discussion about finding unmatched rows when there the key between tableA and tableB is multiple columns. In that case an outer join works, but a subquery is less intuitive. I think you can use "NOT exists" in that case:
select tableA.*
from tableA as A
where NOT exists
(select tableB.*
from tableB
where A.col1 = col1
and A.col2 = col2)

Blogannath said...

Thank you for your comment. I am glad you like the graphics, which took a while to make!

I agree with you about joins that use multiple key fields. In that case an outer join leads to much simpler SQL than the subquery approach.

But that is just one more reason in my book never to have a multi-key table! Every table should have a single primary key and all related tables should have that single primary key as the foreign key. I know that even in the best-designed systems, some junction tables could end up having multiple keys, but it is best to keep such tables to a minimum and not involve them in too many joins if at all possible. And access makes it easy to create single keys for tables with its autonumber feature.

Thank you for posting your SQL that uses the NOT exists clause in a subquery. Another option that might be feasible is combining the keys together into a single key by just concatenating them. Obviously, you have to pad numbers with zeroes in front and strings with spaces in front to make them all the same length, otherwise we might get false matches.

徐正泰 said...

So amazingly helpful. Totally solved my problem. Thanks a lot.

Ben Lee said...

Thank you!!!! You solved my problem with a left join query.

Zoran Djordjevic said...

This is the clearest explanation of joins and unmatched queries I read in years. Thank you for taking the time to write and post this.

Visitors Country Map

Free counters!

Content From TheFreeDictionary.com

In the News

Article of the Day

This Day in History

Today's Birthday

Quote of the Day

Word of the Day

Match Up
Match each word in the left column with its synonym on the right. When finished, click Answer to see the results. Good luck!

 

Hangman

Spelling Bee
difficulty level:
score: -
please wait...
 
spell the word:

Search The Web