If you are interested, you can find my earlier posts on finding the median, the mode, the geometric and harmonic means, ranking every row in a query, selecting random rows out of a table, calculating running sums and averages, calculating running differences, creating histograms, calculating probability masses out of given data, calculating cumulative distributions out of given data, finding percentile scores, percentile values, calculating distinct counts, full outer joins, parameter queries, crosstab queries, working with system objects, and listing table fields.
Before we look at how to find unmatched rows, let us look at a few diagrams (fashioned after Venn diagrams) to understand exactly what we are attempting to do.
Consider two tables in our database that have some elements in common based on some key values (the entire rows in the two tables will probably not be identical. If they are, there is probably something fundamentally wrong with the structure of the database, but that is another topic for another day). This is represented below by the two circles that intersect and overlap in the middle. The circle to the left represents the contents of table A, while the circle to the right represents the contents of table B.

Now, when we perform an inner join on the two tables, we are picking out the rows in the middle whose key values are common between both tables A and B. In the diagram below, the area chosen is colored blue.

When we perform a left outer join on the two tables, we are picking all of table A along with the rows in the middle whose key values are common between both tables A and B. In the diagram below, the area chosen is colored blue.

Similarly, when we perform a right outer join on the two tables, we are picking all of table B along with the rows in the middle whose key values are common between both tables A and B. In the diagram below, the area chosen is colored blue.

In a full outer join, as we saw in this earlier post, we pick all rows of both tables, but make sure the rows whose key values match combine the information from both tables. In some senses, a full outer join selects all the information in both tables, and can therefore be represented by the diagram below, but make sure you understand how this differs from two other concepts:
- a cartesian join between the two tables
- a select from both tables using a UNION between them

Carrying the Venn diagram analogies further, we can now easily define the output of an unmatched query. Similar to joins, you can have a left unmatched query, a right unmatched query or a full unmatched query.
A left unmatched query includes information from rows in table A that don't have matching key information in table B. This is represented by the blue area in the figure below.

A right unmatched query includes information from rows in table B that don't have matching key information in table A. This is represented by the blue area in the figure below.

A full unmatched query includes information from rows in table A that don't have matching key information in table B, and also information from rows in table B that don't have matching key information in table B. This is represented by the blue area in the figure below.

So, how exactly do we write an unmatched query? Actually there is a query wizard in Access that does just that. When you create a new query, you can choose the "Find Unmatched Query Wizard", and provide the information about your tables and key fields to have Access create an unmatched query automatically for you.
If Access already allows you to write unmatched queries using a wizard, why should we learn how to do it manually? The reason is that Access always writes unmatched queries in only one way. If we understand how an unmatched query is fashioned, we can write it in other ways that may suit our needs better. The query produced by the Access wizard uses a left join. It will look something like the query below:
select tableA.* from tableA left join tableB on tableA.keyfield = tableB.keyfield
where tableB.keyfield is NULL
Let us analyze the query briefly to understand why it works as an unmatched query. As explained in the post on full outer joins, a left join selects all rows from the left table (tableA in this case). It then looks for matches of the keyfield in the right table (tableB in this case). Where it finds a match, the information from tableB is appended to the information from tableA to form a joined row. Where such a match is not found (the unmatched rows we want), the information from tableB is NULL. The query above takes advantage of that fact to find unmatched rows. The WHERE clause in the query above filters out rows where matching information exists in tableB, giving us just the unmatched rows.
This query structure, using a join, is very confusing to many beginners. The basic problem is the juxtaposition of the join condition with the WHERE clause which seem to be contradictory to each other. It would seem to imply that somehow we are choosing rows in tableA where the keyfield is NULL because we have the join condition that equates tableA's keyfield to tableB's keyfield and then we have the WHERE condition which selects only rows where tableB's keyfield is NULL. The important thing to remember that in a left join, the join condition is only used to form combined rows of information. It is not used to filter out rows from the left table.
So, this creates a left unmatched query. How do you create a right unmatched query using the wizard? The answer is, you don't! When you use the wizard, you always get a left unmatched query. The order of the tables is changed depending on your inputs, but the join is retained as a left join. Thus, if you actually wanted unmatched query for rows in B that have no match in A, you would get the following query out of the wizard:
select tableB.* from tableB left join tableA on tableB.keyfield = tableA.keyfield
where tableA.keyfield is NULL
But if you wanted to create a right unmatched query by hand, you can do it easily as below:
select tableB.* from tableA right join tableB on tableA.keyfield = tableB.keyfield
where tableA.keyfield is NULL
There is a actually another way of writing a left or right unmatched query that does not use joins. I find it much easier to use, and a lot more intuitive to boot. This query is written as below:
select tableA.* from tableA
where tableA.keyfield not in (select distinct tableB.keyfield from tableB)
As you can see, the query is easier is much easier to read and understand. We are simply choosing rows of tableA where the keyfield is not one of the keyfields in tableB. We use a "select distinct" in the subquery rather than a simple select simply to reduce the number of elements against which the WHERE clause has to filter.
Why does the Access query wizard choose to perform this using a join when a simpler syntax exists? The reason could simply be that the graphical query display works with joins, but not with subqueries. If you wrote the above query into the SQL view of a query, you would not be able to view it in Design view in Access. This is one more reason I prefer to learn and use SQL directly rather than using the Design view of queries in Access. The Design view has very many limitations, this being just one of them.
As you can see, with this new syntax, there is really no concept of left or right unmatched query. We only have a table from which we want unmatched rows and another table to figure out what the matching rows are. You could consider the query above to be a left unmatched query just to be consistent with the figures we have drawn before. So, if we wanted to write a right unmatched query using the same figures a guidance, we would write it as below:
select tableB.* from tableB
where tableB.keyfield not in (select distinct tableA.keyfield from tableA)
Now that we have dealt with left and right unmatched queries, how do we perform a full unmatched query? Once again, Access does not have a facility to directly do a full unmatched query, just like it does not have a facility to directly do a full outer join. A full unmatched query's results are produced by combining the results of a left unmatched query with a right unmatched query using a UNION statement.
Keep in mind that when using a UNION between two queries, the two queries must pull out the same number of fields each. Otherwise, the UNION is not valid. Therefore, it is best to pull out all the fields you want to see from both tables into each of the queries, then join them with a UNION. The resulting result set will have a set of rows in which the columns to the right (from tableB) will be NULL, and another set of rows in which the columns to the left (from tableA) will be NULL. A query written like that is shown below:
select tableA.*, tableB.* from tableA left join tableB on tableA.keyfield = tableB.keyfield
where tableB.keyfield is NULL
UNION
select tableA.*, tableB.* from tableB left join tableA on tableB.keyfield = tableA.keyfield
where tableA.keyfield is NULL
The results of the query above can also be recreated by rewriting the query as below:
select tableA.*, tableB.* from tableA left join tableB on tableA.keyfield = tableB.keyfield
where tableB.keyfield is NULL
UNION
select tableA.*, tableB.* from tableA right join tableB on tableB.keyfield = tableA.keyfield
where tableA.keyfield is NULL
Note that it is difficult to write these queries using a subquery construct because the resultset consists of information from both tables joined together into corresponding rows, and doing that without a join would result in a cartesian product between the tables with a lot of unwanted rows of information.
One could pull out a selected number of columns from tableA as part of the left unmatched query, then pull out an equal number of columns from tableB as part of the right unmatched query, and then combined them with a UNION, but this will make it very difficult to figure out which rows were unmatched in tableA and which were unmatched in tableB. Such a query is shown below:
select tableA.field1, tableA.field2, tableA.field3
from tableA left join tableB on tableA.keyfield = tableB.keyfield
where tableB.keyfield is NULL
UNION
select tableB.field3, tableB.field4, tableB.field5
from tableB left join tableA on tableB.keyfield = tableA.keyfield
where tableA.keyfield is NULL
Note that the number of fields pulled out of the tables in each query is three in this case. That number can be anything, but the numbers have to match across the queries. The actual datatypes of the fields pulled out do not have to match (many people have the mistaken notion that the datatypes have to match across queries in a UNION query. They do not have to match. Try it out for yourself!). I will leave it up to the reader to write the second query as a right join if that is their preference! Note that the query above can be written using subqueries without any problem. That syntax is shown below:
select tableA.field1, tableA.field2, tableA.field3 from tableA
where tableA.keyfield not in (select distinct tableB.keyfield from tableB)
UNION
select tableB.field3, tableB.field4, tableB.field5 from tableB
where tableB.keyfield not in (select distinct tableA.keyfield from tableA)
Hope this post has been helpful in solving any problems you might have had with unmatched queries in Access. If you have any problems or concerns with the SQL in this lesson, please feel free to let me know by posting a comment. If you have other questions on Access that you would like me to address in future lessons, please feel free to let me know through your comments too. Good luck!
















2 comments:
This is a really great explanation. So many of us get confused about left and right outer joins. I always have to stop and think about it, but this article is the first I've seen to use such clear language and useful graphics. Thanks!
One thing that I would like to see added is a discussion about finding unmatched rows when there the key between tableA and tableB is multiple columns. In that case an outer join works, but a subquery is less intuitive. I think you can use "NOT exists" in that case:
select tableA.*
from tableA as A
where NOT exists
(select tableB.*
from tableB
where A.col1 = col1
and A.col2 = col2)
Thank you for your comment. I am glad you like the graphics, which took a while to make!
I agree with you about joins that use multiple key fields. In that case an outer join leads to much simpler SQL than the subquery approach.
But that is just one more reason in my book never to have a multi-key table! Every table should have a single primary key and all related tables should have that single primary key as the foreign key. I know that even in the best-designed systems, some junction tables could end up having multiple keys, but it is best to keep such tables to a minimum and not involve them in too many joins if at all possible. And access makes it easy to create single keys for tables with its autonumber feature.
Thank you for posting your SQL that uses the NOT exists clause in a subquery. Another option that might be feasible is combining the keys together into a single key by just concatenating them. Obviously, you have to pad numbers with zeroes in front and strings with spaces in front to make them all the same length, otherwise we might get false matches.
Post a Comment