Search The Web

Today's Headlines

Monday, February 8, 2010

Microsoft Access Tips & Tricks: Full Outer Joins

Joins are one of the essential features of relational databases. In fact, it would not be an exaggeration to say that relational databases owe most of their power to joins. A properly normalized relational database would be close to impossible to use without joining multiple tables. Only an extremely simple or simplistic database would be capable of producing any usable results without the use of joins between multiple tables.

If you are interested, you can find my earlier posts on finding the median, the mode, the geometric and harmonic means, ranking every row in a query, selecting random rows out of a table, calculating running sums and averages, calculating running differences, creating histograms, calculating probability masses out of given data, calculating cumulative distributions out of given data, finding percentile scores, percentile values, and calculating distinct counts.

There are five main kinds of joins that are commonly recognized. They are inner joins, left outer joins, right outer joins, full outer joins and cartesian joins. A join usually consists of two tables, but in general, joins can involve any number of tables. In this post, we will deal with all kinds of joins in general before we actually come to the solution of how to code up a full outer join in Microsoft Access.

There are two main ways to code up a join in SQL. One of these ways involves the actual use of the keyword JOIN in the query.

The common syntax for this type of join is as below:
select * from table1 INNER JOIN table2 on table1.keyfield = table2.keyfield

The SQL above obviously codes an inner join. By changing INNER to LEFT or RIGHT, you would be able to code a left or right outer join respectively.
select * from table1 LEFT JOIN table2 on table1.keyfield = table2.keyfield

You can also code left or right outer joins by explicitly mentioning the word OUTER in the join as below:
select * from table1 LEFT OUTER JOIN table2 on table1.keyfield = table2.keyfield

There is a second way to code up an inner join is by using a where clause instead of the actual term JOIN in the SQL. An example of a join that works this way is shown below:
select * from table1, table2 where table1.keyfield = table2.keyfield

Note that in each of these statements above, the two tables could be the same rather than two separate tables in the database. A join that involves joining a table with itself is referred to as a self-join.

So, how does a join actually work? This is clearer when analyzing the statement that uses a WHERE clause, so I will use it to explain how a join works.

What a relational database does when it encounters the SQL statement "select * from table1, table2" is as follows. The database pulls up the first row of table1, and pairs it up with each row of table2. It then goes back, pulls up the second row of table1, and again pairs it up with each row of table2. This process goes on until the database has paired every row of table1 with every row of table2. This kind of join is commonly referred to as a cartesian join or cross-product. If table1 contains row1 rows and table2 contains row2 rows, the result of this cartesian join or cross-product will contain row1 x row2 total rows.

In general, cartesian joins are not very useful. In fact, under most circumstances, cartesian joins can bring a database to its knees because of the resource-intensiveness of a cartesian join involving two large tables. Consider the fact that most tables that contain 10,000 rows would be considered modestly sized in a relational database. However, a cartesian join between two such tables would result in a result set that contains 100 million rows!

The WHERE clause in the second syntax is what converts a cartesian join into a different type of join. The where clause is applied to each row that would result from a cartesian join, and then the rows that return a true for the condition in the where clause, become part of the final result set. The other rows are thrown out. Thus, looking at the actual SQL above, we can see that the WHERE clause would cause the database would compare the keyfields of the two tables and only produce rows in the output where they match exactly. In most cases, this is exactly what we want from a join.

Obviously, all this is done without actually producing the result of the cartesian join first, so if you are sure that your where clause actually codes up an join that will produce a manageable number of rows in the output, it is safe to do it using the second syntax even when the tables involved are large.

This exact mechanism of how a join works is somewhat obscured by the syntax that uses the JOIN keyword in the SQL rather than the WHERE clause. This is one of the reasons why I sometimes prefer the second syntax over the first when I code up inner joins (the syntax is also easier to follow, at least for me). Unfortunately, it is not possible to produce outer joins using the syntax that uses a WHERE clause.

However, it is much easier to code up joins between multiple joins using the WHERE clause syntax. All you have to do is add the additional tables to the list of tables in the FROM clause of the SQL. In the absence of a WHERE clause, such an SQL statement would produce every row from the first table paired with every row from the second table, and paired again with every row from the third table and so on. The number of rows in the result set would be the product of the number of rows in each of the tables involved in the join. But, with the appropriate WHERE clause, you can make sure that only the correct combination of rows is produced in the output.

Now, you may think that when you use the WHERE clause, you can fine-tune the join to produce any combination of rows out of a cartesian join that you want. You are not just restricted to inner joins. And you may be thinking that this level of fine-tuning is not possible with SQL syntax that uses the JOIN keyword. In reality, this is not the case. What follows after the keyword ON in the syntax that uses the JOIN keyword is a boolean condition that is completely under your control (you can obviously do this only in SQL view in Access, not from the design view, which is very limiting). You can change that condition to anything that produces a true/false value, and the join would work (it will obviously include every row in the cartesian join for which the ON condition returns a true value). In fact, you can even call functions, do any number of mathematical calculations, etc. in the ON clause of a join just as you can in the WHERE clause of a regular query. This is very important to remember and take advantage of when appropriate. Most people do not know about this feature of an ON condition, that makes it as powerful as a WHERE clause.

To understand the differences between the different types of joins, let us consider two tables, TableA and TableB, with the data below:
TableA

ID    DataA
1     DataA1
2     DataA2
3     DataA3
4     DataA4
5     DataA5
6     DataA6
7     DataA7
8     DataA8
9     DataA9
10    DataA10

TableB

ID    DataB
6     DataB6
7     DataB7
8     DataB8
9     DataB9
10    DataB10
11    DataB11
12    DataB12
13    DataB13
14    DataB14
15    DataB15

An inner join between the tables above is coded in SQL as below:
select TableA.ID, DataA, DataB from TableA inner join TableB on TableA.ID = TableB.ID

or
select TableA.ID, DataA, DataB from TableA, TableB where TableA.ID = TableB.ID

The result would look as below:
TableA.ID    DataA    DataB
6            DataA6   DataB6
7            DataA7   DataB7
8            DataA8   DataB8
9            DataA9   DataB9
10           DataA10  DataB10

As you can see, the inner join only produces rows in the output where the keyfields (TableA.ID and TableB.ID) match exactly in both tables. As you can see, this is quite obvious in the WHERE clause syntax (the WHERE clause clearly tells one what the criterion is), not so clear in the JOIN keyword syntax.

A left outer join between the tables above is coded in SQL as below:
select TableA.ID, DataA, DataB from TableA left join TableB on TableA.ID = TableB.ID

or
select TableA.ID, DataA, DataB from TableA left outer join TableB on TableA.ID = TableB.ID

It is important to remember that in a left join, the table named in the query first is always the left table. The result of the left outer join above is as below:
TableA.ID    DataA    DataB
1            DataA1
2            DataA2
3            DataA3
4            DataA4
5            DataA5
6            DataA6   DataB6
7            DataA7   DataB7
8            DataA8   DataB8
9            DataA9   DataB9
10           DataA10  DataB10

As you can see, a left outer join, includes every single row from the left table (in this case, TableA, because it is named first in the queries above), and the data from the right table (TableB in this case, because it is named second in the queries above) when the condition in the ON clause is satisfied (where the two ID's match). Notice that a left outer join includes the results of an inner join when the ON clause is the same as that in an inner join.

Note that if you modify the condition inside the ON clause of a left outer join, the result would still contain all the rows from the left table. But data from the right table would be included only when the condition inside the ON clause returns a true value.

A right outer join between the tables above is coded in SQL as below:
select TableB.ID, DataA, DataB from TableA right join TableB on TableA.ID = TableB.ID

or
select TableB.ID, DataA, DataB from TableA right outer join TableB on TableA.ID = TableB.ID

It is important to remember that in a right join, the table named second in the query is always the right table. The result of the right outer join above is as below:
TableB.ID    DataA    DataB
6            DataA6   DataB6
7            DataA7   DataB7
8            DataA8   DataB8
9            DataA9   DataB9
10           DataA10  DataB10
11                    DataB11
12                    DataB12
13                    DataB13
14                    DataB14
15                    DataB15

As you can see, a right outer join, includes every single row from the right table (in this case, TableB, because it is named second in the queries above), and the data from the left table (TableA in this case, because it is named first in the queries above) when the condition in the ON clause is satisfied (where the two ID's match). Notice that a right outer join includes the results of an inner join when the ON clause is the same as that in an inner join.

Note that if you modify the condition inside the ON clause of a right outer join, the result would still contain all the rows from the right table. But data from the left table would be included only when the condition inside the ON clause returns a true value.

Note also that left and right outer joins change the results they produce depending on the order in which the tables are present in the query itself. So, it is important to keep the tables in the right order when one is coding up outer joins.

Before we see how a full outer join is created in Microsoft Access, let us see what the result should be for a full outer join first. We want not only data from both TableA and TableB when the key fields match, but also all the data from TableA and from TableB where they don't match. The result should look as below:
 
ID           DataA    DataB
1            DataA1
2            DataA2
3            DataA3
4            DataA4
5            DataA5
6            DataA6   DataB6
7            DataA7   DataB7
8            DataA8   DataB8
9            DataA9   DataB9
10           DataA10  DataB10
11                    DataB11
12                    DataB12
13                    DataB13
14                    DataB14
15                    DataB15

Unfortunately, SQL like below is not supported by Microsoft Access:
select TableA.ID, DataA, DataB from TableA outer join TableB on TableA.ID = TableB.ID

or
select TableA.ID, DataA, DataB from TableA full outer join TableB on TableA.ID = TableB.ID

So, it is time for some trickery! We notice that the rows in a full outer join contains all the rows from a left outer join and a right outer join, with the duplicate rows from both included just once instead of being included twice. So, all we need is a way to combine the outputs from the two queries and removing all but one copy of the duplicate rows. The UNION keyword provides us with exactly such a mechanism, so our solution to the problem of a full outer join in Microsoft Access is as below:
select TableA.ID, DataA, DataB from TableA left join TableB on TableA.ID = TableB.ID
UNION
select TableB.ID, DataA, DataB from TableA right join TableB on TableA.ID = TableB.ID

or
select TableA.ID, DataA, DataB from TableA left outer join TableB on TableA.ID = TableB.ID
UNION
select TableB.ID, DataA, DataB from TableA right outer join TableB on TableA.ID = TableB.ID

There are a few things to keep in mind about the solutions above. Notice that both solutions combine a left outer join with a right outer join. In both the joins, the tables are named in the exact same order. This is very important because left and right outer joins depend on the order in which tables are named to produce the results they produce. In fact, one can take advantage of this to create a full outer join without combining a left outer join with a right outer join. One can simply use two left or two right outer joins with the table order reversed in one of the queries to produce the same results:
select TableA.ID, DataA, DataB from TableA left join TableB on TableA.ID = TableB.ID
UNION
select TableB.ID, DataA, DataB from TableB left join TableA on TableA.ID = TableB.ID
Or
select TableB.ID, DataA, DataB from TableA right join TableB on TableA.ID = TableB.ID
UNION
select TableA.ID, DataA, DataB from TableB right join TableA on TableA.ID = TableB.ID

Also, notice that we are taking advantage of the property of the UNION keyword that eliminates duplicate rows in the result set. This makes sure that the rows that would be produced by an inner join between the tables are not repeated in the outer join. In fact, if one used UNION ALL instead of UNION to combine the two queries, then the rows that would be produced in an inner join would be repeated twice in the result of the outer join.

Another thing to keep in mind is that the use of the UNION keyword between two queries requires that both queries produce the same number of columns in their output (actually, contrary to the understanding of most people, UNION does not require the same data types for the columns in the two queries). Moreover, the column names used in the output are the same as the names or aliases used in the first query. So, please make sure the columns with the same meaning are lined up in both queries so that the results make sense.

Hope this post has been helpful in solving any problems you might have had with full outer joins in Access. If you have any problems or concerns with the queries in this lesson, please feel free to let me know by posting a comment. If you have other questions on Access that you would like me to address in future lessons, please feel free to let me know through your comments too. Good luck!

20 comments:

Anonymous said...

Did you ever try to execute this query?..

Blogannath said...

Why? Are you having problems with it? I did try it out on a sample database which I don't have with me anymore, but it did work and produced a full outer join like it is supposed to.

Totty Totty said...

but how about when you need to make a filter using the "where" in sql?

Blogannath said...

What is the problem with a WHERE clause? A UNION can be used between any valid SQL queries. The UNION does not care whether the queries have WHERE clauses or not. So, if you want to filter out certain rows, do so in the individual queries using a WHERE clause and then put them together using the UNION.

Unknown said...

..............
In fact, in several places in the text, instead TablA.ID, TableB.ID be taken, wherever TableB "first".
For example, instead of

select TableA.ID, DataA, DataB from TableA right join TableB on TableA.ID = TableB.ID

should be

select TableB.ID, DataA, DataB from TableA right join TableB on TableA.ID = TableB.ID

I think ....

I usually achieves full outer join via a third table (register) where keyfield is unique ... without creating link between TableA and TableB

Blogannath said...

You are correct. The ID field should be from the right table whenever a right outer join is being used. I have corrected the post. Thanks for pointing out the error.

slaya said...

it retrieves all data but it does something weird.
If a row matches from TableA with Table B it shows Both
but if a from Table B is not in Table A it wil show it in Table A.
How can i solve this???

Blogannath said...

I have no idea what your query is doing. You have not posted your query or any example of the results it produces. It is impossible for me to diagnose the problem and give you any useful suggestions based on your description.

Please post your query (just the minimum fields required to illustrate the problem), some sample results, and what you expected (why the sample results are wrong), and I will probably be better equipped to help you out.

Bill said...

Is it possible to join two tables based on TWO SEPARATE BUT MUTUALLY REQUIRED sets of KeyFields? In other words, to join two tables based on
table1.keyfield1 = table2.keyfield1
AND
table1.keyfield2 = table2.keyfield2 so that the records are displayed ONLY WHEN BOTH key fields are equal?

Blogannath said...

Yes. Join conditions are just like WHERE clauses. You can have any number of conditions joined by AND's and OR's. As long as it returns a TRUE or FALSE for each combination of rows, it will work as a join condition. And the comparisons don't even have to be between table fields or between fields in table A and table B or anything of that sort. And they don't have to be equal to comparisons either. Any valid WHERE clause can be a join condition.

Bill said...

Thank you.
would the WHERE clause be in the SQL code or in the Design View Code? In the Design View coding, I am only familiar with WHERE clauses in cases where I am using the TOTALS button. If in the SQL coding, I would think I'd use parentheses in the FROM...AND.

sample code:
SELECT [Qry_DelTicket-Combined].DelTicketNo, [Qry_CustReturn-Combined].DelTicketNo, [Qry_DelTicket-Combined].JobNo, [Qry_DelTicket-Combined].CustCode, [Qry_DelTicket-Combined].PartNo, [Qry_DelTicket-Combined].PartDesc, [Qry_DelTicket-Combined].Qty2Ship, [Qry_DelTicket-Combined].ShipDate, [Qry_CustReturn-Combined].OrigJobNo, [Qry_CustReturn-Combined].OrigQtyShipped, [Qry_CustReturn-Combined].QtyReturned, [Qry_CustReturn-Combined].IssueDate
FROM [Qry_DelTicket-Combined] LEFT JOIN [Qry_CustReturn-Combined] ON ([Qry_DelTicket-Combined].JobNo = [Qry_CustReturn-Combined].OrigJobNo) AND ([Qry_DelTicket-Combined].DelTicketNo = [Qry_CustReturn-Combined].DelTicketNo)
WHERE (((Left([Qry_DelTicket-Combined]![CustCode],2))="RK"));

Blogannath said...

I was talking about the WHERE clause of regular SQL code. And yes, you would use parentheses to make sure that the logical conditions and comparisons are done in the correct order between the correct set of operands.

Bill said...

Blogannath, thank you very much for your time and insights. I greatly appreciate it.

Regarding the parentheses, I seem to be receiving errors when I try to apply them. Would you be able to let me know if the placement of the parentheses in the code below SHOULD work (see parentheses right after the FROM and right before the WHERE)?

sample code:
SELECT [Qry_DelTicket-Combined].DelTicketNo, [Qry_CustReturn-Combined].DelTicketNo, [Qry_DelTicket-Combined].JobNo, [Qry_DelTicket-Combined].CustCode, [Qry_DelTicket-Combined].PartNo, [Qry_DelTicket-Combined].PartDesc, [Qry_DelTicket-Combined].Qty2Ship, [Qry_DelTicket-Combined].ShipDate, [Qry_CustReturn-Combined].OrigJobNo, [Qry_CustReturn-Combined].OrigQtyShipped, [Qry_CustReturn-Combined].QtyReturned, [Qry_CustReturn-Combined].IssueDate
FROM ([Qry_DelTicket-Combined] LEFT JOIN [Qry_CustReturn-Combined] ON ([Qry_DelTicket-Combined].JobNo = [Qry_CustReturn-Combined].OrigJobNo) AND ([Qry_DelTicket-Combined].DelTicketNo = [Qry_CustReturn-Combined].DelTicketNo))
WHERE (((Left([Qry_DelTicket-Combined]![CustCode],2))="RK"));

Blogannath said...

Access has a bad habit of adding excessive amounts of parentheses at random in SQL for no discernible reason. Try the following:

SELECT
[Qry_DelTicket-Combined].DelTicketNo
, [Qry_CustReturn-Combined].DelTicketNo
, [Qry_DelTicket-Combined].JobNo
, [Qry_DelTicket-Combined].CustCode
, [Qry_DelTicket-Combined].PartNo
, [Qry_DelTicket-Combined].PartDesc
, [Qry_DelTicket-Combined].Qty2Ship
, [Qry_DelTicket-Combined].ShipDate
, [Qry_CustReturn-Combined].OrigJobNo
, [Qry_CustReturn-Combined].OrigQtyShipped
, [Qry_CustReturn-Combined].QtyReturned
, [Qry_CustReturn-Combined].IssueDate
FROM [Qry_DelTicket-Combined]
LEFT JOIN [Qry_CustReturn-Combined]
ON [Qry_DelTicket-Combined].JobNo = [Qry_CustReturn-Combined].OrigJobNo
AND [Qry_DelTicket-Combined].DelTicketNo = [Qry_CustReturn-Combined].DelTicketNo
WHERE Left([Qry_DelTicket-Combined]![CustCode],2)="RK";

Bill said...

It worked. Thank you!!

kakinz said...

This has solved the problem I have been working on ALL DAY! Thank you very much!

Niloufar said...

Hi, Thank you for explanation.

I have two question
1- if I add two conditions like:
ON [A].EMPNo = [B].EMPID
AND [A].EMP_Name = [B].EMP_FULLNAME
what happens here?

2- Base on the final result you have , how we can combine two column DataA and DataB as One column and eliminate duplicates?

Thank you!

Blogannath said...

The answer to question 1 is that joins in Access (and pretty much any relational database) can be based on any number of conditions. The join condition is based on business knowledge as well as the structure of the database tables, and they can be based on any number of conditions as necessary to achieve the correct join.

I am not sure I understand what the second question is. Maybe you can give me an example of data in the tables and the final output you want. But if you simply want DataA and DataB in one column of output without duplicates, you can do it as below:

select DataA from tableA
Union
Select DataB from tableB

Niloufar said...

Thank you for your quick reply :)

you did this: select TableA.ID, DataA, DataB from TableA left outer join TableB on TableA.ID = TableB.ID UNION select TableB.ID, DataA, DataB from TableA right outer join TableB on TableA.ID = TableB.ID

And your result is this:
ID DataA DataB
1 DataA1
2 DataA2
3 DataA3
4 DataA4
5 DataA5
6 DataA6 DataB6
7 DataA7 DataB7
8 DataA8 DataB8
9 DataA9 DataB9
10 DataA10 DataB10
11 DataB11
12 DataB12
13 DataB13
14 DataB14
15 DataB15


However, if I would like to get a result like this, what do I have to do?

ID DataC
1 DataA1
2 DataA2
3 DataA3
4 DataA4
5 DataA5
6 DataA6
7 DataA7
8 DataA8
9 DataA9
10 DataA10
11 DataB11
12 DataB12
13 DataB13
14 DataB14
15 DataB15

Blogannath said...

Did you try the Union query I posted above?

Visitors Country Map

Free counters!

Content From TheFreeDictionary.com

In the News

Article of the Day

This Day in History

Today's Birthday

Quote of the Day

Word of the Day

Match Up
Match each word in the left column with its synonym on the right. When finished, click Answer to see the results. Good luck!

 

Hangman

Spelling Bee
difficulty level:
score: -
please wait...
 
spell the word:

Search The Web