Search The Web

Today's Headlines

Monday, February 22, 2010

Microsoft Access Tips & Tricks: Crosstab Queries

Crosstab queries are used to create matrices of aggregated data out of simple row data. In some sense, you can consider crosstab queries to be very similar to pivot tables in spreadsheets. In Microsoft Access, a non-standard extension to SQL-92 in the form of the TRANSFORM keyword is used to create crosstab queries.

If you are interested, you can find my earlier posts on finding the median, the mode, the geometric and harmonic means, ranking every row in a query, selecting random rows out of a table, calculating running sums and averages, calculating running differences, creating histograms, calculating probability masses out of given data, calculating cumulative distributions out of given data, finding percentile scores, percentile values, calculating distinct counts, full outer joins and parameter queries.

In general, you are probably already familiar with the use of GROUP BY to create aggregations of data in Access. Let us suppose, for instance, that we have a table, Orders, with the following fields: OrderDate, EmployeeID, and OrderAmount. It is easy to write a query as below to get a total of orders by year:

select year(OrderDate), sum(OrderAmount) from Orders group by year(OrderDate)

Similarly, we can get the total of orders by employee ID:

select EmployeeID(OrderDate), sum(OrderAmount) from Orders group by year(OrderDate)

How do you obtain the order totals by year as well as Employee ID? You could use both employee ID and the year of the order in the group by clause as below, but the resulting results are quite hard to read.

select EmployeeID, year(OrderDate), sum(OrderAmount) from Orders
group by EmployeeID, year(OrderDate)

The above query lists each employee-year combination in a separate row and lists an order total for that combination. It becomes very difficult to compare yearly totals for the same year across employees. By the same token, comparing yearly totals across years for the same employee is difficult too.

Ideally, what we want is a matrix that has employee ID's along the left hand side, years across the top, and the totals of orders corresponding to each employee for each year as the appropriate matrix element. This would make it extremely easy to compare yearly totals across years for the same employee (just scan each row), or compare yearly totals across employees for the same year (just scan each column). This can be hacked together if we know in advance what years we need the matrix to contain. For instance, if we want order totals only for 2008, 2009 and 2010, we can write SQL as below to accomplish this:

select EmployeeID,
(select sum(OrderAmount) from Orders where year(OrderDate) = 2008) as 2008,
(select sum(OrderAmount) from Orders where year(OrderDate) = 2009) as 2009,
(select sum(OrderAmount) from Orders where year(OrderDate) = 2010) as 2010,
from Orders
group by EmployeeID

But what happens if we don't know in advance what years we want the results for? Also, what happens when a new year rolls into the table? We will have to go in and modify the query to include the query (otherwise, that year's results will never be part of the query results). What we want is a dynamic query that does not require us to specify the years for which we need results. That is where a crosstab query comes in handy. However, the syntax of a crosstab query can be quite confusing and intimidating. In this post, we will try to disentangle the syntax so that we can work with it more comfortably.

Creating a crosstab query using the graphical query designer in Access is not very difficult. Access provides you with a crosstab query wizard that can walk you through the steps in creating a crosstab query with the appropriate row headings, column headings and the aggregate value you want to see as elements of the matrix. However, the crosstab query wizard, like all other wizards and graphical design elements of Access, has severe limitations that can only be overcome by working in the SQL view of the query. That is why it is important to understand the syntax of a crosstab query so that we can create one without relying on the wizard to do it for us!

Some of the limitations of the crosstab query wizard, that we can overcome by working in native SQL are as below:

  • The crosstab query wizard can only work with one table or query at a time. If you want data from multiple tables or queries, you first have to create an intermediate query (or table) with all the fields you need before you can use the crosstab query wizard on that intermediate query (or table)
  • Column headings and order can not be specified in the wizard. Only in native SQL can you specify the columns to be included in the result and their order

The basic syntax of a crosstab query in Microsoft Access SQL is as below:

TRANSFORM AggregateFunction(matrixElement)
select rowheadings from table group by rowheadings
pivot columnheadings

Thus, in the example we have been working on a crosstab query like the one below would produce the matrix we are looking for:

TRANSFORM sum(OrderAmount)
select EmployeeID from Orders group by EmployeeID
pivot year(OrderDate)

The select statement in the above SQL statement generates the row headings. You can have multiple row headings in a crosstab query simply by adding them to the select statement of the crosstab query. For instance, suppose you want to see employee names in addition to the ID's. If the employee names are in the Order table, you can modify the query as below to get employee names also in the row headings:

TRANSFORM sum(OrderAmount)
select EmployeeID, EmployeeName from Orders group by EmployeeID, EmployeeName
pivot year(OrderDate)

If the employee names are in a separate table that is linked to the Orders table by employee ID's you can use a join in the select statement to get the employee names as below:

TRANSFORM sum(OrderAmount)
select EmployeeID, EmployeeName from Orders inner join EmployeeDetails
on Orders.EmployeeID = EmployeeDetails.EmployeeID
group by EmployeeID, EmployeeName
pivot year(OrderDate)

The above crosstab query is one which you can not create using the crosstab query wizard directly. You would have to create an intermediate query which performs the join and creates a single result set that contains both employee ID's and employee names. You would then have to use this intermediate query as the recordsource for the crosstab query wizard.

You can also use this feature of crosstab queries to add row totals to the query result. The query below shows how:

TRANSFORM sum(OrderAmount)
select EmployeeID, sum(OrderAmount) as EmployeeTotal from Orders
group by EmployeeID
pivot year(OrderDate)

The sum of all orders of each employee ID (across all years) is now part of the select query. This adds a column titled "EmployeeTotal" in the crosstab query results.

The select query that is part of the crosstab query can also contain WHERE clauses and HAVING clauses. It is just a regular aggregate query as far as Access is concerned. It can also have an ORDER BY clause, giving you control over the order in which rows are presented in the crosstab query results. The query below illustrates this:

TRANSFORM sum(OrderAmount)
select EmployeeID, sum(OrderAmount) as EmployeeTotal from Orders
where EmployeeID < 1000
group by EmployeeID
having sum(OrderAmount) > 1000
pivot year(OrderDate)

Now, what happens if you want both the count of the number of orders as well as the sum of the order amounts as part of a single crosstab query? If you try to write a crosstab query as below, you will get a syntax error:

TRANSFORM sum(OrderAmount), count(OrderAmount)
select EmployeeID from Orders group by EmployeeID
pivot year(OrderDate)

That is because the TRANSFORM clause can only contain one aggregate. This makes sense since each matrix row and column has space for only one value. That one value is supplied by the one aggregate in the TRANSFORM clause. The only way to get two aggregates is by expanding the matrix to twice its size. You can not do this directly using a single crosstab query. However, let us assume that you have the following two crosstab queries already in your database:

Query for order totals (let us say this is named QryXtabOrderTotals):

TRANSFORM sum(OrderAmount)
select EmployeeID from Orders group by EmployeeID
pivot year(OrderDate)

And query for order counts (let us say this is named QryXtabOrderCounts):

TRANSFORM count(OrderAmount)
select EmployeeID from Orders group by EmployeeID
pivot year(OrderDate)

You can then join these two queries on the common field of EmployeeID to create a new query which will expand the crosstab to twice the width and include both counts and totals. The syntax of such a query would be identical to a regular join between any two queries. All this query would do is pick up each row of the first crosstab query, join it with the other crosstab query, resulting in the addition of extra columns and produce a new set of results. There is actually no crosstabulation going on in this third query:

select * from QryXtabOrderTotals inner join QryXtabOrderCounts
on QryXtabOrderTotals.EmployeeID = QryXtabOrderCounts.EmployeeID

Sometimes, you may want the crosstab query to have the same number of columns whenever you run it. For instance, if you create a crosstab query that tabulates order amounts by month, you may want to make sure that the query always returns columns for all 12 months of the year.

Unfortunately, by default, if a column contains no data in a crosstab query, Access automatically gets rid of that column in the results rather than displaying an empty column. Thus, if you had no orders in June for whatever reason, your results would contain only 11 columns with the column for June missing entirely. This can not only be confusing and misleading, it may also cause problems if you base a report on the crosstab query and the report expects a fixed number of columns in the results of the query.

You can easily fix this problem by specifying the columns to include in the query in the PIVOT clause using the keyword IN. Take a look at the query below for an example:

TRANSFORM sum(OrderAmount)
select EmployeeID from Orders group by EmployeeID
pivot month(OrderDate) in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)

With the query above, even if there were no orders in the month of June, the column 6 (with no order totals in it) would still exist in the results of the query. If you actually want to see zeroes instead of blanks in the matrix elements that contain no data, use the NZ() function as below:

TRANSFORM nz(sum(OrderAmount),0)
select EmployeeID from Orders group by EmployeeID
pivot month(OrderDate) in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)

The IN keyword also provides another important functionality to crosstab queries. Suppose you want to crosstabulate order totals by employeeID's and month names (Jan, Feb, etc.). You might think that you could accomplish this by writing the query as below:

TRANSFORM sum(OrderAmount)
select EmployeeID from Orders group by EmployeeID
pivot format(month(OrderDate), "mmm")

The problem, as you will see immediately, is that the ordering of the columns in a crosstab query is in numerical order or alphabetic order, by default. Thus, you will have the columns in the results of the above query ordered with "Apr" first, followed by "Aug", "Dec", "Feb", and so on. This is obviously not going to serve well for any kind of trend analysis! What you want is to specify the order in which the months are to be presented.

The IN keyword comes in handy for that purpose also. When you specify a list of columns to include in the crosstab results, the order of the columns is also the same as how it is specified in the IN clause. Therefore, if you want your monthly totals presented in order from January through December, you would modify the query above as below:

TRANSFORM sum(OrderAmount)
select EmployeeID from Orders group by EmployeeID
pivot format(month(OrderDate), "mmm")
in ("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")

In the query above, you will see that not only does the crosstab query always present its results in 12 columns (1 for each month) even if some months have no data, but the order in which the months are presented is always as specified in the IN clause of the query.

Crosstab queries can also take parameters, just like other types of queries. But, the parameters always have to be declared explicitly as illustrated in the post on parameter queries. Thus, the following query will generate a syntax error:

TRANSFORM sum(OrderAmount)
select EmployeeID from Orders
where EmployeeID < [Enter Maximum Employee ID:]
group by EmployeeID
pivot format(month(OrderDate), "mmm")
in ("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")

But the query below will work fine because we declare the parameter explicitly:

Parameters [Enter Maximum Employee ID:] short;
TRANSFORM sum(OrderAmount)
select EmployeeID from Orders
where EmployeeID < [Enter Maximum Employee ID:]
group by EmployeeID
pivot format(month(OrderDate), "mmm")
in ("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")

Parameters can be used anywhere in the query except inside the IN clause of the query.

I will end this post with one last tip: We already saw how row-totals are easy to include in a crosstab query simply by adding the total to the select query inside the crosstab query. How do you include column totals though? It turns out there is no easy way to do this directly using a single query. The simplest solution is to create a query that generates a single row of totals and then UNION it with the crosstab query to provide the column totals. This is illustrated in the query example below:

TRANSFORM sum(OrderAmount)
select EmployeeID from Orders group by EmployeeID
pivot month(OrderDate) in (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
UNION
select "TOTAL",
(select sum(OrderAmount) from Orders where month(OrderDate) = 1 group by month(OrderDate)),
(select sum(OrderAmount) from Orders where month(OrderDate = 2 group by month(OrderDate)),
...
from Orders

Obviously, not the most elegant thing in the world, but hopefully, functional. In Access 2007, Microsoft has created a facility to create column totals directly when looking at the results from a crosstab query, so this approach may not be necessary anymore. But the column totals are not an integral part of the query results, so if you are using the query results directly for further processing, and you need column totals, you might have to write up a UNION query as above to get them.

Hope this post has been helpful in solving any problems you might have had with crosstab queries in Access. If you have any problems or concerns with the queries in this post, please feel free to let me know by posting a comment. If you have other questions on Access that you would like me to address in future posts, please feel free to let me know through your comments too. Good luck!

6 comments:

Anonymous said...

Very interesting article; potentially useful. BUT I followed your advice for generating Crosstab column totals and came unstuck. I successfully created a normal Crosstab query. I also successfully created a SELECT query that generates the desired additional row of column totals. But when I tried to stitch the two queries together by using a UNION statement I get the message "Syntax error in TRANSFORM statement".

Here is my code:
PARAMETERS [spendyear?] Short, [Account?] Text ( 255 );
TRANSFORM Sum(SpendPeriods.Spend) AS SumOfSpend
SELECT SpendPeriods.Category, Sum(SpendPeriods.Spend) AS [Total Of Spend]
FROM SpendPeriods INNER JOIN MonthIDs ON SpendPeriods.SpendMonth = MonthIDs.MonthNo
WHERE (((SpendPeriods.SpendYear)=[SpendYear?]) AND ((SpendPeriods.Account)=[Account?]))
GROUP BY SpendPeriods.Category, SpendPeriods.Account, SpendPeriods.SpendYear
ORDER BY MonthIDs.MonthName
PIVOT MonthIDs.MonthName In ("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov",

"Dec")
UNION
SELECT "All Categories" AS Category,
(SELECT Sum(Spend) FROM SpendPeriods) AS YearTotal,
(SELECT Sum(Spend) FROM SpendPeriods WHERE SpendPeriods.SpendMonth=1) AS Jan,
(SELECT Sum(Spend) FROM SpendPeriods WHERE SpendPeriods.SpendMonth=2) AS Feb,
(SELECT Sum(Spend) FROM SpendPeriods WHERE SpendPeriods.SpendMonth=3) AS Mar,
(SELECT Sum(Spend) FROM SpendPeriods WHERE SpendPeriods.SpendMonth=4) AS Apr,
(SELECT Sum(Spend) FROM SpendPeriods WHERE SpendPeriods.SpendMonth=5) AS May,
(SELECT Sum(Spend) FROM SpendPeriods WHERE SpendPeriods.SpendMonth=6) AS Jun,
(SELECT Sum(Spend) FROM SpendPeriods WHERE SpendPeriods.SpendMonth=7) AS Jul,
(SELECT Sum(Spend) FROM SpendPeriods WHERE SpendPeriods.SpendMonth=8) AS Aug,
(SELECT Sum(Spend) FROM SpendPeriods WHERE SpendPeriods.SpendMonth=9) AS Sep,
(SELECT Sum(Spend) FROM SpendPeriods WHERE SpendPeriods.SpendMonth=10) AS Oct,
(SELECT Sum(Spend) FROM SpendPeriods WHERE SpendPeriods.SpendMonth=11) AS Nov,
(SELECT Sum(Spend) FROM SpendPeriods WHERE SpendPeriods.SpendMonth=12) AS Dec
FROM SpendPeriods
WHERE (((SpendPeriods.SpendYear)=[SpendYear?]) AND ((SpendPeriods.Account)=[Account?]))
GROUP BY "All Categories";

Blogannath said...

I just tried it, and you are right that a crosstab query does not like being combined with another query using a UNION statement directly. But all you have to do is save each of your queries separately (say as query1 and query2), and then write a third query as below:

select * from query1
UNION
select * from query2

This works and produces the the column totals in addition to the original crosstab.

Anonymous said...

Thanks for your response to my problem, and your solution. You're right; it works!

THANK YOU.

Anonymous said...

hi master..

when i wrote sql query in VBA

example, "SELECT Format([dateprod],"mmmm-yyyy") AS Month1 from table1"

vba give me warning like that

Expected: end of statement..

and cursor is block on "mmmm-yyyy" area.

please show to me how to write that.
thank you

Blogannath said...

Why are you writing a SQL query in VBA? If you are creating a SQL statement as a string with the intention of executing it in VBA, you have to make sure to double the quotes within the string or use chr(34) to denote quotes. If you have never created a SQL statement within VBA, I suggest you look at some examples that include enclosed strings.

The select statement is syntactically correct, but I have no idea how (or why) you are writing your VBA, and since I can't see the full code, I can't do much else to help.

Anonymous said...

Thank you so much for this article! I had a report based on a crosstab query, and had trouble (like you mentioned) since the columns sometimes didn't exist. I could not find a solution for a long time, and this article was a life saver! My reports now look perfect. Thank you!

Visitors Country Map

Free counters!

Content From TheFreeDictionary.com

In the News

Article of the Day

This Day in History

Today's Birthday

Quote of the Day

Word of the Day

Match Up
Match each word in the left column with its synonym on the right. When finished, click Answer to see the results. Good luck!

 

Hangman

Spelling Bee
difficulty level:
score: -
please wait...
 
spell the word:

Search The Web