Search The Web

Today's Headlines

Monday, February 1, 2010

Microsoft Access Tips & Tricks: Distinct Counts

Sometimes, one has a need to count the number of unique or distinct values in a field in a database table. In many database products, this is quite simple using an SQL construct that is not supported by Microsoft Access. This is a case where one needs to find a way around a limitation in Microsoft Access' implementation of SQL.

If you are interested, you can find my earlier posts on finding the median, the mode, the geometric and harmonic means, ranking every row in a query, selecting random rows out of a table, calculating running sums and averages, calculating running differences, creating histograms, calculating probability masses out of given data, calculating cumulative distributions out of given data, finding percentile scores, and percentile values.

Assume that you have a table, myTable, with a field myField that has the following values in it:

1
1
2
3
4
4
4
4
5
5

You see that there are 10 rows in the table, but only 5 different values of myField. When you write a query as below to get counts of this field from myTable, you only get 10 as the result:

select count(myField) from myTable

You can, of course, get a count of distinct values by writing a query as below, and then counting the number of rows in the output of the query:

select distinct myField from myTable

The above approach though has some drawbacks. The first and most obvious drawback is that the output of the query is a set of rows, not a single number like a count() normally provides. So, it is not possible to use this query where a system is expecting a single number rather than a table of values (for instance, as a subquery that provides a count).

In some other database products, the simple way to get around this problem is to write the SQL below:

select count(distinct myField) from myTable

Unfortunately, the SQL above will not work in Microsoft Access. The flavor of SQL implemented by Access does not recognize count(distinct myField) as a syntactically correct SQL construct.

There are two solutions to the above problem. Both of them involve subqueries, unfortunately. The first solution looks cleaner and is easier to understand. The SQL below illustrates this solution:

select count(*) from (select distinct myField from myTable)

The SQL above will produce 5 as the result for the example we set up at the beginning of the post. If we only want to count non-NULL values of myField, we can modify the SQL as below:

select count(myField) from (select distinct myField from myTable)

One can also use the construct below to exclude NULL values from the count:

select count(*) from (select distinct myField from myTable where myField is not NULL)

The second solution uses the SQL below:

select count(*) from (select myField from myTable group by myField)

As you can see, there is not much difference between the first solution and this second one. The subquery still produces a temporary table (in memory, not in the database) that lists only distinct values of myField, and the outer query uses this temporary table to count the number of distinct values. But this second solution is more wordy, and that is why I considered the first solution cleaner and easier to understand.

It is easy to modify the second solution also to exclude NULL values from the count, using either of the methods we employed with the first solution.

Now, let us look at an example of a tricky case of counting distinct values. Let us assume we have a database table called myCars which contains records of cars entering and leaving a parking lot. The table has two fields that are of interest to us. They are Make and Registration. The Make field contains the make of the car entering or leaving the parking lot. It might contain values such as BMW, Toyota, Honda, etc.

The Registration field contains the licence plate numbers or some other unique identifier of the cars entering and leaving the parking lot. So, examples of values in this field might be 123, ABC, FSTCAR, etc. Suppose, the data in our table looks like this:

Make Registration
BMW ABC123
Toyota JPNCAR
Honda MYHNDA
BMW ABC123
BMW AABMW
Toyota JPNCAR
BMW ABC123
Honda 123ABC

Now, suppose we want to find out how many different cars of various types used the parking lot that day based on this data. We want a table that tells us that 2 different BMW's used the parking lot as did 1 Toyota and 2 different Honda's. The SQL below will only tell us that 5 different cars used the parking lot:

select count(*) from (select distinct Registration from myCars)

How do we get a breakdown by the make of the car? The trick is to create a temporary table using a subquery that contains only unique cars along with their Make and Registration. We can then group this temporary table by Make, and count the number of Registrations. The resulting count will be a distinct count because the temporary table only contains distinct combinations of Make and Registration.

Using the insight above, the SQL below shows one way of counting cars by make from our table:

select Make, count(Registration) from
(select distinct Make, Registration from myCars)
group by Make

Let us analyze the above SQL in more detail. The subquery produces a temporary table that contains only unique combinations of Make and Registration. The outer query then groups this temporary table by Make and produces output that consists of the Make and the count of Registrations corresponding to each Make. That is obviously exactly what we want. The output from this query for the example table above would look as below:

Make Count(Registration)
BMW 2
Toyota 1
Honda 2

We can also modify the second solution to get the same results. The SQL below illustrates this:

select Make, count(Registration) from
(select Make, Registration from myCars group by Make, Registration)
group by Make

It is obvious that it is easy to extend each of the solutions above to more fields if needed. Suppose the table contained the model information of each car in addition to just the make, we could count the number of unique cars in the table by make and model, rather than just by make.

In general, suppose we have a table, myTable, with n fields, myField1 through myFieldn. Suppose we want to find the number of unique values of myField1 for each combination of myField2 through myFieldn. Then, you can use the SQL constructs below to achieve this:

select myField2, myField3, ..., myFieldn, count(myField1) from
(select distinct myField1, myField2, ..., myFieldn from myTable)
group by myField2, myField3, ..., myFieldn

Or:

select myField2, myField3, ..., myFieldn, count(myField1) from
(select myField1, myField2, ..., myFieldn from myTable group by myField1, myField2, ..., myFieldn)
group by myField2, myField3, ..., myFieldn

The interesting thing to note is the equivalency between the queries below:

select distinct myField from myTable

and

select myField from myTable group by myField

Hopefully, this equivalency will come in handy when one version of the query works better than another for whatever reason.

Hope this post has been helpful in solving any problems you might have had with calculating counts of distinct values out of your data in Access. If you have any problems or concerns with the queries in this lesson, please feel free to let me know by posting a comment. If you have other questions on Access that you would like me to address in future lessons, please feel free to let me know through your comments too. Good luck!

29 comments:

SaryAnn said...

This was EXACTLY what I was looking for! Thanks for posting this!!!

Blogannath said...

You are welcome. I am glad you found the post helpful.

Prenesh KP said...

this was one of the solutions that i was looking for. Also is it possible to add another field based on this distinct count. I need to add a field based on whether or not another field is distinct or not.

Blogannath said...

I am not sure I understand what your question is. Can you show me a small example of what the data looks like and what your query should pull up as the result? That way, I can get a better understanding of the problem and try to come up with a solution to help you out.

Prenesh KP said...

i have a database for quality reports from which i need to extract scores. i have claimnumbers, fileds audited and errors. Now one claim can have multiple errors. I need to calculate the claim quality which is 1- number of errors/number of claims with errors.

so now if there is one claim which has multiple errors i need to treat that as just one. each error would be in separate row so need to get unique numbers and calculate the errors based on that.

Prenesh KP said...

also in another table i have dates and volumes. how to calculate the weekly totals from this table. assuming week is 1 to 7th and so on..

Blogannath said...

The solution to your second problem is trivial: use datepart() with "ww" to get week numbers and group by that.

I will think about the first problem and post back when I have a solution.

Blogannath said...

OK, the number of errors is easy to calculate: you simply have to say "select count(*) from MyTable where ErrorField is not NULL".

The number of claims with errors does not seem to be too difficult either. I think the following query will work for that: "select count(*) from (select distinct claim from (select claim from MyTable where ErrorField is not NULL))

Note that I have used a nested query inside another nested query. I think it is legal, but I have not tested it in Access. Try it out and see what happens. If it gives you an error, you may have to play around with it a little bit to get it to work.

At worst, you might have to create each subquery as a separate query and then use those queries instead of the original table in other queries to finally get what you need.

Prenesh KP said...

it gives me a syntax error in join Operations.

Prenesh KP said...

ok got the solution.. there was no space between select and claim.. my mistake. got a number now will need to verify if thats correct or not.
thanks again.

Prenesh KP said...

iam getting the claim count correctly now.. how do i include sites and tasks as well in this query. i need to get the claim count sitewise and taskwise. I tried to include the sites after the SELECT and gave a group by site at the end. but its asking for parameter input.

Blogannath said...

The only reason Access will ask for a parameter input is if it thinks a query field is not a field in your table. So, make sure that you have spelled the field name correctly (including underscores, etc.). Also, if your field name includes spaces, you have to enclose the name in square brackets.

Prenesh KP said...

is there a way i can change the week numbers from 1 to 5 instead of it showing 1 to 52.. i run monthly reports so need only week 1 to week 5 at the max..

Blogannath said...

I know of no direct way to do what you want. You can use a table to look up the numbers 1 through 52 against a "week within month" and then use that, but you need one such table for every year. Or you can use a series of nested iif() statements to figure out the week number within a month from the date.

Prenesh KP said...

how do i do lookup from another table in that case.. i have created the table..

Blogannath said...

I am not sure what your level of experience with RDBMS software is. The way you look up data from another table is by using a join. I think it would be best to learn basics like joining tables before trying advanced SQL. Without a solid foundation you are not going to be able to maintain whatever it is you are putting together if the requirements change even a little bit. I hope you don't take this advice the wrong way.

Prenesh KP said...

when i do a lookup it doesnt show up on the table instead i have to click on the field and there it shows as a dropdown for me to select the week numbers for each row. how do i make it come there automatically like how we do in excell with vllookup.iam learning access and sql right now.

Blogannath said...

I have no idea how you are doing the lookup. Maybe you are talking about a lookup field in a table or something like that. What I mean is a join in the SQL. That is why I advised that you should learn the basics of SQL before you copy SQL into your project that you cannot maintain. The syntax of a join is something like select t1.myfield, t2.myfield from t1 inner join t2 on t1.somefield = t2.someotherfield . . .

Lookup fields in tables are an extremely bad idea, and just because Access provides people the opportunity to shoot themselves in the foot does not mean they should go ahead and take it.

Prenesh KP said...

thanks. i solved that problem by pulling the data from access to excel and then doing the lookup there and using that for my reports. Now i have another problem. I have username, application,task, volume and time in one table. iam trying to summarise this data by excluding few applications. I need a summary taskwise for each username. When i try i get application wise and not taskwise. however i try. i used "NOT" operator for excluding the unwanted tasks.how do i solve this.

Blogannath said...

If the summary is not by the field you want, then your group by clause is not using the the correct field. Access will produce a summary by whatever is in the group by clause.

Dorv said...

Can you take this type of SQL call and display it on a report in an unbound field?

My SQL code looks like this:

SELECT Count(*) FROM (select county from [county populations] where state =[Forms]![Launch].[StateFilter] );

which, when run in a query, returns the appropriate single number I'm looking for. How does that translate to what would go into an unbound field on a report (The table I'm pulling this data from is not the Record Source for the report).

Thanks!

Blogannath said...

Unfortunately, for report and form fields, queries do not work. You have to use domain aggregate functions like DCount() to populate such unbound fields.

Wardfire said...

Fantastic! Helped me solve my problem better than my teacher today at school, haha. Thank you! :)

oldalec said...

Using your example, how do I get a count of each distinct value in myField? I mean:
1 2
2 1
3 1
4 4
5 2

10 years ago I knew how to do this, but I'm now too old to spend days reinventing this particular wheel. thanks in advance.

Blogannath said...

Your problem has nothing to do with distinct counts though. It is simply a matter of counting and grouping:

select myField, count(*) from
myTable
group by
myField

should do what you are looking for (unless I am missing something).

oldalec said...

It really is like a language, use it or loose it. Thank you so very much.

Blogannath said...

You are welcome!

محمدعطاری said...

I want rank in a table based on a field in order 1,2,2,3,4,4,5 instead of that I am getting 1,2,2,4,5,5,7. Can anybody help me please?

Blogannath said...

Have you looked at the post on ranking every row in a query (http://blogannath.blogspot.com/2009/11/microsoft-access-tips-tricks-rank-every.html)?

Visitors Country Map

Free counters!

Content From TheFreeDictionary.com

In the News

Article of the Day

This Day in History

Today's Birthday

Quote of the Day

Word of the Day

Match Up
Match each word in the left column with its synonym on the right. When finished, click Answer to see the results. Good luck!

 

Hangman

Spelling Bee
difficulty level:
score: -
please wait...
 
spell the word:

Search The Web