Search The Web

Today's Headlines

Thursday, November 12, 2009

Microsoft Access Tips & Tricks: Finding The Mode

Just as there is no built-in SQL aggregate function for the median, there is no built-in SQL aggregate function for the mode of a field. However, it is not difficult to cook up SQL that can produce the mode of a field quickly.

You can find the previous post in this series, that talks about finding the median of a data field using SQL here.

As you know, the mode of a field is the value that occurs most frequently in that field. If the dataset has only one value that occurs most frequently in that field, that field is said to be unimodal. If there are several values occuring with the same frequency in that field, the field is said to be multi-modal.

Most of the time, we are interested in an SQL query that gives us either the unique mode of a unimodal field or one of the modes of a multi-modal field. However the method we use can easily be used to find all the modes of a multi-modal field also.

Now, we know that a count(*) in conjunction with a GROUP BY clause produces counts for each group produced by the GROUP BY clause. We take advantage of this to write the following SQL that calculates the mode of a field as below:
SELECT top 1 myField
FROM myTable
GROUP BY myField
ORDER BY count(*) DESC, myField DESC

By having the second criterion in the ORDER BY clause be the value of myField itself, we have forced the query above to produce only one mode even if there are several modes. In this case, this will give the highest of the modes. To find the lowest of the modes, modify the query as below:
SELECT top 1 myField
FROM myTable
GROUP BY myField
ORDER BY count(*) DESC, myField

If you want to see all the modes of a multi-modal field, leave out ordering by myField entirely, to get SQL as below:
SELECT top 1 myField
FROM myTable
GROUP BY myField
ORDER BY count(*) DESC

As you can see, the structure of this SQL is quite flexible and allows you to modify it to do several things similar to finding the mode. For instance, you can find the value that occurs least frequently in a field by simply modifying the SQL as below:
SELECT top 1 myField
FROM myTable
GROUP BY myField
ORDER BY count(*), myField desc

This produces the maximum value of the least frequently occuring value in myField if there are several values in myField that occur with the same minimum frequency. Modifying this SQL to produce the minimum such value or even producing all values that occur with the minimum frequency should be trivial at this point.

If you also require the actual frequency in addition to the mode, you can include the count(*) in the SELECT clause as below:
SELECT top 1 myField, count(*)
FROM myTable
GROUP BY myField
ORDER BY count(*) DESC, myField desc

The above query produces the mode and the associated frequency. If you want the frequencies of all values in myField, remove the top 1 restriction on the query as below:
SELECT myField, count(*)
FROM myTable
GROUP BY myField
ORDER BY count(*) DESC, myField desc

You can easily change the ORDER BY clause to order by the value of myField instead of frequency, for instance. Hope you find this trick useful the next time you have to find the mode or frequencies of some field in Access.

If you prefer to do it in VBA as a user-defined function instead of using SQL, you can use the function code below. There is really no reason to do this since it is easy enough to calculate the median using just SQL, but if you are more comfortable in VBA than in SQL (especially if you are calculating the median of a set of records derived after heavy filtering, for instance), the function is there as a backup to the SQL.

Note that this function is designed, for the sake of consistency, to be called in the same way as you would call any built-in domain aggregate function such as DSum(), or DAvg(). The arguments are strings representing expression (the table field or calculated field for which you need the median), domain (the table or query name that constitutes the domain), criteria (an optional string expression used to restrict the range of data on which the function is performed, usually written like the WHERE clause of a query without the word WHERE), and minOrMax (an optional integer value that tells the function whether you want the minimum mode or the maximum mode if the expression is multi-modal, 0 being the default and giving you the maximum and any other value giving you the minimum mode). If you want to specify a value for minOrMax without specifying criteria, leave the criteria field blank or use "True" as the value for criteria.

If you are not sure how to use the function below, search Access help on using a function such as DSum(), look at the examples provided, and use the function below in almost exactly the same way (except for the minOrMax optional argument, which can be safely omitted most of the time).
Function DMode(expr As String, domain As String, Optional criteria As String, Optional minOrMax As Integer) As Double
Dim dbs As Database
Dim rst As Recordset
Dim orderBy As String
Dim currentValue As Double
Dim currentFreq As Integer
Dim maxFreq As Integer
Dim numberOfRecords as Integer
maxFreq = 0

orderBy = " order by " & expr
If minOrMax <> 0 Then
orderBy = orderBy & " desc"
End If

Set dbs = CurrentDb
If Len(criteria) <> 0 Then
Set rst = dbs.OpenRecordset("select " & expr & " from " & domain & " where " & criteria & orderBy)
Else
Set rst = dbs.OpenRecordset("select " & expr & " from " & domain & orderBy)
'Make sure the spaces inside the quotes are preserved, otherwise your SQL will
'not be syntactically correct and Access will complain!
End If

If rst.BOF Then
numberOfRecords = 0
Else
rst.MoveLast
numberOfRecords = rst.RecordCount
'You need the Movelast to get the correct record count out of a recordset
End If
If numberOfRecords = 0 Then
DMode = 0
'We assume the mode is 0 when the number of records is zero
ElseIf numberOfRecords = 1 Then
DMode = rst(expr)
'If there is only one record, the value of expression in that record is the mode
Else
rst.MoveFirst
currentValue = rst(expr)
currentFreq = 1
If currentFreq >= maxFreq Then
DMode = currentValue
maxFreq = currentFreq
End If
rst.MoveNext
'Set up the initial conditions based on the first record
Do While Not rst.EOF
'Now loop through the entire recordset
If rst(expr) = currentValue Then
'If the field's value does not change, update the frequency of occurrence of that value
currentFreq = currentFreq + 1
Else
'Otherwise, reset the current frequency back to 1
currentValue = rst(expr)
currentFreq = 1
End If
'Now check to see if the frequency is the maximum encountered so far.
'If so, update the maximum frequency, and make this value of the field the mode
If currentFreq >= maxFreq Then
DMode = currentValue
maxFreq = currentFreq
End If
rst.MoveNext
Loop
End If
rst.Close
Set rst = Nothing
dbs.Close
Set dbs = Nothing
'Cleanup everything before leaving the function
End Function

This code has been checked in Access 2003. It should work in any version of Access from Access 97 on up, but please do let me know through the comments if you have problems using the code. Good luck!

No comments:

Visitors Country Map

Free counters!

Content From TheFreeDictionary.com

In the News

Article of the Day

This Day in History

Today's Birthday

Quote of the Day

Word of the Day

Match Up
Match each word in the left column with its synonym on the right. When finished, click Answer to see the results. Good luck!

 

Hangman

Spelling Bee
difficulty level:
score: -
please wait...
 
spell the word:

Search The Web