You can find the previous post in this series, that talks about finding the median of a data field using SQL here.

As you know, the mode of a field is the value that occurs most frequently in that field. If the dataset has only one value that occurs most frequently in that field, that field is said to be unimodal. If there are several values occuring with the same frequency in that field, the field is said to be multi-modal.

Most of the time, we are interested in an SQL query that gives us either the unique mode of a unimodal field or one of the modes of a multi-modal field. However the method we use can easily be used to find all the modes of a multi-modal field also.

Now, we know that a count(*) in conjunction with a GROUP BY clause produces counts for each group produced by the GROUP BY clause. We take advantage of this to write the following SQL that calculates the mode of a field as below:

SELECT top 1 myField

FROM myTable

GROUP BY myField

ORDER BY count(*) DESC, myField DESC

By having the second criterion in the ORDER BY clause be the value of myField itself, we have forced the query above to produce only one mode even if there are several modes. In this case, this will give the highest of the modes. To find the lowest of the modes, modify the query as below:

SELECT top 1 myField

FROM myTable

GROUP BY myField

ORDER BY count(*) DESC, myField

If you want to see all the modes of a multi-modal field, leave out ordering by myField entirely, to get SQL as below:

SELECT top 1 myField

FROM myTable

GROUP BY myField

ORDER BY count(*) DESC

As you can see, the structure of this SQL is quite flexible and allows you to modify it to do several things similar to finding the mode. For instance, you can find the value that occurs least frequently in a field by simply modifying the SQL as below:

SELECT top 1 myField

FROM myTable

GROUP BY myField

ORDER BY count(*), myField desc

This produces the maximum value of the least frequently occuring value in myField if there are several values in myField that occur with the same minimum frequency. Modifying this SQL to produce the minimum such value or even producing all values that occur with the minimum frequency should be trivial at this point.

If you also require the actual frequency in addition to the mode, you can include the count(*) in the SELECT clause as below:

SELECT top 1 myField, count(*)

FROM myTable

GROUP BY myField

ORDER BY count(*) DESC, myField desc

The above query produces the mode and the associated frequency. If you want the frequencies of all values in myField, remove the top 1 restriction on the query as below:

SELECT myField, count(*)

FROM myTable

GROUP BY myField

ORDER BY count(*) DESC, myField desc

You can easily change the ORDER BY clause to order by the value of myField instead of frequency, for instance. Hope you find this trick useful the next time you have to find the mode or frequencies of some field in Access.

If you prefer to do it in VBA as a user-defined function instead of using SQL, you can use the function code below. There is really no reason to do this since it is easy enough to calculate the median using just SQL, but if you are more comfortable in VBA than in SQL (especially if you are calculating the median of a set of records derived after heavy filtering, for instance), the function is there as a backup to the SQL.

Note that this function is designed, for the sake of consistency, to be called in the same way as you would call any built-in domain aggregate function such as DSum(), or DAvg(). The arguments are strings representing expression (the table field or calculated field for which you need the median), domain (the table or query name that constitutes the domain), criteria (an optional string expression used to restrict the range of data on which the function is performed, usually written like the WHERE clause of a query without the word WHERE), and minOrMax (an optional integer value that tells the function whether you want the minimum mode or the maximum mode if the expression is multi-modal, 0 being the default and giving you the maximum and any other value giving you the minimum mode). If you want to specify a value for minOrMax without specifying criteria, leave the criteria field blank or use "True" as the value for criteria.

If you are not sure how to use the function below, search Access help on using a function such as DSum(), look at the examples provided, and use the function below in almost exactly the same way (except for the minOrMax optional argument, which can be safely omitted most of the time).

Function DMode(expr As String, domain As String, Optional criteria As String, Optional minOrMax As Integer) As Double

Dim dbs As Database

Dim rst As Recordset

Dim orderBy As String

Dim currentValue As Double

Dim currentFreq As Integer

Dim maxFreq As Integer

Dim numberOfRecords as Integer

maxFreq = 0

orderBy = " order by " & expr

If minOrMax <> 0 Then

orderBy = orderBy & " desc"

End If

Set dbs = CurrentDb

If Len(criteria) <> 0 Then

Set rst = dbs.OpenRecordset("select " & expr & " from " & domain & " where " & criteria & orderBy)

Else

Set rst = dbs.OpenRecordset("select " & expr & " from " & domain & orderBy)

'Make sure the spaces inside the quotes are preserved, otherwise your SQL will

'not be syntactically correct and Access will complain!

End If

If rst.BOF Then

numberOfRecords = 0

Else

rst.MoveLast

numberOfRecords = rst.RecordCount

'You need the Movelast to get the correct record count out of a recordset

End If

If numberOfRecords = 0 Then

DMode = 0

'We assume the mode is 0 when the number of records is zero

ElseIf numberOfRecords = 1 Then

DMode = rst(expr)

'If there is only one record, the value of expression in that record is the mode

Else

rst.MoveFirst

currentValue = rst(expr)

currentFreq = 1

If currentFreq >= maxFreq Then

DMode = currentValue

maxFreq = currentFreq

End If

rst.MoveNext

'Set up the initial conditions based on the first record

Do While Not rst.EOF

'Now loop through the entire recordset

If rst(expr) = currentValue Then

'If the field's value does not change, update the frequency of occurrence of that value

currentFreq = currentFreq + 1

Else

'Otherwise, reset the current frequency back to 1

currentValue = rst(expr)

currentFreq = 1

End If

'Now check to see if the frequency is the maximum encountered so far.

'If so, update the maximum frequency, and make this value of the field the mode

If currentFreq >= maxFreq Then

DMode = currentValue

maxFreq = currentFreq

End If

rst.MoveNext

Loop

End If

rst.Close

Set rst = Nothing

dbs.Close

Set dbs = Nothing

'Cleanup everything before leaving the function

End Function

This code has been checked in Access 2003. It should work in any version of Access from Access 97 on up, but please do let me know through the comments if you have problems using the code. Good luck!

## No comments:

Post a Comment