Search The Web

Today's Headlines

Thursday, November 19, 2009

Microsoft Access Tips & Tricks: Geometric And Harmonic Means

The average, also known as the arithmetic mean is the most well-known statistical average. In fact most people use the terms arithmetic mean and average interchangeably even though, technically, an average is any measure of central tendency, not just the arithmetic mean.

SQL provides the built-in aggregate function, AVG(), to calculate the arithmetic mean of a set of data in a field in a table. There is also a domain aggregate function in VBA called DAvg() that can be used to calculate the arithmetic mean in VBA code.

However, statisticians and data analysts sometimes need to calculate other averages for a given data set. Previously, we have seen how to calculate medians and modes (which are also considered averages in the most general use of the term average) in Microsoft Access using SQL or VBA. Two other important measures of central tendency are called the geometric mean and the harmonic mean.

The geometric mean of n numbers is obtained by multiplying them all together and then taking the nth root. The geometric mean is defined only for datasets which contain all positive numbers. The geometric mean is useful in calculating averages that are compounded over and over again. For instance, when we have rates of return for multiple years, the average rate of return would be calculated using a geometric mean because of the compounding of the returns from period to period.


The harmonic mean for a set of numbers a1, a2, ..., an is defined as the reciprocal of the arithmetic mean of the reciprocals of ai's. The harmonic mean is defined only for datasets which contain only non-zero values. The harmonic mean is very useful when calculating weighted averages of values that are ratios, weighted with respect to the value in the denominator. A simple example of such an average would be the average speed of a set of trips at different individual speeds. Speed is the ratio of distance to time and the average speed is the weighted average of the different speeds weighted with respect to the time.


The arithmetic, geometric and harmonic means are considered the three Pythagorean means. For a given set of numbers (that satisfy the conditions that enable calculation of all three means, and which contains at least one value different from the others in the set), the arithmetic mean is always the highest of the three means. The harmonic mean is the lowest of the three means. The geometric mean is always between the arithmetic and harmonic means.

Even though SQL does not provide us with a built-in function to calculate geometric and harmonic means, it is easy to come up with our own SQL to do it.

First we will deal with the geometric mean. The geometric mean can be thought of as the antilogarithm of the arithmetic mean of the logarithms of the numbers. We can take advantage of this in Microsoft Access by writing SQL as below to calculate the geometric mean of field in a table:

SELECT exp(avg(log(myField))) FROM myTable

In the SQL above, we take the arithmetic mean of the logarithms of the values in myField using avg(log(myField)). Note that we are using natural logarithms in this case. We then take the antilogarithm of this arithmetic mean by using a function call to exp, which raises e to the value passed as its argument. Raising e to a power is the same as taking a natural antilogarithm of that power.

Obviously, you can also calculate the geometric mean in a VBA function similar to the Domain Aggregate functions like DSum(), DAvg(), etc. There is no built-in DGMean() function in Access, but it would be easy to write one as below, and call it where appropriate. There is really no reason to do this since it is easy enough to calculate the geometric mean using just SQL, but if you are more comfortable in VBA than in SQL (especially if you are calculating the geometric mean of a set of records derived after heavy filtering, for instance), the function is there as a backup to the SQL.

This function would take the same arguments as the other domain aggregate functions for the sake of being consistent. The arguments are strings representing expression (the table field or calculated field for which you need the median), domain (the table or query name that constitutes the domain), and criteria (an optional string expression used to restrict the range of data on which the function is performed, usually written like the WHERE clause of a query without the word WHERE). If you are not sure how to use the function below, search Access help on using a function such as DSum(), look at the examples provided, and use the function below in exactly the same way.

Function DGMean(expr As String, domain As String, Optional criteria As String) As Double
Dim dbs As Database
Dim rst As Recordset
Dim product As Double
Dim numberOfRecords As Integer

Set dbs = CurrentDb
If Len(criteria) <> 0 Then
Set rst = dbs.OpenRecordset("select " & expr & " from " & domain & " where " & criteria)
Else
Set rst = dbs.OpenRecordset("select " & expr & " from " & domain)
'Make sure the spaces inside the quotes are preserved, otherwise your SQL will
'not be syntactically correct and Access will complain!
End If

If rst.BOF Then
numberOfRecords = 0
Else
rst.MoveLast
numberOfRecords = rst.RecordCount
'You need the Movelast to get the correct record count out of a recordset
End If
If numberOfRecords = 0 Then
DGMean = 0
'We assume the geometric mean is 0 when the number of records is zero
ElseIf numberOfRecords = 1 Then
DGMean = rst(expr)
'If there is only one record, the value of expression in that record is the geometric mean
Else
rst.MoveFirst
product = rst(expr)
rst.MoveNext
'Set up the initial conditions based on the first record
Do While Not rst.EOF
'Now loop through the entire recordset
product = product * rst(expr)
rst.MoveNext
Loop
DGMean = product ^ (1# / numberOfRecords)
End If
rst.Close
Set rst = Nothing
dbs.Close
Set dbs = Nothing
'Cleanup everything before leaving the function
End Function

Now, we tackle the harmonic mean. Since the harmonic mean is simply the reciprocal of the average of reciprocals, the SQL to calculate it can be written as below:

SELECT count(*)/sum(1.0/myField) from myTable

We first calculate the denominator as the sum of the reciprocals of the values in the field. We then take the reciprocal of the average of this number by putting the number of values in the field as the numerator.

Obviously, you can also calculate the harmonic mean in a VBA function similar to the Domain Aggregate functions like DSum(), DAvg(), etc. There is no built-in DHMean() function in Access, but it would be easy to write one as below, and call it where appropriate. There is really no reason to do this since it is easy enough to calculate the harmonic mean using just SQL, but if you are more comfortable in VBA than in SQL (especially if you are calculating the harmonic mean of a set of records derived after heavy filtering, for instance), the function is there as a backup to the SQL.

This function would take the same arguments as the other domain aggregate functions for the sake of being consistent. The arguments are strings representing expression (the table field or calculated field for which you need the median), domain (the table or query name that constitutes the domain), and criteria (an optional string expression used to restrict the range of data on which the function is performed, usually written like the WHERE clause of a query without the word WHERE). If you are not sure how to use the function below, search Access help on using a function such as DSum(), look at the examples provided, and use the function below in exactly the same way.

Function DHMean(expr As String, domain As String, Optional criteria As String) As Double
Dim dbs As Database
Dim rst As Recordset
Dim reciprocal As Double
Dim sumOfReciprocals As Double
Dim numberOfRecords As Integer

Set dbs = CurrentDb
If Len(criteria) <> 0 Then
Set rst = dbs.OpenRecordset("select " & expr & " from " & domain & " where " & criteria)
Else
Set rst = dbs.OpenRecordset("select " & expr & " from " & domain)
'Make sure the spaces inside the quotes are preserved, otherwise your SQL will
'not be syntactically correct and Access will complain!
End If

If rst.BOF Then
numberOfRecords = 0
Else
rst.MoveLast
numberOfRecords = rst.RecordCount
'You need the Movelast to get the correct record count out of a recordset
End If
If numberOfRecords = 0 Then
DHMean = 0
'We assume the harmonic mean is 0 when the number of records is zero
ElseIf numberOfRecords = 1 Then
DHMean = rst(expr)
'If there is only one record, the value of expression in that record is the harmonic mean
Else
rst.MoveFirst
reciprocal = 1# / rst(expr)
sumOfReciprocals = reciprocal
rst.MoveNext
'Set up the initial conditions based on the first record
Do While Not rst.EOF
'Now loop through the entire recordset
reciprocal = 1# / rst(expr)
sumOfReciprocals = sumOfReciprocals + reciprocal
rst.MoveNext
Loop
DHMean = numberOfRecords / sumOfReciprocals
End If
rst.Close
Set rst = Nothing
dbs.Close
Set dbs = Nothing
'Cleanup everything before leaving the function
End Function

Both the functions above have been checked in Access 2003. They should work in any version of Access from Access 97 on up, but please do let me know through the comments if you have problems using the code. Good luck!

No comments:

Visitors Country Map

Free counters!

Content From TheFreeDictionary.com

In the News

Article of the Day

This Day in History

Today's Birthday

Quote of the Day

Word of the Day

Match Up
Match each word in the left column with its synonym on the right. When finished, click Answer to see the results. Good luck!

 

Hangman

Spelling Bee
difficulty level:
score: -
please wait...
 
spell the word:

Search The Web