Thursday, November 19, 2009

Microsoft Access Tips & Tricks: Geometric And Harmonic Means

The average, also known as the arithmetic mean is the most well-known statistical average. In fact most people use the terms arithmetic mean and average interchangeably even though, technically, an average is any measure of central tendency, not just the arithmetic mean.

SQL provides the built-in aggregate function, AVG(), to calculate the arithmetic mean of a set of data in a field in a table. There is also a domain aggregate function in VBA called DAvg() that can be used to calculate the arithmetic mean in VBA code.

However, statisticians and data analysts sometimes need to calculate other averages for a given data set. Previously, we have seen how to calculate medians and modes (which are also considered averages in the most general use of the term average) in Microsoft Access using SQL or VBA. Two other important measures of central tendency are called the geometric mean and the harmonic mean.

The geometric mean of n numbers is obtained by multiplying them all together and then taking the nth root. The geometric mean is defined only for datasets which contain all positive numbers. The geometric mean is useful in calculating averages that are compounded over and over again. For instance, when we have rates of return for multiple years, the average rate of return would be calculated using a geometric mean because of the compounding of the returns from period to period.

The harmonic mean for a set of numbers a1, a2, ..., an is defined as the reciprocal of the arithmetic mean of the reciprocals of ai's. The harmonic mean is defined only for datasets which contain only non-zero values. The harmonic mean is very useful when calculating weighted averages of values that are ratios, weighted with respect to the value in the denominator. A simple example of such an average would be the average speed of a set of trips at different individual speeds. Speed is the ratio of distance to time and the average speed is the weighted average of the different speeds weighted with respect to the time.

The arithmetic, geometric and harmonic means are considered the three Pythagorean means. For a given set of numbers (that satisfy the conditions that enable calculation of all three means, and which contains at least one value different from the others in the set), the arithmetic mean is always the highest of the three means. The harmonic mean is the lowest of the three means. The geometric mean is always between the arithmetic and harmonic means.

Even though SQL does not provide us with a built-in function to calculate geometric and harmonic means, it is easy to come up with our own SQL to do it.

First we will deal with the geometric mean. The geometric mean can be thought of as the antilogarithm of the arithmetic mean of the logarithms of the numbers. We can take advantage of this in Microsoft Access by writing SQL as below to calculate the geometric mean of field in a table:
`SELECT exp(avg(log(myField))) FROM myTable`

In the SQL above, we take the arithmetic mean of the logarithms of the values in myField using avg(log(myField)). Note that we are using natural logarithms in this case. We then take the antilogarithm of this arithmetic mean by using a function call to exp, which raises e to the value passed as its argument. Raising e to a power is the same as taking a natural antilogarithm of that power.

Obviously, you can also calculate the geometric mean in a VBA function similar to the Domain Aggregate functions like DSum(), DAvg(), etc. There is no built-in DGMean() function in Access, but it would be easy to write one as below, and call it where appropriate. There is really no reason to do this since it is easy enough to calculate the geometric mean using just SQL, but if you are more comfortable in VBA than in SQL (especially if you are calculating the geometric mean of a set of records derived after heavy filtering, for instance), the function is there as a backup to the SQL.

This function would take the same arguments as the other domain aggregate functions for the sake of being consistent. The arguments are strings representing expression (the table field or calculated field for which you need the median), domain (the table or query name that constitutes the domain), and criteria (an optional string expression used to restrict the range of data on which the function is performed, usually written like the WHERE clause of a query without the word WHERE). If you are not sure how to use the function below, search Access help on using a function such as DSum(), look at the examples provided, and use the function below in exactly the same way.
`Function DGMean(expr As String, domain As String, Optional criteria As String) As DoubleDim dbs As DatabaseDim rst As RecordsetDim product As DoubleDim numberOfRecords As IntegerSet dbs = CurrentDbIf Len(criteria) <> 0 ThenSet rst = dbs.OpenRecordset("select " & expr & " from " & domain & " where " & criteria)ElseSet rst = dbs.OpenRecordset("select " & expr & " from " & domain)'Make sure the spaces inside the quotes are preserved, otherwise your SQL will'not be syntactically correct and Access will complain!End IfIf rst.BOF ThennumberOfRecords = 0Elserst.MoveLastnumberOfRecords = rst.RecordCount'You need the Movelast to get the correct record count out of a recordsetEnd IfIf numberOfRecords = 0 ThenDGMean = 0'We assume the geometric mean is 0 when the number of records is zeroElseIf numberOfRecords = 1 ThenDGMean = rst(expr)'If there is only one record, the value of expression in that record is the geometric meanElserst.MoveFirstproduct = rst(expr)rst.MoveNext'Set up the initial conditions based on the first recordDo While Not rst.EOF'Now loop through the entire recordsetproduct = product * rst(expr)rst.MoveNextLoopDGMean = product ^ (1# / numberOfRecords)End Ifrst.CloseSet rst = Nothingdbs.CloseSet dbs = Nothing'Cleanup everything before leaving the functionEnd Function`

Now, we tackle the harmonic mean. Since the harmonic mean is simply the reciprocal of the average of reciprocals, the SQL to calculate it can be written as below:
`SELECT count(*)/sum(1.0/myField) from myTable`

We first calculate the denominator as the sum of the reciprocals of the values in the field. We then take the reciprocal of the average of this number by putting the number of values in the field as the numerator.

Obviously, you can also calculate the harmonic mean in a VBA function similar to the Domain Aggregate functions like DSum(), DAvg(), etc. There is no built-in DHMean() function in Access, but it would be easy to write one as below, and call it where appropriate. There is really no reason to do this since it is easy enough to calculate the harmonic mean using just SQL, but if you are more comfortable in VBA than in SQL (especially if you are calculating the harmonic mean of a set of records derived after heavy filtering, for instance), the function is there as a backup to the SQL.

This function would take the same arguments as the other domain aggregate functions for the sake of being consistent. The arguments are strings representing expression (the table field or calculated field for which you need the median), domain (the table or query name that constitutes the domain), and criteria (an optional string expression used to restrict the range of data on which the function is performed, usually written like the WHERE clause of a query without the word WHERE). If you are not sure how to use the function below, search Access help on using a function such as DSum(), look at the examples provided, and use the function below in exactly the same way.
`Function DHMean(expr As String, domain As String, Optional criteria As String) As DoubleDim dbs As DatabaseDim rst As RecordsetDim reciprocal As DoubleDim sumOfReciprocals As DoubleDim numberOfRecords As IntegerSet dbs = CurrentDbIf Len(criteria) <> 0 ThenSet rst = dbs.OpenRecordset("select " & expr & " from " & domain & " where " & criteria)ElseSet rst = dbs.OpenRecordset("select " & expr & " from " & domain)'Make sure the spaces inside the quotes are preserved, otherwise your SQL will'not be syntactically correct and Access will complain!End IfIf rst.BOF ThennumberOfRecords = 0Elserst.MoveLastnumberOfRecords = rst.RecordCount'You need the Movelast to get the correct record count out of a recordsetEnd IfIf numberOfRecords = 0 ThenDHMean = 0'We assume the harmonic mean is 0 when the number of records is zeroElseIf numberOfRecords = 1 ThenDHMean = rst(expr)'If there is only one record, the value of expression in that record is the harmonic meanElserst.MoveFirstreciprocal = 1# / rst(expr)sumOfReciprocals = reciprocalrst.MoveNext'Set up the initial conditions based on the first recordDo While Not rst.EOF'Now loop through the entire recordsetreciprocal = 1# / rst(expr)sumOfReciprocals = sumOfReciprocals + reciprocalrst.MoveNextLoopDHMean = numberOfRecords / sumOfReciprocalsEnd Ifrst.CloseSet rst = Nothingdbs.CloseSet dbs = Nothing'Cleanup everything before leaving the functionEnd Function`

Both the functions above have been checked in Access 2003. They should work in any version of Access from Access 97 on up, but please do let me know through the comments if you have problems using the code. Good luck!

Anonymous said...

This post is a long time ago, but since I stumbled via google upon it, I feel that s.o. might read my comment.

For the harmonic mean you should not use

SELECT count(*)/sum(1.0/myField) from myTable

but

SELECT count(*)/sum(convert(float, 1)/myField) from myTable

because otherwise you run into rounding problems.
Try
select 2/(1.0/101 + 1.0/101)
And you will see that the result is 101.010101010 and not the expected 101.

http://stackoverflow.com/questions/1072806/sql-server-calculation-with-numeric-literals

Have fun,
Frank

Blogannath said...

Even though MS Access and SQL Server are both Microsoft products, there are lots of differences between them. This seems to be one of the instances when their behaviors are different. I put that query into Access and got back the expected 101 without any problem.

But, thank you for bringing up the issue and suggesting the use of a forced cast to get around the issue if it arises.

Content From TheFreeDictionary.com

In the News

Article of the Day

This Day in History

Today's Birthday

Quote of the Day

Word of the Day

Match Up
 Match each word in the left column with its synonym on the right. When finished, click Answer to see the results. Good luck!

Hangman

Spelling Bee
difficulty level:
score: -

spell the word: