SQL provides the built-in aggregate function, AVG(), to calculate the arithmetic mean of a set of data in a field in a table. There is also a domain aggregate function in VBA called DAvg() that can be used to calculate the arithmetic mean in VBA code.

However, statisticians and data analysts sometimes need to calculate other averages for a given data set. Previously, we have seen how to calculate medians and modes (which are also considered averages in the most general use of the term average) in Microsoft Access using SQL or VBA. Two other important measures of central tendency are called the geometric mean and the harmonic mean.

The geometric mean of n numbers is obtained by multiplying them all together and then taking the nth root. The geometric mean is defined only for datasets which contain all positive numbers. The geometric mean is useful in calculating averages that are compounded over and over again. For instance, when we have rates of return for multiple years, the average rate of return would be calculated using a geometric mean because of the compounding of the returns from period to period.

The harmonic mean for a set of numbers a1, a2, ..., an is defined as the reciprocal of the arithmetic mean of the reciprocals of ai's. The harmonic mean is defined only for datasets which contain only non-zero values. The harmonic mean is very useful when calculating weighted averages of values that are ratios, weighted with respect to the value in the denominator. A simple example of such an average would be the average speed of a set of trips at different individual speeds. Speed is the ratio of distance to time and the average speed is the weighted average of the different speeds weighted with respect to the time.

The arithmetic, geometric and harmonic means are considered the three Pythagorean means. For a given set of numbers (that satisfy the conditions that enable calculation of all three means, and which contains at least one value different from the others in the set), the arithmetic mean is always the highest of the three means. The harmonic mean is the lowest of the three means. The geometric mean is always between the arithmetic and harmonic means.

Even though SQL does not provide us with a built-in function to calculate geometric and harmonic means, it is easy to come up with our own SQL to do it.

First we will deal with the geometric mean. The geometric mean can be thought of as the antilogarithm of the arithmetic mean of the logarithms of the numbers. We can take advantage of this in Microsoft Access by writing SQL as below to calculate the geometric mean of field in a table:

SELECT exp(avg(log(myField))) FROM myTable

In the SQL above, we take the arithmetic mean of the logarithms of the values in myField using avg(log(myField)). Note that we are using natural logarithms in this case. We then take the antilogarithm of this arithmetic mean by using a function call to exp, which raises e to the value passed as its argument. Raising e to a power is the same as taking a natural antilogarithm of that power.

Obviously, you can also calculate the geometric mean in a VBA function similar to the Domain Aggregate functions like DSum(), DAvg(), etc. There is no built-in DGMean() function in Access, but it would be easy to write one as below, and call it where appropriate. There is really no reason to do this since it is easy enough to calculate the geometric mean using just SQL, but if you are more comfortable in VBA than in SQL (especially if you are calculating the geometric mean of a set of records derived after heavy filtering, for instance), the function is there as a backup to the SQL.

This function would take the same arguments as the other domain aggregate functions for the sake of being consistent. The arguments are strings representing expression (the table field or calculated field for which you need the median), domain (the table or query name that constitutes the domain), and criteria (an optional string expression used to restrict the range of data on which the function is performed, usually written like the WHERE clause of a query without the word WHERE). If you are not sure how to use the function below, search Access help on using a function such as DSum(), look at the examples provided, and use the function below in exactly the same way.

Function DGMean(expr As String, domain As String, Optional criteria As String) As Double

Dim dbs As Database

Dim rst As Recordset

Dim product As Double

Dim numberOfRecords As Integer

Set dbs = CurrentDb

If Len(criteria) <> 0 Then

Set rst = dbs.OpenRecordset("select " & expr & " from " & domain & " where " & criteria)

Else

Set rst = dbs.OpenRecordset("select " & expr & " from " & domain)

'Make sure the spaces inside the quotes are preserved, otherwise your SQL will

'not be syntactically correct and Access will complain!

End If

If rst.BOF Then

numberOfRecords = 0

Else

rst.MoveLast

numberOfRecords = rst.RecordCount

'You need the Movelast to get the correct record count out of a recordset

End If

If numberOfRecords = 0 Then

DGMean = 0

'We assume the geometric mean is 0 when the number of records is zero

ElseIf numberOfRecords = 1 Then

DGMean = rst(expr)

'If there is only one record, the value of expression in that record is the geometric mean

Else

rst.MoveFirst

product = rst(expr)

rst.MoveNext

'Set up the initial conditions based on the first record

Do While Not rst.EOF

'Now loop through the entire recordset

product = product * rst(expr)

rst.MoveNext

Loop

DGMean = product ^ (1# / numberOfRecords)

End If

rst.Close

Set rst = Nothing

dbs.Close

Set dbs = Nothing

'Cleanup everything before leaving the function

End Function

Now, we tackle the harmonic mean. Since the harmonic mean is simply the reciprocal of the average of reciprocals, the SQL to calculate it can be written as below:

SELECT count(*)/sum(1.0/myField) from myTable

We first calculate the denominator as the sum of the reciprocals of the values in the field. We then take the reciprocal of the average of this number by putting the number of values in the field as the numerator.

Obviously, you can also calculate the harmonic mean in a VBA function similar to the Domain Aggregate functions like DSum(), DAvg(), etc. There is no built-in DHMean() function in Access, but it would be easy to write one as below, and call it where appropriate. There is really no reason to do this since it is easy enough to calculate the harmonic mean using just SQL, but if you are more comfortable in VBA than in SQL (especially if you are calculating the harmonic mean of a set of records derived after heavy filtering, for instance), the function is there as a backup to the SQL.

This function would take the same arguments as the other domain aggregate functions for the sake of being consistent. The arguments are strings representing expression (the table field or calculated field for which you need the median), domain (the table or query name that constitutes the domain), and criteria (an optional string expression used to restrict the range of data on which the function is performed, usually written like the WHERE clause of a query without the word WHERE). If you are not sure how to use the function below, search Access help on using a function such as DSum(), look at the examples provided, and use the function below in exactly the same way.

Function DHMean(expr As String, domain As String, Optional criteria As String) As Double

Dim dbs As Database

Dim rst As Recordset

Dim reciprocal As Double

Dim sumOfReciprocals As Double

Dim numberOfRecords As Integer

Set dbs = CurrentDb

If Len(criteria) <> 0 Then

Set rst = dbs.OpenRecordset("select " & expr & " from " & domain & " where " & criteria)

Else

Set rst = dbs.OpenRecordset("select " & expr & " from " & domain)

'Make sure the spaces inside the quotes are preserved, otherwise your SQL will

'not be syntactically correct and Access will complain!

End If

If rst.BOF Then

numberOfRecords = 0

Else

rst.MoveLast

numberOfRecords = rst.RecordCount

'You need the Movelast to get the correct record count out of a recordset

End If

If numberOfRecords = 0 Then

DHMean = 0

'We assume the harmonic mean is 0 when the number of records is zero

ElseIf numberOfRecords = 1 Then

DHMean = rst(expr)

'If there is only one record, the value of expression in that record is the harmonic mean

Else

rst.MoveFirst

reciprocal = 1# / rst(expr)

sumOfReciprocals = reciprocal

rst.MoveNext

'Set up the initial conditions based on the first record

Do While Not rst.EOF

'Now loop through the entire recordset

reciprocal = 1# / rst(expr)

sumOfReciprocals = sumOfReciprocals + reciprocal

rst.MoveNext

Loop

DHMean = numberOfRecords / sumOfReciprocals

End If

rst.Close

Set rst = Nothing

dbs.Close

Set dbs = Nothing

'Cleanup everything before leaving the function

End Function

Both the functions above have been checked in Access 2003. They should work in any version of Access from Access 97 on up, but please do let me know through the comments if you have problems using the code. Good luck!

## No comments:

Post a Comment