Search The Web

Today's Headlines

Monday, December 7, 2009

Microsoft Access Tips & Tricks: Running Sums & Averages

In previous lessons in this series, we have talked about the calculation of the median, the mode, and the geometric and harmonic means of a field. We have also learnt how to rank every row in a query, and how to pull out random rows from a table using SQL. In this lesson, we will deal with the calculation of a running sum or running average of a given field in a query.

Why would you want a running sum and/or running average of a field? Basically, it is just another statistic that may be interesting to you. For instance, if you have a travel diary in Access, and you want to calculate the progression of the total distance or travel time as more and more trips are added to your diary, you would want a running sum. If you want the progression of how the average distance or travel time has varied over time, you might want to associate a running average with each trip that tells you what the average was at the end of that trip.

We will assume that the data in the table makes every row unique in some fashion. Either there is a timestamp associated with each row that tells you how the rows are to be ordered or you have used an autonumber field as the primary key. We will first deal with the easier case of calculating running sums, then we will deal separately with calculating running averages.

You can easily calculate running sums with SQL that is very similar to how we calculated ranks for each row in an earlier lesson. There are actually two ways of calculating a running sum as shown below (equivalent to the two constructs we explored for ranking every row), and both the methods produce identical results. Let us first deal with the case where our query produces output with just one field from the table. Let that field be myField. We are assuming that each value of myField is unique and the ordering of myField in ascending order makes some logical sense. The two SQL constructs below will produce myField as well as a running sum of myField for each row in the query results. Both the constructs produce identical results, so you can choose to use whichever one catches your fancy!

SELECT (SELECT sum(myField) from myTable where myField <= T.myField)
as RunningSum,
myField from myTable as T ORDER BY myField

SELECT (SELECT sum(myField) + T.myField from myTable where myField < T.myField)
as RunningSum,
myField from myTable as T ORDER BY myField

In the subquery inside each of these queries, we are adding the value of myField from the rows which have a value of myField less than (or less than or equal to) the value of myField we are dealing with in the outer query. We are then using that sum + the current value of myField, or the sum itself (if the comparison is less than or equal to since less than or equal to would include the current value of myField in the sum) as the running sum for the row.

Actually though, under most conditions, you would probably want to order by one field while displaying some other field and its running sum (for instance, in the case of travel diary, you might want to order by a date/time field that denotes when the trip took place, or an autonumber field that tells you in which order the trips took place, and actually display the distance and its running sum).

In the next example, we see two equivalent SQL queries that order rows by an autonumber field called myAutoNumberField (we assume that myAutoNumberField is not a random autonumber field since sorting by a random autonumber field will not make logical sense in most cases) while displaying myField and a running sum for myField:

SELECT (SELECT sum(myField) from myTable where myAutoNumberField <= T.myAutoNumberField)
as RunningSum,
myField from myTable as T ORDER BY myAutoNumberField

SELECT (SELECT sum(myField) + T.myField from myTable where myAutoNumberField < T.myAutoNumberField)
as RunningSum,
myField from myTable as T ORDER BY myAutoNumberField
Notice that the autonumber field itself is not displayed in the results of the query at all. We can still use it in the subquery to calculate the running sum without any problems. If we do want to display the autonumber field also, we can add it to the outer query and it will not make a difference to the running sums.

What happens if we are pulling out a subset of rows in the table based on some filtering criteria? Suppose we use some condition C (which is a boolean construct that determines whether a given row is part of the result set or not) to select rows for inclusion in the results of the query. The SQL below shows how to calculate a running sum in such a query:

SELECT (SELECT sum(myField) from myTable where (myField <= T.myField) AND (C))
as RunningSum,
myField from myTable as T where C ORDER BY myField

SELECT (SELECT sum(myField) + T.myField from myTable where (myField < T.myField) AND (C))
as RunningSum,
myField from myTable as T where C ORDER BY myField
Once again, we have assumed that ordering the query by myField makes logical sense. More often than not, you probably want to order by timestamp or autonumber field while calculating cumulative statistics for a different field. The SQL for doing that is shown below:

SELECT (SELECT sum(myField) from myTable
where (myAutoNumberField <= T.myAutoNumberField) AND (C))
as RunningSum,
myField from myTable as T where C ORDER BY myAutoNumberField

SELECT (SELECT sum(myField) + T.myField from myTable
where (myAutoNumberField < T.myAutoNumberField) AND (C))
as RunningSum,
myField from myTable as T where C ORDER BY myAutoNumberField
Now, let us see how we tackle the case of calculating running sums for more than one field at the same time. For instance, we may need the running sums for both distance and time (perhaps, so that we can use them to calculate a running average speed) from a travel diary. We will assume that the rows in the query are going to be ordered by an autonumber field (ordering by distance or time in the case of a travel diary does not make much logical sense).

The main problem you encounter is that the way the original query is structured, the subquery can return only one field for each row pulled out by the query. So, if you modify the query as below, it will be syntactically incorrect, and Access will complain:

SELECT (SELECT sum(myField1), sum(myField2) from myTable
where myAutoNumberField <= T.myAutoNumberField),
myField1, myField2 from myTable as T ORDER BY myAutoNumberField

So, how exactly do we get around this limitation? It is actually quite simple, but it does make the SQL quite long, and makes it look complicated. But remember, even though it looks complicated, it actually is not. Essentially, what we are going to do is calculate the two running sums in two separate subqueries, as shown below:

SELECT (SELECT sum(myField1) from myTable
where myAutoNumberField <= T.myAutoNumberField)
as RunningSumField1,
(SELECT sum(myField2) from myTable
where myAutoNumberField <= T.myAutoNumberField)
as RunningSumField2,
myField1, myField2 from myTable as T ORDER BY myAutoNumberField

Now that we have tackled running sums, let us tackle the problem of running averages. This is where the previous lesson on ranking rows in a query comes in handy. Let us recall that the running average corresponding to a row in the query results is nothing but the running sum of that row divided by the rank of that row. Thus, calculating the running average turns out to be quite simple once we combine the SQL from both lessons.

First we will cover the case of a running average for one field. The SQL below should be self-explanatory given the definition in the previous paragraph. If you are not sure, you should review the lesson on ranking every row in a query also:
SELECT
((SELECT sum(myField) from myTable where myAutoNumberField <= T.myAutoNumberField)/
(SELECT count(*) from myTable where myAutoNumberField <= T.myAutoNumberField))
as RunningAverage,
myField from myTable as T ORDER BY myAutoNumberField

To extend this SQL for multiple fields, we follow the same basic technique we used for extending running sums to multiple fields: we create each running average as a separate subquery. The SQL below illustrates the technique:
SELECT
((SELECT sum(myField1) from myTable where myAutoNumberField <= T.myAutoNumberField)/
(SELECT count(*) from myTable where myAutoNumberField <= T.myAutoNumberField))
as RunningAverage1,
((SELECT sum(myField2) from myTable where myAutoNumberField <= T.myAutoNumberField)/
(SELECT count(*) from myTable where myAutoNumberField <= T.myAutoNumberField))
as RunningAverage2,
myField1, myField2 from myTable as T ORDER BY myAutoNumberField
If you want both running sums and running averages, you have to expand the SQL above to include the subqueries that calculate each of them explicitly, as shown below:
SELECT
((SELECT sum(myField) from myTable where myAutoNumberField <= T.myAutoNumberField)/
(SELECT count(*) from myTable where myAutoNumberField <= T.myAutoNumberField))
as RunningAverage,
(SELECT sum(myField) from myTable where myAutoNumberField <= T.myAutoNumberField)
as RunningSum,
myField from myTable as T ORDER BY myAutoNumberField
It should be easy enough to extend this type of construct to include multiple fields if necessary. You can also explicitly write out the rank in addition to the running statistics by adding the subquery for it to the above query.

Hope this post has been helpful in solving any problems you might have had with calculating running statistics such as running sums and averages. If you have any problems or concerns with the queries in this lesson, please feel free to let me know by posting a comment. Good luck!

No comments:

Visitors Country Map

Free counters!

Content From TheFreeDictionary.com

In the News

Article of the Day

This Day in History

Today's Birthday

Quote of the Day

Word of the Day

Match Up
Match each word in the left column with its synonym on the right. When finished, click Answer to see the results. Good luck!

 

Hangman

Spelling Bee
difficulty level:
score: -
please wait...
 
spell the word:

Search The Web