## Monday, December 14, 2009

### Microsoft Access Tips & Tricks: Running Differences

In the previous post on Microsoft Access, we saw how to generate running sums and running averages. In this lesson, we will use SQL to calculate running differences between successive values of a field pulled out in a query.

In previous lessons in this series, we have talked about the calculation of the median, the mode, and the geometric and harmonic means of a field. We have also learnt how to rank every row in a query, and how to pull out random rows from a table using SQL.

Why would you require running differences of a given field in a query? There are several applications to this, but one of the most obvious might be something like this: suppose you maintain a record of when you bought fuel for your car. Suppose you note down the mileage from the odometer and the amount of fuel purchased (among other things) each time you fill up. You might then want to figure out how many miles the car went between fill-ups. You can easily find this by calculating the difference in mileage between this fill-up and the previous one, and then the difference between the previous one and the one before that, and so on.

As such, running differences mostly make sense only for quantities that increase continuously in the results of the query. Otherwise, the running differences can be positive sometimes and negative sometimes. This may make sense for something like a daily bank balance where we are interested in finding the difference from the previous day's or week's or month's balance and don't really care whether the difference is positive or negative.

First we will deal with the case where a particular field's value increases continuously and we are interested in finding running differences between successive values of this field. We will assume that the values in this field are all unique. Then our first attempt might be to come up with SQL like below:
`select myField,(myField - (select max(myField) from myTable where myField < T.myField))as RunningDifffrom myTable as T order by myField`
In the subquery, we select the maximum value of myField that is less than the current value of myField. If the values in myField are all unique, and the outer query is ordered by myField, then the subquery should return the value of myField from the previous row in the query. Subtracting that value from the current value of myField should produce the running difference we are looking for.

Unfortunately, we encounter a problem: Think about what happens for the very first value of myField in the query. The subquery does not return any value for this value of myField. Thus, the result of the subquery is NULL. The RunningDiff thus also becomes NULL because RunningDiff is the result of a mathematical operation, one of whose operands is NULL.

This may be perfectly acceptable. Perhaps the previous value of myField needs to be considered as NULL, and the running difference should also be NULL. But sometimes, we may want to assume that the previous value of myField is zero for the lowest value of myField in the table. For instance, if this table were used to track the mileage at each fuel fill-up, then we may have bought the vehicle with zero miles on the odometer and a full tank. The next fill-up may then be the first entry in the table. We want the first running difference to be the mileage at the first fill-up rather than NULL.

It is easy to arrange for that by making sure that the subquery returns a zero rather than NULL. Once again, the Access built-in function nz() comes to our rescue. We introduced this function in the earlier lesson on picking random rows out of a table. Using that function here, we modify our SQL as below to get non-NULL running differences in every row:
`select myField,(myField - nz(select max(myField) from myTable where myField < T.myField))as RunningDifffrom myTable as T order by myField`
The other way to handle this is to eliminate the NULL value by making the running difference itself for that row a zero, rather than making the result of the subquery a zero. This is illustrated below:
`select myField,nz(myField - (select max(myField) from myTable where myField < T.myField))as RunningDifffrom myTable as T order by myField`
Notice that even though it makes more logical sense to order the outer query by myField, we don't have to. We can order by any field in the table, or even by the calculated field, RunningDiff. The query would still work fine, but the results may be a little more difficult to interpret. The RunningDiff field would not represent the difference between the current value of myField and the value of myField in the previous row in the query results. It would just represent the difference between the current value of myField and the value of myField that is just below the current value of myField.

Now we move on to the case of needing running differences for a field that either does not have unique values or we don't want the rows ordered by this field or both. This is the case of daily temperatures or some other meteorological observation, for instance, and we are interested in finding differences in this value across successive days. Or we are dealing with bank balances on a periodic basis and want to find the change in balance during each period.

In this case, we assume that there is some other field in the table that tells us the order of rows in the final query output. This field could be a date/time field which stores the timestamp of each meteorological observation or the date of each bank balance. Or it could be an autonumber field that increases steadily as we add more observations to the table. We assume that this field has all unique values (which is a reasonable assumption since timestamps of meteorological observations, bank balances, etc. tend to be unique, and autonumber fields are guaranteed to have all unique values).

Under such conditions, what we need to do is write a subquery that selects the value of myField in the row of the table that has the previous value of the field we are ordering by. Let us assume that we are ordering the query results by myOrderField (which could be date/time field, an autonumber field or any other field which has unique values in it). Then the following subquery would give us the value of myOrderField that is just below the current value of myOrderField in the outer query (obviously this subquery has a structure very similar to the subquery that we used in the previous SQL to calculate running differences):
`select (select max(myOrderField) from myTablewhere myOrderField < T.myOrderField)as PrevOrderValuefrom myTable as T`
The value of myField in the row of the table that has the above value of myOrderField would then be:
`select(select myField from myTable where myOrderField =(select max(myOrderField) from myTable where myOrderField < T.myOrderField))from myTable as T`
Now, all we have to do is pull out myField and calculate the difference between myField and this previous value of myField produced by the subquery above, and that would give us the running differences we are looking for. That is easy enough to do as below:
`select myField, myField -(select myField from myTable where myOrderField =(select max(myOrderField) from myTable where myOrderField < T.myOrderField))as RunningDifffrom myTable as T`
To be perfectly sensible, we need to order the query results by myOrderField, so that the running differences can be correlated instantly with the value of myField that are being used in the running difference calculation (otherwise, the query results could be quite confusing to look at). We do this easily as below:
`select myField, myField -(select myField from myTable where myOrderField =(select max(myOrderField) from myTable where myOrderField < T.myOrderField))as RunningDifffrom myTable as T order by myOrderField`
Once again, it needs to be emphasized that ordering the query by myOrderField is purely to make sure that the values between which the running differences are being calculated are adjacent to each other in the results of the query so that the calculation of the difference is somewhat obvious. There is absolutely no need to order the outer query by myOrderField or by any other field for that matter. In some analyses, it may in fact be necessary to order the query by the running differences themselves (to identify the periods that had the highest and lowest differences, for instances).

When we execute the above query, we notice that the first row does not have any running difference value associated with it. This is once again because of the NULL value that is pulled out by the subquery for the very first row of the outer query. This easy to rectify by using nz() once again if we want to consider the "zeroth" row of the table to have a value of myField equal to zero.
`select myField, myField -nz((select myField from myTable where myOrderField =(select max(myOrderField) from myTable where myOrderField < T.myOrderField)))as RunningDifffrom myTable as T order by myOrderField`
You can also make the running difference itself zero for the first row rather than making the result of the subquery zero. That would be accomplished as below:
`select myField, nz(myField -(select myField from myTable where myOrderField =(select max(myOrderField) from myTable where myOrderField < T.myOrderField)))as RunningDifffrom myTable as T order by myOrderField`
However, sometimes it may be better to leave the running difference NULL for the first row in the query. Consider a table which contains, for instance, daytime high temperatures for a given location. Assume that this table has a date/time field with observation date in it as myOrderField. If we are now interested in differences in temperature between successive days, we could use the query above to get the running differences. But, we would find that the running difference for the very first day for which data was entered into the table would be equal to the temperature on that day (or zero). If we now want to do analysis of the running differences (such as finding average daily difference or maximum, minimum, etc.), this artificial running difference could create difficulties. So, it may be better under some circumstances to leave the running difference of the first row to be NULL.

Hope this post has been helpful in solving any problems you might have had with calculating running differences. I hope you also found it helpful in coming up with complex queries that involve multiple subqueries by constructing them step by step using a procedure similar to what I have illustrated in this lesson. If you have any problems or concerns with the queries in this lesson, please feel free to let me know by posting a comment. If you have other questions on Access that you would like me to address in future lessons, please feel free to let me know through your comments too. Good luck!

Anonymous said...

I found this very interesting and was hoping to use it to create a Running Difference between daily odometer readings. However, as an amateur I am now lost because after entering the SELECT expression in my query field [and changing the field names so as to be relevant for my table] I keep getting error messages about syntax being incorrect.

Blogannath said...

What is the query you have entered and what is the exact error message you are getting? Have you tried building the query step by step to isolate the step at which you start getting the syntax error? That will guide you towards what you need to focus on to fix the problem.

Anonymous said...

Running Difference Problem:
Well I copied your example and changed the field and table names accordingly in my SQL view of my main query. Your advice is the closest solution to my specific problem in that I am asking for the previous vehicle reading of a particular vehicle on a particular date. [Table data contains multiple vehicles and multiple dates however I do have a Vehicle ID and therefore need to reference the previous reading accurately]. I did try the step by step approach and the error message comes up immediately but I will try again! New day!

Michael Beh Kea Chang said...

good ligic. This is what I dad been looking for years.
I try to apply the same logic by inserting the formula in a text box. It works, but sometimes the result become blank for a week or so and then it is fine again inthe database sheet view.
This seems to be caused by the criteria in dmax funtion give blank in some records. Actual cause unvlear.

## Content From TheFreeDictionary.com

In the News

Article of the Day

This Day in History

Today's Birthday

Quote of the Day

Word of the Day

Match Up
 Match each word in the left column with its synonym on the right. When finished, click Answer to see the results. Good luck!

Hangman

Spelling Bee
difficulty level:
score: -