Search The Web

Today's Headlines

Thursday, November 26, 2009

Microsoft Access Tips & Tricks: Rank Every Row

In previous lessons in this series, we have talked about the calculation of statistical measurements pertaining to a field in a table. We have covered the median, the mode, and the geometric and harmonic means of a field. In this lesson, we will deal with a different kind of problem that does not involve producing a single number as the answer, but rather involves modifying every row of output from a query.

Sometimes, in a query, you want every row to have a row number (or rank) associated with it. You may need this simply to figure out what the rank of a particular row is based on certain sorting conditions. Some reports also have a need for such ranking because they use the rank to color alternate rows in the report dark and light so that the report becomes easier to read.

Notice that even if your table has an autonumber field (and this autonumber field is not set to random), you can not be guaranteed that this autonumber field can serve as a good substitute for a rank. First of all, the autonumber field denotes the order in which the rows were entered into the table. This may not be the same order in which the query pulls the data out, depending on the ORDER BY clause used in the query. Thus, the values in the autonumber field could be all over the place instead of increasing steadily in the query results.

Moreover, autonumber fields can have gaps (when rows are deleted from the middle of a table, Access never fills those gaps with new data). So, your data will also have gaps in the rank, which may be undesirable. More importantly, depending on the subset of rows pulled out of the table, there is no guarantee that the ranks will start with 1 and go to the number of rows in the query results. Thus, figuring out how to generate ranks on the fly inside SQL can come in very handy even if the table has a field that could be used as a rank under some limited conditions.

If the query pulls out unique rows (i.e. no two rows have identical information on them, and you can always tell one row apart from another in the query's results just by looking at the results of the query), then attaching a rank to each row in the output is very easy. You can also guarantee that these ranks are unique without having to jump through any hoops!

If the query does not pull out unique rows (i.e. all the data is identical between two or more rows), then you can still attach a row number (or rank) to each row, but these row numbers or ranks will not be unique if the ranking is based purely on the information being displayed. There is a way to overcome this if you do need unique ranks for the rows, as long as the actual data in the table is not duplicated across rows. We will talk first about the easy case where the rows are unique, then we will talk about the options for queries that pull out non-unique rows.

If the query does pull out unique rows, then you can easily attach a row number to each row after you sort the query using the ORDER BY clause of SQL. There are actually two ways of assigning a rank to each row as shown below, and both the methods produce identical results. Let us first deal with the case where our query produces output with just one field from the table. Let that field be myField. The two SQL constructs below will produce ranks for each row in the query results. Both the constructs produce identical results, so you can choose to use whichever one catches your fancy!

SELECT (SELECT count(*) from myTable where myField <= T.myField) as Rank,
myField from myTable as T ORDER BY myField

SELECT (SELECT count(*) + 1 from myTable where myField < T.myField) as Rank,
myField from myTable as T ORDER BY myField

In the subquery inside each of these queries, we are counting the number of rows which have a value of myField less than (or less than or equal to) the value of myField we are dealing with in the outer query. We are then using that count + 1 (to account for the current value of myField), or the count itself (if the comparison is less than or equal to since less than or equal to would include the current value of myField) as the rank for the row.

If the ordering in the main query is in descending order, it is easy to modify these queries to take that into account simply by changing the comparison sign inside the subquery as below:

SELECT (SELECT count(*) from myTable where myField >= T.myField) as Rank,
myField from myTable as T ORDER BY myField desc

SELECT (SELECT count(*) + 1 from myTable where myField > T.myField) as Rank,
myField from myTable as T ORDER BY myField desc

What happens if we are pulling out a subset of rows in the table based on some filtering criteria? Suppose we use some condition C (which is a boolean construct that determines whether a given row is part of the result set or not) to select rows for inclusion in the results of the query. The SQL below shows how to rank rows in such a query:

SELECT (SELECT count(*) from myTable where (myField <= T.myField) AND (C)) as Rank,
myField from myTable as T where C ORDER BY myField

SELECT (SELECT count(*) + 1 from myTable where (myField < T.myField) AND (C)) as Rank,
myField from myTable as T where C ORDER BY myField

If you want to rank based on the entire table, but want to attach the ranks to the subset you pull out based on some filtering criteria (for instance, you want to find out the ranking of the first row in your subset when compared with all the data in the table), you can omit the condition C from the subquery. But note that if you do that, the ranks attached to the rows in the subset pulled out by the query may not be contiguous and may not run from 1 to the number of rows pulled out by the query.

The SQL construct we have been discussing can be extended to cover cases where you need to pull out multiple fields from the table. Once again, we will deal with the case where the query produces unique rows as its output. The method relies on the ORDER BY clause once again. The outer query is similar to the outer query used in the previous case, but notice that the subquery used to generate the rank is a little different.

As the number of fields in the query goes up, the WHERE clause of the subquery has more and more parts added to it, so it can get pretty long if the query pulls out lots of columns! With just two columns pulled out, the query looks as below (I have only provided the version of the query that uses <= in the comparison. It is easy enough to modify it to use < and increment the count by 1 if needed):

SELECT
(SELECT count(*) from myTable WHERE
((myField1 < T.myField1) OR
((myField1 = T.myField1) AND (myField2 <= T.myField2)))) as Rank,
myField1, myField2 from myTable ORDER BY myField1, myField2

With three fields being pulled out, the query looks as below:

SELECT (select count(*) from myTable where ((myField1 < T.myField1) OR
((myField1 = T.myField1) AND (myField2 < T.myField2)) OR
((myField1 = T.myField1) AND (myField2 = T.myField2) AND (myField3 <= T.myField3)))) AS Rank,
myField1, myField2, myField3
FROM myTable AS T
ORDER BY myField1, myField2, myField3;

Notice how any row that was ordered before the current row under consideration in the main query will fall into one and only one of the conditions linked by OR's in the subquery. This is important in ensuring unique ranks for all the rows in the main query.

The other important thing to note is that the ORDER BY clause of the main query should include every field that is included in the query so that the ranking is done at the row level rather than a subset of the fields in the row. If the ranking is done on a subset of the fields output by the query, this query will produce unique ranks only if the rows are unique with respect to just the fields that are included in the ORDER BY clause of the query.

It should be relatively easy to extend the SQL above to a case where a condition is used to filter rows for inclusion in the results of the query. Just remember to include the same condition in the subquery as in the outer query, and link it with the condition already in the subquery (for generating the rank numbers) with an AND. Use parentheses generously since you don't want the condition to be linked to just a part of the condition already in the subquery, but with the entire condition already in the subquery.

What if the ordering of the main query uses ascending order for some fields and descending order for others? Let us rewrite the last query above (which pulls out myField1, myField2 and myField3) such that myField1 and myField3 are sorted in ascending order, and myField3 is sorted in descending order. Then the query would look as below:

SELECT (select count(*) from myTable where ((myField1 < T.myField1) OR
((myField1 = T.myField1) AND (myField2 > T.myField2)) OR
((myField1 = T.myField1) AND (myField2 = T.myField2) AND (myField3 <= T.myField3)))) AS Rank,
myField1, myField2, myField3
FROM myTable AS T
ORDER BY myField1, myField2 desc, myField3;

Notice how the comparison involving myField2 has the direction reversed, to correspond with the fact that its ordering is reversed. The rest of the query remains pretty much the same. That wasn't too hard, was it?! I didn't think so!!

Now, let us turn our attention to queries that pull out non-unique rows. For instance suppose the myField field in myTable had the data elements 1, 2, 3, 3, 4, 4, 4, 5 in 8 rows. Now you can use the SQL below to generate ranks for the rows, but the ranks will not be unique:

SELECT (SELECT count(*) from myTable where myField <= T.myField) as Rank,
myField from myTable as T ORDER BY myField

This is obviously the same SQL we used to generate unique ranks in queries that pull out unique rows. But since the values are not unique in this case, this will actually produce the query result below:

Rank myField
1 1
2 2
4 3
4 3
7 4
7 4
7 4
8 5

As you can tell, the ranks are not unique and they have gaps. You can be guaranteed that the rank numbers will go up to the number of rows in the result set. But you can not be guaranteed that the ranks would start at 1. In fact, if there were 8 rows in the results and all of them had the exact same value in myField, this query would assign a rank of 8 to each row.

Because of the <= comparison in the subquery, non-unique values in myField get the highest rank number (I am not sure how to express this since I don't want to say highest rank, which usually implies lower rank numbers!) that any of them would get if that value of myField had been the highest value of myField in the results. This may not be desirable. The alternative to this is the other SQL construct we used for ranking unique rows:

SELECT (SELECT count(*) + 1 from myTable where myField < T.myField) as Rank,
myField from myTable as T ORDER BY myField

In this case, with the same data as in the previous example, you would get the output below:

Rank myField
1 1
2 2
3 3
3 3
5 4
5 4
5 4
8 5

As you can tell, the ranks are not unique and they have gaps. You can be guaranteed that the rank numbers will start from 1. But you can not be guaranteed that the rank numbers will go up to the number of rows in the result set. In fact, if there were 8 rows in the results and all of them had the exact same value in myField, this query would assign a rank of 1 to each row.

Because of the < comparison in the subquery, non-unique values in myField get the lowest rank number (I am not sure how to express this since I don't want to say lowest rank, which usually implies higher rank numbers!) that any of them would get if that value of myField had been the highest value of myField in the results. This may not be desirable either.

So, is there a way to generate unique ranks that run from 1 to the number of rows when the rows are not unique? Well, it turns out this is possible under most circumstances. Consider what the database table is there for: it is there for storing non-redundant data. There really is no reason to store exactly identical data twice or thrice in a table. There must be some field in a table that distinguishes between rows that look identical otherwise. This field could be a timestamp, an autonumber field or anything else.

As long as the table does not actually contain rows that are identical (and this will not be possible in a table with a primary key, whether it is a single-field simple primary key or multi-field compound primary key), there is way to rank rows uniquely in a query based on that table. If you suspect that there may be a case where multiple rows in your table might contain the same data (for instance, the table stores weather observations or observations from some kind of experiment), you should either add a timestamp to each row that will make them unique, or take Access's suggestion during table design and add an autonumber primary key to the table.

Assuming that your table does actually contain a field that never repeats across rows (an autonumber field or a timestamp that is guaranteed to be different for each row in the table), you can rank rows in a query based on that table. The main thing to remember is that ordering can be done on fields in a query that are not displayed to the users. So, suppose myField1 in your table contains non-unique data. Also suppose that myField2 contains unique data (or the combination of myField1 and myField2 is unique across rows). But you want to show only myField1 to the user in your query. The query below will enable you to do so without any problems:

SELECT
(SELECT count(*) from myTable WHERE
((myField1 < T.myField1) OR
((myField1 = T.myField1) AND (myField2 <= T.myField2)))) as Rank,
myField1 from myTable ORDER BY myField1, myField2

Notice that myField2 is not included in the list of fields in the SELECT clause of the main query, but is used in the ORDER BY clause, and also inside the subquery. You can refer to fields in the table in the subquery even if those fields are not actually in the SELECT clause of the main query. And, if it takes more fields to make each row unique, then the subquery can get messy as we saw before.

The output of the query does not make it clear to the user how the ranking was generated because all the fields that were used to order the rows in the query are not visible to the user. This can cause confusion to the user. Keep this in mind when using constructs like the above.

Hope this lesson has been helpful in solving some problem you might have had with ranking rows in a query. If you have any problems or concerns with the queries in this lesson, please feel free to let me know by posting a comment. Good luck!

3 comments:

JasonR said...

I modified your query a little to this:

SELECT
(SELECT Count (*) +1 FROM Production WHERE Units_Produced>Q.Units_Produced) as Rank,
Units_Produced

FROM Production as Q

*********************
This yields a result of Units_Produced, ordered descending, and ranked Descending.

The Production table simply has a column of integers named "Units_Produced" and a column of ID numbers named "Emp_ID". Emp_ID has a 1-1 relationship with the Emp_ID column in my Payroll table, which is where employee names are saved.

Any idea how I can add employee names to the query above? I've tried Inner Joining payroll to the query above, specifically:

INNER JOIN Payroll ON Production.Emp_ID = Payroll.Emp_ID

I've tried adding this in all combinations of:
In the 1st FROM statement
In the 2nd FROM statement before the AS Q alias
In the 2nd FROM statement after the AS Q alian

Blogannath said...

I don't have time to test this right now, but I think your problem is the use of Production in your join condition. Since you have already aliased it to Q in the query, you have to use Q.Emp_ID rather than Production.Emp_ID. So, if you write the query as below, I have a feeling it will work:

SELECT Payroll.Employee_Name,
(SELECT Count (*) +1 FROM Production WHERE Units_Produced>Q.Units_Produced) as Rank,
Units_Produced

FROM Production as Q
inner join Payroll ON Q.Emp_ID = Payroll.Emp_ID

JasonR said...

Yep. That was it. I didn't know that using an alias forces me to use an alias in all future references. Thanks a bunch!

Visitors Country Map

Free counters!

Content From TheFreeDictionary.com

In the News

Article of the Day

This Day in History

Today's Birthday

Quote of the Day

Word of the Day

Match Up
Match each word in the left column with its synonym on the right. When finished, click Answer to see the results. Good luck!

 

Hangman

Spelling Bee
difficulty level:
score: -
please wait...
 
spell the word:

Search The Web