Search The Web

Today's Headlines

Monday, November 30, 2009

Microsoft Access Tips & Tricks: Select Random Rows

Sometimes, one has a need to select random rows from a table. One may want to run a sweepstakes or lottery based on selecting winners at random. Or one might require a random sample of data for model development, with the rest of the data being used for testing the model so developed.

You can find other Microsoft Access tips and tricks in these earlier posts: Medians, Modes, Geometric and Harmonic Means, and Ranking of each row in a query.

Microsoft Access has no built-in way to generate random rows. In general, a query that does not have an ORDER BY clause is supposed to pull out random rows, but in my testing, that never works in Access to pull out random rows. But, there is a way to explicitly get Access to pull out random rows using a query.

The trick to doing this is a function called rnd(). This function generates random numbers. Read the Microsoft Access help files for a good description and technical details of how the function works, paying particular attention to the return value for different values of its argument.

Now that we know what the function does, how do we use it to pull out random rows from a table? The trick is essentially to associate a random number with each row in the table, then pull out the top few rows (as many random rows as are required) by ordering the rows by the random number associated with each row.

In fact, if your table was designed with a random autonumber field as the primary key, that would be precisely what we would do, and we could actually be done. Your SQL might look something like below:

SELECT TOP 5 * from myTable ORDER BY myRandomAutoNumberField

The above query pulls out the rows with smallest values in the field myRandomAutoNumberField (remember that you are sorting in ascending order, and TOP 5 pulls out the rows that would appear at the top of the query, so they would be the rows with the smallest values of the ORDER BY field). The * tells Access to pull out whole rows (including the autonumber field, in this case) rather than a subset of fields. You can replace * with the subset of fields you want pulled out if you only want a subset of the fields.

If we wanted a 10% random sample of the data rather than 5 rows, you could use SQL like below:

SELECT TOP 10 PERCENT * from myTable ORDER BY myRandomAutoNumberField

There is a big disadvantage to this approach, though: Every time you run the query, you will get only the same exact rows pulled out. Since the random autonumbers are permanently associated with each row, there is no variation from run to run that would result in any change in the actual rows pulled out by the query. The results are entirely predictable and may not work well for something like a raffle in which multiple random samples need to be pulled out for multiple prizes.

Using a dynamic random number generator to generate the ranks would take care of this disadvantage. It would enable you to pull out random rows that change each time the query is run so that the results would not be so predictable.

Your first thought might be to write a query as below:

SELECT TOP 5 * from myTable ORDER BY rnd(10)

The SQL looks fine, and in fact, is syntactically correct. As you can tell from reading the help file on rnd(), a positive number like 10 is a good choice for generating streams of random numbers. But if you run this query, you will actually see that the set of rows pulled out is the same every time your query is run. Try it and you will see that it is true.

Why is that? To answer that question, we need to see exactly what our query is doing. The query is ordering rows by the random number returned by rnd(10), and then returning the first 5 rows based on that ordering. Unfortunately, the query actually associates only one random number with each row of the table, not a different random number with each row of the table. Surprising but true. To see this, run the following query:

SELECT *, rnd(10) as myRandomNumber from myTable ORDER BY rnd(10)

You will see that the numbers in the calculated field, myRandomNumber, are the same for each and every row of the result set produced by the query. The reason is that Access is too clever for its own good! You see, since rnd(10) is a function of a constant number that has nothing to do with any field in a table that changes from row to row, the SQL optimizer of Access decides that it will execute rnd(10) just once, and append the result to every row of the query rather than running the function once for each row.

Now, you and I know that the function actually produces a different result for each row if it is run once for each row. But Access thinks that a function can only produce a constant result if the parameters passed to the function do not involve any change from row to row. As I said, the SQL optimizer in Access is too clever for its own good.

But, it is easy enough to force Access to run the random number generator once for each row pulled out by the query. All we have to do is pass an argument to rnd() that changes with each row so that the SQL optimizer in Access can't try to take any shortcuts when trying to optimize the query.

The easiest way to accomplish this is to pass one of the fields in the query as an argument to rnd(). You just have to make sure that the values in the field are all positive numbers. Remember that if you pass an argument to rnd() that is zero or a negative number, the function either returns the same random number or the most recently generated random number. The only two options for generating the next random number in the sequence is to pass rnd() a positive number as its argument or pass it no argument at all. Passing it no argument is not an option since the silly SQL optimizer would once again get in the way!

So, if you have a field in the table that is guaranteed to be a positive number in every row of the table, you are done. You can write SQL like below and you will get your random records without any problem:

SELECT TOP 5 * from myTable ORDER BY rnd(myPositiveField)

or

SELECT TOP 5 PERCENT * from myTable ORDER BY rnd(myPositiveField)

We are almost there. What if your table is missing a suitable field for the query above? In other words, you have no field that has positive values in every row of the table. Never fear! There is a way around that too! Remember that rnd() returns a positive number between 0 and 1 every time it is called. So, you can actually use rnd() to generate a number that is guaranteed to be positive, and pass that as an argument to another call to rnd() to generate the random numbers that will be used for ranking!!

So, suppose you have a numeric field myNonNullField in the table which contains positive, negative or zero-valued numbers in each row of the table. As long as myNonNullField is guaranteed to not contain any NULL values, The SQL statements below can be used to select random rows from the table:

SELECT TOP 5 * from myTable ORDER BY rnd(rnd(myNonNullField))

or

SELECT TOP 5 PERCENT * from myTable ORDER BY rnd(rnd(myNonNullField))

The latter would obviously be used if you wanted a random sample that contained a fixed percentage of the total number of rows in the table rather than just a fixed number of random rows.

The field myNonNullField in the above queries is some numerical field that can contain, positive numbers, zeroes or negative numbers. In fact, as mentioned above, the only condition is that myField not contain any NULL values. Rnd() does not play well with NULL arguments, so it will create a run-time error if myField contains NULL values in any rows.

To get around this new problem we will use Access's nz() function. Nz() returns a zero by default when fed a NULL argument. When it is fed a numerical argument that is non-null, it returns the same value as the return value. By default, it returns a zero when fed an argument that is null (hence the name nz(), which stands for null-to-zero), but you can get it to return any value you want by feeding the function an optional second argument. You can read full details about the nz() function also in the Access help files.

Now, we are ready to solve our problem even if there is no field in your table that contains non-null values in it. The SQL below uses a daisy chain of 3 function calls to generate random numbers to order the rows pulled out by the query:

SELECT TOP 5 * from myTable ORDER BY rnd(rnd(nz(myField)))

or

SELECT TOP 5 PERCENT * from myTable ORDER BY rnd(rnd(nz(myField)))

The only condition now is that myField be a numerical field in the table myTable. Now, there are conversion functions in Access to convert between most types of fields. So, even if your table does not have any numerical fields at all (which would be highly unusual, but not impossible), you just have to use an appropriate conversion function as the innermost link in the daisychain of function calls above to get an appropriate random number to order your rows by. You can read about conversion functions also in the Access help files.

Hope you found the technique explained in this post useful for your needs! If you have any questions, concerns or comments, please do feel free to use the comments field to let me know. If there are other aspects of Microsoft Access or SQL you would like me to address in future lessons, let me know too. Good luck!!

No comments:

Visitors Country Map

Free counters!

Content From TheFreeDictionary.com

In the News

Article of the Day

This Day in History

Today's Birthday

Quote of the Day

Word of the Day

Match Up
Match each word in the left column with its synonym on the right. When finished, click Answer to see the results. Good luck!

 

Hangman

Spelling Bee
difficulty level:
score: -
please wait...
 
spell the word:

Search The Web