Generating test data

To test the performance of a database, first get some data. There are two alternatives to keying in a large volume of test data. First, existing files can be used as a source of test data and imported into appropriate tables. Second, test data can be generated for each table. Here, we discuss the use of Excel to generate test data. This exercise assumes considerable Excel experience.

Generating test data with Excel

The RAND function is very useful for generating random test data because it computes an evenly distributed random number r in the range 0 <= r < 1.

Generating a range of integers

When combined with the ROUND function, RAND can be used to generate random integers. The syntax for round is ROUND(value, digits), where value is the number to round and digits specifies the number of decimal places of the rounding (i.e., ROUND(5.4,0) equals 5). To generate an integer random number between a and b, use: ROUND(RAND()*(b-a) + a, 0)

Thus, to generate quantities sold in the range 1 to 5 use:
ROUND(RAND()*4 + 1,0)

Generating a range of dates

Excel stores a date as an integer, and days since some starting point are numbered sequentially. Thus to generate random dates in a year, you can add a random integer between 1 and 365 to the last day of the previous year. For example to generate a random date in 1997, enter the following in the designated cells:

1 31-DEC-1996
2 =ROUND(RAND()*364+1,0) + A1

If A2 is not a date, you need to set the format of the cell (Format>Cells ...)

Randomly selecting text from a list

Imagine that a company has locations in three cities (Atlanta, New York, and Chicago) and sales are distributed, respectively, in the following proportions: .4, .3,.3. To randomly select city names to match this distribution do the following.

1 1 Atlanta
2 2 Atlanta
3 3 Atlanta
4 4 Atlanta
5 5 New York
6 6 New York
7 7 New York
8 8 Chicago
9 9 Chicago
10 10 Chicago

VLOOKUP(value, table, column) where value is the item to be found in the first column of the array, table is the array, and column is the number of the column to be retrieved the table.

Randomly selecting a pair of values from a list

Sometimes, you may want to select two values from a list (e.g., the description of an item and its price). VLOOKUP can be used as before. Consider the case where a company sells 10 items and the following price list is established. Here we would name two arrays: ITEM A1:B10 and PRICE B1:C10

1  1
Boots - snakeproof
2  2
Camel saddle
3  3
4  4
Elephant polo stick
5  5 Exploring in 10 Easy Lessons  20
6  6
Geo positioning system
7 7
8 8
Hat - polar explorer
9 9
How to Win Foreign Friends
10 10
Map case

To randomly select an item from ITEM, use:


Now, another lookup is required to match the selected item to its price. Assume the selected item (e.g., Hammock) is stored in cell X100, then the lookup command would be:


This would look for the value of X100 (i.e., Hammock) in the first column of PRICE and return its price, which is stored in the second column.

Your turn

Generate data for the following table:

Column Values
Date In the range 1-Jan-98 to 31-Dec-98
Sales quantity In the range 1 to 8
Location Either Paris, London, or Singapore in the proportions .4,.4, .2 respectively
Item One of five possible items (all equally likely)
Walking boots
Price Respective prices of the above items are:
Type of sale Catalog or store in a 3:1 ratio


It is a good idea to create two worksheets.

  1. The data used to generate the test data
  2. The test data


  1. Generate 100 items (use Edit >Fill) and import the file into your database system and answer the following queries:
  2. Generate 1000 items, import the file and repeat the queries. Is there a discernible difference in the time to execute the queries?


This page is part of the promotional and support material for Data Management (sixth edition) by Richard T. Watson
For questions and comments please contact the author

Date revised: 02-Jan-2020