19 - Data Structure and Storage

Answers to exercises

Why is a disk drive considered a bottleneck?

The time to access data on a disk drive is relatively long compared to main memory. It takes approximately 105 times longer to access a disk than to access main memory. Because of the length of time it takes to access a disk drive the computer is often waiting for a disk to retrieve data before it can continue to process a request for information.

Describe the two types of delay that can occur prior to reading a record from a disk. What can be done to reduce these delays?

The two types of delay that can occur prior to reading a record from a disk are rotational delay and access arm delay. Access arm delay is reduced by minimizing the movement of the read/write head. To minimize read/write head movement, data that are likely to be accessed at the same time should be stored on the same track or location, either on a single surface or on a cylinder. Rotational delay is reduced by rotating the disk faster. The database designer can only work to minimize the read/write head movement, the disk manufacturer sets the rotational speed of the disk.

Describe the differences between a file manager and a disk manager.

The disk manager is part of the operating system and is concerned with the storage of pages. The file manager is one level above the disk manager and is concerned with the storage of files.

What are the advantages and disadvantages of indexing?

Indexing can improve the speed of data retrieval especially in situations where the records requested are a low percentage of the overall records contained in the database. It speeds up retrieval by reducing disk accesses. On the other hand, when the majority of records in the database meet the criteria of the query, an index may decrease the speed at which they are retrieved. Furthermore, when a new record is added to a file, two or more disk writes are necessary, since an entry must be added to both the file and its index. The tradeoff is between faster retrieval and slower updates.

A Paris insurance firm keeps paper records of all policies and claims made on it. The firm now has a vault containing 100 filing cabinets full of forms. Because Paris rental costs are so high, the CEO has asked you to recommend a more compact medium for long-term storage of these documents. Because some insurance claims are contested, she is very concerned with ensuring that documents, once stored, cannot be altered. What would you recommend and why?

CD-ROM storage would meet the needs of the company best. Once data are written to this media, it is not possible to alter them and thus has the highest legal standing of the different media forms. It is also suitable for storing large files and is a low cost alternative.

A German consumer research company collects scanning data from supermarkets throughout central Europe. The scanned data include product code identifier, price, quantity purchased, time, date, supermarket location, and supermarket name, and in some cases where the supermarket has a frequent buyer plan, it collects a consumer identification code. It has also created a table containing details of the manufacturer of each product. The database is very large and contains nearly one Tbyte of data. The data are used by market researchers in consumer product companies. A researcher will typically request access to a slice of the database (e.g., sales of all detergents) and analyze these data for trends and patterns. The consumer research company promises rapid access to its data. Its goal is to give clients access to requested data within one to two minutes. Once clients have access to the data, they expect very rapid response to queries. What data storage and retrieval strategy would you recommend?

This is a situation where the company has a high volume data base but does not require extremely fast retrieval (1-2 minutes). Mass storage is ideally suited for such a case, where data are copied from mass storage to magnetic disk as required for analysis.

A firm is planning to offer a satellite-based digital radio service to the continental U.S. market. It will broadcast music, sports, and talk-radio programs from a library of 1.5 million digital audio files, which will be sent to a satellite uplink and then beamed to car radios equipped to accept the service. Consumers will pay $9.95 per month to access 100 channels.
Assuming the average size of a digital audio file is 5Mbytes (~ 4 minutes of music), how much storage space is required?

(1.5*10^6)*(5*10^6) = 7.5*10^12 (i.e., 7.5 Tbytes). There is also the need to consider storage for backup and recovery and thus you should probably triple this answer (i.e., 22.5 TBytes)

What is the data deluge? What are the implications for data management?

The data deluge refers to high rate of growtn of data that organizations need to store. For some businesses, growth rates exceed 100% per year, and many others face rates in the 50% or so range. As a result, data managers have to carefully estimate their organizations' needs and have sufficient storage space available to meet future demand.

This page is part of the promotional and support material for Data Management (open edition) by Richard T. Watson
For questions and comments please contact the author

Date revised: 02-Dec-2022