Banner

 

20 - Data Structure and Storage

Answers to exercises

1.
Why is a disk drive considered a bottleneck?

The time to access data on a disk drive is relatively long compared to main memory. It takes approximately 105 times longer to access a disk than to access main memory. Because of the length of time it takes to access a disk drive the computer is often waiting for a disk to retrieve data before it can continue to process a request for information.

2.
What is the difference between a record and a page?

A page is the minimum amount of disk storage accessed at a time. It is usually around 1-4 kbytes. A record is the unit of data that a DBMS retrieves. Records can vary in size. A page may hold several small records, or a large record may be distributed over several pages.

3.
Describe the two types of delay that can occur prior to reading a record from a disk. What can be done to reduce these delays?

The two types of delay that can occur prior to reading a record from a disk are rotational delay and access arm delay. Access arm delay is reduced by minimizing the movement of the read/write head. To minimize read/write head movement, data that are likely to be accessed at the same time should be stored on the same track or location, either on a single surface or on a cylinder. Rotational delay is reduced by rotating the disk faster. The database designer can only work to minimize the read/write head movement, the disk manufacturer sets the rotational speed of the disk.

4.
What is clustering? What is the difference between intrafile and interfile clustering?

Clustering records means to physically store those records that are frequently used together close together on a disk. Intrafile clustering applies to the clustering of records that are in a single file. Interfile clustering applies to clustering records together that are in different files.

5.
Describe the differences between a file manager and a disk manager.

The disk manager is part of the operating system and is concerned with the storage of pages. The file manager is one level above the disk manager and is concerned with the storage of files.

6.
What is an index?

An index is a file that contains the values of a field in a file (the index field) and the address of the field's corresponding record in that file. Thus, an index file typically contains two fields, one that stores the value of the indexed field in the original file and one that contains a pointer to the matching record in the original file.

7.
What are the advantages and disadvantages of indexing?

Indexing can improve the speed of data retrieval especially in situations where the records requested are a low percentage of the overall records contained in the database. It speeds up retrieval by reducing disk accesses. On the other hand, when the majority of records in the database meet the criteria of the query, an index may decrease the speed at which they are retrieved. Furthermore, when a new record is added to a file, two or more disk writes are necessary, since an entry must be added to both the file and its index. The tradeoff is between faster retrieval and slower updates.

8.
Write the SQL to create an index on the column natcode in the nation table.

CREATE INDEX natcodeindx ON nation (natcode);

9.
A Paris insurance firm keeps paper records of all policies and claims made on it. The firm now has a vault containing 100 filing cabinets full of forms. Because Paris rental costs are so high, the CEO has asked you to recommend a more compact medium for long-term storage of these documents. Because some insurance claims are contested, she is very concerned with ensuring that documents, once stored, cannot be altered. What would you recommend and why?

CD-ROM storage would meet the needs of the company best. Once data are written to this media, it is not possible to alter them and thus has the highest legal standing of the different media forms. It is also suitable for storing large files and is a low cost alternative.

Teaching tip: Ask students how they would estimate the storage space required for storing the scanned items.
10.
A video producer has asked for your advice on a data storage device. She has specified that she must be able to record video at 5 to 7 Mbytes per second. What would you recommend and why?

RAID is the best solution for this situation. RAID level 3 is probably the best choice.

11.
A German consumer research company collects scanning data from supermarkets throughout central Europe. The scanned data include product code identifier, price, quantity purchased, time, date, supermarket location, and supermarket name, and in some cases where the supermarket has a frequent buyer plan, it collects a consumer identification code. It has also created a table containing details of the manufacturer of each product. The database is very large and contains nearly one Tbyte of data. The data are used by market researchers in consumer product companies. A researcher will typically request access to a slice of the database (e.g., sales of all detergents) and analyze these data for trends and patterns. The consumer research company promises rapid access to its data. Its goal is to give clients access to requested data within one to two minutes. Once clients have access to the data, they expect very rapid response to queries. What data storage and retrieval strategy would you recommend?

This is a situation where the company has a high volume data base but does not require extremely fast retrieval (1-2 minutes). Mass storage is ideally suited for such a case, where data are copied from mass storage to magnetic disk as required for analysis.

12.
A magazine subscription service has a toll-free number for customers who may dial the company to place orders, inquire about existing orders, or check subscription rates. All customers are uniquely identified by an 11-digit numeric code. All magazines are identified by a 2- to 4-character code. The company has approximately 10 million customers who subscribe to an average of four magazines. Subscriptions are available to 126 magazines. Draw a data model for this situation. Decide what data structures you would recommend for the storage of the data for each entity. The management of the company prides itself on its customer service and strives to answer customer queries as rapidly as possible.

Speed and reliability are important characteristics. Level 1 RAID would be a good implementation choice since the database is relatively small (10 million customers and 100 bytes for each customer is 1 Gbytes), thus storage requirements and costs will be low. You may want to index the customer table by customer code as well as interfile cluster customer and subscription tables so that customer and subscription data are stored together. Because the subscription table is so small (126 magazines), this table could be held in memory to speed up access.

13.
A firm is planning to offer a satellite-based digital radio service to the continental U.S. market. It will broadcast music, sports, and talk-radio programs from a library of 1.5 million digital audio files, which will be sent to a satellite uplink and then beamed to car radios equipped to accept the service. Consumers will pay $9.95 per month to access 100 channels.
13a.
Assuming the average size of a digital audio file is 5Mbytes (~ 4 minutes of music), how much storage space is required?

(1.5*10^6)*(5*10^6) = 7.5*10^12 (i.e., 7.5 Tbytes). There is also the need to consider storage for backup and recovery and thus you should probably triple this answer (i.e., 22.5 TBytes)

13b.
What storage technology would you recommend?

RAID 5 because of the high I/O rates and low volatility.

14.
Is MP3 a lossless or lossy compression standard?

Lossy

15.
What is the data deluge? What are the implications for data management?

The data deluge refers to high rate of growtn of data that organizations need to store. For some businesses, growth rates exceed 100% per year, and many others face rates in the 50% or so range. As a result, data managers have to carefully estimate their organizations' needs and have sufficient storage space available to meet future demand.

16.

According to Wikipedia, Apple has more than 28 million songs in the iTunes store. What storage technology might be a good choice for this library?

The first step is to calculate the size of the library. Assuming 4Mb for a the average length of a song. The required space is 4*106*28*106 = 1.12*1012 bytes, or 100 Tb. You also have to consider backup and recovery capability. There will need to be redundancy to cover disk errors and so forth. A safe estimate is 300Tb. Apple probably has a proprietary solution, but if it does not then you might select RAID 5 because of its high I/O rates and the low volatility of music files. A file is loaded once and read many times. Rumor has it that Apple uses Hadoop as its files structure, which fits the need for high fault tolerance, rapid retrieval, and low cost.

 

This page is part of the promotional and support material for Data Management (open edition) by Richard T. Watson
For questions and comments please contact the author
Date revised: 10-Dec-2021