Banner

 

20 - Data Structure and Storage

1.

Why do you want to reduce disk accesses? How can you reduce disk accesses?

Disk access time is often the bottleneck in computing. It takes much longer to access information on a disk than to access information stored in main memory. Some of the techniques used to reduce disk access are clustering, indexing, direct addressing, hashing, and linked lists.

2a.

Can a file have more than one index? If so, can they be used together?

All fields of a file can be indexed and queries can use multiple indexes.

2b.

Can you create an index based on more the one field?

Yes, you would do this when two fields are commonly queried at the same time.

3.

Why not index all fields of a file?

Indexes increase the time for maintenance operations. For instance, when a record is removed from a database, the file needs to be updated as does the index file.

4.

What is a sparse index and what are its pros and cons?

A sparse index takes advantage of the physical sequencing of a file and does not contain an entry for every value of the indexed field. It takes less storage space and requires fewer disk accesses. On the other hand it can not be used for existence tests because it does not contain a value for every record in the file. A file can have only one sparse index.

5.

What is the least number of accesses needed to access a file when using an index? Explain.

At least two access are necessary when accessing a file using an index. The index must be loaded and searched and then the record must be retrieved.

6.

When using the hashing function :

hash address = remainder after dividing EMPNO by 1,000

a. What disk address would you access for EMPNO 543123?

The disk address is 123 (543123/1,000)

b. What happens if you had records with the EMPNO’s 001123 and 999123?

A collision has occurred; these records are synonyms. The colliding records must be stored in an overflow area and pointed to from the hash address. In this case, you have a synonym chain and the first record in the overflow area points to the second record in the overflow area.

c. What is the maximum number of retrievals necessary to obtain one of the EMPNO’s in part b of this question?

The maximum number of retrievals required is 2: one to the hash address, and the second to the first record in the overflow area.

7.

If you frequently need to process your records sequentially would you use a hashing function to store them? Why or why not?

No, you would not store these records using a hashing function. When a field is hashed it can no longer be processed sequentially because its physical sequence loses any logical meaning.

8.

You want to have a removable storage device to back-up your system. Your system currently has a floppy drive and a CD-ROM, but you would be willing to purchase an external drive for back-up. You have a fairly large volume of data stored on your system and you want to be able to read and write to your back-ups. Money is an issue as you are a college student without much spare money for computer equipment. What are your options? Which one do you think is most appropriate and why?

You can use a removable disk, a floppy disk, a cartridge or a magneto-optical disk. Floppy disks will become expensive and time consuming since you will need a large number of them to back up your system, and they are not very reliable. CD-ROM is not considered since it can only be written to once and would not serve your purposes. Thus, you may want to investigate the cost of a cartridge drive, magneto-optical, a removable disk, or CD-RW. You also need to consider the cost of the media for each storage device.

9.

What takes longer to access, a disk or main memory in a computer? By what order of magnitude?

A disk takes about 105 more time to access then main memory.

10.

When it is said that data is accessed a page at a time, what does that mean? About how big is a page in kbytes?

A page is the minimal amount of storage retrieved from a disk at a given time. It is typically 1-4 kbytes.

 

This page is part of the promotional and support material for Data Management (open edition) by Richard T. Watson
For questions and comments please contact the author
Date revised: 10-Dec-2021