Banner

 

7 - Data Modeling

Answers to exercises

Short Answers

1.
What is data modeling?

Data modeling is both a technique for modeling data and its relationships and a graphical representation of a database. It is a method for determining what data and relationships should be stored in the database. The data model is not concerned with how the data should be stored.

2.
What is a useful technique for identifying entities in a written description of a data modeling problem?

One useful approach to identifying entities is to underline all nouns in the problem description. Most nouns are possible entities. Select an entity that seems central to the problem and describe it. Then move to the next one.

3.
When do you label relationships?
Typically, relationship arcs should not be labeled. Only when there are two relationships between two entities or if the relationship is not easily inferred should you label the arc.
4.
When is a data model well formed and when is it high fidelity?

A data model is well formed when all attributes and entities have meaningful and unique names. Entities, identifiers, and relationships should be meaningful to the client.

A high-fidelity data model faithfully describes the real world. All relationships are recorded and are of the correct degree. There are no compromises and all exceptions have been taken into account.

A well-formed data model is syntactically correct and a high-fidelity data model is complete, understandable, and accurate.

5.
How do you handle exceptions when data modeling?

Exceptions should be handled by the data model. Actually, it is recommended to go "hunting for exceptions" to avoid a low-fidelity data model. One should ask the client questions such as:

6.
Describe the different types of entities.

There are five types of entities: independent, dependent, intersection, aggregate, and subordinate. Independent entities are often the starting points of the database. They are the entities most prominent in the client's mind and are typically easy to name.

Dependent entities are entities which rely on a different entity for their existence and identification. An indication of a dependent entity is a bar on the dependent entities end of the arc.

Intersection entities are a byproduct of m:m relationships. Intersection entities are sometimes hard to name, in which case the names of the two related independent entities are concatenated.

An aggregate entity is created when several different entities have similar attributes that are distinguished only by a prefix or suffix to keep the names unique. Aggregate entities typically become independent entities.

A subordinate entity is used to store data about an entity that can vary among instances. A subordinate entity is useful when an entity consists of mutually exclusive classes that have different descriptions. For example, the entity "animal" would have to store different information depending on the type of animal described.

7.
Why might a data model grow?

A data model might grow as you expand the boundaries of the application. Some 1:m relationships will become m:m relationships and you will add entities, attributes, and relationships to faithfully describe the real world.

8.
Why might a data model contract?

A data model might shrink when you generalize structures. Some entities might actually all be part of a more general entity and thus could be combined into one entity.

9.
How do you indicate ordering of instances in a data model?

To handle ordering of the instances in a database, create a new attribute which contains the order of the entity. This way one does not have to store the rows in order and it is also possible to create two or more orders for each entity. The SQL clause ORDER BY can be used to print out the rows in order.

10.
What is the difference between a synonym and a homonym?

Synonyms: Different words are used to describe the same concept.

Homonyms: The same word is used for different concepts.

Data Modeling

Create a data model from the following narratives, which are sometimes intentionally incomplete. You will have to make some assumptions. Make certain you state these alongside your data model. Define the identifier(s) and attributes of each entity.

1.
The president of a book wholesaler has told you that she wants information about publishers, authors, and books.
2.
A university has many subject areas (e.g., MIS, Romance languages). Professors teach in only one subject area, but the same subject area can have many professors. Professors can teach many different courses in their subject area. An offering of a course (e.g., Data Management 457, French 101) is taught by only one professor at a particular time.
3.
Kids'n'Vans retails minivans for a number of manufacturers. Each manufacturer offers several models of its minivan (e.g., SE, LE, GT). Each model comes with a standard set of equipment (e.g., the Acme SE comes with wheels, seats, and an engine). Minivans can have a variety of additional equipment or accessories (radio, air conditioning, automatic transmission, airbag, etc.), but not all accessories are available for all minivans (e.g., not all manufacturers offer a driver's side airbag). Some sets of accessories are sold as packages (e.g., the luxury package might include stereo, six speakers, cocktail bar, and twin overhead foxtails).

4.
Steve operates a cinema chain and has given you the following information: “I have many cinemas. Each cinema can have multiple theaters. Movies are shown throughout the day starting at 11 a.m. and finishing at 1 a.m. Each movie is given a two-hour time slot. We never show a movie in more than one theater at a time, but we do shift movies among theaters because seating capacity varies. I am interested in knowing how many people, classified by adults and children, attended each showing of a movie. I vary ticket prices by movie and time slot. For instance, Lassie Get Lost at 11 a.m. is 50 cents for everyone but is 75 cents at 11 p.m.”
5.
A university gymnastics team can have as many as 10 gymnasts. The team competes many times during the season. A meet can have one or more opponents and consists of four events: vault, uneven bars, beam, and floor routine. A gymnast can participate in all or some of these events though the team is limited to five participants in any event.
6.
A famous Greek shipping magnate, Stell, owns many container ships. Containers are collected at one port and delivered to another port. Customers pay a negotiated fee for the delivery of each container. Each ship has a sailing schedule that lists the ports the ship will visit over the next six months. The schedule shows the expected arrival and departure dates. The daily charge for use of each port is also recorded.

For details of container identification, see the Wikipedia reference.

7.
A medical center employs several physicians. A physician can see many patients, and a patient can be seen by many physicians though not always on the one visit. On any particular visit, a patient may be diagnosed to have one or more illnesses.
8.
A telephone company offers a 10 percent discount to any customer who phones another person who is also a customer of the company. To be eligible for the discount, the pairing of the two phone numbers must be registered with the telephone company. Furthermore, for billing purposes, the company records both phone numbers, start time, end time, and date of call.
9.
Global Trading (GT), Inc. is a conglomerate. It buys and sells businesses frequently and has difficulty keeping track of what strategic business units (SBUs) it owns, in what nations it operates, and what markets it serves. For example, the CEO was recently surprised to find that GT owns 25 percent of Dundee's Wild Adventures, headquartered in Zaire, that has subsidiaries operating tours of Australia, Zaire, and New York. You have been commissioned to design a database to keep track of GT's businesses. The CEO has provided you with the following information: SBUs are headquartered in one country, not necessarily the United States. Each SBU has subsidiaries or foreign agents, depending on local legal requirements, in a number of countries. Each subsidiary or foreign agent operates in only one country but can operate in more than one market. GT uses the standard industrial code (SIC) to identify a market (e.g., newspaper publishing). The SIC is a unique four-digit code. While foreign agents operate as separate legal entities, GT needs to know in what countries and markets they operate. On the other hand, subsidiaries are fully or partly-owned by GT, and it is important for GT to know who are the other owners of any subsidiary and what percentage of the subsidiary they own. It is not unusual for a corporation to have shares in several of GT's subsidiary companies and for several corporations to own a portion of a subsidiary. Multiple ownership can also occur at the SBU level.
10.
A real estate investment company owns many shopping malls. Each mall contains many shops. To encourage rental of its shops, the company gives a negotiated discount to retailers who have shops in more than one mall. Each shop generates an income stream that can vary from month to month because rental is based on a flat rental charge and a negotiated percentage of sales revenue. Also, each shop has monthly expenses for scheduled and unscheduled maintenance. The company uses the data to compute its monthly net income per square meter for each shop and for ad hoc querying.
11.
Draw a data model for the following table taken from a magazine that evaluates consumer goods. The reports follow a standard fashion of listing a brand and model, price, overall score, and then an evaluation of a series of attributes, which can vary with the product. For example, the sample table evaluates stereo systems. A table for evaluating microwave ovens would have a similar layout, but different features would be reported (e.g., cooking quality).
Brand and model Price Overall
score
Sound
quality
Taping
quality
FM
tuning
CD
handling
Ease
of use
Phillips SC-AK103 140 620 Very good Good Very good Excellent Fair
Panasonic MC-50 215 55 Good Good Very good Very good Good
Rio G300 165 38 Good Good Fair Very good Poor

12.
Draw a data model for the following freight table taken from a mail order catalog.
Merchandise
subtotals
Regular delivery
7–10 days
Rush delivery
4–5 business days
Express delivery
1–2 business days
Up to $30.00 $4.95 $9.95 $12.45
$30.01–$65.00 $6.95 $11.95 $15.45
$65.01–$125.00 $8.95 $13.95 $20.45
$125.01+ $9.95 $15.95 $25.45

 

This page is part of the promotional and support material for Data Management (open edition) by Richard T. Watson
For questions and comments please contact the author
Date revised: 10-Dec-2021