- 2.
What data does your university need to support decision making? Does the data come
from internal or external sources?
The university may need a variety of data for decision making, coming from within the
university and from external sources.
For instance, to plan for incoming students, demographic information would be
important, as would drop out rate (internal data), dorm capacity (internal rate),
and the faculty/staff availability.
To hire new faculty, the university may need information about average salaries for a
particular field, current department size, the teaching load, number of students
enrolled in that college, and average salary of current faculty members in that
college.
There are many more decision making areas for the university and each will have its
own needs for data (both internal and external).
- 4.
What database architecture might be appropriate for a regional telephone company?
Assuming that the regional phone company is primarily in one location and the
database will be used for data analysis and informational uses, the architecture
that might best fit the needs of the phone company would be a tiered database (with
summary information stored locally). The server could be a clustered SMP for the
central location and either SP or SMP for local uses. The DBMS could be an RDBMS for
the central location and an MDDB for local uses.
- 6.
What special feature must you look for in software if you select an MPP?
When you select an MPP, you must look for software that is able to work in parallel
in order to utilize the power of the MPP.
- 8.
If your university were to implement a data warehouse, what examples of dirty data
might you expect to find?
Dirty data that may be found in a university's database system include:
- Duplicate student records (one in the billing department, one in the
grading system, and perhaps one in payroll);
- Duplicate faculty records (one in the payroll and one in the grading
system);
- Multiple entries for the same student (for instance, if they changed
their name or they took some time off and re-enrolled in the university);
- Misuse of data fields (a database may have been designed prior to
the widespread use of e-mail and used a non-populated field such as fax number
for the e-mail addresses).
- 10.
Write five data verification questions for a university data warehouse.
Examples of verification questions:
- What is the average GPA for MIS students?
- What is the average age of a university student?
- How many students have a 4.0 GPA?
- How many students have a home address from the same state in which
the university resides?
- What is the average number of class hours a student registers for in
a term?
- 12.
Imagine you work as an analyst for a major global auto manufacturer. What
techniques would you use for the following questions?
- 12a.
How do sports car buyers differ from other customers?
A discovery question using data mining.
- 12b.
How should the market for trucks be segmented?
Clustering.
- 12c.
Where does our major competitor have its dealers? d. How much money is a
dealer likely to make from servicing a customer who buys a luxury car?
Visualization. An SQL query will give a list of the locations, but a
map (visualization) is likely to give you a better feel for the competitor's impact.
- 12d.
How much money is a dealer likely to make from servicing a customer who buys a
luxury car?
Calculate the predicted value of a customer.
- 12e.
What do people who buy midsize sedans have in common?
Find classification characteristics of people who buy midsize
sedans.
- 12f.
What products do customers buy within six months of buying a new car?
Sequential pattern.
- 12g.
Who are the most likely prospects to buy a luxury car?
Classify customers into luxury and non-luxury car buyers, and then
examine those attributes that discriminate between the two groups.
- 12h.
What were last year's sales of compacts in Europe by country and quarter?
Use a multidimensional database with a verification question.
- 12i.
We know a great deal about the sort of car a customer will buy based on
demographic data (e.g., age, number of children, and type of job). What is a simple
visual aid we can provide to sales personnel to help them show customers the car
they are most likely to buy?
Decision tree. By asking a series of questions based on the decision
tree, a salesperson could identify the car a person is most likely to buy.
- 14.
Discuss the conceptual views of a database presented by the relational, object,
and multidimensional models. How do you reconcile these different views? Are they likely
to cause confusion for data analysts?
A relational database is conceptualized as a table, an object database centers on
specific subjects or objects, and a multidimensional database can be centered on an
n dimensional hypercube. The multidimensional database is the hardest to
conceptualize once the database moves beyond three dimensions. Because all three
database environments operate under different paradigms, data analysts can become
very confused as they switch between the different models.
- 16.
A regional telephone company needs your advice on the data it should include in
its multidimensional database. It has an extensive relational database that captures
details (e.g., calling and called phone numbers, time of day, cost, length of call) of
every call. As well, it has access to extensive demographic data so it can allocate
customers to one of 50 lifestyle categories. What data would you load into the
multidimensional database? What aggregations would you use? It might help to identify
initially the identifier and variable dimensions (use Table 15-6 on page 448).
You may load lifestyle category, cost of phone call, length of call, and time of day
into the multidimensional database. This will allow you to run queries to examine
the profitability of each lifestyle category as well as find calling plans that best
fit the habits of each lifestyle.
- 18.
Download the file exped.xls from the book's Web site and open it with MS Excel.
This file is a sample of 1,000 sales transactions for The Expeditioner. For each sale,
there is a row recording when it was sold, where it was sold, what was sold, how it was
sold, the quantity sold, and the sales revenue. Use the PivotTable Wizard
(Data>PivotTable Report) to produce the following report:
| Sum of REVENUE |
HOW
|
|
|
|
| WHERE |
Catalog |
Store |
Web |
Grand Total |
|
London
|
50,310 |
151,015 |
13,009 |
214,334 |
| New York |
8,712 |
28,060 |
2,351 |
39,123 |
| Paris |
32,166 |
104,083 |
7,054 |
143,303 |
| Sydney |
5,471 |
21,769 |
2,749 |
29,989 |
| Tokyo |
12,103 |
42,610 |
2,003 |
56,716 |
| Grand Total |
108,762 |
347,537 |
27,166 |
483,465 |
Continue to use the PivotTable Wizard to answer the following questions:
- 18a.
What was the value of catalog sales for London in the first quarter?
7,810 (assuming first quarter is Jan.- March)
- 18c.
What percent of Sydney's annual sales were catalog sales?
- 18e.
What was the value of Camel saddle sales for Paris in 2002 by quarter?
| Qtr1 |
13,156
|
| Qtr2 |
14,950
|
| Qtr3 |
24,518
|
| Qtr4 |
8,372
|