Banner

 

Normalization

This lab exercise illustrates the problem of handling a multivalued data item. One approach is to fool the system by defining a fixed number of separate fields with different names, when they are really repetitions of the same field. The result is a violation of first normal form. Normalization can improve the quality of a database design. The following table, UnPerson, is not normalized because it has multiple values for language for each person.

UnPerson
Personcode FirstName Lang1 Lang2 Lang3
1 Rita German English Dutch
2 Wei Mandarin English
3 Alice English French
4 Ned English Japanese
5 Sophie English

To appreciate the problems of using an unnormalize database, write the following queries

  1. List all people who speak French.
  2. List those who speak English and no other language.
  3. Who speaks the most languages?
  4. Add the fact that Rita speaks French.
  5. Suppose that Lang1 is considered the primary or native language or each person. After many years in New Zealand, Wei wants English listed as his primary language.
  6. George, whose record is not shown, just joined the firm. He speaks English and French.

A normalized design

The correct data model, which recognizes a m:m relationship between person and language, generally requires three tables. In this particular case, the Lang table is not necessary since there are no additional attributes of language to be stored. A separate Lang table would be necessary if you wanted to keep a master list of all languages, to control the set of valid values for language, or to store any languages which did not appear in the UnPerson table. Whether you decide to include the the Lang table is not a major issue since it takes little effort to create and maintain, and takes little storage space.

Person
Personcode FirstName
1 Rita
2 Wei
3 Alice
4 Ned
5 Sophie
PersonLang
Personcode Language
1 German
1 English
1 Dutch
2 Mandarin
2 English
3 English
3 French
4 English
4 Japanese
5 English
Lang
Language
German
English
Dutch
Mandarin
French
Japanese

Transferring the data to the new tables

Define the tables Person, PersonLang, and Lang in the usual fashion

Populate the Person table

Copy the first two columns of UnPerson into Person.

insert into person
   (personcode, firstname)
      select personcode, firstname from unperson

Populate the PersonLang table

Create a row for each pair of personcode and language in the unnormalized table (UnPerson). This requires three queries because there are three columns for language in UnPerson.

insert into personlang
   (personcode, language)
   select personcode, lang1 from unperson
      where lang1 is not null

insert into personlang
   (personcode, language)
   select personcode, lang2 from unperson
      where lang2 is not null 

insert into personlang
   (personcode, language)
   select personcode, lang3 from unperson
      where lang3 is not null

Populate the Lang table

Create a table containing the distinct values for language, assuming you want to maintain a master list of languages.

insert into language
   (language)
   select distinct language from personlang

Answers

Unnormalized database

1.

select firstname from unperson
   where lang1 = 'French' or lang2 = 'French' or lang3 = 'French'
2.
select firstname from unperson
   where (lang1 = 'English' and lang2  is null and lang3 is null)
   or (lang1 is null and lang2 = 'English' and lang3 is null)
   or (lang1 is null and lang2 is null and lang3 = 'English')

3a.

The first approach is based on the observation that there are at most three languages for a person

select firstname from unperson
   where (lang1 is not null and lang2 is not null and lang3 is not null)

to produce a list of persons speaking three languages. If nothing is retrieved, then you would have to find those persons speaking two languages, and so on.

3b.

Here is a multistep solution that first computes the number of languages spoken by each person.

Add a column to the person table with the columname langcount

Alter table unperson
   add column langcount …

Determine who speaks three languages

Update unperson
   set langcount = 3
      where (lang1 is not null and lang2 is not null and lang3 is not null)

Determine those who speaks two languages

Update unperson
   set langcount = 2
      where (lang1 is not null and lang2 is not null and lang3 is null)
      or (lang1 is not null and lang2 is null and lang3 is not null)
      or (lang1 is null and lang2 is not null and lang3 is not null)

Determine who speaks one language

Update unperson
   set langcount = 1
      where (lang1 is not null and lang2 is null and lang3 is null)
      or (lang1 is null and lang2 is not null and lang3 is null)
      or (lang1 is null and lang2 is null and lang3 is not null)

Query the updated table

select firstname from unperson
   where langcount in (select max(langcount) from person)

4.

Alter table unperson
   add column lang4 …

Update unperson
   set lang4 = 'French'
   where firstname = 'Rita'

5.

First query the database to discover the languages spoken by Wei.

select firstname, lang1, lang2, lang3 from unperson
   where firstname = 'Wei'

Now, update the table. Note that the user has to remember that lang1 is the primary language.

update unperson
   set lang1 = 'English', lang2 = Mandarin
      where firstname = 'Wei'
6.
insert into unperson (personcode, firstname, lang1, lang2)
   values (6,'George', 'English', 'French')

Normalized database

1.

select firstname from person, personlang
   where person.personcode = personlang.personcode and language = 'French'

2.

select firstname from person, personlang
   where person.personcode = personlang.personcode and language = 'English'
   and firstname in
      (select firstname from person, personlang
         where person.personcode = personlang.personcode
            group by firstname having count(*) = 1)

3.

Here is a multistep solution

a. Create a view called langscount

create view langscount (personcode, langcount)
   (select person.personcode, count(language) from person, personlang
      where person.personcode = personlang.personcode
         group by person.personcode)

b. Query the view

select firstname, langcount from person, langscount
   where person.personcode = langscount.personcode
   and langcount = (select max(langcount) from langscount)

4.

Assuming there is only one person with the name Rita

insert into personlang (personcode, language) 
   select personcode, language from person, lang
      where firstname = 'Rita'
      and language = 'French'

5.

A 1:m relationship between lang and person can record the fact that a person has a single primary language and that a language can be a primary language for many people. The revised tables, showing the foreign key for the new relationship, are:


			
Person
Personcode FirstName PrimeLang
1 Rita German
2 Wei Mandarin
3 Alice English
4 Ned English
5 Sophie English
PersonLang
Personcode Language
1 German
1 English
1 Dutch
2 Mandarin
2 English
3 English
3 French
4 English
4 Japanese
5 English
Lang
Language
German
English
Dutch
Mandarin
French
Japanese

To change Wei's primary language to English

update person   
   set primelang = 'English'
      where firstname = 'Wei'

6.

insert into person (personcode, firstname)
   values (6,'George')

insert into personlang (personcode, language)
   values (6, 'French')

insert into personlang (personcode, language)
   values (6, 'English')

Note

 

This page is part of the promotional and support material for Data Management (open edition) by Richard T. Watson
For questions and comments please contact the author

Date revised: 02-Dec-2022