Chapter 16 Data visualization

The commonality between science and art is in trying to see profoundly - to develop strategies of seeing and showing.

Edward Tufte, The Visual Display of Quantitative Information

Learning objectives

Students completing this chapter will:

Understand the principles of the grammar of graphics;
Be able to use ggvis to create common business graphics;
Be able to select data from a database for graphic creation;
Be able to depict locations on a Google map.

Visual processing

Humans are particularly skilled at processing visual information because it is an innate capability, compared to reading which is a learned skill. When we evolved on the Savannah of Africa, we had to be adept at processing visual information (e.g., recognizing danger) and deciding on the right course of action (fight or flight). Our ancestors were those who were efficient visual processors and quickly detected threats and used this information to make effective decisions. They selected actions that led to survival. Those who were inefficient visual information processors did not survive to reproduce. Even those with good visual skills failed to propagate if they made poor decisions relative to detected dangers. Consequently, we should seek opportunities to present information visually and play to our evolved strength. As people vary in their preference for visual and textual information, it often makes sense to support both types of reporting.

In order to learn how to visualize data, you need to become familiar with the grammar of graphics and ggvis (an R extension for graphics). In line with the learning of data modeling and SQL, we will take an intertwined spiral approach. First we will tackle the grammar of graphics (the abstract) and then move to ggvis (the concrete). You will also learn how to take the output of an SQL query and feed it directly into ggvis. The end result will be a comprehensive set of practical skills for data visualization.

The grammar of graphics

A grammar is a system of rules for generating valid statements in a language. A grammar makes it possible for people to communicate accurately and concisely. English has a rather complex grammar, as do most languages. In contrast, computer-related languages, such as SQL, have a relatively simple grammar because they are carefully designed for expressive power, consistency, and ease of use. Each of the dialects of data modeling also has a grammar, and each of these grammars is quite similar in that they all have the same foundational elements: entities, attributes, identifiers, and relationships.

A grammar has been designed by Wilkinson⁴⁹ for creating graphics to enhance their expressiveness and comprehensiveness. From a mathematical perspective, a graph is a set of points. A graphic is a physical representation of a graph. Thus, a graph can have many physical representations, and one of the skills you need to gain is to be able to judge what is a meaningful graphic for your clients. A grammar for graphics provides you with many ways of creating a graphic, just as the grammar of English gives you many ways of writing a sentence. Of course, we differ in our ability to convey information in English. Similarly, we also differ in our skills in representing a graph in a visual format. The aesthetic attributes of a graph determine its ability to convey information. For a graphic, aesthetics are specified by elements such as size and color. One of the most applauded graphics is Minard’s drawing in 1861 of Napoleon’s advance on and retreat from Russia during 1812. The dominating aspect of the graphic is the dwindling size of the French army as it battled winter, disease, and the Russian army. The graph shows the size of the army, its location, and the direction of its movement. The temperature during the retreat is drawn at the bottom of graphic.

Charles Joseph Minard’s graphic of Napoleon’s Russian expedition in 1812

Wilkinson’s grammar is based on six elements:

Data: a set of data operations that creates variables from datasets;
Trans: variable transformations;
Scale: scale transformations;
Coord: a coordinate system;
Element: a graph and its aesthetic attributes;
Guide: one or more guides.

Interest in the grammar of data visualization has increased in recent years because of the growth in data. There is ongoing search to find ways to reveal clearly the information in large data sets. Vega⁵⁰ is a recent formulation building on Wilkinson’s work. In Vega, a visualization consists of basic properties of a graph (such as the width and height) and definitions of the data to visualize, scales for mapping to data to a visual form, axes for representing scales, and graphic marks (such as rectangles, circles, and arcs) for displaying the data.

ggvis

ggvis is an implementation of Vega in R Because it is based on a grammar, ggvis is a very powerful graphics tool for creating both simple and complex graphs. It is well-suited for generating multi-layered graphs because of its grammatical foundation. As a result, using ggvis you specify a series of steps (think of them as sentences) to create a graphic (think of it as a paragraph) to visually convey your message. ggvis is a descendant of ggplot2 and adds new features to support interactive visualization using shiny, another R package, which we will cover later.

Data

Most structured data, which is what we require for graphing, are typically stored in spreadsheets or databases. In the prior chapter introducing R, you learned how to read a spreadsheet exported as a CSV file and access a database. These are also the skills you need for reading a file containing data to be visualized.

Transformation

A transformation converts data into a format suitable for the intended visualization. In this case, we want to depict the relative change in carbon levels since pre-industrial periods, when the value was 280 ppm. Here are sample R commands.

# compute a new column in carbon containing the relative change in CO2
carbon$relCO2 = (carbon$CO2-280)/280

There are many ways that you might want to transform data. The preceding example just illustrates the general nature of a transformation. You can also think of SQL as a transformation process as it can compute new values from existing columns.

Coord

A coordinate system describes where things are located. A geopositioning system (GPS ) reading of latitude and longitude describes where you are on the surface of the globe. It is typically layered onto a map so you can see where you are relative to your current location. Most graphs are plotted on a two-dimensional (2D) grid with x (horizontal) and y (vertical) coordinates. ggvis currently supports six 2D coordinate systems, as shown in the following table. The default coordinate system for most packages is Cartesian.

Coordinate systems

Name	Description
cartesian	Cartesian coordinates
equal	Equal scale Cartesian coordinates
flip	Flipped Cartesian coordinates
trans	Transformed Cartesian coordinates
map	Map projections
polar	Polar coordinates

Element

An element is a graph and its attributes. Let’s start with a simple scatterplot of year against CO₂ emissions. We do this in two steps applying the ggvis approach of building a graphic by adding layers. ggvis uses the pipe function to specify a linear sequence of processing.

The foundation is the ggvis function, which identifies the source of the data and what is to be plotted. In the following example, the file carbon is fed into the ggvis function, which selects year and CO2 as the x and y, respectively, dimensions of the graph
A graph consists of a number of layers, and in the following example the points layer receives the x and y values from ggvis and plots each pair of points with a red dot.

library(ggvis)
library(readr)
library(dplyr)
url <- 'http://www.richardtwatson.com/data/carbon.txt'
carbon <- read_delim(url, delim=',')

## Rows: 58 Columns: 2

## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (2): year, CO2

## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# Select year(x) and CO2(y) to create a x-y point plot
# Specify red points, as you find that aesthetically pleasing
carbon %>% 
  ggvis(~year,~CO2) %>% 
  layer_points(fill:='red')

# Notice how ‘%>%’ is used for creating a pipeline of commands

Graphics appear under the Plots tab of the bottom right window. You can select the Export tab to save a graphic to a file or the clipboard. Also, notice that you can use left and right arrows of the Plots tab to cycle through graphics created in a session.

Scale

Scales control the visualization of data. It is usually a good idea to have a zero point for the y axis so you don’t distort the slope of the line. In the following example, the scale_numeric functionspecifies a zero point for the y axis with the command zero=T.

carbon %>% 
  ggvis(~year,~CO2) %>% 
  layer_points(fill:='red') %>% 
  scale_numeric('y',zero=T)

Axes and legends

Axes and legends are both forms of guides, which help the reader to understand a graphic. Readability enhancers such as axes, title, and tick marks are generated automatically based on the parameters specified in the ggvis command. You can override the defaults.

Let’s redo the graphic with the relative change since the beginning of industrialization in the mid 18th century, when the level of CO2 was around 280 ppm. This time, we will create a line plot. We also add some titles for the axes and specify a format of four consecutive digits for displaying a year on the x-axis. We also move or offset the title for the y-axis a bit to the left to improve readability.

# compute a new column containing the relative change in CO2
carbon %>% 
  mutate(relCO2 = (CO2-280)/280) %>% # transformation
  ggvis(~year,~relCO2) %>% 
  layer_lines(stroke:="blue") %>% 
  scale_numeric('y',zero=T) %>% 
  add_axis('y', title = "CO2 ppm of the atmosphere", title_offset=50) %>% 
  add_axis('x', title ='Year', format='####')

As the prior graphic shows, the present level of CO₂ in the atmosphere is now above 40 percent higher than the pre-industrial period and is continuing to grow.

Assignment function

You will have seen three types of assignment symbols in the preceding examples. The notion of scaling helps in understanding the difference between a value and a property. A property is fixed. So no matter how you resize the preceding graph, the stroke will always be blue. Whereas, a value, such as the title, is scalable. The difference between a value and a property can seem confusing, so don’t agonize over it.

Symbol		Example
~	Data assignment	y ~ CO2	y is CO2 column
=	Set a value	title = ‘year’	Title is scaled
:=	Set a property	stroke := ‘blue’	Stroke is unscaled

Guides

Axes and legends are both forms of guides, which help the reader to understand a graphic. In ggvis, legends and axes are generated automatically based on the parameters specified in the ggvis command. You have the capability to override the defaults for axes, but the legend is quite fixed in its format.

In the following graphic, for each axis there is a label and there are tick marks with values so that you eyeball a point on the graphic for its x and y values. The legend enables you to determine which color, in this case, matches each year. A legend could also use shape (e.g., a diamond) and shape size to aid matching.

❓Skill builder

Create a line plot using the data in the following table.

year 1804 1927 1960 1974 1987 1999 2012 2027 2046

population (billions) 1 2 3 4 5 6 7 8 9

year	1804	1927	1960	1974	1987	1999	2012	2027	2046
population (billions)	1	2	3	4	5	6	7	8	9

Some recipes

Learning the full power of ggvis is quite an undertaking, so here are a few recipes for visualizing data.

Histogram

Histograms are useful for showing the distribution of values in a single column. A histogram has a vertical orientation. The number of occurrences of a particular value in a column are automatically counted by the ggvis function. In the following sample code, we show the distribution of temperatures in Celsius for the Central Park temperature data. Notice that the Celsius conversion is rounded to an integer for plotting. Width specifies the size of the bin, which is two in this case. This means that the bin above the tick mark 10 contains all values in the range 9 to 11. There is online a list of names of colors you can use in ggvis.⁵¹

library(ggvis)
library(readr)
library(measurements)
url <-  'http://www.richardtwatson.com/data/centralparktemps.txt'
t <- read_delim(url, delim=',')

## Rows: 1794 Columns: 3

## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): year, month, temperature

## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

t %>% 
  mutate(Celsius = conv_unit(t$temperature,'F','C')) %>%
  ggvis(~Celsius) %>% 
  layer_histograms(width = 2, fill:='cornflowerblue') %>%
add_axis('x',title='Celsius') %>% 
  add_axis('y',title='Frequency')

Bar graph

In the following example, we query a database to get data for plotting. Because the column selected for graphing, productLine is categorical, we need to use a bar graph.

library(ggvis)
library(DBI)
conn <- dbConnect(RMySQL::MySQL(), "www.richardtwatson.com", dbname="ClassicModels", user="student", password="student")
# Query the database and create file for use with R
d <- dbGetQuery(conn,"SELECT * from Products;") 
# Plot the number of product lines by specifying the appropriate column name
d %>% 
  ggvis(~productLine) %>% 
  layer_bars(fill:='chocolate') %>%
  add_axis('x',title='Product line') %>% 
  add_axis('y',title='Count')

❓Skill builder

Create a bar graph using the data in the following table. Set population as the weight for each observation.

year 1804 1927 1960 1974 1987 1999 2012 2027 2046

population (billions) 1 2 3 4 5 6 7 8 9

year	1804	1927	1960	1974	1987	1999	2012	2027	2046
population (billions)	1	2	3	4	5	6	7	8	9

Scatterplot

A scatterplot shows points on an x-y grid, which means you need to have an x and y with numeric values.

library(ggvis)
library(DBI)
library(dplyr)
library(lubridate)

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

conn <- dbConnect(RMySQL::MySQL(), "www.richardtwatson.com", dbname="ClassicModels", user="student", password="student")
o <- dbGetQuery(conn,"SELECT * FROM Orders") 
od <- dbGetQuery(conn,"SELECT * FROM OrderDetails")
d <- inner_join(o,od)

## Joining, by = "orderNumber"

# Get the monthly value of orders
d2 <- d %>% 
  mutate(month = month(orderDate)) %>% 
  group_by(month) %>% summarize(orderValue = sum(quantityOrdered*priceEach))
# Plot data orders by month
# Show the points and the line
d2 %>% 
  ggvis(~month, ~orderValue/1000000) %>%  
  layer_lines(stroke:='blue') %>%
  layer_points(fill:='red') %>%
  add_axis('x', title = 'Month') %>%
  add_axis('y',title='Order value (millions)', title_offset=30)

It is sometimes helpful to show multiple scatterplots on the one grid. In ggvis, you can create groups of points for plotting. Sometimes, you might find it convenient to recode data so you can use different colors or lines to distinguish values in set categories. Let’s examine grouping by year.

library(ggvis)
library(DBI)
library(dplyr)
library(lubridate)
conn <- dbConnect(RMySQL::MySQL(), "www.richardtwatson.com", dbname="ClassicModels", user="student", password="student")
o <- dbGetQuery(conn,"SELECT * FROM Orders") 
od <- dbGetQuery(conn,"SELECT * FROM OrderDetails")
d <- inner_join(o,od)

## Joining, by = "orderNumber"

d2 <- d %>% 
  mutate(month = month(orderDate)) %>% 
  mutate(year = year(orderDate)) %>% 
  group_by(year,month) %>% summarize(orderValue = sum(quantityOrdered*priceEach))

## `summarise()` has grouped output by 'year'. You can override using the `.groups` argument.

# Plot data orders by month and display by year
# ggvis expects grouping variables to be a factor
d2 %>% 
  ggvis(~month,~orderValue/1000, stroke = ~as.factor(year)) %>%
  layer_lines() %>% 
  add_axis('x', title = 'Month') %>%
  add_axis('y',title='Order value (thousands)', title_offset=50)

Because ggvis is based on a grammar of graphics, with a few changes, you can create a bar graph of the same data.

d2 %>% 
  ggvis( ~month, ~orderValue/100000, fill = ~as.factor(year)) %>% 
  layer_bars() %>% 
  add_axis('x', title = 'Month') %>%
  add_axis('y',title='Order value (thousands)', title_offset=50)

Sometimes the data that you want to graph will be in multiple files. Here is a case where we want to show ClassicCars orders and payments by month on the same plot.

library(ggvis)
library(DBI)
library(dplyr)
library(lubridate)
# Load the driver
conn <- dbConnect(RMySQL::MySQL(), "www.richardtwatson.com", dbname="ClassicModels", user="student", password="student")
o <- dbGetQuery(conn,"SELECT * FROM Orders") 
od <- dbGetQuery(conn,"SELECT * FROM OrderDetails")
d <- inner_join(o,od)

## Joining, by = "orderNumber"

d2 <- 
  d %>% 
  mutate(month = month(orderDate)) %>% 
  mutate(year = year(orderDate)) %>% 
  filter(year == 2004) %>% 
  group_by(month) %>% 
  summarize(value = sum(quantityOrdered*priceEach))
d2$category <- 'Orders'
p <-  dbGetQuery(conn,"SELECT * from Payments;")
p2 <- p %>% 
  mutate(month = month(paymentDate)) %>% 
  mutate(year = year(paymentDate)) %>% 
  filter(year==2004) %>% 
  group_by(month) %>% 
  summarize(value = sum(amount))
p2$category <- 'Payments'
m <- rbind(d2,p2) # bind by rows
m %>% 
  group_by(category) %>% 
  ggvis(~month, ~value, stroke = ~ category) %>% 
  layer_lines() %>% 
  add_axis('x',title='Month') %>% 
  add_axis('y',title='Value',title_offset=70)

Smoothing

Smoothing helps to detect a trend in a line plot. The following example shows the average mean temperatures for August for Central Park.

library(ggvis)
library(readr)
library(dplyr)
url <-  "http://www.richardtwatson.com/data/centralparktemps.txt"
t <- read_delim(url, delim=',')

## Rows: 1794 Columns: 3

## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): year, month, temperature

## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

t %>% 
  filter(month == 8) %>% 
  ggvis(~year,~temperature) %>% 
  layer_lines(stroke:='red') %>% 
  layer_smooths(se=T, stroke:='blue') %>%
  add_axis('x',title='Year',format = '####') %>% 
  add_axis('y',title='Temperature (F)', title_offset=30)

❓Skill builder Using the Central Park data, plot the temperatures for February and fit a straight line to the points. Use layer_model_predictions(). What do you observe?

Box Plot

A box plot is an effective means of displaying information about one or more variables. It shows minimum and maximum, range, lower quartile, median, and upper quartile for each variable. The following code creates a box plot for a single variable. Notice that we use factor(0) to indicate there is no grouping variable.

library(ggvis)
library(DBI)
conn <- dbConnect(RMySQL::MySQL(), "www.richardtwatson.com", dbname="ClassicModels", user="student", password="student")
d <- dbGetQuery(conn,"SELECT * from Payments;")
# Boxplot of amounts paid
d %>% 
  ggvis(~factor(0),~amount) %>% 
  layer_boxplots() %>%
  add_axis('x',title='Checks') %>% 
  add_axis('y',title='')

The following example shows a box plot for each month. Notice how to indicate the values for each of the ticks for the x-axis. Try running the code without the values specification.

library(ggvis)
library(DBI)
library(lubridate)
conn <- dbConnect(RMySQL::MySQL(), "www.richardtwatson.com", dbname="ClassicModels", user="student", password="student")
d <- dbGetQuery(conn,"SELECT * from Payments;")
# Boxplot of amounts paid
d %>% 
  mutate(month = month(paymentDate)) %>% 
  ggvis(~month,~amount) %>% 
  layer_boxplots() %>%
  add_axis('x',title='Month', values=c(1:12)) %>%
  add_axis('y',title='Amount', title_offset=70)

Heat map

A heat map visualizes tabular data. It is useful when you have two categorical variables cross tabulated. Consider the case for the ClassicModels database where we want to get an idea of the different number of model scales in each product line. We can get a quick picture with the following code.

library(ggvis)
library(DBI)
library(dplyr)
# Load the driver
conn <- dbConnect(RMySQL::MySQL(), "www.richardtwatson.com", dbname="ClassicModels", user="student", password="student")
d <- dbGetQuery(conn,'SELECT * FROM Products;')
d2 <- 
  d %>% 
  group_by(productLine, productScale) %>% 
  summarize(frequency = n())

## `summarise()` has grouped output by 'productLine'. You can override using the `.groups` argument.

d2 %>% 
  ggvis( ~productScale, ~productLine, fill= ~frequency) %>% 
  layer_rects(width = band(), height = band()) %>%
  add_axis('y',title='Product Line', title_offset=70) %>% 
  # add frequency to each cell
  layer_text(text:=~frequency, stroke:='white', align:='left', baseline:='top')

Interactive graphics

The ggvis package incorporates features of shiny, an R package, that enable you to create interactive graphics. For example, you might want to let the viewer select the color and width of a line on a graphic. Many of the interactive controls, shown in the following table, should be familiar as the the various options are often used for web forms.

Interactive controls for ggvis

Function	Purpose
input_checkbox()	Check one or more boxes
input_checkboxgroup()	A group of checkboxes
input_numeric()	A spin box
input_radiobuttons()	Pick one from a set of options
input_select()	Select from a drop-down text box
input_slider()	Select using a slider
input_text()	Input text

Selecting a property using a drop-down list

The following example uses the input_select function to present the viewer with an option of one of three colors (red, green, or blue) for the stroke of the graph. When you execute the code, you will see a selection list on the left bottom. Because the graph is interactive, you will need to click on the stop button (top right of the Viewer window) before running anymore R commands.

library(shiny)
library(ggvis)
library(dplyr)
carbon %>% 
  mutate(relCO2 = (CO2-280)/280) %>% 
  ggvis(~year,~relCO2) %>%
  layer_lines(stroke:=input_select(c("red", "green", "blue"))) %>% 
  scale_numeric('y',zero=T) %>% 
  add_axis('y', title = "CO2 ppm of the atmosphere", title_offset=50) %>% 
  add_axis('x', title ='Year', format='####')

## Warning: Can't output dynamic/interactive ggvis plots in a knitr document.
## Generating a static (non-dynamic, non-interactive) version of the plot.

❓Skill builder

Create a point plot using the data in the following table. Give the viewer a choice of three colors and three shapes (square, cross, or diamond) for the points.

year 1804 1927 1960 1974 1987 1999 2012 2027 2046

population (billions) 1 2 3 4 5 6 7 8 9

year	1804	1927	1960	1974	1987	1999	2012	2027	2046
population (billions)	1	2	3	4	5	6	7	8	9

Selecting a numeric value with a slider

A slider enables the viewer to select a numeric value in a range. In the following code, a slider is set up for values between 1 to 5, inclusively. Note in this case, the stroke width slider and color selector are defined outside the ggvis code. This is a useful technique for writing code that makes it possible for some chunks to be easily reused

slider <- input_slider(1, 5, label = "Width")
select_color <- input_select(label='Color',c("red", "green", "blue")) 
carbon %>% 
  mutate(relCO2 = (CO2-280)/280) %>% 
  ggvis(~year,~relCO2) %>% 
  layer_lines(stroke:=select_color, strokeWidth:=slider) %>%
  scale_numeric('y',zero=T) %>% 
  add_axis('y', title = "CO2 ppm of the atmosphere", title_offset=50) %>% 
  add_axis('x', title ='Year', format='####')

## Warning: Can't output dynamic/interactive ggvis plots in a knitr document.
## Generating a static (non-dynamic, non-interactive) version of the plot.

❓Skill builder Using the Central Park data, plot the temperatures for a selected year using a slider.

Geographic data

The ggmap package supports a variety of mapping systems, including Google maps. As you might expect, it offers many features, and we just touch on the basics in this example.

The Offices table in the Classic Models database includes the latitude and longitude of each office in officeLocation, which has a datatype of POINT. The following R code can be used to mark each office on a Google map. After loading the required packages, a database query is executed to return the longitude and latitude for each office. Then, a Google map of the United States is displayed. The marker parameter specifies the name of the table containing the values for longitude and latitude. Offices that are not in the U.S. (e.g., Sydney) are ignored. Adjust zoom, an integer, to get a suitably sized map. Zoom can range from 3 (a continent) to 20 (a building).

# Google now requires an API key
library(ggplot2)
library(ggmap)
library(mapproj)
# Google maps requires lon and lat, in that order, to create markers
d <- dbGetQuery(conn,"SELECT ST_Y(officeLocation) AS lon, ST_X(officeLocation) AS lat FROM Offices;")
# show offices in the United States
# vary zoom to change the size of the map
map <-  get_googlemap('united states',marker=d,zoom=4)
ggmap(map) + labs(x = 'Longitude', y = 'Latitude') + ggtitle('US offices')

❓Skill builder

Create the following maps.

Offices in Europe

Office location in Paris

Customers in the US

Customers in Sydney

R resources

The developer of ggvis, Hadley Wickham, maintains a web site. Another useful web site for data visualization is FlowingData, which has an associated book, titled Visualize This. If you become heavily involved in visualization, we suggest you read some of the works of Edward Tufte. One of his books is listed in the references.

Summary

Because of evolutionary pressure, humans became strong visual processors. Consequently, graphics can be very effective for presenting information. The grammar of graphics provides a logical foundation for the construction of a graphic by adding successive layers of information. ggvis is a package implementing the grammar of graphics in R, the open source statistics platform. Data can be extracted from an MySQL database for processing and graphical presentation in R. Spatial data can be selected from a database and displayed on a Google map.

Key terms and concepts
Bar chart	Graphic
Box plot	Heat map
dplyr	Histogram
ggvis	Scatterplot
Grammar of graphics	Smooth
Graph

References

Kabacoff, R. I. (2009). R in action: data analysis and graphics with R. Greenwich, CT: Manning Publications.

Kahle, D., & Wickham, H. (2013). ggmap : Spatial Visualization with ggplot. The R Journal.

Tufte, E. (1983). The visual display of quantitative information. Cheshire, CT: Graphics Press.

Yau, N. (2011). Visualize this: the flowing data guide to design, visualization, and statistics. Indianapolis, IN: Wiley.

Wilkinson, L. (2005). The grammar of graphics (2nd ed.). New York: Springer.

Exercises

Given the following data on the world’s cargo carrying capacity in millions of dead weight tons (dwt)⁵² for the given years, graph the relationship as a point plot. Add a linear prediction model to the graph. What is the approximate increase in the world capacity each year?

dwt <- c(2566, 3704, 4008, 5984, 7700, 8034, 8229, 7858, 8408, 8939)
year <- c(1970, 1980, 1990, 2000, 2006, 2007, 2008, 2009, 2010, 2011)

Create a bar graph showing the relative ocean shipping cost as a percentage of total cost for some common household items using the following data.

unitCost <-  c(700, 200, 150, 50, 15, 3, 1)
shipCost <-  c(10.00, 1.50, 1.00, 0.15, 0.15, 0.05, 0.01)
item <-  c('TV set', 'DVD player', 'Vacuum cleaner', 'Scotch whisky', 'Coffee', 'Biscuits', 'Beer')

SQL input

Visualize in blue the number of items for each product scale.
Prepare a line plot with appropriate labels for total payments for each month in 2004.
Create a histogram with appropriate labels for the value of orders received from the Nordic countries (Denmark, Finland, Norway, Sweden).
Create a heatmap for product lines and Norwegian cities.
Show on a Google map the customers in Japan.
Show on a Google map the European customers who have never placed an order.

File input

Access http://www.richardtwatson.com/data/manheim.txt, which contains details of the sales of three car models: X, Y, and Z.
1. Create a bar chart for sales of each model (X, Y , or Z)
2. Create bar chart for sales by each form of sale (online or auction).
Use the ‘Import Dataset’ feature of RStudio to read http://www.richardtwatson.com/data/electricityprices.csv, which contains details of electricity prices for a major city.⁵³ Do a Box plot of cost. What do you conclude about cost?
Read the table http://www.richardtwatson.com/data/wealth.csv containing details of the wealth of various countries. Create histograms for each of the wealth measures. Consult the ggvis color chart⁵⁴ for a choice of colors.
Merge the data for weather <http://www.richardtwatson.com/data/weather.csv> and electricity prices <http://www.richardtwatson.com/data/electricityprices.csv> for a major city. The merged file should contain air temperature and electricity cost. Also, you need to convert air temperature from a factor to a numeric (hint: first convert to a character). As readr does not currently handle date and time stamps, use the following code to read the files.

wurl <-  'http://www.richardtwatson.com/data/weather.csv'
w <-  read.csv(wurl,sep=',',header=T)
eurl <-  'http://www.richardtwatson.com/data/electricityprices.csv'
e <-  read.csv(eurl,sep=',',header=T)

a. Compute the correlation between temperature and electricity price.
   What do you conclude?

b. Graph the relationship between temperature and electricity price.

c. Graph the relationship between temperature and electricity price
   when the temperature is 95ºF and above.

d. Create a single graph showing the relationship between temperature
   and electricity price differentiating by color when the temperature
   is above or below 90ºF. (Hint: Trying recoding temperature).

Wilkinson, L. (2005). *The grammar of graphics* (2nd ed.). New York: Springer.↩︎
http://trifacta.github.io/vega/↩︎
http://www.w3schools.com/cssref/css_colornames.asp ↩︎
Dead weight tonnage is a measure of how much weight, in tonnes, a ship can safely carry.↩︎
Note these prices have been transformed from the original values, but are still representative of the changes over time.↩︎
http://www.w3schools.com/cssref/css_colornames.asp ↩︎