Last time, we learned about,
Last time, we learned about,
Today, we will cover,
Packages are collections of functions and tools that make your life easier! The best part of R is the huge number of user-created packages. The Packages
tab in the bottom-right pane of RStudio lists your installed packages.
Packages are collections of functions and tools that make your life easier! The best part of R is the huge number of user-created packages. The Packages
tab in the bottom-right pane of RStudio lists your installed packages.
To install a new package in R, run the line of code:
install.packages("gapminder")
We always install packages in the console, because we only want to do it once
Installing a packages does not mean it's loaded in our R session. To do so, we call the package:
library(gapminder)
NOTE: Use quotes when installing packages, but not when loading packages!
Installing a packages does not mean it's loaded in our R session. To do so, we call the package:
library(gapminder)
NOTE: Use quotes when installing packages, but not when loading packages!
We need to run this code every time we open a new R session: Where should we put this code?
Installing a packages does not mean it's loaded in our R session. To do so, we call the package:
library(gapminder)
NOTE: Use quotes when installing packages, but not when loading packages!
We need to run this code every time we open a new R session: Where should we put this code?
Answer: In R/Rmd files, and not the console!
R saves files and looks for files to open in your current working directory. You can ask R what this is:
getwd()
## [1] "/Users/pearce790/CSSS508/Lectures/Lecture2"
R saves files and looks for files to open in your current working directory. You can ask R what this is:
getwd()
## [1] "/Users/pearce790/CSSS508/Lectures/Lecture2"
Similarly, we can set a working directory like so:
setwd("C:/Users/pearce790/CSSS508/HW2")
R saves files and looks for files to open in your current working directory. You can ask R what this is:
getwd()
## [1] "/Users/pearce790/CSSS508/Lectures/Lecture2"
Similarly, we can set a working directory like so:
setwd("C:/Users/pearce790/CSSS508/HW2")
Don't set a working directory in R Markdown documents! They automatically set the directory they are in as the working directory.
When managing R projects, it is normally best to give each project (such as a homework assignment) its own folder. I use the following system:
Every class or project has its own folder
Each assignment or task has a folder inside that, which is the working directory for that item.
.Rmd
and .R
files are named clearly and completely
When managing R projects, it is normally best to give each project (such as a homework assignment) its own folder. I use the following system:
Every class or project has its own folder
Each assignment or task has a folder inside that, which is the working directory for that item.
.Rmd
and .R
files are named clearly and completely
For example, this presentation is located and named this:
GitHub/CSSS508/Lectures/Lecture2/CSSS508_Lecture2_ggplot2.Rmd
Be consistent so your projects are organized! You don't want to lose files!!
In today's lecture and Homework 2, we'll use data from Hans Rosling's Gapminder project. The data can be accessed through the gapminder
R package.
In today's lecture and Homework 2, we'll use data from Hans Rosling's Gapminder project. The data can be accessed through the gapminder
R package.
If you didn't already, run in the console: install.packages("gapminder")
, and then load the package:
library(gapminder)
The data is in a dataframe called gapminder
, which is available after loading the package. Let's explore it using functions from last week:
str(gapminder)
## tibble [1,704 × 6] (S3: tbl_df/tbl/data.frame)## $ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...## $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...## $ year : int [1:1704] 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...## $ lifeExp : num [1:1704] 28.8 30.3 32 34 36.1 ...## $ pop : int [1:1704] 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...## $ gdpPercap: num [1:1704] 779 821 853 836 740 ...
Factor variables country
and continent
Factor variables country
and continent
Many observations: n=1704 rows
Factor variables country
and continent
Many observations: n=1704 rows
For each observation, a few variables: p=6 columns
Factor variables country
and continent
Many observations: n=1704 rows
For each observation, a few variables: p=6 columns
A nested/hierarchical structure: year
in country
in continent
ggplot2
China <- subset(gapminder, gapminder$country == "China")plot(lifeExp ~ year, data = China, xlab = "Year", ylab = "Life expectancy", main = "Life expectancy in China", col = "red", pch = 16)
Note: Don't worry about the code used to create the object China
. We'll explore data manipulation next week!
ggplot
ggplot(data = China, aes(x = year, y = lifeExp)) + geom_point(color = "red", size = 3) + xlab("Year") + ylab("Life expectancy") + ggtitle("Life expectancy in China") + theme_bw(base_size=18)
ggplot
is made with many functions and fewer arguments in each.
ggplot2
The ggplot2
package provides an alternative toolbox for plotting.
# install.packages("ggplot2")library(ggplot2)
The core idea underlying this package is the layered grammar of graphics: we can break up elements of a plot into pieces and combine them.
ggplot2
The ggplot2
package provides an alternative toolbox for plotting.
# install.packages("ggplot2")library(ggplot2)
The core idea underlying this package is the layered grammar of graphics: we can break up elements of a plot into pieces and combine them.
ggplot
s are a bit harder to create, but are usually:
ggplot
graphics objects consist of two primary components:
ggplot
graphics objects consist of two primary components:
Layers, the components of a graph.
ggplot
object using +
.ggplot
graphics objects consist of two primary components:
Layers, the components of a graph.
ggplot
object using +
.Aesthetics, which determine how the layers appear.
color="red"
) inside layer functions.Layers are the components of the graph, such as:
ggplot()
: initializes basic plotting object, specifies input datageom_point()
: layer of scatterplot pointsgeom_line()
: layer of linesgeom_histogram()
: layer of a histogramggtitle()
, xlab()
, ylab()
: layers of labelsfacet_wrap()
: layer creating multiple plot panelstheme_bw()
: layer replacing default gray background with black-and-whiteLayers are separated by a +
sign. For clarity, I usually put each layer on a new line.
Aesthetics control the appearance of the layers:
x
, y
: x and y coordinate values to usecolor
: set color of elements based on some data valuegroup
: describe which points are conceptually grouped together for the plot (often used with lines)size
: set size of points/lines based on some data value (greater than 0)alpha
: set transparency based on some data value (between 0 and 1)We'll now build up two ggplot
s together that demonstrate common layers and aesthetics.
ggplot(data = China, aes(x = year, y = lifeExp))
Initialize the plot with ggplot()
and x
and y
aesthetics mapped to variables. These aesthetics will be accessible to any future layers since they're in the primary layer.
ggplot(data = China, aes(x = year, y = lifeExp)) + geom_point()
Add a scatterplot layer.
ggplot(data = China, aes(x = year, y = lifeExp)) + geom_point(color = "red", size = 3)
Set aesthetics to make the points large and red.
ggplot(data = China, aes(x = year, y = lifeExp)) + geom_point(color = "red", size = 3) + xlab("Year")
Add a layer to capitalize the x-axis label.
ggplot(data = China, aes(x = year, y = lifeExp)) + geom_point(color = "red", size = 3) + xlab("Year") + ylab("Life expectancy")
Add a layer to clean up the y-axis label.
ggplot(data = China, aes(x = year, y = lifeExp)) + geom_point(color = "red", size = 3) + xlab("Year") + ylab("Life expectancy") + ggtitle("Life expectancy in China")
Add a title layer.
ggplot(data = China, aes(x = year, y = lifeExp)) + geom_point(color = "red", size = 3) + xlab("Year") + ylab("Life expectancy") + ggtitle("Life expectancy in China") + theme_bw()
Pick a nicer theme with a new layer.
ggplot(data = China, aes(x = year, y = lifeExp)) + geom_point(color = "red", size = 3) + xlab("Year") + ylab("Life expectancy") + ggtitle("Life expectancy in China") + theme_bw(base_size=18)
Increase the base text size.
We have a plot we like for China...
... but what if we want all the countries?
ggplot(data = gapminder, aes(x = year, y = lifeExp)) + geom_point(color = "red", size = 3) + xlab("Year") + ylab("Life expectancy") + ggtitle("Life expectancy over time") + theme_bw(base_size=18)
We can't tell countries apart! Maybe we could follow lines?
ggplot(data = gapminder, aes(x = year, y = lifeExp)) + geom_line(color = "red", size = 3) + xlab("Year") + ylab("Life expectancy") + ggtitle("Life expectancy over time") + theme_bw(base_size=18)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.## ℹ Please use `linewidth` instead.
ggplot2
doesn't know how to connect the lines!
ggplot(data = gapminder, aes(x = year, y = lifeExp, group = country)) + geom_line(color = "red", size = 3) + xlab("Year") + ylab("Life expectancy") + ggtitle("Life expectancy over time") + theme_bw(base_size=18)
That looks more reasonable... but the lines are too thick!
ggplot(data = gapminder, aes(x = year, y = lifeExp, group = country)) + geom_line(color = "red") + xlab("Year") + ylab("Life expectancy") + ggtitle("Life expectancy over time") + theme_bw(base_size=18)
Much better... but maybe we can do highlight regional differences?
ggplot(data = gapminder, aes(x = year, y = lifeExp, group = country, color = continent)) + geom_line() + xlab("Year") + ylab("Life expectancy") + ggtitle("Life expectancy over time") + theme_bw(base_size=18)
Patterns are obvious... but why not separate continents completely?
ggplot(data = gapminder, aes(x = year, y = lifeExp, group = country, color = continent)) + geom_line() + xlab("Year") + ylab("Life expectancy") + ggtitle("Life expectancy over time") + theme_bw(base_size=18) + facet_wrap(~ continent)
Now the text is too big!
ggplot(data = gapminder, aes(x = year, y = lifeExp, group = country, color = continent)) + geom_line() + xlab("Year") + ylab("Life expectancy") + ggtitle("Life expectancy over time") + theme_bw() + facet_wrap(~ continent)
Better. Do we even need the legend anymore?
ggplot(data = gapminder, aes(x = year, y = lifeExp, group = country, color = continent)) + geom_line() + xlab("Year") + ylab("Life expectancy") + ggtitle("Life expectancy over time") + theme_bw() + facet_wrap(~ continent) + theme(legend.position = "none")
Looking good!
(10 minute break!)
Next, we'll discuss:
Storing, modifying, and saving ggplots
Advanced axis changes (scales, text, ticks)
Legend changes (scales, colors, locations)
We can assign a ggplot
object to a name:
lifeExp_by_year <- ggplot(data = gapminder, aes(x = year, y = lifeExp, group = country, color = continent)) + geom_line() + xlab("Year") + ylab("Life expectancy") + ggtitle("Life expectancy over time") + theme_bw() + facet_wrap(~ continent) + theme(legend.position = "none")
Afterwards, you can display or modify ggplot
s...
lifeExp_by_year
lifeExp_by_year + theme(legend.position = "bottom")
ggplot
PlotsIf you want to save a ggplot, use ggsave()
:
ggsave("I_saved_a_file.pdf", plot = lifeExp_by_year, height = 3, width = 5, units = "in")
If you didn't manually set font sizes, these will usually come out at a reasonable size given the dimensions of your output file.
We can modify the axes in a variety of ways, such as:
Change the x or y range using xlim()
or ylim()
layers
Change to a logarithmic or square-root scale on either axis: scale_x_log10()
, scale_y_sqrt()
Change where the major/minor breaks are: scale_x_continuous(breaks =, minor_breaks = )
ggplot(data = China, aes(x = year, y = gdpPercap)) + geom_line() + scale_y_log10(breaks = c(1000, 2000, 3000, 4000, 5000)) + xlim(1940, 2010) + ggtitle("Chinese GDP per capita")
lifeExp_by_year + theme(legend.position = c(0.8, 0.2))
Instead of coordinates, you could also use "top", "bottom", "left", or "right".
Scales are layers that control how the mapped aesthetics appear.
You can modify these with a scale_[aesthetic]_[option]()
layer:
Scales are layers that control how the mapped aesthetics appear.
You can modify these with a scale_[aesthetic]_[option]()
layer:
[aesthetic]
is color
, shape
, linetype
, alpha
, size
, fill
, etc.Scales are layers that control how the mapped aesthetics appear.
You can modify these with a scale_[aesthetic]_[option]()
layer:
[aesthetic]
is color
, shape
, linetype
, alpha
, size
, fill
, etc.
[option]
is something like manual
, continuous
or discrete
(depending on nature of the variable).
Scales are layers that control how the mapped aesthetics appear.
You can modify these with a scale_[aesthetic]_[option]()
layer:
[aesthetic]
is color
, shape
, linetype
, alpha
, size
, fill
, etc.
[option]
is something like manual
, continuous
or discrete
(depending on nature of the variable).
Examples:
scale_linetype_manual()
: manually specify the linetype for each different valuescale_color_manual()
: manually specify colorslifeExp_by_year + theme(legend.position = c(0.8, 0.2)) + scale_color_manual( name = "Which continent are\nwe looking at?", # \n adds a line break values = c("Africa" = "seagreen", "Americas" = "turquoise1", "Asia" = "royalblue", "Europe" = "violetred1", "Oceania" = "yellow"))
We're going to slowly build up a really detailed plot now!
ggplot(data = gapminder, aes(x = year, y = lifeExp, group = country)) # # # # # # # # #
ggplot(data = gapminder, aes(x = year, y = lifeExp, group = country)) + geom_line() # # # # # # # #
ggplot(data = gapminder, aes(x = year, y = lifeExp, group = country)) + geom_line() + geom_line(stat = "smooth", method = "loess", aes(group = continent)) # # # # # #
Note: A loess curve is something like a moving average.
ggplot(data = gapminder, aes(x = year, y = lifeExp, group = country)) + geom_line() + geom_line(stat = "smooth", method = "loess", aes(group = continent)) + facet_wrap(~ continent, nrow = 2) # # # # #
ggplot(data = gapminder, aes(x = year, y = lifeExp, group = country)) + geom_line(aes(color = "Country")) + geom_line(stat = "smooth", method = "loess", aes(group = continent, color = "Continent")) + facet_wrap(~ continent, nrow = 2) + scale_color_manual(name = "Life Exp. for:", values = c("Country" = "black", "Continent" = "blue")) # # # #
ggplot(data = gapminder, aes(x = year, y = lifeExp, group = country)) + geom_line(aes(color = "Country", size = "Country")) + geom_line(stat = "smooth", method = "loess", aes(group = continent, color = "Continent", size = "Continent")) + facet_wrap(~ continent, nrow = 2) + scale_color_manual(name = "Life Exp. for:", values = c("Country" = "black", "Continent" = "blue")) + scale_size_manual(name = "Life Exp. for:", values = c("Country" = 0.25, "Continent" = 3)) # # #
ggplot(data = gapminder, aes(x = year, y = lifeExp, group = country)) + geom_line(alpha = 0.5, aes(color = "Country", size = "Country")) + geom_line(stat = "smooth", method = "loess", aes(group = continent, color = "Continent", size = "Continent"), alpha = 0.5) + facet_wrap(~ continent, nrow = 2) + scale_color_manual(name = "Life Exp. for:", values = c("Country" = "black", "Continent" = "blue")) + scale_size_manual(name = "Life Exp. for:", values = c("Country" = 0.25, "Continent" = 3)) # # #
ggplot(data = gapminder, aes(x = year, y = lifeExp, group = country)) + geom_line(alpha = 0.5, aes(color = "Country", size = "Country")) + geom_line(stat = "smooth", method = "loess", aes(group = continent, color = "Continent", size = "Continent"), alpha = 0.5) + facet_wrap(~ continent, nrow = 2) + scale_color_manual(name = "Life Exp. for:", values = c("Country" = "black", "Continent" = "blue")) + scale_size_manual(name = "Life Exp. for:", values = c("Country" = 0.25, "Continent" = 3)) + theme_minimal(base_size = 14) + ylab("Years") + xlab("") # #
ggplot(data = gapminder, aes(x = year, y = lifeExp, group = country)) + geom_line(alpha = 0.5, aes(color = "Country", size = "Country")) + geom_line(stat = "smooth", method = "loess", aes(group = continent, color = "Continent", size = "Continent"), alpha = 0.5) + facet_wrap(~ continent, nrow = 2) + scale_color_manual(name = "Life Exp. for:", values = c("Country" = "black", "Continent" = "blue")) + scale_size_manual(name = "Life Exp. for:", values = c("Country" = 0.25, "Continent" = 3)) + theme_minimal(base_size = 14) + ylab("Years") + xlab("") + ggtitle("Life Expectancy, 1952-2007", subtitle = "By continent and country") #
ggplot(data = gapminder, aes(x = year, y = lifeExp, group = country)) + geom_line(alpha = 0.5, aes(color = "Country", size = "Country")) + geom_line(stat = "smooth", method = "loess", aes(group = continent, color = "Continent", size = "Continent"), alpha = 0.5) + facet_wrap(~ continent, nrow = 2) + scale_color_manual(name = "Life Exp. for:", values = c("Country" = "black", "Continent" = "blue")) + scale_size_manual(name = "Life Exp. for:", values = c("Country" = 0.25, "Continent" = 3)) + theme_minimal(base_size = 14) + ylab("Years") + xlab("") + ggtitle("Life Expectancy, 1952-2007", subtitle = "By continent and country") + theme(axis.text.x = element_text(angle = 45))
Note: Fewer values might be better than angled labels!
ggplot(data = gapminder, aes(x = year, y = lifeExp, group = country)) + geom_line(alpha = 0.5, aes(color = "Country", size = "Country")) + geom_line(stat = "smooth", method = "loess", aes(group = continent, color = "Continent", size = "Continent"), alpha = 0.5) + facet_wrap(~ continent, nrow = 2) + scale_color_manual(name = "Life Exp. for:", values = c("Country" = "black", "Continent" = "blue")) + scale_size_manual(name = "Life Exp. for:", values = c("Country" = 0.25, "Continent" = 3)) + theme_minimal(base_size = 14) + ylab("Years") + xlab("") + ggtitle("Life Expectancy, 1952-2007", subtitle = "By continent and country") + theme(legend.position=c(0.82, 0.15), axis.text.x = element_text(angle = 45))
ggplot2
can do a LOT! I won't expect you to remember all these tools.
With time and practice, you'll start to remember the key tools
When in doubt, Google it! ("R ggplot rename title")
There are lots of great resources out there:
Kieran Healy's book Data Visualization: A Practical Introduction (right) which is targeted at social scientists without technical backgrounds.
In pairs, you will create a histogram of life expectancy observations in the complete Gapminder dataset.
Set the base layer by specifying the data as gapminder
and the x variable as lifeExp
Add a second layer to create a histogram using the function geom_histogram()
Customize your plot with nice axis labels and a title.
ggplot(gapminder,aes(x=lifeExp))
ggplot(gapminder,aes(x=lifeExp))+ geom_histogram(bins=30)
Setting the bins
aesthetic removes a pesky message!
ggplot(gapminder,aes(x=lifeExp))+ geom_histogram(bins=30)+ xlab("Life Expectancy")+ ylab("Count")+ ggtitle("Histogram of Life Expectancy in Gapminder Data")
In this homework, you'll pose a question regarding the Gapminder dataset and investigate it graphically.
geom_histogram()
), scatterplots (geom_point()
), or lineplots (geom_line()
).facet_wrap()
or facet_grid()
.Your document should be pleasant for a peer to look at, with some organization. You must write up your observations in words as well as showing the graphs. Upload both the .Rmd
file and the .html
file to Canvas.
Last time, we learned about,
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |