• 0. Use the following code to download the King County restaurants data:
  • 1. Strings
    • Question 1.1: Use a function to determine how long the following character string is: paste0(letters,1:5,collapse=":")
    • Question 1.2: Describe, in 1-2 complete sentences, the difference between the arguments “sep” and “collapse” in the paste() function.
    • Question 1.3: Filter your data to only include rows in which the Name includes the word “coffee” (in any case!)
    • Question 1.4: Create a new variable in your data which includes the length of the business name, after removing beginning/trailing whitespace.
    • Question 1.5: Create a new variable in your data for the inspection year, using a stringr function!
    • Question 1.6: Create side-by-side boxplots for the length of business name vs. year.
    • Question 1.7: Calculate the maximum Inspection_Score by business (Name) and Year.
    • Question 1.8: Create a line plot of maximum score (“MaxScore”) over time (“Year”), by business (“Name”). That is, you should have a single line for each business. (Don’t try to label them, as there are far too many!)
  • 2. Mapping
    • Question 2.1: Using your data from part 1, create a ggplot displaying each coffee shop in King County by their latitude/longitude. For this question, no need to display any actual map data!
    • Question 2.2: Modify the City variable so that it is in title case. Then, modify your plot from 2.1 such that each city has a different color.
    • Question 2.3: Recreate the plot from 2.2 using the qmplot function in the gmap package
    • Question 2.4: Create a density plot of coffee shops in Bellevue
    • Question 2.5: Create a new dataset called that includes the name, latitude, and longitude of each Starbucks coffee store in Bellevue. Remove any duplicates by year.
    • Question 2.6: Plot all Bellevue coffee shops, then add labels for the Starbucks stores using geom_label_repel

0. Use the following code to download the King County restaurants data:

load(url("https://pearce790.github.io/CSSS508/Lectures/Lecture8/restaurants.Rdata"))

1. Strings

Question 1.1: Use a function to determine how long the following character string is: paste0(letters,1:5,collapse=":")

nchar(paste0(letters,1:5,collapse=":"))
## [1] 77

Question 1.2: Describe, in 1-2 complete sentences, the difference between the arguments “sep” and “collapse” in the paste() function.

ANSWER: “sep” is what separates strings provided by multiple arguments to the paste function. “collapse” is what separates the already-pasted strings after additionally collapsing them into a single string.

Question 1.3: Filter your data to only include rows in which the Name includes the word “coffee” (in any case!)

coffee <- restaurants
coffee$Name <- str_to_lower(coffee$Name)
coffee <- coffee %>% filter(str_detect(Name,"coffee"))

Question 1.4: Create a new variable in your data which includes the length of the business name, after removing beginning/trailing whitespace.

coffee$NameLength <- str_length(str_trim(coffee$Name))

Question 1.5: Create a new variable in your data for the inspection year, using a stringr function!

coffee$Year <- str_sub(coffee$Inspection_Date,-4,-1)

Question 1.6: Create side-by-side boxplots for the length of business name vs. year.

ggplot(coffee,aes(Year,NameLength))+geom_boxplot()

Question 1.7: Calculate the maximum Inspection_Score by business (Name) and Year.

coffee_summary <- coffee %>% group_by(Name,Year) %>% 
  summarize(MaxScore=max(Inspection_Score))
## `summarise()` has grouped output by 'Name'. You can override using the
## `.groups` argument.
coffee_summary %>% head(6)
## # A tibble: 6 × 3
## # Groups:   Name [2]
##   Name                Year  MaxScore
##   <chr>               <chr>    <int>
## 1 701 coffee          2015        10
## 2 701 coffee          2016        48
## 3 701 coffee          2017         0
## 4 909 coffee and wine 2007         2
## 5 909 coffee and wine 2008         5
## 6 909 coffee and wine 2009         2

Question 1.8: Create a line plot of maximum score (“MaxScore”) over time (“Year”), by business (“Name”). That is, you should have a single line for each business. (Don’t try to label them, as there are far too many!)

ggplot(coffee_summary,aes(Year,MaxScore,group=Name))+
  geom_line(alpha=.2)+theme_bw()

2. Mapping

Question 2.1: Using your data from part 1, create a ggplot displaying each coffee shop in King County by their latitude/longitude. For this question, no need to display any actual map data!

ggplot(coffee,aes(x=Longitude,y=Latitude))+
  geom_point()+
  theme_bw()

Question 2.2: Modify the City variable so that it is in title case. Then, modify your plot from 2.1 such that each city has a different color.

coffee$City <- str_to_title(coffee$City)
ggplot(coffee,aes(x=Longitude,y=Latitude,color=City))+
  geom_point()+
  theme_bw()

Question 2.3: Recreate the plot from 2.2 using the qmplot function in the gmap package

qmplot(data=coffee,x=Longitude,y=Latitude,
       color=City)

Question 2.4: Create a density plot of coffee shops in Bellevue

Filter to coffee shops in Bellevue first!!

bellevue_coffee <- coffee %>% filter(City == "Bellevue")
qmplot(data = bellevue_coffee,
       geom = "blank",
       x = Longitude, 
       y = Latitude,
       darken = 0.5)+
  stat_density_2d(
    aes(fill = stat(level)), #<<
    geom = "polygon", 
    alpha = .2, color = NA
  )+
  scale_fill_gradient2(
    "Coffee Shops", 
    low = "white", 
    mid = "yellow", 
    high = "red")

Question 2.5: Create a new dataset called that includes the name, latitude, and longitude of each Starbucks coffee store in Bellevue. Remove any duplicates by year.

Hint: Use the select, filter, and distinct functions (in that order). Within filter, you’ll use str_detect.

unique_bellevue_coffee <- bellevue_coffee %>% 
  select(Name,Latitude,Longitude) %>% 
  filter(str_detect(string = Name,pattern = "starbucks")) %>%
  distinct()

Question 2.6: Plot all Bellevue coffee shops, then add labels for the Starbucks stores using geom_label_repel

qmplot(data=bellevue_coffee,
       x=Longitude,y=Latitude,
       alpha=I(.1))+
  geom_label_repel(
    data = unique_bellevue_coffee,
    aes(label = Name), 
    size=2)