Processing math: 100%
  • Instructions
  • Homework 5
    • Importing the Data
    • Exploratory Data Analysis (EDA)
      • Precinct
      • Race
      • CounterGroup
      • Party
      • CounterType
      • SumOfCount
    • The quantities of interest
    • Filtering down the data
    • Seattle precincts
    • Registered voters and turnout rates
    • Quick Plot!
  • Next Time

Instructions

For Homework 5, you will fill out this RMarkdown template. Throughout the document, create code (in chunks) and write text answers (outside chunks) to answer the provided questions in an analysis of King County election data from 2016. IMPORTANT: Do NOT add any additional code chunks, and do NOT modify any chunk options! This week’s homework will become “part 1” for next week’s homework.

Homework 5

Importing the Data

Download the data from https://raw.githubusercontent.com/breonh/breonh.github.io/main/csss_508/homework/homework_5/king_county_elections_2016.txt. It is a plain text file of data, about 60 MB in size. Values are separated with commas. Read the file into R. Note the cache=TRUE chunk option, which allows R to store the file between “knits” of the RMarkdown document and thus save memory/time.

# [Your Code Here!]

Exploratory Data Analysis (EDA)

Use functions str and/or summary to look at the data. Describe the data in their current state. How many rows are there? What variables are there? What kinds of values do they take? Are the column types sensible?

# [Your Code Here!]

[YOUR ANSWER HERE]

This real-world election data is provided to you in “tidy” format! That is, each row is an observation: The number of votes given to a candidate/ballot measure/answer type in a given political race across voters in a precinct. We will ignore the variables LEG, CC, and CG as they are not of practical use. For the remaining variables, we will explore them graphically and attempt to figure out what they mean. Remember that in real world data work, you often have to get by with intuition or poking around online to figure out the nature of the data.

In each code chunk below, present a summary of each variable on it’s own (such as a histogram, frequency table, barplot, etc.). If there are many categories, it’s fine to print just the first few. If you create a figure, use ggplot. After, write down a one sentence interpretation for each summary.

Precinct

# [Your Code Here!]

[YOUR ANSWER HERE]

Race

# [Your Code Here!]

[YOUR ANSWER HERE]

CounterGroup

# [Your Code Here!]

[YOUR ANSWER HERE]

Party

# [Your Code Here!]

[YOUR ANSWER HERE]

CounterType

# [Your Code Here!]

[YOUR ANSWER HERE]

Notice something odd about CounterType in particular? It tells you what a given row of votes was for… but it also has Registered Voters and Times Counted. What are these values?

SumOfCount

# [Your Code Here!]

[YOUR ANSWER HERE]

The quantities of interest

In this assignment (and the next), we will focus on three major races in Washington in 2016:

  • “US President & Vice President”
  • “Governor”
  • “Lieutenant Governor”

With these races, we are interested in:

  1. Turnout rates for each of these races in each precinct. We will measure turnout as total number of submitted votes (including for a candidate, blank, write-in, or “over vote”) divided by the number of registered voters.
  1. Differences between precincts in Seattle and precincts elsewhere in King County.
  1. Precinct-level support for the Democratic candidates in King County in 2016 for each contest. We will measure support as the percentage of votes in a precinct for the Democratic candidate out of all votes for candidates or write-ins.

You will answer Questions #1 and #2 in this homework (Question #3 will be completed in homework 6). The sections below describe steps you may want to take to answer Questions 1 and 2. I suggest loading dplyr and tidyr (in the very first code chunk of this Rmd) to start!

Filtering down the data

For what we want to do, there are a lot of rows that are not useful. Start by filtering the dataset to only includes rows in which the race is one of: “US President & Vice President”, “Governor”, or “Lieutenant Governor”. Save this subsetted dataset as a new object, called king_county_elections_2016_Exec

# [Your Code Here!]

Seattle precincts

In our subsetted data, we want to add a “boolean” variable (TRUE or FALSE) for a precinct belonging to Seattle. The following code will create a vector of booleans. Using this code, add it to your dataset king_county_elections_2016_Exec as a new variable called “Seattle”

ifelse(substr(king_county_elections_2016_Exec$Precinct, start = 1, stop = 4) == "SEA ","Seattle","Not Seattle")

# [Your Code Here!]

Registered voters and turnout rates

We want to calculate turnout rates for each race. We define Turnout=TotalVotesRegisteredVoters, where total votes are listed in rows where the variable CounterType equals ‘Times Counted’, and registered votes are listed in rows where CounterType equals ‘Registered Voters’. We can calculate turnout rates in three steps: Total votes by race/precinct, Registered votes by race/precinct, and finally turnout by race/precinct. Let’s do it!

First, create a dataset called Votes in which you filter king_county_elections_2016_Exec to contain rows only where CounterType == ‘Times Counted’. You should now have on row per Precint/Race. Add a variable called “TotalVotes” to the Votes dataset, which contains the number of total votes by race/precinct (currently in the “SumOfCount” variable in Votes).

# [Your Code Here!]

Second, create a dataset called Registered which contains rows only where CounterType == ‘Registered Voters’. The “SumOfCount” variable in this dataset includes the number of registered votes by precinct/race. Add a variable called “Registered” to the Votes dataset, which contains the number of registered voters by race/precinct (currently in the “SumOfCount” variable in Registered).

# [Your Code Here!]

Third, create a new variable in Votes called “Turnout”, which includes turnout calculated by dividing total votes by registered voters. Then, subset Votes to contain only the following variables: Precinct, Seattle, Race, TotalVotes, Registered, and Turnout. Display the first 10 rows of Votes, and print the number of rows/columns (there should be 7551 rows and 6 columns!).

# [Your Code Here!]

Quick Plot!

Create ggplot histograms of turnout rates, first by “Race” and second by “Seattle”. Do you notice any changes in turnout based on race or whether or not a precinct was located in Seattle?

# [Your Code Here!]

[YOUR ANSWER HERE]

That’s it for Homework 5!

Next Time

In the next homework, we’ll continue this analysis!