class: center, top, title-slide .title[ # CSSS 508, Lecture 1 ] .subtitle[ ## Syllabus and Introduction to R, RStudio, and RMarkdown ] .author[ ### Michael Pearce
(based on slides from Chuck Lanfear) ] .date[ ### March 29, 2022 ] --- # Welcome to CSSS 508: Introduction to R for Social Scientists Today, we will do: * Introductions * Syllabus * Lecture 1: Introduction to R, RStudio, and RMarkdown --- # Introductions We'll go around the room and each share our: * Name and preferred pronouns * Program and year * Experience with programming (in R or generally) * Something fun you did over Spring Break --- # Syllabus The syllabus (as well as lots of other information) can be found on our course website: <br> **https://pearce790.github.io/CSSS508** <br> Feel free to follow along online as I run through the syllabus! --- # Course Goals This course is intended to give students a foundational understanding of programming in the statistical language R. General topics include: * Exploring data with graphics and summaries -- * Cleaning, preparing, and linking data for analyses -- * Foundational programming skills such as functions and loops -- * Organizing projects and creating reproducible research We will cover almost no statistics here, but I hope you'll leave being able to focus on <i>statistics instead of coding</i> in future CSSS or STAT courses! --- # Logistics **Sessions: ** * **Lecture:** Wednesdays, 3:30-5:20 (Savery 117) -- Interactive sessions in which we'll learn key skills, concepts, and principles * **Lab:** Mondays, 3:30-5:20 (Savery 117) -- Optional and mostly unstructured sessions to work on homework and review * **Office Hours:** Tuesdays, 9-10am and 3-4pm (on Zoom; link on Canvas) **Course Website:** https://pearce790.github.io/CSSS508 **Contact:** Feel free to email me at [mpp790 at uw dot edu] --- # Schedule * Week 1: Introduction to R, RStudio, and RMarkdown * Week 2: Visualizing Data * Week 3: Manipulating and Summarizing Data * Week 4: Understanding R Data Structures * Week 5: Importing, Exporting, and Cleaning Data * Week 6: Using Loops * Week 7: Writing Functions * Week 8: Working with Text Data * Week 9: Working with Geographical Data * Week 10: Reproducibility and Model Results This course will have no meeting during final exam week. --- # Prereqs, Materials, and Texts **Prerequisites:** None **Materials:** All course materials are provided on the [course website](https://pearce790.github.io/CSSS508). This includes: * These slides and the code used to generate them. * An R script for the slides to follow along in class. * Homework instructions and/or templates * Useful links to other resources. **Laptops:** It's helpful to bring a laptop to class. If you don't have one, you can use the lab computers or borrow one for free from the UW Student Technology Loan Program. **Textbooks:** This course has no textbook. However, the website has links to a few texts which I have found useful! --- # Grading **Final grade:** C/NC, 60% to get Credit. * **Homework** (75%; assessed by peers): 8 total homeworks; assessed on a 0-3 point rubric. Assigned after lectures and due before the following lecture. * **Peer Grading** (25%; assessed by the instructor): One per homework, assessed on a binary "good"/"not good" scale. Due before the following lab. Assignment/peer grading instructions and deadlines can be found on the [Homework](https://pearce790.github.io/CSSS508/homework.html) page of the course website. All homework will be turned in on Canvas. --- # Ugh, peer grading? Yes, because: * You will write your reports better knowing others will see them * You learn alternate approaches to the same problem * You will have more opportunities to practice and have the material sink in -- How to peer review: * Leave **constructive comments**: You'll only get the point if you write at least 1 full paragraph that includes + Any key issues from the assignment and, + Points out something positive in your peer's work. * **Email me** if you would like your assignment to be regraded or provide feedback if no peer review was given. --- # Academic Integrity Academic integrity is essential to this course and to your learning. Violations of the academic integrity policy include but are not limited to: * Copying from a peer * Copying from an online resource * Using resources from a previous iteration of the course. -- I hope you will collaborate with peers on assignments and use Internet resources when questions arise to help solve issues. The key is that you **ultimately submit your own work**. -- Anything found in violation of this policy will be automatically given a score of 0 with no exceptions. If the situation merits, it will also be reported to the UW Student Conduct Office, at which point it is out of my hands. If you have any questions about this policy, please do not hesitate to reach out and ask. --- # Classroom Environment **I am absolutely committed to fostering a friendly and inclusive classroom environment in which all students have an equal opportunity to learn and succeed.** -- * Names & Pronouns: Everyone should be addressed respectfully and correctly. Feel free to send me your preferred name/pronouns anytime. -- * Covid: Covid creates unique circumstances for each of us, which may limit your ability to fully participate in this course. You never need to apologize to me for anything pandemic related. Let me know how I can help! -- * Accessibility & Accomodations: See course website for information on health, disability, and religious accomodations. -- * Feedback: I encourage feedback at any point in the quarter. I will also send out a mid-quarter evaluation around Week 5. -- * Getting Help: If you ever find yourself struggling, know I'm here to help! Try chatting after class, email, or office hours. --- # Asking Questions Don't ask like this: > tried lm(y~x) but it iddn't work wat do -- Instead, ask like this: <blockquote> <pre> y <- seq(1:10) + rnorm(10) x <- seq(0:10) model <- lm(y ~ x) </pre> Running the block above gives me the following error, anyone know why? <pre> Error in model.frame.default(formula = y ~ x, drop.unused.levels = TRUE) : variable lengths differ (found for 'x') </pre> </blockquote> I may send out your question (anonymously) and my answer to the course mailing list! --- # Questions? --- class: inverse # Lecture 1: Introduction to R, RStudio, and RMarkdown --- # A Note on Slide Formatting **Bold** and *Italics* indicate important terms! -- `Code` represents R code you could use to perform actions. For example: "Press `Ctrl-P` to open the print dialogue." -- Code chunks that span the page represent *actual R code embedded in the slides*. ```r *# Sometimes important stuff is highlighted! 7 * 49 ``` ``` ## [1] 343 ``` --- # Why R? R is a programming language built for statistical computing. If one already knows Stata or similar software, why use R? -- * R is *free*. -- * R has a *very* large community. -- * R can handle virtually any data format. -- * R makes replication easy. -- * R is a *language* so it can do *everything*. -- * R skills transfer to other languages like Python and Julia. --- # R Studio R Studio is a "front-end" or integrated development environment (IDE) for R that can make your life *easier*. -- We'll show RStudio can... -- * Organize your code, output, and plots -- * Auto-complete code and highlight syntax -- * Help view data and objects -- * Enable easy integration of R code into documents with **R Markdown** -- It can also... * Manage `git` repositories * Run interactive tutorials * Handle other languages like C++, Python, SQL, HTML, and shell scripting --- # Selling You on R Markdown The ability to create R Markdown files is a powerful advantage of R: -- * Document analyses by combining text, code, and output -- + No copying and pasting into Word -- + Easy for collaborators to understand -- + Show as little or as much code as you want -- * Produce many different document types as output -- + PDF documents + HTML webpages and reports + Word and PowerPoint documents + Presentations (like these slides) -- * Works with LaTeX and HTML for math and more formatting control -- We'll get back to this shortly! --- # Downloading R and RStudio If you don't already have R and RStudio on your machine, now is the time to do so! 1. Go to the course homepage, https://pearce790.github.io/CSSS508 2. Click the *Download R* link and download R to your machine. 3. Afterwards, click the *Download RStudio* link and download RStudio to your machine. We'll take a ~10 minute break now to stretch and solve any software issues! --- # Getting Started Open up RStudio now and choose *File > New File > R Script*. Then, let's get oriented with the interface: * *Top Left*: Code **editor** pane, data viewer (browse with tabs) * *Bottom Left*: **Console** for running code (`>` prompt) * *Top Right*: List of objects in **environment**, code **history** tab. * *Bottom Right*: Tabs for browsing files, viewing plots, managing packages, and viewing help files. --- # Editing and Running Code There are several ways to run R code in RStudio: -- * Highlight lines in the **editor** window and click *Run* at the top or hit `Ctrl+Enter` or `⌘+Enter` to run them all. -- * With your **caret**<sup>1</sup> on a line you want to run, hit `Ctrl+Enter` or `⌘+Enter`. Note your caret moves to the next line, so you can run code sequentially with repeated presses. .footnote[ This thing is the caret: .blink_me[`|`] ] -- * Type individual lines in the **console** and press `Enter`. -- * In R Markdown documents, click within a code chunk and click the green arrow to run the chunk. The button beside that runs *all prior chunks*. -- The console will show the lines you ran followed by any printed output. --- # Incomplete Code If you mess up (e.g. leave off a parenthesis), R might show a `+` sign prompting you to finish the command: ```r > (11-2 + ``` Finish the command or hit `Esc` to get out of this. --- # R as a Calculator In the **console**, type `123 + 456 + 789` and hit `Enter`. -- ```r 123 + 456 + 789 ``` ``` ## [1] 1368 ``` -- The `[1]` in the output indicates the numeric **index** of the first element on that line. -- Now in your blank R document in the **editor**, try typing the line `sqrt(400)` and either clicking *Run* or hitting `Ctrl+Enter` or `⌘+Enter`. -- ```r sqrt(400) ``` ``` ## [1] 20 ``` --- # Functions and Help `sqrt()` is an example of a **function** in R. If we didn't have a good guess as to what `sqrt()` will do, we can type `?sqrt` in the console and look at the **Help** panel on the right. ```r ?sqrt ``` **Arguments** are the *inputs* to a function. In this case, the only argument to `sqrt()` is `x` which can be a number or a vector of numbers. Help files provide documentation on how to use functions and what functions produce. --- # Objects R stores everything as an **object**, including data, functions, models, and output. -- Creating an object can be done using the **assignment operator**: `<-` -- ```r new.object <- 144 ``` -- **Operators** like `<-` are functions that look like symbols but typically sit between their arguments (e.g. numbers or objects) instead of having them inside `()` like in `sqrt(x)`. -- We do math with operators, e.g., `x + y`. `+` is the addition operator! --- # Calling Objects You can display or "call" an object simply by using its name. ```r new.object ``` ``` ## [1] 144 ``` -- Object names can contain `_` and `.` in them but cannot *begin* with numbers. Try to be consistent in naming objects. RStudio auto-complete means *long names are better than vague ones*! *Good names save confusion later!* Object names are **CaSe SeNsItIvE!!** --- # Using Objects An object's **name** represents the information stored in that **object**, so you can treat the object's name as if it were the values stored inside. -- ```r new.object + 10 ``` ``` ## [1] 154 ``` ```r new.object + new.object ``` ``` ## [1] 288 ``` ```r sqrt(new.object) ``` ``` ## [1] 12 ``` --- # Vectors A **vector** is a series of **elements**, such as numbers. -- You can create a vector using the function `c()` which stands for "combine" or "concatenate". -- ```r new.object <- c(4, 9, 16, 25, 36) new.object ``` ``` ## [1] 4 9 16 25 36 ``` -- If you name an object the same name as an existing object, *it will overwrite it*. -- You can provide a vector as an argument for many functions. -- ```r sqrt(new.object) ``` ``` ## [1] 2 3 4 5 6 ``` --- # More Complex Objects There are other, more complex data types in R which we will discuss later in the quarter! These include **matrices**, **arrays**, **lists**, and **dataframes**. Most data sets you will work with will be read into R and stored as a **dataframe**, so this course will mainly focus on manipulating and visualizing these objects. --- class: inverse # R Markdown --- # R Markdown Documents Let's try making an R Markdown file: 1. Choose *File > New File > R Markdown...* 2. Make sure *HTML Output* is selected and click OK 3. Save the file somewhere, call it `my_first_rmd.Rmd` 4. Click the *Knit HTML* button 5. Watch the progress in the R Markdown pane, then gaze upon your result! -- If you ever have trouble knitting your file (especially if creating a PDF), try running the following code in the console: ```r install.packages('rmarkdown') install.packages('tinytex') tinytex::install_tinytex() ``` --- # R Markdown Headers The header of an .Rmd file is a code block, and everything else is part of the main document. -- ```{} --- title: "Untitled" author: "Michael Pearce" date: "March 29, 2023" output: html_document --- ``` -- To mess with global formatting, [you can modify the header](http://rmarkdown.rstudio.com/html_document_format.html)<sup>2</sup>. ```{} output: html_document: theme: readable ``` --- # R Markdown Syntax .pull-left[ ## Output **bold/strong emphasis** *italic/normal emphasis* # Header ## Subheader ### Subsubheader > Block quote from > famous person ] .pull-right[ ## Syntax <pre> **bold/strong emphasis** *italic/normal emphasis* # Header ## Subheader ### Subsubheader > Block quote from > famous person </pre> ] --- # More R Markdown Syntax .pull-left[ ## Output 1. Ordered lists 1. Are real easy 1. Even with sublists 1. Or when lazy with numbering * Unordered lists * Are also real easy + Also even with sublists ] .pull-right[ ## Syntax <pre> 1. Ordered lists 1. Are real easy 1. Even with sublists 1. Or when lazy with numbering * Unordered lists * Are also real easy + Also even with sublists </pre> ] --- # Formulae and Syntax .pull-left[ ## Output Include math `\(y= \left( \frac{2}{3} \right)^2\)`. `$$\frac{1}{n} \sum_{i=1}^{n} x_i = \bar{x}_n$$` Or write `code-looking font`. Or a block of code: <pre> y <- 1:5 z <- y^2 <pre> ] .pull-right[ ## Syntax <pre> Include math $y= \left( \frac{2}{3} \right)^2$. `$$\frac{1}{n} \sum_{i=1}^{n} x_i = \bar{x}_n$$` Or write `code-looking font`. Or a block of code: ``` y <- 1:5 z <- y^2 ``` </pre> ] --- # R Markdown Tinkering R Markdown docs can be modified in many ways. Visit these links for more information. * [Ways to modify the overall document appearance](http://rmarkdown.rstudio.com/html_document_format.html) * [Ways to format parts of your document](http://rmarkdown.rstudio.com/authoring_basics.html) * [R Markdown: The Definitive Guide](https://bookdown.org/yihui/rmarkdown/) --- # R Code in R Markdown Inside RMarkdown, lines of R code are called **chunks**. Code is sandwiched between sets of three backticks and `{r}`. This chunk of code... ```{r} summary(cars) ``` Produces this output in your document: ```r summary(cars) ``` ``` ## speed dist ## Min. : 4.0 Min. : 2.00 ## 1st Qu.:12.0 1st Qu.: 26.00 ## Median :15.0 Median : 36.00 ## Mean :15.4 Mean : 42.98 ## 3rd Qu.:19.0 3rd Qu.: 56.00 ## Max. :25.0 Max. :120.00 ``` --- # Chunk Options Chunks have options that control what happens with their code, such as: * `echo=FALSE`: Keeps R code from being shown in the document * `eval=FALSE`: Shows R code in the document without running it * `include=FALSE`: Hides all output but still runs code (good for `setup` chunks where you load packages!) * `results='hide'`: Hides R's (non-plot) output from the document * `cache=TRUE`: Saves results of running that chunk so if it takes a while, you won't have to re-run it each time you re-knit the document * `fig.height=5, fig.width=5`: modify the dimensions of any plots that are generated in the chunk (units are in inches) * `fig.cap="Text"`: add a caption to your figure in the chunk --- # Playing with Chunk Options Try adding or changing the chunk options (separated by commas) for the two chunks in `my_first_Rmd.Rmd` and re-knitting to check what happens. ```{r echo=FALSE} summary(cars) ``` --- # In-Line R code Sometimes we want to insert a value directly into our text. We do that using code in single backticks starting off with `r`. -- Four score and seven years ago is the same as `r 4*20 + 7` years. -- Four score and seven years ago is the same as 87 years. -- Maybe we've saved a variable in a chunk we want to reference in the text: ```r x <- sqrt(77) # <- is how we assign objects ``` -- The value of `x` rounded to the nearest two decimals is `r round(x, 2)`. -- The value of `x` rounded to the nearest two decimals is 8.77. --- # This is Amazing! Having R dump values directly into your document protects you from silly mistakes: -- * Never wonder "how did I come up with this quantity?" ever again: Just look at your formula in your .Rmd file! -- * Consistency! No "find/replace" mishaps; reference a variable in-line throughout your document without manually updating if the calculation changes (e.g. reporting sample sizes). -- * You are more likely to make a typo in a "hard-coded" number than you are to write R code that somehow runs but gives you the wrong thing. --- # Example: Keeping Dates In your YAML header, make the date come from R's `Sys.time()` function by changing: date: "March 29, 2023" to: date: "`r Sys.time()`" --- class: inverse # Data Frames --- # What's Up with `cars`? In the sample R Markdown document you are working on, we can load the built-in data `cars`, which loads as a dataframe, a type of object mentioned earlier. Then, we can look at it in a couple different ways. -- `data(cars)` loads this dataframe into the **Global Environment**. -- `View(cars)` pops up a **Viewer** pane ("interactive" use only, don't put in R Markdown document!) or... -- ```r head(cars, 5) # prints first 5 rows, see tail() too ``` ``` ## speed dist ## 1 4 2 ## 2 4 10 ## 3 7 4 ## 4 7 22 ## 5 8 16 ``` --- # Tell Me More About `cars` `str()` displays the structure of an object: ```r str(cars) # str[ucture] ``` ``` ## 'data.frame': 50 obs. of 2 variables: ## $ speed: num 4 4 7 7 8 9 10 10 10 11 ... ## $ dist : num 2 10 4 22 16 10 18 26 34 17 ... ``` -- `summary()` displays summary information<sup>1</sup>: ```r summary(cars) ``` ``` ## speed dist ## Min. : 4.0 Min. : 2.00 ## 1st Qu.:12.0 1st Qu.: 26.00 ## Median :15.0 Median : 36.00 ## Mean :15.4 Mean : 42.98 ## 3rd Qu.:19.0 3rd Qu.: 56.00 ## Max. :25.0 Max. :120.00 ``` --- # Ugly Pictures of `cars` `hist()` generates a histogram of a vector. Note you can access a vector that is a column of a dataframe using `$`, the **extract operator**. .pull-left[ ```r hist(cars$speed) # Histogram ``` <img src="CSSS508_Lecture1_RStudio_and_RMarkdown_files/figure-html/unnamed-chunk-8-1.png" height="330px" /> ] .pull-right[ ```r hist(cars$dist) ``` <img src="CSSS508_Lecture1_RStudio_and_RMarkdown_files/figure-html/unnamed-chunk-9-1.png" height="330px" /> ] --- ## Drawing Slightly Less Ugly Pictures ```r hist(cars$dist, xlab = "Distance (ft)", # X axis label main = "Observed stopping distances of cars") # Title ``` <img src="CSSS508_Lecture1_RStudio_and_RMarkdown_files/figure-html/unnamed-chunk-10-1.png" height="400px" /> --- # Math with `cars` ```r dist_mean <- mean(cars$dist) print(dist_mean) ``` ``` ## [1] 42.98 ``` ```r speed_mean <- mean(cars$speed) print(speed_mean) ``` ``` ## [1] 15.4 ``` --- # Drawing Still Ugly Pictures .small[ ```r plot(dist ~ speed, data = cars, xlab = "Speed (mph)", ylab = "Stopping distance (ft)", main = "Speeds and stopping distances of cars", pch = 16) # Point size ``` <img src="CSSS508_Lecture1_RStudio_and_RMarkdown_files/figure-html/unnamed-chunk-12-1.png" height="330px" /> ] --- # `swiss` Time Let's switch gears to the `swiss` data frame built in to R. -- First, use `?swiss` to see what things mean. -- Then, load it using `data(swiss)` -- Add chunks to your R Markdown document inspecting swiss, defining variables, doing some exploratory plots using hist or plot. You might experiment with [colors](http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf) and [shapes](http://www.cookbook-r.com/Graphs/Shapes_and_line_types/). --- ## Looking at `swiss` ```r *pairs(swiss, pch = 8, col = "violet", main = "Pairwise comparisons of Swiss variables") ``` <img src="CSSS508_Lecture1_RStudio_and_RMarkdown_files/figure-html/unnamed-chunk-13-1.png" height="340px" /> .pull-right30[ .footnote[`pairs()` is a pairwise scatterplot function. Good for a quick look at small datasets, but mostly useless for larger data.] ] --- class: inverse # Homework Visit [Homework Page](https://pearce790.github.io/CSSS508/homework.html) for details. Advice: Start with the provided template, then modify! --- # See you on Monday for lab! (I'll stay for a bit after class to chat!)