+ - 0:00:00
Notes for current slide
Notes for next slide

CSSS508, Lecture 7

Functions

Michael Pearce
(based on slides from Chuck Lanfear)

May 10, 2022

1 / 30

Topics

Last time, we learned about,

  1. Why we use loops
  2. for() loops
  3. while() loops
2 / 30

Topics

Last time, we learned about,

  1. Why we use loops
  2. for() loops
  3. while() loops

Today, we will cover,

  1. Aside: Visualizing the Goal
  2. Building blocks of functions
  3. Simple functions
  4. Using functions with apply()
2 / 30

1. Visualizing the Goal

3 / 30

Visualizing the Goal

Before you can write effective code, you need to know exactly what you want:

  • Goal: Do I want a single value? vector? one observation per person? per year?
4 / 30

Visualizing the Goal

Before you can write effective code, you need to know exactly what you want:

  • Goal: Do I want a single value? vector? one observation per person? per year?

  • Current State: What do I currently have? matrix, vector? long or wide format?

4 / 30

Visualizing the Goal

Before you can write effective code, you need to know exactly what you want:

  • Goal: Do I want a single value? vector? one observation per person? per year?

  • Current State: What do I currently have? matrix, vector? long or wide format?

  • Translate: How can I take what I have and turn it into my goal?

    • Sketch out the steps!
    • Break it down into little operations
4 / 30

Visualizing the Goal

Before you can write effective code, you need to know exactly what you want:

  • Goal: Do I want a single value? vector? one observation per person? per year?

  • Current State: What do I currently have? matrix, vector? long or wide format?

  • Translate: How can I take what I have and turn it into my goal?

    • Sketch out the steps!
    • Break it down into little operations

As we become more advanced coders, this concept is key!!

Remember: When you're stuck, try searching your problem on Google!!

4 / 30

2. Building blocks of functions

5 / 30

Why Functions?

R (as well as mathematics in general) is full of functions!

6 / 30

Why Functions?

R (as well as mathematics in general) is full of functions!

We use functions to:

  • Compute summary statistics (mean(), sd(), min())
  • Fit models to data (lm(Fertility~Agriculture,data=swiss))
  • Load data (read_csv())
  • Create ggplots (ggplot())
  • And so much more!!
6 / 30

Examples of Existing Functions

  • mean():
    • Input: a vector
    • Output: a single number
7 / 30

Examples of Existing Functions

  • mean():
    • Input: a vector
    • Output: a single number
  • dplyr::filter():
    • Input: a data frame, logical conditions
    • Output: a data frame with rows removed using those conditions
7 / 30

Examples of Existing Functions

  • mean():
    • Input: a vector
    • Output: a single number
  • dplyr::filter():
    • Input: a data frame, logical conditions
    • Output: a data frame with rows removed using those conditions
  • readr::read_csv():
    • Input: a file path, optionally variable names or types
    • Output: a data frame containing info read in from file
7 / 30

Examples of Existing Functions

  • mean():
    • Input: a vector
    • Output: a single number
  • dplyr::filter():
    • Input: a data frame, logical conditions
    • Output: a data frame with rows removed using those conditions
  • readr::read_csv():
    • Input: a file path, optionally variable names or types
    • Output: a data frame containing info read in from file

Each function requires inputs, and returns outputs

7 / 30

Why Write Your Own Functions?

Functions encapsulate actions you might perform often, such as:

  • Given a vector, compute some special summary stats
  • Given a vector and definition of "invalid" values, replace with NA
  • Defining a new logical operator
8 / 30

Why Write Your Own Functions?

Functions encapsulate actions you might perform often, such as:

  • Given a vector, compute some special summary stats
  • Given a vector and definition of "invalid" values, replace with NA
  • Defining a new logical operator

Advanced function applications (not covered in this class):

  • Parallel processing
  • Generating other functions
  • Making custom packages containing your functions
8 / 30

Anatomy of a Function

NAME <- function(ARGUMENT1, ARGUMENT2=DEFAULT){
BODY
return(OUTPUT)
}
  • Name: What you call the function so you can use it later
9 / 30

Anatomy of a Function

NAME <- function(ARGUMENT1, ARGUMENT2=DEFAULT){
BODY
return(OUTPUT)
}
  • Name: What you call the function so you can use it later

  • Arguments (aka inputs, parameters): things the user passes to the function that affect how it works

    • e.g. ARGUMENT1, ARGUMENT2
    • ARGUMENT2=DEFAULT is example of setting a default value
    • In this example, ARGUMENT1, ARGUMENT2 values won't exist outside of the function
9 / 30

Anatomy of a Function

NAME <- function(ARGUMENT1, ARGUMENT2=DEFAULT){
BODY
return(OUTPUT)
}
  • Name: What you call the function so you can use it later

  • Arguments (aka inputs, parameters): things the user passes to the function that affect how it works

    • e.g. ARGUMENT1, ARGUMENT2
    • ARGUMENT2=DEFAULT is example of setting a default value
    • In this example, ARGUMENT1, ARGUMENT2 values won't exist outside of the function
  • Body: The actual operations inside the function.

9 / 30

Anatomy of a Function

NAME <- function(ARGUMENT1, ARGUMENT2=DEFAULT){
BODY
return(OUTPUT)
}
  • Name: What you call the function so you can use it later

  • Arguments (aka inputs, parameters): things the user passes to the function that affect how it works

    • e.g. ARGUMENT1, ARGUMENT2
    • ARGUMENT2=DEFAULT is example of setting a default value
    • In this example, ARGUMENT1, ARGUMENT2 values won't exist outside of the function
  • Body: The actual operations inside the function.

  • Output: The object inside return(). Could be anything (or nothing!)
    • If unspecified, will be the last thing calculated
9 / 30

3. Simple functions

10 / 30

Example 1: Doubling A Number

double_x <- function(x){
double_x <- x * 2
return(double_x)
}
11 / 30

Example 1: Doubling A Number

double_x <- function(x){
double_x <- x * 2
return(double_x)
}

Let's run it!

double_x(5)
## [1] 10
double_x(NA)
## [1] NA
double_x(1:2)
## [1] 2 4
11 / 30

Example 2: Extract First/Last

first_and_last <- function(x) {
first <- x[1]
last <- x[length(x)]
return(c("first" = first, "last" = last))
}
12 / 30

Example 2: Extract First/Last

first_and_last <- function(x) {
first <- x[1]
last <- x[length(x)]
return(c("first" = first, "last" = last))
}

Test it out:

first_and_last(c(4, 3, 1, 8))
## first last
## 4 8
12 / 30

Example 2: Testing first_and_last

What if I give first_and_last() a vector of length 1?

first_and_last(7)
## first last
## 7 7
13 / 30

Example 2: Testing first_and_last

What if I give first_and_last() a vector of length 1?

first_and_last(7)
## first last
## 7 7

Of length 0?

first_and_last(numeric(0))
## first
## NA
13 / 30

Example 2: Testing first_and_last

What if I give first_and_last() a vector of length 1?

first_and_last(7)
## first last
## 7 7

Of length 0?

first_and_last(numeric(0))
## first
## NA

Maybe we want it to be a little smarter.

13 / 30

Example 3: Checking Inputs

Let's make sure we get an error message when the vector is too small:

smarter_first_and_last <- function(x) {
if(length(x) < 2){
stop("Input is not long enough!")
} else{
first <- x[1]
last <- x[length(x)]
return(c("first" = first, "last" = last))
}
}

stop() ceases running the function and prints the text inside as an error message.

14 / 30

Example 3: Testing Smarter Function

smarter_first_and_last(NA)
## Error in smarter_first_and_last(NA): Input is not long enough!
smarter_first_and_last(c(4, 3, 1, 8))
## first last
## 4 8
15 / 30

Cracking Open Functions

If you type a function name without any parentheses or arguments, you can see its contents:

smarter_first_and_last
## function(x) {
## if(length(x) < 2){
## stop("Input is not long enough!")
## } else{
## first <- x[1]
## last <- x[length(x)]
## return(c("first" = first, "last" = last))
## }
## }
## <bytecode: 0x7fb785f379b0>
16 / 30

4. Using functions with apply()

17 / 30

Applying Functions Multiple Times?

Last week, we saw an example where we wanted to take the mean of each column in the swiss data:

for(col_index in 1:ncol(swiss)){
mean_swiss_col <- mean(swiss[,col_index])
names_swiss_col <- names(swiss)[col_index]
print(c(names_swiss_col,round(mean_swiss_col,3)))
}
## [1] "Fertility" "70.143"
## [1] "Agriculture" "50.66"
## [1] "Examination" "16.489"
## [1] "Education" "10.979"
## [1] "Catholic" "41.144"
## [1] "Infant.Mortality" "19.943"

Isn't this kind of complex?!

18 / 30

apply(), don't loop!

Writing loops can be challenging and prone to bugs!!

19 / 30

apply(), don't loop!

Writing loops can be challenging and prone to bugs!!

The apply() can solve this issue:

  • apply a function to values in each row or column of a matrix
  • Doesn't require preallocation
  • Can take built-in functions or user-created functions.
19 / 30

Structure of apply()

apply() takes 3 arguments:

  1. Data (a matrix or data frame)
  2. Margin (1 applies function to each row, 2 applies to each column)
  3. Function
apply(DATA, MARGIN, FUNCTION)
20 / 30

Structure of apply()

apply() takes 3 arguments:

  1. Data (a matrix or data frame)
  2. Margin (1 applies function to each row, 2 applies to each column)
  3. Function
apply(DATA, MARGIN, FUNCTION)

For example,

apply(swiss, 2, mean)
## Fertility Agriculture Examination Education
## 70.14255 50.65957 16.48936 10.97872
## Catholic Infant.Mortality
## 41.14383 19.94255
20 / 30

Example 1

row_max <- apply(swiss,1,max) #maximum in each row
head(row_max,20)
## Courtelary Delemont Franches-Mnt Moutier Neuveville
## 80.20 84.84 93.40 85.80 76.90
## Porrentruy Broye Glane Gruyere Sarine
## 90.57 92.85 97.16 97.67 91.38
## Veveyse Aigle Aubonne Avenches Cossonay
## 98.61 64.10 67.50 68.90 69.30
## Echallens Grandson Lausanne La Vallee Lavaux
## 72.60 71.70 55.70 54.30 73.00
21 / 30

Example 2

apply(swiss,2,summary) # summary of each column
## Fertility Agriculture Examination Education Catholic
## Min. 35.00000 1.20000 3.00000 1.00000 2.15000
## 1st Qu. 64.70000 35.90000 12.00000 6.00000 5.19500
## Median 70.40000 54.10000 16.00000 8.00000 15.14000
## Mean 70.14255 50.65957 16.48936 10.97872 41.14383
## 3rd Qu. 78.45000 67.65000 22.00000 12.00000 93.12500
## Max. 92.50000 89.70000 37.00000 53.00000 100.00000
## Infant.Mortality
## Min. 10.80000
## 1st Qu. 18.15000
## Median 20.00000
## Mean 19.94255
## 3rd Qu. 21.70000
## Max. 26.60000

*Note: Matrix output!

22 / 30

Example 3: User-Created Function

scores <- matrix(1:21,nrow=3)
print(scores)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 1 4 7 10 13 16 19
## [2,] 2 5 8 11 14 17 20
## [3,] 3 6 9 12 15 18 21
my_function <- function(x){ mean(x+10,na.rm=T) }
apply(scores,1,my_function)
## [1] 20 21 22
23 / 30

Activity: Writing A Function

In Olympic diving, a panel of 7 judges provide scores. After removing the worst and best scores, the mean of the remaining scores is given to the diver. We'll write code to calculate this score!

  1. Suppose I get you a vector, x, of length 7. Write code that will sort the vector from least to greatest, then keep the 2nd-6th elements. (HINT: Use the sort() function and square brackets [ ] for subsetting).

  2. Write a function to calculate a diver's score:

    • Input: Vector of length 7
    • Checks: Check that the vector has length 7 (if not, stop!)
    • Output: Mean score after removing the lowest and greatest scores.
  3. Calculate the diver's score given x <- c(2,1:5,3)

24 / 30

Activity: My Solution

  1. Sort and xtract elements 2 through 6:

    • Answer: Given vector x, use sort(x)[2:6]
25 / 30

Activity: My Solution

  1. Sort and xtract elements 2 through 6:

    • Answer: Given vector x, use sort(x)[2:6]
  2. Function

divers_score <- function(x){
if(length(x) != 7){
stop("x is not of length 7!")
} else{
x_nofirst_nolast <- sort(x)[2:6]
return(mean(x_nofirst_nolast))
}
}
25 / 30

Activity: My Solution

  1. Sort and xtract elements 2 through 6:

    • Answer: Given vector x, use sort(x)[2:6]
  2. Function

divers_score <- function(x){
if(length(x) != 7){
stop("x is not of length 7!")
} else{
x_nofirst_nolast <- sort(x)[2:6]
return(mean(x_nofirst_nolast))
}
}
  1. Calculate the diver's score given x <- c(2,1:5,3)
divers_score(x = c(2,1:5,3) )
## [1] 2.8
25 / 30

Activity

These are homework questions!!

  1. Preallocate a matrix of NAs with 3 rows and 8 columns, called double_matrix. Manually specify the first column equal to the values 1, 2, and 3. Using a nested loop, fill in the matrix, row by row, such that each value is double that to its left.

  2. Write an apply() function to take the median value of each column in the cars dataset

  3. Using ggplot, make a scatterplot of the speed and dist variables in cars. Then, add an appropriate horizontal and vertical line symbolizing the median value of each variable.

Hint: Using the layers geom_vline(xintercept = ) and geom_hline(yintercept = )

26 / 30

My Answers

  1. Preallocate a matrix of NAs with 3 rows and 8 columns, called double_matrix. Manually specify the first column equal to the values 1, 2, and 3. Using a nested loop, fill in the matrix, row by row, such that each value is double that to its left.
double_matrix <- matrix(NA,nrow=3,ncol=8)
double_matrix[,1] <- 1:3
for(row in 1:3){
for(col in 2:8){
double_matrix[row,col] <- double_matrix[row,col-1]*2
}
}
double_matrix
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,] 1 2 4 8 16 32 64 128
## [2,] 2 4 8 16 32 64 128 256
## [3,] 3 6 12 24 48 96 192 384
27 / 30

My Answers

2. Write an apply() function to take the median value of each column in the cars dataset

median_cars <- apply(cars,2,median)
median_cars
## speed dist
## 15 36
28 / 30

My Answers

3. Make a ggplot

library(ggplot2)
ggplot(cars,aes(speed,dist))+geom_point()+
geom_vline(xintercept = median_cars[1])+
geom_hline(yintercept = median_cars[2])

29 / 30

Homework

Time to work on Homework 7!

30 / 30

Topics

Last time, we learned about,

  1. Why we use loops
  2. for() loops
  3. while() loops
2 / 30
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow