Last time, we learned about,
for()
loopswhile()
loopsLast time, we learned about,
for()
loopswhile()
loopsToday, we will cover,
apply()
Before you can write effective code, you need to know exactly what you want:
Before you can write effective code, you need to know exactly what you want:
Goal: Do I want a single value? vector? one observation per person? per year?
Current State: What do I currently have? matrix, vector? long or wide format?
Before you can write effective code, you need to know exactly what you want:
Goal: Do I want a single value? vector? one observation per person? per year?
Current State: What do I currently have? matrix, vector? long or wide format?
Translate: How can I take what I have and turn it into my goal?
Before you can write effective code, you need to know exactly what you want:
Goal: Do I want a single value? vector? one observation per person? per year?
Current State: What do I currently have? matrix, vector? long or wide format?
Translate: How can I take what I have and turn it into my goal?
As we become more advanced coders, this concept is key!!
Remember: When you're stuck, try searching your problem on Google!!
R (as well as mathematics in general) is full of functions!
R (as well as mathematics in general) is full of functions!
We use functions to:
mean()
, sd()
, min()
)lm(Fertility~Agriculture,data=swiss)
)read_csv()
)ggplot()
)mean()
:mean()
:dplyr::filter()
:mean()
:dplyr::filter()
:readr::read_csv()
:mean()
:dplyr::filter()
:readr::read_csv()
:Each function requires inputs, and returns outputs
Functions encapsulate actions you might perform often, such as:
NA
Functions encapsulate actions you might perform often, such as:
NA
Advanced function applications (not covered in this class):
NAME <- function(ARGUMENT1, ARGUMENT2=DEFAULT){ BODY return(OUTPUT)}
NAME <- function(ARGUMENT1, ARGUMENT2=DEFAULT){ BODY return(OUTPUT)}
Name: What you call the function so you can use it later
Arguments (aka inputs, parameters): things the user passes to the function that affect how it works
ARGUMENT1
, ARGUMENT2
ARGUMENT2=DEFAULT
is example of setting a default valueARGUMENT1
, ARGUMENT2
values won't exist outside of the functionNAME <- function(ARGUMENT1, ARGUMENT2=DEFAULT){ BODY return(OUTPUT)}
Name: What you call the function so you can use it later
Arguments (aka inputs, parameters): things the user passes to the function that affect how it works
ARGUMENT1
, ARGUMENT2
ARGUMENT2=DEFAULT
is example of setting a default valueARGUMENT1
, ARGUMENT2
values won't exist outside of the functionBody: The actual operations inside the function.
NAME <- function(ARGUMENT1, ARGUMENT2=DEFAULT){ BODY return(OUTPUT)}
Name: What you call the function so you can use it later
Arguments (aka inputs, parameters): things the user passes to the function that affect how it works
ARGUMENT1
, ARGUMENT2
ARGUMENT2=DEFAULT
is example of setting a default valueARGUMENT1
, ARGUMENT2
values won't exist outside of the functionBody: The actual operations inside the function.
return()
. Could be anything (or nothing!)double_x <- function(x){ double_x <- x * 2 return(double_x)}
double_x <- function(x){ double_x <- x * 2 return(double_x)}
Let's run it!
double_x(5)
## [1] 10
double_x(NA)
## [1] NA
double_x(1:2)
## [1] 2 4
first_and_last <- function(x) { first <- x[1] last <- x[length(x)] return(c("first" = first, "last" = last))}
first_and_last <- function(x) { first <- x[1] last <- x[length(x)] return(c("first" = first, "last" = last))}
Test it out:
first_and_last(c(4, 3, 1, 8))
## first last ## 4 8
first_and_last
What if I give first_and_last()
a vector of length 1?
first_and_last(7)
## first last ## 7 7
first_and_last
What if I give first_and_last()
a vector of length 1?
first_and_last(7)
## first last ## 7 7
Of length 0?
first_and_last(numeric(0))
## first ## NA
first_and_last
What if I give first_and_last()
a vector of length 1?
first_and_last(7)
## first last ## 7 7
Of length 0?
first_and_last(numeric(0))
## first ## NA
Maybe we want it to be a little smarter.
Let's make sure we get an error message when the vector is too small:
smarter_first_and_last <- function(x) { if(length(x) < 2){ stop("Input is not long enough!") } else{ first <- x[1] last <- x[length(x)] return(c("first" = first, "last" = last)) }}
stop()
ceases running the function and prints the text inside as an error message.
smarter_first_and_last(NA)
## Error in smarter_first_and_last(NA): Input is not long enough!
smarter_first_and_last(c(4, 3, 1, 8))
## first last ## 4 8
If you type a function name without any parentheses or arguments, you can see its contents:
smarter_first_and_last
## function(x) {## if(length(x) < 2){## stop("Input is not long enough!")## } else{## first <- x[1]## last <- x[length(x)]## return(c("first" = first, "last" = last)) ## }## }## <bytecode: 0x7fb785f379b0>
apply()
Last week, we saw an example where we wanted to take the mean of each column in the swiss
data:
for(col_index in 1:ncol(swiss)){ mean_swiss_col <- mean(swiss[,col_index]) names_swiss_col <- names(swiss)[col_index] print(c(names_swiss_col,round(mean_swiss_col,3)))}
## [1] "Fertility" "70.143" ## [1] "Agriculture" "50.66" ## [1] "Examination" "16.489" ## [1] "Education" "10.979" ## [1] "Catholic" "41.144" ## [1] "Infant.Mortality" "19.943"
Isn't this kind of complex?!
apply()
, don't loop!Writing loops can be challenging and prone to bugs!!
apply()
, don't loop!Writing loops can be challenging and prone to bugs!!
The apply()
can solve this issue:
apply()
apply()
takes 3 arguments:
apply(DATA, MARGIN, FUNCTION)
apply()
apply()
takes 3 arguments:
apply(DATA, MARGIN, FUNCTION)
For example,
apply(swiss, 2, mean)
## Fertility Agriculture Examination Education ## 70.14255 50.65957 16.48936 10.97872 ## Catholic Infant.Mortality ## 41.14383 19.94255
row_max <- apply(swiss,1,max) #maximum in each rowhead(row_max,20)
## Courtelary Delemont Franches-Mnt Moutier Neuveville ## 80.20 84.84 93.40 85.80 76.90 ## Porrentruy Broye Glane Gruyere Sarine ## 90.57 92.85 97.16 97.67 91.38 ## Veveyse Aigle Aubonne Avenches Cossonay ## 98.61 64.10 67.50 68.90 69.30 ## Echallens Grandson Lausanne La Vallee Lavaux ## 72.60 71.70 55.70 54.30 73.00
apply(swiss,2,summary) # summary of each column
## Fertility Agriculture Examination Education Catholic## Min. 35.00000 1.20000 3.00000 1.00000 2.15000## 1st Qu. 64.70000 35.90000 12.00000 6.00000 5.19500## Median 70.40000 54.10000 16.00000 8.00000 15.14000## Mean 70.14255 50.65957 16.48936 10.97872 41.14383## 3rd Qu. 78.45000 67.65000 22.00000 12.00000 93.12500## Max. 92.50000 89.70000 37.00000 53.00000 100.00000## Infant.Mortality## Min. 10.80000## 1st Qu. 18.15000## Median 20.00000## Mean 19.94255## 3rd Qu. 21.70000## Max. 26.60000
*Note: Matrix output!
scores <- matrix(1:21,nrow=3)print(scores)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]## [1,] 1 4 7 10 13 16 19## [2,] 2 5 8 11 14 17 20## [3,] 3 6 9 12 15 18 21
my_function <- function(x){ mean(x+10,na.rm=T) }apply(scores,1,my_function)
## [1] 20 21 22
In Olympic diving, a panel of 7 judges provide scores. After removing the worst and best scores, the mean of the remaining scores is given to the diver. We'll write code to calculate this score!
Suppose I get you a vector, x
, of length 7. Write code that will sort the vector from least to greatest, then keep the 2nd-6th elements. (HINT: Use the sort()
function and square brackets [ ]
for subsetting).
Write a function to calculate a diver's score:
Calculate the diver's score given x <- c(2,1:5,3)
Sort and xtract elements 2 through 6:
x
, use sort(x)[2:6]
Sort and xtract elements 2 through 6:
x
, use sort(x)[2:6]
Function
divers_score <- function(x){ if(length(x) != 7){ stop("x is not of length 7!") } else{ x_nofirst_nolast <- sort(x)[2:6] return(mean(x_nofirst_nolast)) }}
Sort and xtract elements 2 through 6:
x
, use sort(x)[2:6]
Function
divers_score <- function(x){ if(length(x) != 7){ stop("x is not of length 7!") } else{ x_nofirst_nolast <- sort(x)[2:6] return(mean(x_nofirst_nolast)) }}
x <- c(2,1:5,3)
divers_score(x = c(2,1:5,3) )
## [1] 2.8
These are homework questions!!
Preallocate a matrix of NAs with 3 rows and 8 columns, called double_matrix
. Manually specify the first column equal to the values 1, 2, and 3. Using a nested loop, fill in the matrix, row by row, such that each value is double that to its left.
Write an apply()
function to take the median value of each column in the cars
dataset
Using ggplot
, make a scatterplot of the speed
and dist
variables in cars
. Then, add an appropriate horizontal and vertical line symbolizing the median value of each variable.
Hint: Using the layers geom_vline(xintercept = )
and geom_hline(yintercept = )
double_matrix
. Manually specify the first column equal to the values 1, 2, and 3. Using a nested loop, fill in the matrix, row by row, such that each value is double that to its left.double_matrix <- matrix(NA,nrow=3,ncol=8)double_matrix[,1] <- 1:3for(row in 1:3){ for(col in 2:8){ double_matrix[row,col] <- double_matrix[row,col-1]*2 }}double_matrix
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]## [1,] 1 2 4 8 16 32 64 128## [2,] 2 4 8 16 32 64 128 256## [3,] 3 6 12 24 48 96 192 384
2. Write an apply()
function to take the median value of each column in the cars
dataset
median_cars <- apply(cars,2,median)median_cars
## speed dist ## 15 36
3. Make a ggplot
library(ggplot2)ggplot(cars,aes(speed,dist))+geom_point()+ geom_vline(xintercept = median_cars[1])+ geom_hline(yintercept = median_cars[2])
Time to work on Homework 7!
Last time, we learned about,
for()
loopswhile()
loopsKeyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |