@annakrystalli | annakrystalli@googlemail.com
Functions used to:
usually:
INPUT
OUTPUT
provides many tools for the creation and manipulation of functions.
“To understand computations in R, two slogans are helpful:
- Everything that exists is an object.
- Everything that happens is a function call."
John Chambers
even +
and indexing through [ ]
!
`+`(3,4)
## [1] 7
`[`(1:10, 1)
## [1] 1
You can do anything with functions that you can do with vectors:
Basic structure
function(arglist){body}
TIP: in Rstudio, typing the fun
snippet inserts an R function definition:
name <- function(variables) {
}
Just try typing fun
Let’s write a function that will calculate the standard deviation of the values in a vector x
.
std.dev <- function(x){
n <- length(x)
xbar <- sum(x)/n
diff <- x - xbar
sum.sq <- sum(diff^2)
var <- sum.sq / (n-1)
sqrt(var)
}
s.d <- std.dev(1:10)
s.d
## [1] 3.02765
xbar
## Error in eval(expr, envir, enclos): object 'xbar' not found
This can be any valid variable name, but you should avoid using names that are used elsewhere in R, such as dir
, function
, plot
, etc
? function.name
(you can access a function from a specific package using package.name::function.name
)
Functions can have any number of arguments. These can be any R object: numbers, strings, arrays, data frames, of even pointers to other functions; anything that is needed for the function.name function to run.
formals(std.dev)
## $x
The ...
, or ellipsis, element in the definition of a function allows for other arguments to be passed into the function, and passed onto to another function.
ellipsis_example <- function(x, ...) {
input_list <- list(...)
output_list <- lapply(X=input_list, summary)
return(output_list)
}
ellipsis_example(x = 1, a=1:10,b=11:20,c=21:30)
## $a
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 3.25 5.50 5.50 7.75 10.00
##
## $b
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 11.00 13.25 15.50 15.50 17.75 20.00
##
## $c
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 21.00 23.25 25.50 25.50 27.75 30.00
The function code between the {}
brackets is run every time the function is called. Ideally functions are short and do just one thing.
body(std.dev)
## {
## n <- length(x)
## xbar <- sum(x)/n
## diff <- x - xbar
## sum.sq <- sum(diff^2)
## var <- sum.sq/(n - 1)
## sqrt(var)
## }
The last line of the code is the value that will be returned by the function. It is not necessary that a function return anything, for example a function that makes a plot might not return anything, whereas a function that does a mathematical operation might return a number, or a list.
All arguments required for computation must be supplied.
missing arguments not required for computation are fine:
f1 <- function(a, b, c){a + b}
f1(a = 10 , b = 20)
## [1] 30
objects required by function will be sought first in the local environment. If an argument specified in the function is missing, it will return an error, even if such an object exists in the global environment.
b <- 10
f1 <- function(a, b){a + b}
f1(a = 10)
## Error in f1(a = 10): argument "b" is missing, with no default
f1(a = 10 , b = 20)
## [1] 30
Objects required by computation but not specified as function arguments will be sought in the containing environment iteratively until it reaches the global environment.
b <- 10
f2 <- function(a){a + b}
f2(a = 10)
## [1] 20
rm(b) # remove object b
f2(a = 10)
## Error in f2(a = 10): object 'b' not found
Default values for arguments can also be supplied when writing the functions
f3 <- function(a, b = 20){a + b}
f3(a = 10)
## [1] 30
f3(a = 10, b = 10)
## [1] 20
...
A function by default returns the last ‘thing’ evaluated
f4 <- function(x){x + 10}
f4(10) # returns a value
## [1] 20
f5 <- function(x){
y <- x + 10}
f5(10) # return an object which needs to be assigned
z <- f5(10)
z
## [1] 20
f6 <- function(x){
y <- x + 10
return(y)}
f6(10) # returns a value
## [1] 20
v <- f6(10)
v
## [1] 20
I generally advise you always use return()
to specify the outputs of your functions.
You can also use it conditionally to return different values or hault evaluation and return back to the calling environment:
f7 <- function(x) {
y <- x - 10
if(y < 0){return(NA)}else{
y <- 2 * y
return(y)}
}
f7(20)
## [1] 20
f7(5)
## [1] NA
If you want to return multiple values/objects, you can collect objects created within a function in a list. Let’s say instead of just the std.dev, we also wanted the mean (xbar
) and n
returned. No problem. Just collect them in a list.
std.dev <- function(x){
n <- length(x)
xbar <- sum(x)/n
diff <- x - xbar
sum.sq <- sum(diff^2)
var <- sum.sq / (n-1)
s.d <- sqrt(var)
return(list(s.d, xbar, n))
}
std.dev(1:10)
## [[1]]
## [1] 3.02765
##
## [[2]]
## [1] 5.5
##
## [[3]]
## [1] 10
lm
are, a list.
Suppose you had a list of function arguments:
args <- list(c(1:10, NA), na.rm = TRUE)
You could you then send that list to mean()
by using do.call()
:
do.call(mean, list(1:10, na.rm = TRUE))
## [1] 5.5
If you have a long workflow of computations:
dplyr
and the pipe shorthand %>%
to set up function pipelineinstall.packages("dplyr")
Say I want to prepare a vector of values by removing NAs, scaling and centering the values. First I create a list containg the vector of values as well as an element to track the status of the process.
require(dplyr)
l <- list(x = c(1:10, NA), status = NULL)
Then I write three functions that will receive and return the list.
rmNAs <- function(X){
X$x <- na.omit(X$x)
X$status <- c(X$status, "NAs_removed")
return(X)
}
scaleVector <- function(X){
X$x <- scale(X$x, scale = T)
X$status <- c(X$status, "scaled")
return(X)
}
centerVector <- function(X){
X$x <- scale(X$x, center = T)
X$status <- c(X$status, "centered")
return(X)
}
Then I set up a pipeline with the functions and pass the vector through:
l %>% rmNAs() %>% scaleVector %>% centerVector
## $x
## [,1]
## [1,] -1.4863011
## [2,] -1.1560120
## [3,] -0.8257228
## [4,] -0.4954337
## [5,] -0.1651446
## [6,] 0.1651446
## [7,] 0.4954337
## [8,] 0.8257228
## [9,] 1.1560120
## [10,] 1.4863011
## attr(,"scaled:center")
## [1] 0
## attr(,"scaled:scale")
## [1] 1
##
## $status
## [1] "NAs_removed" "scaled" "centered"
more on pipelines: https://rpubs.com/tjmahr/pipelines_2015
If you just want to use a function once, you don’t have to name it:
(function(x) x * 10)(10)
## [1] 100
Anonymous functions can be particularly useful in conjunction with vectorising functions like lapply()
. Here’s an unammed function for calculating the mean of a vector x
. In the following example, the input x
to the function is each element of the list l
.
l <- list(1:5, 5:7)
lapply(l, FUN = function(x){sum(x)/length(x)})
## [[1]]
## [1] 3
##
## [[2]]
## [1] 6
I often write functions associated with a particular project.
I will save all the functions in a separate "project.name_functions.R"
script. I’ll then call that script to make all the functions available to my workflow.
source("project.name_functions.R")
Remember to document details of your functions!
Let’s start with another example of a function:
getY <- function(X.matrix, b.vec, a.scalar) {
# multiply the matrix by the vector using %*% operator
Xb.prod <- X.matrix %*% b.vec
# multiply the two resulting objects together to get a final object
y <- Xb.prod * a.scalar
# return the result
return(y)
}
Clearly this function requires that the length of the vector and the number of columns in the matrix match.
Let’s make some objects to pass to the function:
mat <- cbind(c(1, 3, 4), c(5, 4, 3))
vec <- c(4, 3)
getY(mat, vec, 3)
## [,1]
## [1,] 57
## [2,] 72
## [3,] 75
In this case the function works because the number of columns of matrix mat
(2) and the length of vec
(2) match.
We can adapt the function and use the print()
function to print off values of interest, for example the dimensions of arguments of concern
getY <- function(X.matrix, b.vec, a.scalar) {
# print diagnostics
print(dim(X.matrix))
print(length(b.vec))
# multiply the matrix by the vector using %*% operator
Xb.prod <- X.matrix %*% b.vec
# multiply the two resulting objects together to get final y
y <- Xb.prod * a.scalar
return(y)
}
getY(mat, vec, 3)
## [1] 3 2
## [1] 2
## [,1]
## [1,] 57
## [2,] 72
## [3,] 75
When you have an error, one thing you can do is use R’s built-in debugger debug() to find at what point the error occurs.
debug(getY)
getY(X.matrix = mat, b.vec = c(2, 3, 6, 4, 1), a.scalar = 9)
## debugging in: getY(X.matrix = mat, b.vec = c(2, 3, 6, 4, 1), a.scalar = 9)
## debug at <text>#1: {
## print(dim(X.matrix))
## print(length(b.vec))
## Xb.prod <- X.matrix %*% b.vec
## y <- Xb.prod * a.scalar
## return(y)
## }
## debug at <text>#4: print(dim(X.matrix))
## [1] 3 2
## debug at <text>#5: print(length(b.vec))
## [1] 5
## debug at <text>#8: Xb.prod <- X.matrix %*% b.vec
## Error in X.matrix %*% b.vec: non-conformable arguments
To ensure functions run smoothly you can use functions stop()
or stopifnot()
to hault the execution of functions should specific conditions not be met and flag with an appropriate and informative error
stop()
stops execution and returns error message. Usually used with conditional statements
f1 <- function(x) {
if(!is.numeric(x)){stop("x is not numeric")}else{
return(2*x)
}
}
f1(10)
## [1] 20
f1("a")
## Error in f1("a"): x is not numeric
stopifnot()
tests the conditional statements supplied as arguments and stops execution if any return FALSE
f1 <- function(x) {
stopifnot(is.numeric(x))
return(2*x)
}
f1(10)
## [1] 20
f1("a")
## Error: is.numeric(x) is not TRUE
Each function should be easy to test, then you can “freeze” it. Write test cases, which can be automatically checked. - Unit Testing in R: The Bare Minimum
res <- some.computation(par1, par2, par3)
plot(res)
( )
std.dev
## function(x){
## n <- length(x)
## xbar <- sum(x)/n
## diff <- x - xbar
## sum.sq <- sum(diff^2)
## var <- sum.sq / (n-1)
## s.d <- sqrt(var)
##
## return(list(s.d, xbar, n))
## }
source: https://www.r-bloggers.com/functions-exercises/
Exercise 1 Create a function that will return the sum of 2 integers.
Exercise 2 Create a function what will return TRUE if a given integer is inside a vector.
Exercise 3 Create a function that given a data frame will print by screen the name of the column and the class of data it contains (e.g. Variable1 is Numeric).
Exercise 4 Create the function unique, which given a vector will return a new vector with the elements of the first vector with duplicated elements removed.
Exercise 5 Create a function that given a vector and an integer will return how many times the integer appears inside the vector.
Exercise 6 Create a function that given a vector will print by screen the mean and the standard deviation, it will optionally also print the median.
Exercise 7 Create a function that given an integer will calculate how many divisors it has (other than 1 and itself). Make the divisors appear by screen.
Exercise 8 Create a function that given a data frame, and a number or character will return the data frame with the character or number changed to NA.