**@annakrystalli** | **annakrystalli@googlemail.com**

Functions used to:

- incorporate sets of instructions that we want to use repeatedly
- contain complex code in a neat sub-program
- reduce opportunity for errors
- make code more readable

usually:

- accepts parameters (arguments) <-
`INPUT`

- returns value(s) <-
`OUTPUT`

**provides many tools for the creation and manipulation of functions.**

“To understand computations in R, two slogans are helpful:

- Everything that exists is an object.
- Everything that happens is a function call."
John Chambers

even `+`

and indexing through `[ ]`

!

``+`(3,4)`

`## [1] 7`

``[`(1:10, 1)`

`## [1] 1`

**You can do anything with functions that you can do with vectors:**

- assign them to variables
- store them in lists
- pass them as arguments to other functions
- create them inside functions
- return them as the result of a function

**Basic structure**

`function(arglist){body}`

*TIP: in Rstudio, typing the fun snippet inserts an R function definition:*

```
name <- function(variables) {
}
```

Just try typing `fun`

Let’s write a function that will **calculate the standard deviation** of the **values in a vector x**.

```
std.dev <- function(x){
n <- length(x)
xbar <- sum(x)/n
diff <- x - xbar
sum.sq <- sum(diff^2)
var <- sum.sq / (n-1)
sqrt(var)
}
```

- every time a function is called, a
**new environment**is created to host execution. - each invocation is
**completely independent of previous ones** - variables used within are
, e.g. their scope lies within - and is limited to - the function itself. They are therefore*local***invisible outside the function body**

```
s.d <- std.dev(1:10)
s.d
```

`## [1] 3.02765`

`xbar`

`## Error in eval(expr, envir, enclos): object 'xbar' not found`

This can be any valid variable name, but you should avoid using names that are used elsewhere in R, such as `dir`

, `function`

, `plot`

, etc

- choose descriptive names
- use verbs
- check whether they are already in use:
`? function.name`

(you can access a function from a specific package using `package.name::function.name`

)

Functions can have **any number of arguments**. These can be **any R object:** numbers, strings, arrays, data frames, of even pointers to other functions; anything that is needed for the function.name function to run.

- Again, use descriptive names for arguments

`formals(std.dev)`

`## $x`

The ** ...**, or

```
ellipsis_example <- function(x, ...) {
input_list <- list(...)
output_list <- lapply(X=input_list, summary)
return(output_list)
}
ellipsis_example(x = 1, a=1:10,b=11:20,c=21:30)
```

```
## $a
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 3.25 5.50 5.50 7.75 10.00
##
## $b
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 11.00 13.25 15.50 15.50 17.75 20.00
##
## $c
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 21.00 23.25 25.50 25.50 27.75 30.00
```

The function code between the `{}`

brackets is run every time the function is called. Ideally functions are short and do just one thing.

`body(std.dev)`

```
## {
## n <- length(x)
## xbar <- sum(x)/n
## diff <- x - xbar
## sum.sq <- sum(diff^2)
## var <- sum.sq/(n - 1)
## sqrt(var)
## }
```

The last line of the code is the value that will be returned by the function. It is not necessary that a function return anything, for example a function that makes a plot might not return anything, whereas a function that does a mathematical operation might return a number, or a list.

All arguments required for computation must be supplied.

missing arguments not required for computation are fine:

```
f1 <- function(a, b, c){a + b}
f1(a = 10 , b = 20)
```

`## [1] 30`

objects required by function will be sought first in the ** local environment**. If an argument specified in the function is missing, it will return an error, even if such an object exists in the global environment.

```
b <- 10
f1 <- function(a, b){a + b}
f1(a = 10)
```

`## Error in f1(a = 10): argument "b" is missing, with no default`

`f1(a = 10 , b = 20)`

`## [1] 30`

Objects required by computation but not specified as function arguments will be sought in the containing environment iteratively until it reaches the ** global environment**.

```
b <- 10
f2 <- function(a){a + b}
f2(a = 10)
```

`## [1] 20`

```
rm(b) # remove object b
f2(a = 10)
```

`## Error in f2(a = 10): object 'b' not found`

Default values for arguments can also be supplied when writing the functions

```
f3 <- function(a, b = 20){a + b}
f3(a = 10)
```

`## [1] 30`

`f3(a = 10, b = 10)`

`## [1] 20`

- can be a bit dangerous depending on objects from outside the function environment.
- consider using
`...`

A function by default returns the last ‘thing’ evaluated

```
f4 <- function(x){x + 10}
f4(10) # returns a value
```

`## [1] 20`

```
f5 <- function(x){
y <- x + 10}
f5(10) # return an object which needs to be assigned
z <- f5(10)
z
```

`## [1] 20`

```
f6 <- function(x){
y <- x + 10
return(y)}
f6(10) # returns a value
```

`## [1] 20`

```
v <- f6(10)
v
```

`## [1] 20`

I generally advise you always use `return()`

to specify the outputs of your functions.

You can also use it conditionally to return different values or hault evaluation and return back to the calling environment:

```
f7 <- function(x) {
y <- x - 10
if(y < 0){return(NA)}else{
y <- 2 * y
return(y)}
}
f7(20)
```

`## [1] 20`

`f7(5)`

`## [1] NA`

If you want to **return multiple values/objects**, you can collect objects created within a function in a **list**. Let’s say instead of just the std.dev, we also wanted the mean (`xbar`

) and `n`

returned. No problem. Just collect them in a list.

```
std.dev <- function(x){
n <- length(x)
xbar <- sum(x)/n
diff <- x - xbar
sum.sq <- sum(diff^2)
var <- sum.sq / (n-1)
s.d <- sqrt(var)
return(list(s.d, xbar, n))
}
std.dev(1:10)
```

```
## [[1]]
## [1] 3.02765
##
## [[2]]
## [1] 5.5
##
## [[3]]
## [1] 10
```

- Lists can collect diverse and complex outputs.
- This is basically what outputs of function like
`lm`

are, a list.

Suppose you had a list of function arguments:

`args <- list(c(1:10, NA), na.rm = TRUE)`

You could you then send that list to `mean()`

by using `do.call()`

:

`do.call(mean, list(1:10, na.rm = TRUE))`

`## [1] 5.5`

If you have a long workflow of computations:

- break it up into logical blocks
- write function for each
- write functions so that output from one is the first argument to the next
- use package
`dplyr`

and the**pipe**shorthand`%>%`

to set up**function pipeline**

`install.packages("dplyr")`

Say I want to prepare a vector of values by removing NAs, scaling and centering the values. First I create a list containg the vector of values as well as an element to track the status of the process.

```
require(dplyr)
l <- list(x = c(1:10, NA), status = NULL)
```

Then I write three functions that will receive and return the list.

```
rmNAs <- function(X){
X$x <- na.omit(X$x)
X$status <- c(X$status, "NAs_removed")
return(X)
}
scaleVector <- function(X){
X$x <- scale(X$x, scale = T)
X$status <- c(X$status, "scaled")
return(X)
}
centerVector <- function(X){
X$x <- scale(X$x, center = T)
X$status <- c(X$status, "centered")
return(X)
}
```

Then I set up a pipeline with the functions and pass the vector through:

`l %>% rmNAs() %>% scaleVector %>% centerVector`

```
## $x
## [,1]
## [1,] -1.4863011
## [2,] -1.1560120
## [3,] -0.8257228
## [4,] -0.4954337
## [5,] -0.1651446
## [6,] 0.1651446
## [7,] 0.4954337
## [8,] 0.8257228
## [9,] 1.1560120
## [10,] 1.4863011
## attr(,"scaled:center")
## [1] 0
## attr(,"scaled:scale")
## [1] 1
##
## $status
## [1] "NAs_removed" "scaled" "centered"
```

more on pipelines: https://rpubs.com/tjmahr/pipelines_2015

If you just want to use a function once, you don’t have to name it:

`(function(x) x * 10)(10)`

`## [1] 100`

Anonymous functions can be particularly useful in conjunction with vectorising functions like `lapply()`

. Here’s an unammed function for calculating the mean of a vector `x`

. In the following example, the input `x`

to the function is each element of the list `l`

.

```
l <- list(1:5, 5:7)
lapply(l, FUN = function(x){sum(x)/length(x)})
```

```
## [[1]]
## [1] 3
##
## [[2]]
## [1] 6
```

I often write functions associated with a particular project.

I will save all the functions in a separate `"project.name_functions.R"`

script. I’ll then call that script to make all the functions available to my workflow.

`source("project.name_functions.R")`

Remember to document details of your functions!

Let’s start with another example of a function:

```
getY <- function(X.matrix, b.vec, a.scalar) {
# multiply the matrix by the vector using %*% operator
Xb.prod <- X.matrix %*% b.vec
# multiply the two resulting objects together to get a final object
y <- Xb.prod * a.scalar
# return the result
return(y)
}
```

Clearly this function requires that the length of the vector and the number of columns in the matrix match.

Let’s make some objects to pass to the function:

```
mat <- cbind(c(1, 3, 4), c(5, 4, 3))
vec <- c(4, 3)
getY(mat, vec, 3)
```

```
## [,1]
## [1,] 57
## [2,] 72
## [3,] 75
```

In this case the function works because the number of columns of matrix `mat`

(2) and the length of `vec`

(2) match.

We can adapt the function and use the `print()`

function to print off values of interest, for example the dimensions of arguments of concern

```
getY <- function(X.matrix, b.vec, a.scalar) {
# print diagnostics
print(dim(X.matrix))
print(length(b.vec))
# multiply the matrix by the vector using %*% operator
Xb.prod <- X.matrix %*% b.vec
# multiply the two resulting objects together to get final y
y <- Xb.prod * a.scalar
return(y)
}
getY(mat, vec, 3)
```

```
## [1] 3 2
## [1] 2
```

```
## [,1]
## [1,] 57
## [2,] 72
## [3,] 75
```

When you have an error, one thing you can do is use R’s built-in debugger debug() to find at what point the error occurs.

```
debug(getY)
getY(X.matrix = mat, b.vec = c(2, 3, 6, 4, 1), a.scalar = 9)
```

```
## debugging in: getY(X.matrix = mat, b.vec = c(2, 3, 6, 4, 1), a.scalar = 9)
## debug at <text>#1: {
## print(dim(X.matrix))
## print(length(b.vec))
## Xb.prod <- X.matrix %*% b.vec
## y <- Xb.prod * a.scalar
## return(y)
## }
## debug at <text>#4: print(dim(X.matrix))
## [1] 3 2
## debug at <text>#5: print(length(b.vec))
## [1] 5
## debug at <text>#8: Xb.prod <- X.matrix %*% b.vec
```

`## Error in X.matrix %*% b.vec: non-conformable arguments`

To ensure functions run smoothly you can use functions `stop()`

or `stopifnot()`

to hault the execution of functions should specific conditions not be met and flag with an appropriate and informative error

`stop()`

stops execution and returns error message. Usually used with conditional statements

```
f1 <- function(x) {
if(!is.numeric(x)){stop("x is not numeric")}else{
return(2*x)
}
}
f1(10)
```

`## [1] 20`

`f1("a")`

`## Error in f1("a"): x is not numeric`

`stopifnot()`

tests the conditional statements supplied as arguments and stops execution if any return `FALSE`

```
f1 <- function(x) {
stopifnot(is.numeric(x))
return(2*x)
}
f1(10)
```

`## [1] 20`

`f1("a")`

`## Error: is.numeric(x) is not TRUE`

Each function should be easy to test, then you can “freeze” it. Write test cases, which can be automatically checked. - Unit Testing in R: The Bare Minimum

**Keep your functions short.**Remember you can use them to call other functions!- code cleaner and easily testable.
- code easy to update

**Document**what the inputs to the function are, what the function does, and what the output is.**Check for errors**along the way- Try out your function with
**simple examples**to make sure it’s working properly **Use debugging and error messages**, as well as sanity checks as you build your function.- Avoid mixing computation and plotting in the same function eg.

```
res <- some.computation(par1, par2, par3)
plot(res)
```

- it can be useful to look at the code of a function. Type the function name without the
`( )`

- Will work on any function apart from base functions which are written in C

`std.dev`

```
## function(x){
## n <- length(x)
## xbar <- sum(x)/n
## diff <- x - xbar
## sum.sq <- sum(diff^2)
## var <- sum.sq / (n-1)
## s.d <- sqrt(var)
##
## return(list(s.d, xbar, n))
## }
```

**source: https://www.r-bloggers.com/functions-exercises/**

**Exercise 1** Create a function that will return the sum of 2 integers.

**Exercise 2** Create a function what will return TRUE if a given integer is inside a vector.

**Exercise 3** Create a function that given a data frame will print by screen the name of the column and the class of data it contains (e.g. Variable1 is Numeric).

**Exercise 4** Create the function unique, which given a vector will return a new vector with the elements of the first vector with duplicated elements removed.

**Exercise 5** Create a function that given a vector and an integer will return how many times the integer appears inside the vector.

**Exercise 6** Create a function that given a vector will print by screen the mean and the standard deviation, it will optionally also print the median.

**Exercise 7** Create a function that given an integer will calculate how many divisors it has (other than 1 and itself). Make the divisors appear by screen.

**Exercise 8** Create a function that given a data frame, and a number or character will return the data frame with the character or number changed to NA.