# Data types, structures and classes

## Base types

Every object has a base type and only R-core can create new types.

Over all there are 25 different base object types.

## Base data types

There are

5 base data types:as well as`double`

,`integer`

,`complex`

,`logical`

,`character`

`NULL`

.

No matter how complicated your analyses become, all data in R is interpreted as one of these basic data types.

You can inspect the type of a value or object through function `typeof()`

.

`typeof(3.14)`

`## [1] "double"`

`typeof(1L) # The L suffix forces the number to be an integer, since by default R uses float numbers`

`## [1] "integer"`

`typeof(TRUE)`

`## [1] "logical"`

`typeof('banana')`

`## [1] "character"`

`typeof(NULL)`

`## [1] "NULL"`

## Data Structures

### Arrays and type coersion

The distinguishing feature of arrays is that all values are of the same data type.

Arrays can take values of any base data type and span any number of dimensions. However, all values must be of the same base data type. This allows for efficent calculation and matrix mathematics. The strictness also has some really important consequences which introduces another key concept in R, that of **type coersion**.

### Vectors and Type Coercion

#### Vectors

Vectors are one dimensional arrays.

To better understand the importance of data types and coersion, let’s meet a special case of an array, the **vector**.

To create a new vector use function `vector()`

. You can specify the length of the vector with argument `length`

and the base data type through argument `mode`

.

```
<- vector(length = 3)
my_vector my_vector
```

`## [1] FALSE FALSE FALSE`

A vector in R is essentially an ordered list of things, with the special
condition that *everything in the vector must be the same basic data type*.

If you don’t choose the datatype, it’ll default to `logical`

.

`typeof(my_vector)`

`## [1] "logical"`

Otherwise, you can declare an empty vector of whatever type you like using argument `mode`

.

```
<- vector(mode='character', length=3)
another_vector another_vector
```

`## [1] "" "" ""`

You can also create a vector of a series of numbers:

`1:10`

`## [1] 1 2 3 4 5 6 7 8 9 10`

`seq(10)`

`## [1] 1 2 3 4 5 6 7 8 9 10`

`seq(1,10, by=0.1)`

```
## [1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4
## [16] 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9
## [31] 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 5.1 5.2 5.3 5.4
## [46] 5.5 5.6 5.7 5.8 5.9 6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9
## [61] 7.0 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 8.0 8.1 8.2 8.3 8.4
## [76] 8.5 8.6 8.7 8.8 8.9 9.0 9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9
## [91] 10.0
```

You can also create vectors by combining individual elements using function `c`

(for combine).

```
<- c(2,6,3)
combine_vector combine_vector
```

`## [1] 2 6 3`

### Type coercion

Q: Given what we’ve learned so far, what do you think the following will produce?

`c(2,6,'3')`

`## [1] "2" "6" "3"`

This is something called *type coercion*, and it is the source of many surprises
and the reason why we need to be aware of the basic data types and how R will
interpret them.

When R encounters a mix of types (here numeric and character) to be combined into a single vector, it will force them all to be the same type.

Not all types can be coerced into another, rather, R has a coercion hierarchy rule. All values are converted to the lowest data type in the hierarchy.

##### R coercion rules:

`logical`

-> `integer`

-> `numeric`

-> `complex`

-> `character`

*where -> can be read as “are transformed into”.*

In our case, our `2`

, & `3`

integer values where converted to character.

Some other examples:

`c('a', TRUE)`

`## [1] "a" "TRUE"`

`c("FALSE", TRUE)`

`## [1] "FALSE" "TRUE"`

`c(0, TRUE)`

`## [1] 0 1`

You can try to force coercion against this flow using the `as.`

functions:

```
<- c('0','2','4')
chars as.numeric(chars)
```

`## [1] 0 2 4`

`as.logical(chars)`

`## [1] NA NA NA`

`as.logical(as.numeric(chars))`

`## [1] FALSE TRUE TRUE`

`as.logical(c(0, TRUE))`

`## [1] FALSE TRUE`

`as.logical(c("FALSE", TRUE))`

`## [1] FALSE TRUE`

`as.numeric(c("FALSE", TRUE))`

`## Warning: NAs introduced by coercion`

`## [1] NA NA`

`as.numeric(as.logical(c("FALSE", TRUE)))`

`## [1] 0 1`

As you can see, some surprising things can happen when R forces one basic data type into another!

If your data isn’t the data type you expected, type coercion may well be to blame; make sure everything is the same type in your vectors and your columns of data.frames, or you will get nasty surprises!

We can ask a few questions about vectors:

```
<- seq(10)
sequence_example
head(sequence_example, n=2)
```

`## [1] 1 2`

`tail(sequence_example, n=4)`

`## [1] 7 8 9 10`

`length(sequence_example)`

`## [1] 10`

`str(sequence_example)`

`## int [1:10] 1 2 3 4 5 6 7 8 9 10`

The somewhat cryptic output from this command indicates the basic data type
found in this vector - in this case `int`

, integer; an indication of the
number of things in the vector - actually, the indexes of the vector, in this
case `[1:10]`

; and a few examples of what’s actually in the vector - in this case
ascending integers.

Finally, you can give names to elements in your vector:

```
<- 5:8
my_example names(my_example) <- c("a", "b", "c", "d")
my_example
```

```
## a b c d
## 5 6 7 8
```

`names(my_example)`

`## [1] "a" "b" "c" "d"`

### Matrices

Matrices are 2 dimensional arrays

The lengths of each dimension are defined by the number of rows and columns.

We can declare a matrix full of zeros:

```
<- matrix(0, ncol=6, nrow=3)
matrix_example matrix_example
```

```
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0 0 0 0 0 0
## [2,] 0 0 0 0 0 0
## [3,] 0 0 0 0 0 0
```

We can get the number of dimensions of a matrix (or of any array with dimensions > 1) and their length.

`dim(matrix_example)`

`## [1] 3 6`

## Lists

Lists can store objects of any data type and class

Another key data structure is the `list`

. List are the most flexible data structure because each element can hold any object, of any data type and dimension, including other lists.

Create lists using `list()`

or coerce other objects using `as.list()`

.

`list(1, "a", TRUE)`

```
## [[1]]
## [1] 1
##
## [[2]]
## [1] "a"
##
## [[3]]
## [1] TRUE
```

`as.list(1:4)`

```
## [[1]]
## [1] 1
##
## [[2]]
## [1] 2
##
## [[3]]
## [1] 3
##
## [[4]]
## [1] 4
```

We can name list elements:

```
<- list(title = "Numbers", numbers = 1:10, data = TRUE )
a_list a_list
```

```
## $title
## [1] "Numbers"
##
## $numbers
## [1] 1 2 3 4 5 6 7 8 9 10
##
## $data
## [1] TRUE
```

Lists are a base type:

`typeof(a_list)`

`## [1] "list"`

## Data.frames

### S3, S4 and S6 objects

Arrays and lists are all immutable base types. However, there are other types of objects in R.

These are S3, S4 & S6 type objects, with S3 being the most common.

Such objects have a class attribute (base types can have a class attribute too), enabling class specific functionality, a characteristic of object oriented programming. New classes can be created by users, allowing greater flexibility in the types of data structures available for analyses.

### Data.frames

The most important S3 object class in R is the data.frame. Data.frames are special types of lists.

Data.frames are special types of lists where each element is a vector, each of equal length. So each column of a data.frame contains values of consistent data type but the data type can vary between columns (i.e. along rows).

```
<- data.frame(id = 1:3,
df treatment = c("a", "b", "b"),
complete = c(TRUE, TRUE, FALSE))
df
```

```
## id treatment complete
## 1 1 a TRUE
## 2 2 b TRUE
## 3 3 b FALSE
```

We can check that our data.frame is a list under the hood:

`typeof(df)`

`## [1] "list"`

As an S3 object, it also has a class attribute:

`class(df)`

`## [1] "data.frame"`

And we can check the type of object that it is:

`::otype(df) sloop`

`## [1] "S3"`

Compared to a vector?

`::otype(1:10) sloop`

`## [1] "base"`

We can check the dimensions of a data.frame

`dim(df)`

`## [1] 3 3`

Get a certain number of rows from the top or bottom

`head(df, 1)`

```
## id treatment complete
## 1 1 a TRUE
```

`tail(df, 1)`

```
## id treatment complete
## 3 3 b FALSE
```

Importantly, we can display the structure of a data.frame.

`str(df)`

```
## 'data.frame': 3 obs. of 3 variables:
## $ id : int 1 2 3
## $ treatment: chr "a" "b" "b"
## $ complete : logi TRUE TRUE FALSE
```

### A note on factors

Note that the default behaviour of `data.frame()`

USED TO BE to covert character vectors to factors (this default changed as of R 4.0.0). Factors are another important data structure for handling categorical data, which have particular statistical properties. They can be useful during modelling and plotting but in the interest of time we will not be discuss them further here.

You can suppress R default behaviour using:

```
<- data.frame(id = 1:3,
df treatment = c("a", "b", "b"),
complete = c(TRUE, TRUE, FALSE),
stringsAsFactors = FALSE)
str(df)
```

```
## 'data.frame': 3 obs. of 3 variables:
## $ id : int 1 2 3
## $ treatment: chr "a" "b" "b"
## $ complete : logi TRUE TRUE FALSE
```