Materials on GitHub

@annakrystalli | a.krystalli[at]sheffield.ac.uk




Literate programming

Programming paradigm first introduced by Donald E. Knuth.

Treat program as a literature understandable to human beings

  • move away from writing programs in the manner and order imposed by the computer

  • focus instead on the logic and flow of human thought and understanding

  • single document to integrate data analysis (executable code) with textual documentation, linking data, code, and text



Why is this important in science:

Computational science has led to exciting new developments

  • Increasing data collection throughput; data are more complex and highdimensional

  • Existing databases can be merged to become bigger databases

  • Computing power allows more sophisticated analyses, even on “small” data

  • For every field “X” there is a “Computational X”


Increased computational complexity has exposed limitations in our ability to evaluate published findings

  • Even basic analyses difficult to describe

  • Errors more easily introduced into long analysis pipelines

  • Knowledge transfer is inhibited

  • Results are difficult to replicate or reproduce

  • Complicated analyses cannot be trusted


Calls for reproducibility


Reproducibility has the potential to serve as a minimum standard for judging scientific claims when full independent replication of a study is not possible.

  • Fully scripted analyses pipelines
    • from raw data to published tables and figures
  • Publication of code and data



Calls for open science

… highlight problems with users jumping straight into software implementations of methods (e.g. in r) that may lack documentation on biases and assumptions that are mentioned in the original papers.

To help solve these problems, we make a number of suggestions including providing blog posts or videos to explain new methods in less technical terms, encouraging reproducibility and code sharing, making wiki-style pages summarising the literature on popular methods, more careful consideration and testing of whether a method is appropriate for a given question/data set, increased collaboration, and a shift from publishing purely novel methods to publishing improvements to existing methods and ways of detecting biases or testing model fit. Many of these points are applicable across methods in ecology and evolution, not just phylogenetic comparative methods.



Science and the web

the web was made for open science !

tl;dr

  • Modern open source technologies have given us great power
  • With great power comes great responsibility
  • You can share some of that burden by using these tools to open your work up to feedback and contribution by others. The more eyes the better.
  • Use them to provide and context around your work. Help more humans understand

Literate programming in R

rmarkdown (.Rmd) integrates:

– a documentantion language (.md)

– a programming language (R)

Combine tools, processes and outputs into interactive evidence streams that are easily shareable, particularly through the web.



Rmarkdown overview

Features

Rstudio features fly through

What is R Markdown? from RStudio, Inc. on Vimeo.



The researchers perspective

a reproducible workflow in action



elements of R markdown


markdown {.md}

stripped down html. User can focus on communicating & disseminating


  • intended to be as easy-to-read and easy-to-write as possible.

  • most powerful as a format for writing to the web.

  • syntax is very small, corresponding only to a very small subset of HTML tags.

  • clean and legible across platforms (even mobile) and outputs.

  • formatting handled automatically

  • html markup language also handled.


code {r, python, SQL, … }

  • Code chunks defined through special notation. Executed in sequence. Exceution of individual chunks controllable

  • Analysis self-contained and reproducible
    • Run in a fresh R session every time document is knit.
  • A number of Language Engines are supported by knitr
    • R (default)
    • Python
    • SQL
    • Bash
    • Rcpp
    • Stan
    • JavaScript
    • CSS
  • Can read appropriately annotated .R scripts in and call them within an .Rmd


outputs

Knit together through package knitr to

Many great packages and applications build on rmarkdown.

All this makes it incredibly versatile. Check out the gallery.


Simple interface to powerful modern web technologies and libraries


RPubs

Publish rendered rmarkdown documents on the web with the click of a button, for free!

Applications in research

Rmd documents

Can be useful for a number of research related materials

  • Vignettes: long form documentation.
    • Analyses
    • Documentation (code & data)
    • Supplementary materials
  • Reports
  • Papers

Useful features: - bibliographies and citations


Exercise Part 1

Throughout this workshop, we’ll be working with the gapminder dataset to produce a reproducible Rmarkdown vignette of our work.

open your first .Rmd!!

File > New File > RMarkdown… > Document


save and render it

  • Before knitting, the document needs to be saved. Give it a useful name, e.g. gapminder.Rmd

  • Render the document by clicking on the knit button.

You can also render .Rmd documents to html using rmarkdown function render()

rmarkdown::render(input = "gapminder.Rmd")

Publish your .Rmd

  • Register an account on RPubs

  • Publish your rendered document (don’t worry, you can delete or overwrite it later)


open the cheatsheet

install the packages we’ll need

install.packages(c("rmarkdown", "tidyverse", "plotly", "DT", "reprex"))



YAML header

define outputs


basic html_document

---
title: "Untitled"
author: "Anna Krystalli"
date: "3/23/2018"
output: html_document
---


define a floating table of contents

---
title: "Untitled"
author: "Anna Krystalli"
date: "3/23/2018"
output:
  html_document:
    toc: true
    toc_float: true
---


choose a theme

Specify bootswatch themes.

---
title: "Untitled"
author: "Anna Krystalli"
date: "3/23/2018"
output:
  html_document:
    toc: true
    toc_float: true
    theme: cosmo
---


choose code highlights

---
title: "Untitled"
author: "Anna Krystalli"
date: "3/23/2018"
output:
  html_document:
    toc: true
    toc_float: true
    theme: cosmo
    highlights: zenburn
---


Exercise Part 2

  • Clear everything BELOW THE YAML header. You should be left with just this:

    ---
    title: "Gapminder"
    author: "Anna Krystalli"
    date: "3/23/2018"
    output: html_document
    ---
  • add a floating table of contents

  • set a theme of your choice (see avalable themes here and the associated bootstrap styles here)



Markdown basics



text

    normal text

normal text

    *italic text*

italic text

    **bold text**

bold text

    ***bold italic text***

bold italic text

headers

rmarkdown

# Header 1
## Header 2
### Header 3
#### Header 4
##### Header 5
###### Header 6

rendered html


unordered lists

rmarkdown

- first item in the list
- second item in list
- third item in list

rendered html

  • first item in the list
  • second item in list
  • third item in list

ordered lists

rmarkdown

1. first item in the list
1. second item in list
1. third item in list

rendered html

  1. first item in the list
  2. second item in list
  3. third item in list

quotes

rmarkdown

> this text will be quoted

rendered html

this text will be quoted


code

annotate code inline

rmarkdown

`this text will appear as code` inline

rendered html

this text will appear as code inline



evaluate r code inline

a <- 10

rmarkdown

the value of parameter *a* is `r a`

rendered html

the value of parameter a is 10



images

Provide either a path to a local image file or the URL of an image.

rmarkdown

![](assets/cheat.png)

rendered html


resize images

html in rmarkdown

<img src="assets/cheat.png" width="200px" />

rendered html


basic tables in markdown

rmarkdown


    Table Header  | Second Header
    ------------- | -------------
    Cell 1        | Cell 2
    Cell 3        | Cell 4 

rendered html

Table Header Second Header
Cell 1 Cell 2
Cell 3 Cell 4

Check out handy online .md table converter



mathematical expressions

Supports mathematical notations through MathJax.

You can write LaTeX math expressions inside a pair of dollar signs, e.g. $\alpha+\beta$ renders \(\alpha+\beta\). You can use the display style with double dollar signs:

$$\bar{X}=\frac{1}{n}\sum_{i=1}^nX_i$$

\[\bar{X}=\frac{1}{n}\sum_{i=1}^nX_i\]

Exercise: Part 3

Get more info on gapminder:

Do some quick online research on Gapminder. A good places to start: https://www.gapminder.org/

Create a "Background" section using headers

Write a short description

Write a short description of the Gapminder project (feel free to copy, paste and edit information).

Make use of markdown annotation to:

  • highlight important information
  • include links to sources or further information.

Add an image

Add an image related to Gapminder.

  • have a look online for an image.
  • include the source URL underneath for attribution.
  • see if you can resize it.

Chunks


R code chunks execute code.

They can also be used as a means render R output into documents or to simply display code for illustration (eg with option eval=FALSE)


chunk notation

chunk notation in .rmd

```{r chunk-name}
print('hello world!')
```

rendered html code and output

print("hello world!")
## [1] "hello world!"

Chunks can be labelled with chunk names, names must be unique.


inserting new chunks

You can quickly insert chunks with:

  • the keyboard shortcut Ctrl + Alt + I (OS X: Cmd + Option + I)
  • the Add Chunk command in the RStudio toolbar
  • by typing the chunk delimiters ```{r} and ```.

chunk options

for more details see http://yihui.name/knitr/


uses

  • controlling whether code is displayed inline (echo setting)
  • controlling whether code is evaluated (eval setting)
  • controlling how figures are displayed (fig.width and fig.height settings)
  • suppressing warnings and messages (warning and message settings)
  • cacheing computations (cache setting)
  • controlling whether code is extracted when using purl (purl settings)

controlling code display with echo

chunk notation in .rmd

```{r hide-code, echo=FALSE}
print('hello world!')
```

rendered html code and output

## [1] "hello world!"

controlling code evaluation with eval

chunk notation in .rmd

```{r dont-eval, eval=FALSE}
print('hello world!')
```

rendered html code and output

print("hello world!")

setting document level default options

knitr::opts_chunk$set(echo = TRUE, warning = F, message = F)

Exercise Part 4

For this exercise we’ll be accessing the gapminder data through the gapminder R package.

Create an “Installation” section using headers

Write installation instruction

Write brief instructions (including code) for others to access the dataset in R. Have a look at the package documentation on GitHub for inspiration.

In R we often need to describe a setup proceedure that involves specifying the installation of required packages. However, installation of packages in not handled in .Rmd! (For the moment, install packages through the console).

In our case, we’ll want to include the code for installing the gapminder package but not evaluate it in the .Rmd.



Displaying data

printing data.frames

data(airquality)
head(airquality)
##   Ozone Solar.R Wind Temp Month Day
## 1    41     190  7.4   67     5   1
## 2    36     118  8.0   72     5   2
## 3    12     149 12.6   74     5   3
## 4    18     313 11.5   62     5   4
## 5    NA      NA 14.3   56     5   5
## 6    28      NA 14.9   66     5   6

printing tibbles

library(tibble)
as_tibble(airquality)
## # A tibble: 153 x 6
##    Ozone Solar.R  Wind  Temp Month   Day
##    <int>   <int> <dbl> <int> <int> <int>
##  1    41     190  7.40    67     5     1
##  2    36     118  8.00    72     5     2
##  3    12     149 12.6     74     5     3
##  4    18     313 11.5     62     5     4
##  5    NA      NA 14.3     56     5     5
##  6    28      NA 14.9     66     5     6
##  7    23     299  8.60    65     5     7
##  8    19      99 13.8     59     5     8
##  9     8      19 20.1     61     5     9
## 10    NA     194  8.60    69     5    10
## # ... with 143 more rows

Displaying knitr::kable() tables

library(knitr)
data(airquality)
kable(head(airquality), caption = "New York Air Quality Measurements")
New York Air Quality Measurements
Ozone Solar.R Wind Temp Month Day
41 190 7.4 67 5 1
36 118 8.0 72 5 2
12 149 12.6 74 5 3
18 313 11.5 62 5 4
NA NA 14.3 56 5 5
28 NA 14.9 66 5 6

Displaying interactive DT::datatable() tables

library(DT)
data(airquality)
datatable(airquality, caption = "New York Air Quality Measurements")

Summarising data with skimr::skim()

Provides a frictionless approach to displaying summary statistics that can be quickly skimmed quickly to understand their data.

skimr::skim(airquality)
## Skim summary statistics
##  n obs: 153 
##  n variables: 6 
## 
## Variable type: integer 
##  variable missing complete   n   mean    sd p0    p25 median    p75 p100
##       Day       0      153 153  15.8   8.86  1   8      16    23      31
##     Month       0      153 153   6.99  1.42  5   6       7     8       9
##     Ozone      37      116 153  42.13 32.99  1  18      31.5  63.25  168
##   Solar.R       7      146 153 185.93 90.06  7 115.75  205   258.75  334
##      Temp       0      153 153  77.88  9.47 56  72      79    85      97
##      hist
##  ▇▇▇▇▆▇▇▇
##  ▇▇▁▇▁▇▁▇
##  ▇▆▃▃▂▁▁▁
##  ▃▃▃▃▅▇▇▃
##  ▂▂▃▆▇▇▃▃
## 
## Variable type: numeric 
##  variable missing complete   n  mean   sd  p0 p25 median  p75 p100
##      Wind       0      153 153  9.96 3.52 1.7 7.4    9.7 11.5 20.7
##      hist
##  ▁▃▇▇▅▅▁▁



Exercise Part 5

install.packages("gapminder")

Start a new section called "Dataset"

Display an example of the dataset

Write a short description of the dataset

  • What size is the data? (How many variables? How many rows of data points. See if you can extract and include such info inline)
  • what type of object is it? (see ?class)
  • Use some of the functions you’ve learnt to extract such information (eg ?dim, ?ncol etc).

Summarise the data

(e.g. ?summary, ?skimr)



plots

set.seed(100)
d <- diamonds[sample(nrow(diamonds), 1000), ]

p <- ggplot(data = d, aes(x = carat, y = price)) + geom_point(aes(text = paste("Clarity:", 
    clarity)), size = 1) + geom_smooth(aes(colour = cut, fill = cut)) + facet_wrap(~cut)

p



interactive plots with plotly

Wraps nicely around plotting library ggplot2

library(plotly)

ggplotly(p)

Exercise Part 6

  • Replicate some of the plots you produced earlier today with the gapminder data but hide the code that generates them.

  • Add a new plot of your own

  • Add some comments for each plot

Exercise Part 7

Publish your report on Rpubs

Advanced .Rmd


reading chunks of code

R -> Rmd

You can read in chunks of code from an annotated .R (or any other language) script using knitr::read_chunks()

Chunks are defined by the following notation. Names must be unique.

# ---- descriptive-chunk-name1 ----
code("you want to run as a chunk")

# ---- descriptive-chunk-name2 ----
code("you want to run as a chunk")

code in .R script hello-world.R

hello-world.R

# ---- demo-read_chunk ----
print("hello world")


read chunks from hello-world.R

knitr::read_chunk("hello-world.R")



call chunk by name

rmarkdown r chunk notation

```{r demo-read_chunk}

```

rendered html code and output

print("hello world")
## [1] "hello world"



Check chunks in the current session

knitr:::knit_code$get()
## $`demo-read_chunk`
## [1] "print(\"hello world\")"

Extracting code from an .Rmd

Rmd -> R

You can use knitr::purl() to tangle code out of an Rmd into an .R script. purl takes many of the same arguments as knit(). The most important additional argument is:

  • documentation: an integer specifying the level of documentation to go the tangled script:
    • 0 means pure code (discard all text chunks)
    • 1 (default) means add the chunk headers to code
    • 2 means add all text chunks to code as roxygen comments
purl("file-to-extract-code-from.Rmd", documentation = 0)

extract using purl

Here i’m running a loop to extract the code in demo-rmd.Rmd for each documentation level

file <- "demo-rmd.Rmd"
for (docu in 0:2) {
    knitr::purl(file, output = paste0(gsub(".Rmd", "", file), "_", docu, ".R"), 
        documentation = docu, quiet = T)
}

demo-rmd_0.R

knitr::opts_chunk$set(echo = TRUE)
summary(cars)
plot(pressure)

demo-rmd_1.R

## ----setup, include=FALSE------------------------------------------------
knitr::opts_chunk$set(echo = TRUE)

## ----cars----------------------------------------------------------------
summary(cars)

## ----pressure, echo=FALSE------------------------------------------------
plot(pressure)

demo-rmd_2.R

#' ---
#' title: "Untitled"
#' author: "Anna Krystalli"
#' date: "3/23/2018"
#' output:
#'   html_document:
#'     toc: true
#'     toc_float: true
#'     theme: cosmo
#'     highlight: textmate
#' 
#' ---
#' 
## ----setup, include=FALSE------------------------------------------------
knitr::opts_chunk$set(echo = TRUE)

#' 
#' ## R Markdown
#' 
#' 
#' This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.
#' 
#' When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
#' 
## ----cars----------------------------------------------------------------
summary(cars)

#' 
#' ## Including Plots
#' 
#' You can also embed plots, for example:
#' 
## ----pressure, echo=FALSE------------------------------------------------
plot(pressure)

#' 
#' Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.
#' 
#' 


Exercise: Part 7*

read in a chunk

  • Open an .R script
  • Cut the code from one or more of your chunks and paste it into the .R script
  • Annotate the code up as named chunk(s)
  • Read the chunk(s) in your .R script into your .Rmd (?read_chunk())
  • Include the code in your .Rmd workflow by labelling an empty chunk with your chunk(s) name(s)

purl your document

Once your document is ready, try and extract the contents of your .Rmd into an .R script.

?purl


html in rmarkdown

marking up with html tags

This text marked up in html

<strong>Bold text</strong>

renders to this

Bold text


**This text marked up with Bootstrap alert css classes

<div class="alert alert-warning"><small>this a is warning message</small></div>

renders to

this a is warning message


<div class="alert alert-success"><small>this a is success message</small></div>

renders to

this a is success message

embedding tweets

This snipped copied from twitter in the embed format

<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">How cool does this tweet look embedded in <a href="https://twitter.com/hashtag/rmarkdown?src=hash&amp;ref_src=twsrc%5Etfw">#rmarkdown</a>! 😎</p>&mdash; annakrystalli (@annakrystalli) <a href="https://twitter.com/annakrystalli/status/977209749958791168?ref_src=twsrc%5Etfw">March 23, 2018</a></blockquote>
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

renders to this

Embbed gifs, videos, widgets in this way


Parting words


Getting help with markdown

To get help, you need a reproducible example

  • github issues
  • stackoverflow
  • slack channels
  • discussion boards

reprex

Use function reprex::reprex() to produce a reproducible example in a custom markdown format for the venue of your choice

  • "gh" for GitHub (default)
  • "so" for StackOverflow,
  • "r" or "R" for a runnable R script, with commented output interleaved.

using reprex

  1. Copy the code you want to run.
    all required variables must be defined and libraries loaded
  2. In the console, call the reprex function

    reprex::reprex()
    • the code is executed in a fresh environment and “code + commented output” is returned invisibly on the clipboard.
  3. Paste the result in the venue of your choice.
    • Once published it will be rendered to html.

bookdown

Authoring with R Markdown. Offers:

  • cross-references,
  • citations,
  • HTML widgets and Shiny apps,
  • tables of content and section numbering

The publication can be exported to HTML, PDF, and e-books (e.g. EPUB) Can even be used to write thesis!


pkgdown

For buidling package documentation

  • Can use it to document any functional code you produce and demonstrate it’s us ethrough vignettes


workflowr pkg

Build analyses websites and organise your project

The workflowr R package makes it easier for researchers to organize their projects and share their results with colleagues.


blogdown

For creating and mantaining blogs.

Check out https://awesome-blogdown.com/, a curated list of awesome #rstats blogs in blogdown for inspiration!


Learn about Version Control

Use Git and GitHub to manage, publish and collaborate on your work

See Happy Git with R Tutorial


Share your work

  • Start a blog!
  • Work openly

Keep learning with others