--- class: inverse, center, middle .box[ ## Benefit #2 ] ## transparency as a means of supercharging research cycle --- class: inverse, center, middle # Reproducible Research Compendia --- ### The concept of a Research Compendium >β ...We introduce the **concept of a compendium** as both a **container for the different elements** that make up the document and its computations (i.e. text, code, data, ...), and as a **means for distributing, managing and updating the collection**." [_Gentleman and Temple Lang, 2004_](https://biostats.bepress.com/bioconductor/paper2/) --- ## Principles ![](assets/reproducible-data-analysis_004.png) .img-attr[slides: [_Karthik Ram: rstudio::conf 2019 talk_](https://github.com/karthik/rstudio2019)] --- ## Components ![](assets/reproducible-data-analysis_005.png) .img-attr[slides: [_Karthik Ram: rstudio::conf 2019 talk_](https://github.com/karthik/rstudio2019)] --- ### Research compendia in R **Ben Marwick, Carl Boettiger & Lincoln Mullen (2018)** [_Packaging Data Analytical Work Reproducibly Using R (and Friends)_](https://peerj.com/preprints/3192/) .pull-left[ ![](assets/compendium-small.png) .img-attr[slides: [_Karthik Ram: rstudio::conf 2019 talk_](https://github.com/karthik/rstudio2019)] ] .pull-right[ ![](assets/compendium-large.png) ] --- ## Example compendium .pull-left[ **Paper**: ##### Boettiger, C. (2018) *From noise to knowledge: how randomness generates novel phenomena and reveals information*. <https://doi.org/10.1111/ele.13085> <img src="assets/Boettiger-2018.png" heigth="250px" width="400px"> ] .pull-right[ **Compendium** ##### *cboettig/noise-phenomena: Supplement to: "From noise to knowledge: how randomness generates novel phenomena and reveals information"* http://doi.org/10.5281/zenodo.1219780 <img src="assets/boettiger_compendium.png" heigth="250px" width="400px"> ] --- # `rrtools`: Creating Compendia in R ### "The goal of rrtools is to provide **instructions, templates, and functions** for making a **basic compendium** suitable for writing **reproducible research with R**." <br> ### Install [`rrtools`](https://github.com/benmarwick/rrtools) from GitHub ```r # install.packages("devtools") devtools::install_github("benmarwick/rrtools") ``` --- # Create compendium ```r rrtools::create_compendium("~/Documents/workflows/rrcompendium") ``` ``` β Setting active project to '/Users/Anna/Documents/workflows/rrcompendium' β Creating 'R/' β Creating 'man/' β Writing 'DESCRIPTION' β Writing 'NAMESPACE' β Writing 'rrcompendium.Rproj' β Adding '.Rproj.user' to '.gitignore' β Adding '^rrcompendium\\.Rproj$', '^\\.Rproj\\.user$' to '.Rbuildignore' β Opening new project 'rrcompendium' in RStudio β The package rrcompendium has been created β Opening the new compendium in a new RStudio session... Next, you need to: β β β β Edit the DESCRIPTION file β Use other 'rrtools' functions to add components to the compendium ``` --- ## `DESCRIPTION` file #### Package metadata ```yaml Package: rrcompendiumDTB Title: Partial Reproduction of Boettiger Ecology Letters 2018;21:1255β1267 with rrtools Version: Authors@R: person(given = "Anna", family = "Krystalli", role = c("aut", "cre"), email = "annakrystalli@googlemail.com") Description: This repository contains the research compendium of the partial reproduction of Boettiger Ecology Letters 2018;21:1255β1267. The compendium contains all data, code, and text associated with this sub-section of the analysis. ``` --- # Prepare for sharing ```r rrtools::use_readme_rmd() ``` .pull-left[ ``` β Creating 'README.Rmd' from template. β Adding 'README.Rmd' to `.Rbuildignore`. β Modify 'README.Rmd' β Rendering README.Rmd to README.md for GitHub. β Adding code of conduct. β Creating 'CONDUCT.md' from template. β Adding 'CONDUCT.md' to `.Rbuildignore`. β Adding instructions to contributors. β Creating 'CONTRIBUTING.md' from template. β Adding 'CONTRIBUTING.md' to `.Rbuildignore`. ``` ] .pull-right[ ![](assets/README-webshot.png) ] --- # Create analysis folder ```r rrtools::use_analysis() ``` ``` β Adding bookdown to Imports β Creating 'analysis' directory and contents β Creating 'analysis' β Creating 'analysis/paper' β Creating 'analysis/figures' β Creating 'analysis/templates' β Creating 'analysis/data' β Creating 'analysis/data/raw_data' β Creating 'analysis/data/derived_data' β Creating 'references.bib' from template. β Creating 'paper.Rmd' from template. Next, you need to: β β β β β Write your article/report/thesis, start at the paper.Rmd file β Add the citation style library file (csl) to replace the default provided here, see https://github.com/citation-style-language/ β Add bibliographic details of cited items to the 'references.bib' file β For adding captions & cross-referencing in an Rmd, see https://bookdown.org/yihui/bookdown/ β For adding citations & reference lists in an Rmd, see http://rmarkdown.rstudio.com/authoring_bibliographies_and_citations.html ``` --- # `paper.Rmd` to `paper.pdf` .pull-left[ **Rmd** <img src="assets/paper_rmd.png" > ] .pull-right[ **pdf** <img src="assets/paper_pdf.png" > ] --- # Capturing dependencies ```r rrtools::add_dependencies_to_description() ``` ``` Imports: bookdown, ggplot2 (>= 3.0.0), ggthemes (>= 3.5.0), here (>= 0.1), knitr (>= 1.20), rticles (>= 0.6) ``` --- # Further Helpers ## π¦ `rticles` Contains a **suite of custom R Markdown templates for popular journals**, simplifying the creation of documents that conform to research paper submission standards. --- # π¦ `citr` RStudio Add-in to **Insert Markdown Citations** <img src="assets/citr-insert.png" width="700px"> --- class: inverse, center, middle # Reproducible Computational Environments --- ## Why isn't sharing code enough? ### Case Study: Sharing a Geospatial Analysis in R *** #### On a computer without System Library `GDAL` β .pull-left[ ```r package βrgdalβ successfully unpacked and MD5 sums checked configure: gdal-config: gdal-config checking gdal-config usability... ./configure: line 1353: gdal-config: command not found no *Error: gdal-config not found ... *ERROR: configuration failed for package βrgdalβ ``` ] .pull-right[ <br> <br> ![](assets/reproducible-data-analysis-02.png) .img-attr[slide: [_Karthik Ram: rstudio::conf 2019 talk_](https://github.com/karthik/rstudio2019)] ] --- ## What are Docker containers? ### standardized units of software **package up everything needed to run an application:** _code, runtime, system tools, system libraries_ and settings in a lightweight, standalone, executable package -- - #### **Dockerfile**: Text file containing recipe for setting up computation environment. - #### **Docker Image**: Executable **built** from the **Dockerfile** with all required dependencies installed. Can have many images from the same `Dockerfile`. - #### **Docker Container**: **Docker Images** become containers at **runtime** .center[ <img src="assets/docker_workflow.png" height=180px> ] --- # Rocker on DockerHub #### using the `rocker/geospatial` Docker Image β *** .pull-left[ ![](assets/rocker_geospatial.png) ] .pull-right[ <br> <br> ![](assets/reproducible-data-analysis_042.png) .img-attr[slide: [_Karthik Ram: rstudio::conf 2019 talk_](https://github.com/karthik/rstudio2019)] ] --- # Create Dockerfile w/ `rrtools` ```r rrtools::use_dockerfile() ``` ```r β Creating 'Dockerfile' from template. β Adding 'Dockerfile' to `.Rbuildignore`. β Modify Next: * Edit the dockerfile with your name & email * Edit the dockerfile to include system dependencies, such as linux libraries that are needed by the R packages you're using * Check the last line of the dockerfile to specify which Rmd should be rendered in the Docker container, edit if necessary ``` --- # `Dockerfile` ```bash # get the base image, the rocker/verse has R, RStudio and pandoc FROM rocker/verse:3.6.0 # required *MAINTAINER Anna Krystalli <annakrystallil@googlemail.com> COPY . /rrcompendiumDTB # go into the repo directory RUN . /etc/environment \ # Install linux depedendencies here # e.g. need this for ggforce::geom_sina && sudo apt-get update \ && sudo apt-get install libudunits2-dev -y \ # build this compendium package && R -e "devtools::install('/rrcompendiumDTB', dep=TRUE)" \ # render the manuscript into a docx, you'll need to edit this if you've # customised the location and name of your main Rmd file * && R -e "rmarkdown::render('/rrcompendiumDTB/analysis/paper/paper.Rmd')" ``` --- # Docker + Travis ## Create `.travis.yml` ```r rrtools::use_travis() ``` ```r β Creating '.travis.yml' from template. β Adding '.travis.yml' to `.Rbuildignore`. Next: * Add a travis shield to your README.Rmd: [![Travis-CI Build Status](https://travis-ci.org/annakrystalli/rrcompendiumDTB.svg?branch=master)](https://travis-ci.org/annakrystalli/rrcompendiumDTB) * Turn on travis for your repo at https://travis-ci.org/annakrystalli/rrcompendiumDTB ** To connect Docker, go to https://travis-ci.org/, and add your environment *variables: DOCKER_EMAIL, DOCKER_USER, DOCKER_PASS to enable pushing to the *Docker Hub ``` --- # `.travis.yml` ```bash env: global: - REPO=$DOCKER_USER/rrcompendiumdtb sudo: required warnings_are_errors: false language: generic services: - docker before_install: * - docker build -t $REPO . ``` Create & build image using dockerfile --- # `.travis.yml` Push our custom docker image to docker hub ```bash after_success: * - docker login -u $DOCKER_USER -p $DOCKER_PASS - export REPO=$DOCKER_USER/rrcompendiumdtb - export TAG=`if [ "$TRAVIS_BRANCH" == "master" ]; then echo "latest"; else echo $TRAVIS_BRANCH ; fi` - docker build -f Dockerfile -t $REPO:$COMMIT . - docker tag $REPO:$COMMIT $REPO:$TAG - docker tag $REPO:$COMMIT $REPO:travis-$TRAVIS_BUILD_NUMBER * - docker push $REPO ``` --- class: middle, inverse, center # How to review a research compendium? --- class: inverse, center, middle # Review as an auditor --- ## Access - How easy was it to gain access to the materials? ## Installation - How easy / automated was installation? - Did you have any problems? ## Data - Data were clearly separated from code and other items? - Large data files were deposited in a trustworthy data repository and referred to using a persistent identifier? - Data are documented ...somehow... --- ## Documentation Was there adequate documentation describing: - the purpose and target audience of the compendium? - how to cite the compendium in a form that can be copy and pasted? - how to install necessary software including non-standard dependencies? - how to use materials to reproduce the paper? ## Analysis - Were all dependencies clearly specified? - Was a full computational environment captured? - How automated was the reproduction? - How easy was it to link analysis code to: - the plots it generates - sections in the manuscript in which it is described --- class: inverse, center, middle ## Review as a user --- .pull-left[ ![](assets/user-testing-game.jpg) ] -- .pull-right[ ## User Testing #### What did you find easy / intuitive? #### What did you find confusing / difficult #### What did you enjoy? ] --- class: inverse, center, middle ## Feedback as a community member --- ### Acknowledge author effort -- ### Give feedback in good faith -- ### Focus on community benefits - and system level solutions --- class: inverse, center, middle # td:lr ## Reproducbility: ## the value in the practice --- background-image: url('assets/1728_TURI_Book sprint_38 computer readable_040619.jpg') background-size: contain .box[ ## Following conventions β‘οΈ <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> ] .box[ #### This image was created by Scriberia for The Turing Way community and is used under a CC-BY licence. ] --- background-image: url('assets/reproducible-data-analysis_042.png') background-size: contain .box[ ## Successful Reproducibility β‘οΈ <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> .box[slide: [_Karthik Ram: rstudio::conf 2019 talk_](https://github.com/karthik/rstudio2019)] ] --- background-image: url('assets/1728_TURI_Book sprint_36 data research cycle_040619.jpg') background-size: contain .box[ ## Enhanced Research Cycle π <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> .box[ #### This image was created by Scriberia for The Turing Way community and is used under a CC-BY licence. ] ] --- background-image: url('assets/1728_TURI_Book sprint_26 culture shift_040619.jpg') background-size: contain .box[ ## Reproducibility as standard π <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> <br> .box[ #### This image was created by Scriberia for The Turing Way community and is used under a CC-BY licence. ] ] --- class: inverse, center, middle # Further Resources --- # The Turing Way .pull-left[ ### Book #### a lightly opinionated guide to reproducible data science <https://the-turing-way.netlify.com> <img src="assets/1728_TURI_Book sprint_12 chapter_040619.jpg" height="150px"> ] .pull-right[ ### workshops - **Boost Your Research Reproducibility with Binder** [materials](https://github.com/alan-turing-institute/the-turing-way/tree/master/workshops/boost-research-reproducibility-binder) - **Build a binderhub** [materials](https://github.com/alan-turing-institute/the-turing-way/tree/master/workshops/build-a-binderhub) ] ### <https://github.com/alan-turing-institute/the-turing-way> --- ## Reviewing resources ### Inspiration from paper reviews: > #### _How to Read a Research Compendium_ [arXiv:1806.09525v1](https://arxiv.org/pdf/1806.09525.pdf) [cs.GL] ### Inspiration from package reviews: > #### [rOpenSci Packages: Development, Maintenance, and Peer Review](https://devguide.ropensci.org/) > ### [DataONE Reproducible Research Compendia Onboarding](https://github.com/benmarwick/onboarding-reproducible-compendia) > - [`reviewer_template.md`](https://github.com/benmarwick/onboarding-reproducible-compendia/blob/master/reviewer_template.md) --- # Reprohack Resources ### [reprohack-hq](https://github.com/reprohack/reprohack-hq) repository #### Check out our [issues](https://github.com/reprohack/reprohack-hq/issues) #### Chat to us: [![Gitter](https://badges.gitter.im/reprohack/community.svg)](https://gitter.im/reprohack/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge) #### Sign up to our Newsletter <form style="border:1px solid #ccc;padding:3px;text-align:center;" action="https://tinyletter.com/reprohack-hq" method="post" target="popupwindow" onsubmit="window.open('https://tinyletter.com/reprohack-hq', 'popupwindow', 'scrollbars=yes,width=800,height=600');return true"><p><label for="tlemail">Enter your email address</label></p><p><input type="text" style="width:140px" name="email" id="tlemail" /></p><input type="hidden" value="1" name="embed"/><input type="submit" value="Subscribe" /><p><a href="https://tinyletter.com" target="_blank">powered by TinyLetter</a></p></form> --- ## ReproHack outputs: [ReScience C](http://rescience.github.io/) .pull-left[ ### Replication Report - Repeat a published protocol - Respect its spirit and intentions - Varying the technical details, e.g. using different software, initial conditions, etc. π **Change something that everyone believes shouldnβt matter, and see if the scientific conclusions are affected** ] .pull-right[ ![](assets/rescience_article.png) ] --- ## ReproHack outputs: [ReScience C](http://rescience.github.io/) .pull-left[ ### Replication Report - Repeat a published protocol - Respect its spirit and intentions - Varying the technical details, e.g. using different software, initial conditions, etc. π **Change something that everyone believes shouldnβt matter, and see if the scientific conclusions are affected** ] .pull-right[ ### _Coming soon:_ Reproducibility Report - Accessibility of materials, - Installation and dependency management, - Documentation clarity, - Ease of reuse, - What else? 