@annakrystalli] --- class: inverse, center, middle # Background --- background-image: url(assets/boats.png) background-size: cover ## .bg-white[Marine Biology] --- background-image: url(assets/maps.png) background-size: cover .box[ ## .bg-white[Marine Biology] ] --- background-image: url( background-size: cover class: inverse ## .bg-white[ Quality Assurance ] .bg-white[> #### _QA Auditor for a Contract Research Organisation subject to GLP regulation_ ] --- background-image: url(assets/ruchindra-gunasekara-GK8x_XCcDZg-unsplash.jpg) background-size: cover ## .bg-white[Ultrasport] .bg-white[ > #### _Brand coordinator for an extreme sports equipment distributor_ ] --- class: inverse, center, middle # Back to science: --- .pull-left[ ### Ooops, that's embarassing! <br>  ] .pull-right[  ] --- # The paper is the advertisement > โan article about a computational result is advertising, not scholarship. The **actual scholarship is the full software environment, code and data, that produced the result.**โ *John Claerbout paraphrased in [Buckheit and Donoho (1995)](* -- ### Why is our whole system geared towards **reviewing, publishing, distributing, archiving** the advertisement? --- ## Progress: calls for reproducibility as minimum standard > #### **Reproducibility** has the potential to serve as a **minimum standard for judging scientific claims** when full independent replication of a study is not possible. <br> .center[ <img src="assets/repro-spectrum.jpg" width="90%" /> ] .img-attr[Reproducible Research in Computational Science _ROGER D. PENG, SCIENCE 02 DEC 2011 : 1226-1227_ ] <br> --- class: inverse, center, middle .box[ ## Benefit #1 ] ## transparency as a means of verification --- background-image: url("") background-size: cover .bg-white[ ### There is a hidden superpower... ] --- background-image: url("assets/repository-fork.png") -- .pull-left[ <br> <br> <br> <br> <br> <br> ## **Woah, It's evolution...** ] .pull-right[ <br> <br> <br> <br> <br> <br> <img src="" width="90%" /> ] --- > #### [**_Macroecological and macroevolutionary patterns emerge in the universe of GNU/Linux operating systems_**]( .pull-left[ > doi:10.1111/ecog.03424 <img src="assets/ecography_gnu.png" width="90%" /> ] -- .pull-right[ <img src="assets/ecography_gnu_fig4.jpg" width="90%" /> ] --- background-image: url("assets/hanslibdata.png") background-size: cover ## Example: ** 2006** ### [liberating stories from data]( --- ## Gapminder today ```r library(ggplot2) p <- ggplot(gapminder::gapminder, aes(gdpPercap, lifeExp, size = pop, color = continent, frame = year)) + geom_point() + scale_x_log10() + theme_bw() ``` ```r plotly::ggplotly(p) ```
--- class: inverse, center, middle ## Benefit #2 ## transparency as a means of supercharging research cycle --- class: center, middle # So how are we doing? <img src="assets/repro-spectrum.jpg" width="90%" /> .img-attr[Reproducible Research in Computational Science _ROGER D. PENG, SCIENCE 02 DEC 2011 : 1226-1227_ ] --- background-image: url(assets/annie-spratt-fallen-tree.jpg) background-size: cover class: center, middle -- .bg-white[ # If a paper claims to be reproducible but nobody checks it, is it really reproducible? ] --- class: center <img src="assets/practice-sharon-mccutcheon-unsplash.jpg" width="60%" /> # Practice *** -- ### The less you do, the more you s**k --- # Reprohack #### One day reproducibility hackathons *** -- - ### How reproducible are papers? -- - ### How can we practice reproducibility? --- # ReproHack History #### OpenCon Satellite: Berlin, 2016 #### OpenCon Satellite: London, 2017 -- <br> Inspired by Owen Petchey's [Reproducible Research in Ecology, Evolution, Behaviour, and Environmental Studies]( course, - Reproduce published results from raw data - Over a few months and a number of sessions -- ### **ReproHack mission: Reproduce paper in a day from code and data** --- ### Software Sustainability Institute Fellowship 2019 .pull-left[  ] -- .pull-right[ <img src="assets/me-reprohack.png" width="90%" /> <img src="" width="90%" /> ] --- ### ReproHackNL! - _Leiden_ .pull-left[ <blockquote class="twitter-tweet"><p lang="en" dir="ltr">This thing is really happening! <a href=""></a></p>— ReproHack โป๏ธ (@ReproHack) <a href="">September 20, 2019</a></blockquote> <script async src="" charset="utf-8"></script> ] .pull-right[ .middle[ <img src="assets/Hackathon.jpg" width="100%" /> ] ] --- ## Reprohack Core Team formation <img src="assets/reprohack_core_team.png" width="100%" /> --- ## N8 CIR ReproHack Series! .pull-left[ **<>** <img src="assets/reprohack-webad_N8-01.png" width="70%" /> ] -- .pull-right[ <blockquote class="twitter-tweet"><p lang="en" dir="ltr">โผ๏ธ๐จ Manchester <a href="">@N8CIR</a> <a href="">@ReproHack</a> CANCELLED โผ๏ธ๐จ <br><br>Due to the changing situation with <a href="">#COVID19</a> we've decided that despite the low risk, it would be irresponsible to expose participants unnecessarily to it.<br><br>We'll either reschedule or explore a remote option. ๐</p>— annakrystalli (@annakrystalli) <a href="">March 12, 2020</a></blockquote> <script async src="" charset="utf-8"></script> ] --- <img src="assets/remote_reprohack_flyer.png" width="75%" /> #### Much of the team made it! #### People from a far were able to join: including Japan, Argentina, Netherlands, Sweden and the USA! --- class: inverse, middle, center # How does it work? --- ## Call for papers .pull-left[ <blockquote class="twitter-tweet"><p lang="en" dir="ltr">โจDo you champion <a href="">#reproducible</a> <a href="">#research</a>? <br>โจDo you have a reproducible paper with open code and data?<br><br>The <a href="">@SoftwareSaved</a> <a href="">#ReproHack</a> series needs you! ๐<br><br>Help others learn & engage with your work by submitting it to our 1-day Reproducibility hackathons!<a href=""></a></p>— annakrystalli (@annakrystalli) <a href="">June 12, 2019</a></blockquote> <script async src="" charset="utf-8"></script> ] -- .pull-right.middle[ <img src="assets/rh-paper_list.png" width="100%" /> ] --- background-image: url("assets/on_the_day-bg.jpg") background-size: cover # On the day - ### Select paper and form groups - ### Work with materials and reproduce - ### Discuss - ### Feed back to authors --- class: inverse, center, middle # Tips for Reproducing & Reviewing <img src="assets/Hackathon.jpg" width="70%" /> --- ## Selecting Papers .pull-left[ - **Information submitted by authors:** - Languages / tools used - Why you should attempt the paper. - **No. attempts
** No. times reproduction has been attempted - **Mean Repro Score
** Mean reproducibility score (out of 10) - lower == harder! ] .pull-right[ <img src="assets/ReprohackPickPapers.jpg" width="90%" /> ] --- # Review as an auditor ๐ .center[ <img src="assets/FAIRPrinciples.jpg" width="80%" /> ] --- .pull-left[ # Access - How **easy** was it to **gain** access to the materials? - Did you manage to download all the files you needed? ] .pull-right[ # Installation - How **easy / automated** was **installation**? - Did you have any problems? - How did you solve them? ] --- --- .pull-left[ # Data - Were **data clearly separated from code and other items**? - Were **large data files deposited in a trustworthy data repository** and referred to using a **persistent identifier**? - Were **data documented** ...somehow... ] .pull-right[ # Documentation Was there **adequate documentation** describing: - how to **install** necessary software including non-standard dependencies? - how to **use** materials to reproduce the paper? - how to **cite** the materials, ideally in a form that can be copy and pasted? ] --- .pull-left[ # Analysis - **Were you able to fully reproduce** the paper? โ - **How automated** was the process of reproducing the paper? - **How easy was it to link** analysis **code** to: - the **plots** it generates - **sections in the manuscript** in which it is described and results reported ] -- .pull-right[ <br> ### If the analysis was not fully reproducible ๐ซ - Were there **missing dependencies?** - Was the **computational environment not adequately described** / captured? - Was there **bugs** in the code? - Did **code run but results (e.g. model outputs, tables, figures) differ** to those published? By **how much?** ] --- class: center # Review as a user ๐ฎ .pull-left[ ### New User <img src="" width="80%" /> ] -- .pull-right[ ### Invested User <img src="" width="80%" /> ] --- ## Feedback as a community member .pull-left[ #### Acknowledge author effort #### Give feedback in good faith #### Focus on community benefits and system level solutions <img src="assets/1728_TURI_Book sprint_11 community_040619.jpg" width="100%" /> ] -- .pull-right[ > #### _Help build convention on what form a Reproducible paper should take and how we should be able to use it_ ] --- class: inverse, center, middle # What did we learn? --- ## N8 CIR ReproHack Series Stats - ### 38 papers submitted so far - ### Total of ~ 70 participants - ### 39 completed reviews over 27 papers --- ## Review Scores <img src="user2020_files/figure-html/unnamed-chunk-23-1.svg" width="90%" /> --- ## Positives vs challenges <img src="user2020_files/figure-html/unnamed-chunk-24-1.svg" width="90%" /> --- ## Trade-offs .pull-left[ <img src="user2020_files/figure-html/unnamed-chunk-25-1.svg" width="90%" /> ] .pull-right[ <img src="user2020_files/figure-html/unnamed-chunk-26-1.svg" width="90%" /> ] --- ## ReproHacks are fun <img src="assets/n8-reprohack_collage.gif" width="90%" /> --- .pull-left[ ### Opportunity for peer skill sharing - CCMcr: Contributing to open source - Leiden: Synching GitHub repositories with Zenodo - Remote Reprohack: Docker school ] -- .pull-right[ ## Fit for purpose <blockquote class="twitter-tweet" data-conversation="none"><p lang="en" dir="ltr">On the way home, <a href="">@df3n5</a> said quite rightly, if all [code-producing/data-analysing] researchers would take part in at least one <a href="">@ReproHack</a>, the code reproducibility and quality of documentation would generally soar!</p>— Durham University Advanced Research Computing (@ARC_DU) <a href="">January 22, 2020</a></blockquote> <script async src="" charset="utf-8"></script> ] --- class: inverse, center, middle # The Way Forward --- .pull-left[ ## Define ## Create ## Review ] -- .pull-right[ # Practice ] --- class: inverse, center, middle # Define: Research Compendium --- ### The concept of a Research Compendium >โ ...We introduce the **concept of a compendium** as both a **container for the different elements** that make up the document and its computations (i.e. text, code, data, ...), and as a **means for distributing, managing and updating the collection**." [_Gentleman and Temple Lang, 2004_]( -- #### Principles - Stick with peer conventions - Keep data, methods and outputs separate - Specify the computations environment as clearly as possible --- ### Research compendia in R **Ben Marwick, Carl Boettiger & Lincoln Mullen (2018)** [_Packaging Data Analytical Work Reproducibly Using R (and Friends)_]( > R package structure is an excellent way for sharing research compendia. -- ### Convention โก๏ธ ### Automation, templates and checklists โจ --- class: inverse, middle, center # Create: `rrtools` --- # `rrtools`: Creating Compendia in R ### "The goal of rrtools is to provide **instructions, templates, and functions** for making a **basic compendium** suitable for writing **reproducible research with R**." #### Install [`rrtools`]( from GitHub ```r # install.packages("devtools") devtools::install_github("benmarwick/rrtools") ``` --- # Create compendium ```r rrtools::create_compendium("~/Documents/workflows/rrcompendium") ``` ``` โ Setting active project to '/Users/Anna/Documents/workflows/rrcompendium' โ Creating 'R/' โ Creating 'man/' โ Writing 'DESCRIPTION' โ Writing 'NAMESPACE' โ Writing 'rrcompendium.Rproj' โ Adding '.Rproj.user' to '.gitignore' โ Adding '^rrcompendium\\.Rproj$', '^\\.Rproj\\.user$' to '.Rbuildignore' โ Opening new project 'rrcompendium' in RStudio โ The package rrcompendium has been created โ Opening the new compendium in a new RStudio session... Next, you need to: โ โ โ โ Edit the DESCRIPTION file โ Use other 'rrtools' functions to add components to the compendium ``` --- ## `DESCRIPTION` file #### Package metadata ```yaml Package: rrcompendiumDTB Title: Partial Reproduction of Boettiger Ecology Letters 2018;21:1255โ1267 with rrtools Version: Authors@R: person(given = "Anna", family = "Krystalli", role = c("aut", "cre"), email = "") Description: This repository contains the research compendium of the partial reproduction of Boettiger Ecology Letters 2018;21:1255โ1267. The compendium contains all data, code, and text associated with this sub-section of the analysis. ``` --- # Prepare for sharing ```r rrtools::use_readme_rmd() ``` .pull-left[ ``` โ Creating 'README.Rmd' from template. โ Adding 'README.Rmd' to `.Rbuildignore`. โ Modify 'README.Rmd' โ Rendering README.Rmd to for GitHub. โ Adding code of conduct. โ Creating '' from template. โ Adding '' to `.Rbuildignore`. โ Adding instructions to contributors. โ Creating '' from template. โ Adding '' to `.Rbuildignore`. ``` ] .pull-right[  ] --- # Create analysis folder ```r rrtools::use_analysis() ``` ``` โ Adding bookdown to Imports โ Creating 'analysis' directory and contents โ Creating 'analysis' โ Creating 'analysis/paper' โ Creating 'analysis/figures' โ Creating 'analysis/templates' โ Creating 'analysis/data' โ Creating 'analysis/data/raw_data' โ Creating 'analysis/data/derived_data' โ Creating 'references.bib' from template. โ Creating 'paper.Rmd' from template. Next, you need to: โ โ โ โ โ Write your article/report/thesis, start at the paper.Rmd file โ Add the citation style library file (csl) to replace the default provided here, see โ Add bibliographic details of cited items to the 'references.bib' file โ For adding captions & cross-referencing in an Rmd, see โ For adding citations & reference lists in an Rmd, see ``` --- # Capturing dependencies ```r rrtools::add_dependencies_to_description() ``` ``` Imports: bookdown, ggplot2 (>= 3.0.0), ggthemes (>= 3.5.0), here (>= 0.1), knitr (>= 1.20), rticles (>= 0.6) ``` -- Paper using `renv` / `packrat` & docker: <> --- # Further Helpers ## ๐ฆ `rticles` Contains a **suite of custom R Markdown templates for popular journals**, simplifying the creation of documents that conform to research paper submission standards. --- # ๐ฆ `citr` RStudio Add-in to **Insert Markdown Citations** <img src="assets/citr-insert.png" width="700px"> --- class: inverse, center, middle ## Is sharing this enough? --- ### Case Study: Sharing a Geospatial Analysis in R *** #### On a computer without System Library `GDAL` โ .pull-left[ ```r package โrgdalโ successfully unpacked and MD5 sums checked configure: gdal-config: gdal-config checking gdal-config usability... ./configure: line 1353: gdal-config: command not found no *Error: gdal-config not found ... *ERROR: configuration failed for package โrgdalโ ``` ] .pull-right[ <br> <br>  .img-attr[slide: [_Karthik Ram: rstudio::conf 2019 talk_](] ] --- # What are Docker containers? > #### standardized units of software that **package up everything needed to run an application:** _code, runtime, system tools, system libraries_ and settings in a lightweight, standalone, executable package -- - #### **Dockerfile**: Text file containing recipe for setting up computation environment. - #### **Docker Image**: Executable **built** from the **Dockerfile** with all required dependencies installed. Can have many images from the same `Dockerfile`. - #### **Docker Container**: **Docker Images** become containers at **runtime** .center[ <img src="assets/docker_workflow.png" width="55%" /> ] --- # Rocker on DockerHub #### using the `rocker/geospatial` Docker Image โ *** .pull-left[  ] .pull-right[ <br> <br> <img src="assets/reproducible-data-analysis_042.png" width="90%" /> .img-attr[slide: [_Karthik Ram: rstudio::conf 2019 talk_](] ] --- # Create Dockerfile w/ `rrtools` ```r rrtools::use_dockerfile() ``` ```r โ Creating 'Dockerfile' from template. โ Adding 'Dockerfile' to `.Rbuildignore`. โ Modify Next: * Edit the dockerfile with your name & email * Edit the dockerfile to include system dependencies, such as linux libraries that are needed by the R packages you're using * Check the last line of the dockerfile to specify which Rmd should be rendered in the Docker container, edit if necessary ``` --- # `Dockerfile` ```bash # get the base image, the rocker/verse has R, RStudio and pandoc FROM rocker/verse:3.6.0 # required *MAINTAINER Anna Krystalli <> COPY . /rrcompendiumDTB # go into the repo directory RUN . /etc/environment \ # Install linux depedendencies here # e.g. need this for ggforce::geom_sina && sudo apt-get update \ && sudo apt-get install libudunits2-dev -y \ # build this compendium package && R -e "devtools::install('/rrcompendiumDTB', dep=TRUE)" \ # render the manuscript into a docx, you'll need to edit this if you've # customised the location and name of your main Rmd file * && R -e "rmarkdown::render('/rrcompendiumDTB/analysis/paper/paper.Rmd')" ``` --- # Docker + Travis ## Create `.travis.yml` ```r rrtools::use_travis() ``` ```r โ Creating '.travis.yml' from template. โ Adding '.travis.yml' to `.Rbuildignore`. Next: * Add a travis shield to your README.Rmd: []( * Turn on travis for your repo at ** To connect Docker, go to, and add your environment *variables: DOCKER_EMAIL, DOCKER_USER, DOCKER_PASS to enable pushing to the *Docker Hub ``` --- # `.travis.yml` ```bash env: global: - REPO=$DOCKER_USER/rrcompendiumdtb sudo: required warnings_are_errors: false language: generic services: - docker before_install: * - docker build -t $REPO . ``` Create & build image using dockerfile, i.e. compile pkg and render Rmd to Word doc --- # `.travis.yml` Push our custom docker image to docker hub, env vars stored on ```bash after_success: * - docker login -u $DOCKER_USER -p $DOCKER_PASS - export REPO=$DOCKER_USER/rrcompendiumdtb - export TAG=`if [ "$TRAVIS_BRANCH" == "master" ]; then echo "latest"; else echo $TRAVIS_BRANCH ; fi` - docker build -f Dockerfile -t $REPO:$COMMIT . - docker tag $REPO:$COMMIT $REPO:$TAG - docker tag $REPO:$COMMIT $REPO:travis-$TRAVIS_BUILD_NUMBER * - docker push $REPO ``` #### Travis repository settings <img src="assets/travis_docker_settings.png" width="60%" /> --- # Travis build passes!  .center[ []( ] --- # Image on Dockerhub <img src="assets/rrcompendiumDTB_dockerhub.png" width="90%" /> .center[ ##### Docker Image: <> ##### Compendium Repository: <> ] --- class: inverse, middle, center # On reproducible lab culture --- # Documentation ### The heart of communities of practice -- ## Turing Way <> - Great source of general best practice. -- ### Needs to be translated to on-the-ground lab practice guidelines. --- ## Templates, checklists and automation in the lab ### Define and document lab level procedures & conventions. - Clear and complete on-boarding. - Guidance on creating and managing digital research outputs - Clear off-boarding procedures including archiving of generated materials. -- ### Basics can be templated and provided in customisable formats --- # Checklib <img src="assets/checklib.png" width="90%" /> --- class: inverse, center, middle # On the future of Reviewing --- background-image: url(assets/imagine_review.png) background-size: contain class: middle, center -- <img src="assets/ropensci_icon_lettering_color.png" width="65%" /> --- ## On the scope of reproducibility .pull-left[ - #### Reproducibility _ad infinitum_ + โ **UNREALISTIC** ] --- ## On the scope of reproducibility .pull-left[ - #### Reproducibility _ad infinitum_ + โ **UNREALISTIC** - #### Reproducibility for 2-3 years post-publication + โ **MORE REALISTIC** + Checked as part of publication process, e.g. CODE CHECK <> []( ] .pull-right[ <embed src="assets/codecheck.pdf" width="90%" height="500" type="application/pdf" /> ] --- ## On the scope of reusability ### Openness can help: - surface useful parts of code. - facilitate user feedback and contribution -- ### MAINTENANCE?! --- class: inverse, middle, center the meantime ### take any opportunity to practice! --- background-image: url(assets/reprohack_many_ways.png) background-size: cover --- background-image: url(assets/reprohack_hub.png) background-size: cover -- .center[ <img src="assets/n8cir-logo-v1-cropped-224x63.png" width="30%" /> <img src="assets/reprohack_hub_participate.png" width="90%" /> ] --- ## Interested in ReproHacking? ### [reprohack/reprohack-hq]( GH repository #### Chat to us: []( ### Host your own event! ### Submit your own papers! --- ## td;lr -- - ### Challenges remain to moving from theory to practice -- - ### We need to clearly define our expectations of a research compendium -- - ### This will allow to develop tools and templates -- - ### ReproHacks provide great opportunities to practice --- class: inverse, center, middle ## ๐ Thanks for ๐ <br> # โ --- ## Resources - [**The Turing Way**]( a lightly opinionated guide to reproducible data science. - [**Statistical Analyses and Reproducible Research**]( Gentleman and Temple Lang's introduction of the concept of Research Compendia - [**Packaging data analytical work reproducibly using R (and friends)**]( how researchers can improve the reproducibility of their work using research compendia based on R packages and related tools - [How to Read a Research Compendium]( Introduction to existing conventions for research compendia and suggestions on how to utilise their shared properties in a structured reading process. - [Reproducible Research in R with rrtools]( Workshop: Create a research compendium around materials associated with a published paper (text, data and code) using `rrtools`. - [**Example Compendium**]( Demo Research compendium. --- # Acknowledgements Images throughout the slides watermarked with **Scriberia** were created by [Scriberia]( for The Turing Way community and is used under a CC-BY licence - _The Turing Way Community, & Scriberia. 