class: inverse # Welcome *** ## Resources ### Course Materials #### [http://bit.ly/ACCE-book](http://bit.ly/ACCE-book) ### Course Collaborative Notepad #### [http://bit.ly/acce21-notepad](http://bit.ly/acce21-notepad) _Please **Sign In** on the Notepad_ --- class: top, left, inverse ## ACCE DTP ### _Reproducible Research Data and Project Management in R_ *** .bottom[ # Introduction & Welcome <br> **<svg viewBox="0 0 448 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M400 64h-48V12c0-6.6-5.4-12-12-12h-40c-6.6 0-12 5.4-12 12v52H160V12c0-6.6-5.4-12-12-12h-40c-6.6 0-12 5.4-12 12v52H48C21.5 64 0 85.5 0 112v352c0 26.5 21.5 48 48 48h352c26.5 0 48-21.5 48-48V112c0-26.5-21.5-48-48-48zm-6 400H54c-3.3 0-6-2.7-6-6V160h352v298c0 3.3-2.7 6-6 6z"></path> </svg> April-May 2021** <br> **<svg viewBox="0 0 288 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M112 316.94v156.69l22.02 33.02c4.75 7.12 15.22 7.12 19.97 0L176 473.63V316.94c-10.39 1.92-21.06 3.06-32 3.06s-21.61-1.14-32-3.06zM144 0C64.47 0 0 64.47 0 144s64.47 144 144 144 144-64.47 144-144S223.53 0 144 0zm0 76c-37.5 0-68 30.5-68 68 0 6.62-5.38 12-12 12s-12-5.38-12-12c0-50.73 41.28-92 92-92 6.62 0 12 5.38 12 12s-5.38 12-12 12z"></path> </svg> Online** ] --- # π Hello ### me: Dr Anna Krystalli - **Research Software Engineer**, _University of Sheffield_ + twitter **@annakrystalli** + github **@annakrystalli** + email **a.krystalli[at]sheffield.ac.uk** - **Editor [rOpenSci](http://onboarding.ropensci.org/)** -- ## Course support Also here: **David Wilby** & **Bob Turner**, fellow Sheffield RSEs --- class: inverse # Ice Breaker <br> ### Split into break out rooms ### Introduce yourselves ### Q: Why did you decide to join this course? --- class: top, right, inverse # Why are we here? *** --- ### The paper is the advertisement > βan article about computational result is advertising, not scholarship. The actual scholarship is the **full software environment, code and data, that produced the result.**β *John Claerbout paraphrased in [Buckheit and Donoho (1995)](https://statweb.stanford.edu/~wavelab/Wavelab_850/wavelab.pdf)* -- ### [The Scientific Paper Is Obsolete](https://www.theatlantic.com/science/archive/2018/04/the-scientific-paper-is-obsolete/556676/) Here's what's next *<small>APR 5, 2018, The Atlantic</small>* <img src="assets/SciencePaperFlames-New.gif" height="100px" width="350px"> --- ### Lessons from the Reproducibility/Replicability crisis - Many issues statistical and a results of broken Academic incentive systems. - Much can be tackled by transparency and better computational literacy. <img src="assets/woes.png" width="450px"> --- ### [Reproducible Research in Computational Science](http://science.sciencemag.org/content/334/6060/1226) ROGER D. PENG, SCIENCE 02 DEC 2011 : 1226-1227 > Reproducibility has the potential to serve as a minimum standard for judging scientific claims when full independent replication of a study is not possible. <img src="assets/repro-spectrum.jpg" width=550px> --- ## Reinventing discovery by open sourcing science _Nielsen, Michael. Reinventing Discovery: The New Era of Networked Science. Princeton University Press, 2012. JSTOR, www.jstor.org/stable/j.ctt7s4vx._ .pull-left[ - Sharing resources - Collective intelligence - Mass collaboration ] .pull-right[ <img src="assets/reinventing-innovation.png" height="300px"> ] --- ## The internet was built for open science ### Key to next generation networked science <img src="assets/www.jpg" width="70%" /> --- class: top, right, inverse # **The grand vision** --- ### Hans Rosling on open data (and data science) back in 2006 .center[ <iframe width="470" height="250" src="https://goo.gl/ry6AiG" frameborder="0" allowfullscreen></iframe> ] > So how far have we come? --- class: inverse ## gapminder.org: today #### liberating stories from data ### www.gapminder.org --- ## gapmider at our fingertips ```r library(ggplot2) p <- ggplot(gapminder::gapminder, aes(gdpPercap, lifeExp, size = pop, color = continent, frame = year)) + geom_point() + scale_x_log10() + theme_bw() ``` ```r plotly::ggplotly(p) ```
--- class: top, right, inverse # How do we get there? *** --- ## **Research meta-responsibilities** We need better digital curation of the workhorses of modern science: **code** & **data** > **aim to create secure materials that are [FAIR](https://www.nature.com/articles/sdata201618)** > *findable, accessible, interoperable, reusable* <img src="assets/FAIRPrinciples.jpg" width="70%" /> --- ## **Research meta-responsibilities** *** .pull-left[ - #### Think about traceablility and provencance. - #### Follow community conventions. - #### Prepare it to share it. ] .pull-right.center[ ### We all need to do our bit! ![](assets/CultureShift.jpg)<!-- --> ] --- ## **Drivers of better digital management** - Funders: value for money, impact, reputation - Publishers: many now require code and data. + Specialist journals have emerged for: + **software**: [Journal of Open Source Software](http://joss.theoj.org/), [MEE](https://besjournals.onlinelibrary.wiley.com/journal/2041210x) + **data**: [Scientific Data](https://www.nature.com/sdata/)) - PIs, Supervisors and immediate research group - Your wider scientific community - The public --- ## **Yourselves!** **Be your own best friend:** .center[![](https://media.giphy.com/media/9Q249Qsl5cfLi/giphy.gif)] --- ### **Ultimately it's about getting a handle on our research materials** > "Agree on a community convention...then follow it"" .centre[ <img src="assets/img/beer_messy_tidy.png" width="70%" /> ] --- ## The concept of a Research Compendium .pull-left[ >β ...We introduce the concept of a compendium as both a container for the different elements that make up the document and its computations (i.e. text, code, data, ...), and as a means for distributing, managing and updating the collection." [_Gentleman and Temple Lang, 2004_](https://biostats.bepress.com/bioconductor/paper2/) ] .pull-right[ .centre[ <img src="assets/ResearchCompendium.jpg" width="50%" /> ] ] --- <img src="assets/reproducible-data-analysis-02.png" width="90%" /> [_Kartik Ram: rstudio::conf 2019 talk_](https://github.com/karthik/rstudio2019) --- <img src="assets/reproducible-data-analysis-04.png" width="90%" /> [_Kartik Ram: rstudio::conf 2019 talk_](https://github.com/karthik/rstudio2019) --- ## R + Rstudio ### Next generation data science powerhouse -- #### Backed by a diverse and active community of learners, users and developers <img src="https://www.rfordatasci.com/img/carousel/logo-big.png" height="150px"><img src="https://software-carpentry.org/files/2017/12/satrday-logo.png" width="150px" height="150px"><img src="https://rladies.org/wp-content/uploads/2016/12/R-LadiesGlobal.png" width="150px" height="150px"><img src="https://forwards.github.io/images/forwards.svg" width="150px" height="150px"><img src="https://github.com/ropensci/dev_guide/raw/master/images/icon_short_color.png" width="150px" height="150px"> --- ## Back to "Why are we here?" - To show you howto use R + Rstudio to perform reproducible data analyses. -- - To help you make the most of the real workhorses of your work, **YOUR CODE & DATA**! -- - To help you be empowered by modern tools & technologies rather than be overwhelmed by them -- - To help you lead the culture change rather than be burdened by increased requirements -- - Ultimately, to **change how science works for better for everyone**! --- - We'll do this by introducing you to **useful data and software tools and best practices**. --- # Course Outline .pull-left[ - ### Welcome - ### Basics - ### Project Management - ### Data Munging ] .pull-right[ - ### Metadata - ### Analysing & presenting data - ### Version Control ] *** <br> #### **We'll take regular breaks and aim to break for lunch between 12:00-13:00 for an hour** --- # Before we dive in - We'll exploring best practice in data and workflow management. I've tried to focus on concepts and tools that I wish I knew when I started -- - We'll explore individual tools and concepts and show how they work nicely together. -- - We'll be coding together and working in Rstudio Cloud. -- - Feedback: After each day, let me know on the notepad: - π: somethind you liked - π΄: somethind that could be improved -- - Please feel free to ask questions if I use jargon you don't understand or need some clarification. Questions are helpful for everyone! β¨ --- ## Working in Blackboard collaborate <img src="assets/bb_comms.png" width="60%" /> - Have your mic on mute by default -- - Please enter questions relevant to the course in the Collaborative notepad under **Participant Questions** -- - Please ask technical questions in the chat -- - Please try to help each other! -- - Use status reaction emojis to communicate how it's going -- - If you need to get my attention while speaking, raise your hand! --- # Let's go! ## Get back [home](https://annakrystalli.me/rrresearchACCE20/)