Reproducible Research Data & Project Management in R
In order to ensure robustness of outputs and maximise the benefits of ACCE research to future researchers and society more generally, it is important to share the underlying code and data. But for sharing to have any impact, such materials need to be created FAIR (findable, accessible, interoperable, reusable), i.e. they must be adequately described, archived, and made discoverable to an appropriate standard.
Additionally, if analyses are to be deemed robust, they must be at the very least reproducible, but ideally well documented and reviewable.
R and Rstudio tools and conventions offer a powerful framework for making modern, open, reproducible and collaborative computational workflows more accessible to researchers.
This course focuses on data and project management through R and Rstudio, will introduce students to best practice and equip them with modern tools and techniques for managing data and computational workflows to their full potential. The course is designed to be relevant to students with a wide range of backgrounds, working with anything from relatively small sets of data collected from field or experimental observations, to those taking a more computational approach and bigger datasets.
By the end of the workshop, participants will be able to:
Understand the basics of good research data management and be able to produce clean datasets with appropriate metadata.
Manage computational projects for reproducibility, reuse and collaboration.
Use version control to track the evolution of research projects.
Use R tools and conventions to document code and analyses and produce reproducible reports.
Be able to publish, share materials and collaborate through the web.
Understand why this all matters!
- Intro to R & Rstudio
- R basics
- Data types, structures & classes
- Indexing and Subsetting
- The tidyverse way
- Data management basics
- Projects in Rstudio
- Good File Naming
- Paths and projects structure
- Merging data
- Intro to metadata
- Creating metadata with
Analysing & Presenting data
- Plotting basics
- Literate programming
- Version control with Git
- Collaboration through GitHub
- Writing & documenting functions
- Capturing metadata incl. dependencies
- Checking & Testing functions
Putting it all together: a Research Compendium
- Creating a research compendium
This work is licensed under a Creative Commons Attribution 4.0 International License.
Sources of Materials
The first few chapters of the Basics section were heavily sourced and adapted from “Software Carpentry: R for Reproducible Scientific Analysis.” Thomas Wright and Naupaka Zimmerman (eds): Version 2016.06, June 2016 https://github.com/swcarpentry/r-novice-gapminder, .
The Good File Naming chapter was heavily sourced from “File organization for reproducible research.” Data Carpentry Reproducible Research Committee. 2016.
Small sections in the Data Munging section where inspired by text in the online version of “R 4 Data Science”, Garrett Grolemund & Hadley Wickham.
- Licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License.
Images contained throughout the materials and watermarked with Scriberia were sourced from “Illustrations from the Turing Way book dashes”, . Images were created by Scriberia for The Turing Way community.
Data for the the main practical parts of the course were sourced from the NEON Data Portal, provided by the National Ecological Observatory Network. 2019 Provisional data downloaded from http://data.neonscience.org on 2019-08-06. Battelle, Boulder, CO, USA.
- Data Products: NEON.DOM.SITE.DP1.10098.001
- Name: Woody plant vegetation structure
- Description: Structure measurements, including height, canopy diameter, and stem diameter, as well as mapped position of individual woody plants
- Query information:
- Start Date-Time for Queried Data: 2018-08-15 16:00 (UTC)
- End Date-Time for Queried Data: 2018-08-29 16:00 (UTC)
- Domains: D01:D9
THE NEON DATA PRODUCTS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE NEON DATA PRODUCTS BE LIABLE FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE NEON DATA PRODUCTS.
Materials for the Research Compendium section were sourced from Carl Boettiger. (2018, April 17). cboettig/noise-phenomena: Supplement to: “From noise to knowledge: how randomness generates novel phenomena and reveals information” , accompanying the publication: Carl Boettiger . From noise to knowledge: how randomness generates novel phenomena and reveals information. Published in Ecology Letters, 22 May 2018 https://doi.org/10.1111/ele.13085.