Course description

In order to ensure robustness of outputs and maximise the benefits of ACCE research to future researchers and society more generally, it is important to share the underlying code and data. But for sharing to have any impact, such materials need to be created FAIR (findable, accessible, interoperable, reusable), i.e. they must be adequately described, archived, and made discoverable to an appropriate standard.

Additionally, if analyses are to be deemed robust, they must be at the very least reproducible, but ideally well documented and reviewable.

R and Rstudio tools and conventions offer a powerful framework for making modern, open, reproducible and collaborative computational workflows more accessible to researchers.

This course focuses on data and project management through R and Rstudio, will introduce students to best practice and equip them with modern tools and techniques for managing data and computational workflows to their full potential. The course is designed to be relevant to students with a wide range of backgrounds, working with anything from relatively small sets of data collected from field or experimental observations, to those taking a more computational approach and bigger datasets.


Learning Outcomes

By the end of the workshop, participants will be able to:

  • Understand the basics of good research data management and be able to produce clean datasets with appropriate metadata.

  • Manage computational projects for reproducibility, reuse and collaboration.

  • Use version control to track the evolution of research projects.

  • Use R tools and conventions to document code and analyses and produce reproducible reports.

  • Be able to publish, share materials and collaborate through the web.

  • Understand why this all matters!


Course Outline

Welcome

  • Introduction

Basics

  • Intro to R & Rstudio
  • R basics
  • Data types, structures & classes
  • Indexing and Subsetting
  • The tidyverse way
  • Data management basics

Project Management

  • Projects in Rstudio
  • Good File Naming
  • Paths and projects structure

Data Munging

  • Iteration
  • Merging data
  • Functions

Metadata

  • Intro to metadata
  • Creating metadata with dataspice

Analysing & Presenting data

  • Plotting basics
  • Literate programming

Version Control

  • Version control with Git
  • Collaboration through GitHub

Optional

Packaging Code

  • Writing & documenting functions
  • Capturing metadata incl. dependencies
  • Checking & Testing functions

Putting it all together: a Research Compendium

  • Creating a research compendium

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.


Sources of Materials

The first few chapters of the Basics section were heavily sourced and adapted from “Software Carpentry: R for Reproducible Scientific Analysis.” Thomas Wright and Naupaka Zimmerman (eds): Version 2016.06, June 2016 https://github.com/swcarpentry/r-novice-gapminder, DOI.


The Good File Naming chapter was heavily sourced from “File organization for reproducible research.” Data Carpentry Reproducible Research Committee. 2016.


Small sections in the Data Munging section where inspired by text in the online version of “R 4 Data Science”, Garrett Grolemund & Hadley Wickham.


Images contained throughout the materials and watermarked with Scriberia were sourced from “Illustrations from the Turing Way book dashes”, DOI. Images were created by Scriberia for The Turing Way community.


Data for the the main practical parts of the course were sourced from the NEON Data Portal, provided by the National Ecological Observatory Network. 2019 Provisional data downloaded from http://data.neonscience.org on 2019-08-06. Battelle, Boulder, CO, USA.

  • Data Products: NEON.DOM.SITE.DP1.10098.001
  • Name: Woody plant vegetation structure
  • Description: Structure measurements, including height, canopy diameter, and stem diameter, as well as mapped position of individual woody plants
  • Query information:
    • Start Date-Time for Queried Data: 2018-08-15 16:00 (UTC)
    • End Date-Time for Queried Data: 2018-08-29 16:00 (UTC)
    • Domains: D01:D9
  • LICENSE
  • Disclaimer
 THE NEON DATA PRODUCTS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE NEON DATA PRODUCTS BE LIABLE FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE NEON DATA PRODUCTS.


Materials for the Research Compendium section were sourced from Carl Boettiger. (2018, April 17). cboettig/noise-phenomena: Supplement to: “From noise to knowledge: how randomness generates novel phenomena and reveals information” DOI, accompanying the publication: Carl Boettiger . From noise to knowledge: how randomness generates novel phenomena and reveals information. Published in Ecology Letters, 22 May 2018 https://doi.org/10.1111/ele.13085.