Course description

In order to ensure robustness of outputs and maximise the benefits of ACCE research to future researchers and society more generally, it is important to share the underlying code and data. But for sharing to have any impact, such materials need to be created FAIR (findable, accessible, interoperable, reusable), i.e. they must be adequately described, archived, and made discoverable to an appropriate standard.

Additionally, if analyses are to be deemed robust, they must be at the very least reproducible, but ideally well documented and reviewable.

R and Rstudio tools and conventions offer a powerful framework for making modern, open, reproducible and collaborative computational workflows more accessible to researchers.

This course focuses on data and project management through R and Rstudio, will introduce students to best practice and equip them with modern tools and techniques for managing data and computational workflows to their full potential. The course is designed to be relevant to students with a wide range of backgrounds, working with anything from relatively small sets of data collected from field or experimental observations, to those taking a more computational approach and bigger datasets.

Learning Outcomes

By the end of the workshop, participants will be able to:

Understand the basics of good research data management and be able to produce clean datasets with appropriate metadata.
Manage computational projects for reproducibility, reuse and collaboration.
Use version control to track the evolution of research projects.
Use R tools and conventions to document code and analyses and produce reproducible reports.
Be able to publish, share materials and collaborate through the web.
Understand why this all matters!

Course Outline

Day 1: 30th April

10:00 - 16:30

OPTIONAL

Welcome

Basics

Intro to R & Rstudio
R basics
Data types, structures & classes
Indexing and Subsetting
The tidyverse way

Day 2: 4th May

09:00 - 17:00

Project Management

Data management basics
Projects in Rstudio
Good File Naming
Paths and projects structure

Data Munging

Iteration
Merging data
Functions

Day 3: 5th May

09:00 - 17:00

Metadata

Intro to metadata
Creating metadata with dataspice

Analysing & Presenting data

Plotting basics
Literate programming

Day 4: 7th May

09:00 - 17:00

Version Control

Version control with Git
Collaboration through GitHub

Packaging Code

Writing & documenting functions
Capturing metadata incl. dependencies
Checking & Testing functions

Putting it all together: Research Compendia

Creating a research compendium

This work is licensed under a Creative Commons Attribution 4.0 International License.

Sources of Materials

The first few chapters of the Basics section were heavily sourced and adapted from “Software Carpentry: R for Reproducible Scientific Analysis.” Thomas Wright and Naupaka Zimmerman (eds): Version 2016.06, June 2016 https://github.com/swcarpentry/r-novice-gapminder, .

Licensed under CC-BY 4.0 2018–2020 by The Carpentries.

The Good File Naming chapter was heavily sourced from “File organization for reproducible research.” Data Carpentry Reproducible Research Committee. 2016.

Licensed under CC-BY 4.0 2018–2020 by The Carpentries.

Small sections in the Data Munging section where inspired by text in the online version of “R 4 Data Science”, Garrett Grolemund & Hadley Wickham.

Licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License.

Images contained throughout the materials and watermarked with Scriberia were sourced from “Illustrations from the Turing Way book dashes”, . Images were created by Scriberia for The Turing Way community.

Licensed under CC-BY 4.0 by The Turing Way.

Data for the the main practical parts of the course were sourced from the NEON Data Portal, provided by the National Ecological Observatory Network. 2019 Provisional data downloaded from http://data.neonscience.org on 2019-08-06. Battelle, Boulder, CO, USA.

Data Products: NEON.DOM.SITE.DP1.10098.001
Name: Woody plant vegetation structure
Description: Structure measurements, including height, canopy diameter, and stem diameter, as well as mapped position of individual woody plants
Query information:
- Start Date-Time for Queried Data: 2018-08-15 16:00 (UTC)
- End Date-Time for Queried Data: 2018-08-29 16:00 (UTC)
- Domains: D01:D9
LICENSE
Disclaimer

 THE NEON DATA PRODUCTS ARE PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE NEON DATA PRODUCTS BE LIABLE FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE NEON DATA PRODUCTS.

Materials for the Research Compendium section were sourced from Carl Boettiger. (2018, April 17). cboettig/noise-phenomena: Supplement to: “From noise to knowledge: how randomness generates novel phenomena and reveals information” , accompanying the publication: Carl Boettiger . From noise to knowledge: how randomness generates novel phenomena and reveals information. Published in Ecology Letters, 22 May 2018 https://doi.org/10.1111/ele.13085.

Reproducible Research Data and Project Management in R

Reproducible Research Data and Project Management in R

Course description

Learning Outcomes

Course Outline

Day 1: 30th April

10:00 - 16:30

OPTIONAL

Welcome

Day 2: 4th May

09:00 - 17:00

Day 3: 5th May

09:00 - 17:00

Day 4: 7th May

09:00 - 17:00

Sources of Materials