Welcome

Course Outline

Today

  • Introduction
  • Basic Data Hygiene
  • Tidy data & Metadata
  • Project Organisation

Tomorrow

  • Literate Programming with rmarkdown
  • Version Control with Git
  • Collaborating through GitHub
  • Bringing it all together

Why are we here?

The paper is the advertisement

“an article about computational result is advertising, not scholarship. The actual scholarship is the full software environment, code and data, that produced the result.

John Claerbout paraphrased in Buckheit and Donoho (1995)

The Scientific Paper Is Obsolete

Here’s what’s next

APR 5, 2018, The Atlantic

Reproducible Research in Computational Science

ROGER D. PENG, SCIENCE 02 DEC 2011 : 1226-1227

Reproducibility has the potential to serve as a minimum standard for judging scientific claims when full independent replication of a study is not possible.


Reinventing Discovery -> Open Sourcing science

  • Sharing resources
  • Collective intelligence
  • Mass collaboration


Open Science == key to next generation science

the internet

built for open science



The grand vision

Hans Rosling on open data (and data science) back in 2006

How do we get there?


21st Century Research meta-responsibilities

Better digital curation of the workhorses of modern science: code & data

  • accessible
  • reusable
  • searchable

    We all need to do our bit


Getting a handle on our research materials




Drivers of better digital management

  • Funders: value for money, impact, reputation
  • Publishers: many now require code and data.
  • Your wider scientific community
  • PIs, Supervisors and immediate research group

Yourselves!

Be your own best friend:

aim to create secure materials that are FAIR findable, accessible, interoperable, reusable

  • Think about traceablility and provencance
  • Follow community conventions
  • Prepare it to share it

Back to “Why are we here?”

  • To help you make the most of the real workhorses of your work, YOUR CODE & DATA!

  • To help you be empowered by modern tools & technologies rather than be overwhelmed by them

  • To help you lead the culture change rather than be burdened by increased requirements

  • Ultimately, to change how science works for better for everyone!


Resources


BES guide to data management


This guide for early career researchers explains what data and data management are, and provides advice and examples of best practices in data management, including case studies from researchers currently working in ecology and evolution.


BES guide to reproducible code

A Guide to Reproducible Code covers all the basic tools and information you will need to start making your code more reproducible. We focus on R and Python, but many of the tips apply to any programming language.

Training materials

The Carpentries

  • Domain specific lessons in Software & Data available free online
    • Ecology materials
    • Genomics materials
    • Geospatial data materials
    • Biology semester long materials

Before we dive in

  • I’ve tried to focus on concepts and tools that I wish I knew when I started

  • I will try and give a broad overview rather that dig too deeply

  • PLEASE STOP ME if I use jargon you don’t understand or need some clarification.