class: top, left, inverse ## ACCE DTP ### _Reproducible Research Data and Project Management in R_ *** .bottom[ # Metadata <br> **<svg viewBox="0 0 448 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns=""> <path d="M400 64h-48V12c0-6.6-5.4-12-12-12h-40c-6.6 0-12 5.4-12 12v52H160V12c0-6.6-5.4-12-12-12h-40c-6.6 0-12 5.4-12 12v52H48C21.5 64 0 85.5 0 112v352c0 26.5 21.5 48 48 48h352c26.5 0 48-21.5 48-48V112c0-26.5-21.5-48-48-48zm-6 400H54c-3.3 0-6-2.7-6-6V160h352v298c0 3.3-2.7 6-6 6z"></path> </svg> April-May 2021** <br> **<svg viewBox="0 0 288 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns=""> <path d="M112 316.94v156.69l22.02 33.02c4.75 7.12 15.22 7.12 19.97 0L176 473.63V316.94c-10.39 1.92-21.06 3.06-32 3.06s-21.61-1.14-32-3.06zM144 0C64.47 0 0 64.47 0 144s64.47 144 144 144 144-64.47 144-144S223.53 0 144 0zm0 76c-37.5 0-68 30.5-68 68 0 6.62-5.38 12-12 12s-12-5.38-12-12c0-50.73 41.28-92 92-92 6.62 0 12 5.38 12 12s-5.38 12-12 12z"></path> </svg> Online** ] --- ## You got data. Is it enough? <blockquote class="twitter-tweet" data-conversation="none" data-lang="en"><p lang="en" dir="ltr"><a href="">@tomjwebb</a> I see tons of spreadsheets that i don't understand anything (or the stduent), making it really hard to share.</p>— Erika Berenguer (@Erika_Berenguer) <a href="">January 16, 2015</a></blockquote> <script async src="//" charset="utf-8"></script> <blockquote class="twitter-tweet" data-conversation="none" data-lang="en"><p lang="en" dir="ltr"><a href="">@tomjwebb</a> <a href="">@ScientificData</a> "Document. Everything." Data without documentation has no value.</p>— Sven Kochmann (@indianalytics) <a href="">January 16, 2015</a></blockquote> <script async src="//" charset="utf-8"></script> --- <blockquote class="twitter-tweet" data-conversation="none" data-lang="en"><p lang="it" dir="ltr"><a href="">@tomjwebb</a> Annotate, annotate, annotate!</p>— CanJFishAquaticSci (@cjfas) <a href="">January 16, 2015</a></blockquote> <script async src="//" charset="utf-8"></script> <blockquote class="twitter-tweet" data-conversation="none" data-lang="en"><p lang="und" dir="ltr">Document all the metadata (including protocols).<a href="">@tomjwebb</a></p>— Ward Appeltans (@WrdAppltns) <a href="">January 16, 2015</a></blockquote> <script async src="//" charset="utf-8"></script> --- <blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">You download a zip file of <a href="">#OpenData</a>. Apart from your data file(s), what else should it contain?</p>— Leigh Dodds (@ldodds) <a href="">February 6, 2017</a></blockquote> <script async src="//" charset="utf-8"></script> --- ## **#otherpeoplesdata dream match!** #### **Thought experiment: Imagine a dream open data set** #### **How would you locate it?** - what details would you need to know to determine relevance? - what information would you need to know to use it? <img src="assets/img/missing-unicorn.jpg" height="300px"> --- class: top, right, inverse # metadata = data about data *** --- > ### _"Information that **describes, explains, locates**, or in some way makes it easier to **find, access**, and **use** a resource (in this case, data)."_" .pull-left[ <img src="" width="200px"> ] .pull-right[ ### **Data Reuse Checklist** <><img src="" width="300px"> ] > ### **Backbone of digital curation** > > **Without it, a digital resource may be irretrievable, unidentifiable or unusable** --- ### **Descriptive** - enables **identification, location** and **retrieval** of data, often includes use of **controlled vocabularies** for classification and indexing. ### **Technical** - describes the **technical processes** used to **produce**, or required to **use** a digital data object. ### **Administrative** - used to manage **administrative aspects** of the digital object e.g. **intellectual property rights and acquisition.** --- ## **Elements of metadata** - #### **Structured data files:** - readable by machines and humans, accessible through the web - #### **Controlled vocabularies** eg. [NERC Vocabulary server]( - allows for connectivity of data ### **KEY TO SEARCH FUNCTION** - By structuring & adhering to controlled vocabularies, data can be **combined, accessed** and **searched!** - **Different communities** develop **different standards** which define both the structure and content of metadata --- class: top, right, inverse # metadata in research *** --- ## Identifying the right metadata standard - **General:** Dublin Core Metadata Initiative [Specification]( - **[NERC Data Centers:](** Check with individual data centers for their metadata specification. - **[](** Registry of Research Data Repositories. --- ### **Seek help from support teams** Most university libraries have assistants dedicated to Research Data Management: <blockquote class="twitter-tweet" data-conversation="none" data-lang="en"><p lang="en" dir="ltr"><a href="">@tomjwebb</a> <a href="">@ScientificData</a> Talk to their librarian for data management strategies <a href="">#datainfolit</a></p>— Yasmeen Shorish (@yasmeen_azadi) <a href="">January 16, 2015</a></blockquote> <script async src="//" charset="utf-8"></script> --- # Key metadata: ## the bare minimum ### document **data coverage** information - **taxonomic coverage**: a table containing **taxonomic information on species in data**. - also record authority / source - **temporal coverage**: temporal range and resolution details - **spatial coverage**: + a human readable geographic description of the study area + spatial range and resolution details + include depth (marine/freshwater) or altitudinal (terrestrial) information Make sure to record units! --- ### document protocols in a `methods` document Keep a dynamic document used to **plan**, **record** and **write up** methods. <blockquote class="twitter-tweet" data-conversation="none" data-lang="en"><p lang="en" dir="ltr"><a href="">@tomjwebb</a> record every detail about how/where/why it is collected</p>— Sal Keith (@Sal_Keith) <a href="">January 16, 2015</a></blockquote> <script async src="//" charset="utf-8"></script> **Any additional information other users would need to combine your data with theirs? Record it** --- class: inverse, middle, center # Practical metadata *** --- # [ACCE DTP RDM]( course <br> Teaching this course has always felt challenging in terms of practical exercises -- - **Defining** Metadata & **explaining importance**: β -- - Advising on domain specific **Controlled Vocabularies** & **structure** β - How can we practice creating metadata? --- # [rOpenSci Unconf 18]( ##### May 21 - 22, 2018. Seattle <img src="assets/seattle.svg" height="65%" /> --- # rOpenSci Unconf mission > bringing together scientists, developers, and open data enthusiasts from academia, industry, government, and non-profits to get together for a few days and hack on various projects. <br> #### Ideas for projects submitted through GitHub [**issues**]( in the [**runconf18** repo]( --- ## issue [#72]( π <img src="assets/issue.png" width="100%"> --- # Metadata team! ------------ Luckily, a **whole bunch of other awesome folks** were also thinking about these topics and interested in working on them! π€© (in alphabetical order): - [Carl Boettiger]( - [Scott Chamberlain]( - [Auriel Fournier]( #[41]( - [Kelly Hondula]( - [Anna Krystalli]( - [Bryce Mecum]( - [MaΓ«lle Salmon]( - [Kate Webbink]( #[52]( - [Kara Woo]( #[68]( --- # [rOpenSciLabs]( pkg [**`dataspice`**]( > Package [**`dataspice`**]( makes it easier for researchers to **create basic, lightweight and concise metadata files for their datasets**. <br> - Metadata **collected in `csv` files** -- - Metadata fields are **based on [](** + underlies Google [Datasets]( metadata specification -- - Helper functions and shinyapps to **extract and edit metadata files**. -- - Ability to produce: + **structured json-ld metadata file**. + a helpful dataset **README webpage**. <br> --- ### [Google unveils search engine for open data]( #### _The tool, called Google Dataset Search, should help researchers to find the data they need more easily._ ##### Nature NEWS - 05 SEPTEMBER 2018 <img src="assets/google_search.png" width="100%"> <br> *** <> --- ## `dataspice` tutorial <br> The goal of this section is to provide a **practical exercise in creating metadata** for an **example field collected data product** using package `dataspice`. - Understand basic metadata and why it is important -- - Understand where and how to store them -- - Understand how they can feed into more complex metadata objects. --- ## `dataspice` workflow <img src="" width="100%" /> --- class: inverse, right, center # Practical *** -- ### time for some live coding π± <br> <br> *** _head to the [tutorial](#dataspice)_ --- class: inverse, right, center # Outro *** --- # Additional metadata tips - ### The approach we went for is very general / minimal -- - #### You can **make your datasets more discoverable** by developing **richer/more domain specific metadata** files. -- - eg. create [Ecological Metadata Language (EML)]( metadata using r pkg [`EML`]( -- - reposit your data at [KNB]( -- - allows richer [search and presentation of metadata]( --- # KNB data portal ## Powerful search <img src="assets/knb.png" width="60%" /> --- # KNB data portal ## Rich interactive metadata <img src="assets/knb_attributes.png" width="60%" /> --- # Parting words -- - #### Any metadata documentation is better than none π -- - #### Start small and build up to more complex standards π― -- - - #### But make sure to cover bare minimum β οΈ -- - #### Reach out for help from your local librarians or try the [rOpenSci discussion board]( π -- ## π― ## Get back [home](