class: top, left, inverse ## ACCE DTP ### _Reproducible Research Data and Project Management in R_ *** .bottom[ # Metadata <br> **<svg viewBox="0 0 448 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M400 64h-48V12c0-6.6-5.4-12-12-12h-40c-6.6 0-12 5.4-12 12v52H160V12c0-6.6-5.4-12-12-12h-40c-6.6 0-12 5.4-12 12v52H48C21.5 64 0 85.5 0 112v352c0 26.5 21.5 48 48 48h352c26.5 0 48-21.5 48-48V112c0-26.5-21.5-48-48-48zm-6 400H54c-3.3 0-6-2.7-6-6V160h352v298c0 3.3-2.7 6-6 6z"></path> </svg> April-May 2021** <br> **<svg viewBox="0 0 288 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M112 316.94v156.69l22.02 33.02c4.75 7.12 15.22 7.12 19.97 0L176 473.63V316.94c-10.39 1.92-21.06 3.06-32 3.06s-21.61-1.14-32-3.06zM144 0C64.47 0 0 64.47 0 144s64.47 144 144 144 144-64.47 144-144S223.53 0 144 0zm0 76c-37.5 0-68 30.5-68 68 0 6.62-5.38 12-12 12s-12-5.38-12-12c0-50.73 41.28-92 92-92 6.62 0 12 5.38 12 12s-5.38 12-12 12z"></path> </svg> Online** ] --- ## You got data. Is it enough? <blockquote class="twitter-tweet" data-conversation="none" data-lang="en"><p lang="en" dir="ltr"><a href="https://twitter.com/tomjwebb">@tomjwebb</a> I see tons of spreadsheets that i don't understand anything (or the stduent), making it really hard to share.</p>— Erika Berenguer (@Erika_Berenguer) <a href="https://twitter.com/Erika_Berenguer/status/556111838715580417">January 16, 2015</a></blockquote> <script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script> <blockquote class="twitter-tweet" data-conversation="none" data-lang="en"><p lang="en" dir="ltr"><a href="https://twitter.com/tomjwebb">@tomjwebb</a> <a href="https://twitter.com/ScientificData">@ScientificData</a> "Document. Everything." Data without documentation has no value.</p>— Sven Kochmann (@indianalytics) <a href="https://twitter.com/indianalytics/status/556120920881115136">January 16, 2015</a></blockquote> <script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script> --- <blockquote class="twitter-tweet" data-conversation="none" data-lang="en"><p lang="it" dir="ltr"><a href="https://twitter.com/tomjwebb">@tomjwebb</a> Annotate, annotate, annotate!</p>— CanJFishAquaticSci (@cjfas) <a href="https://twitter.com/cjfas/status/556109252788379649">January 16, 2015</a></blockquote> <script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script> <blockquote class="twitter-tweet" data-conversation="none" data-lang="en"><p lang="und" dir="ltr">Document all the metadata (including protocols).<a href="https://twitter.com/tomjwebb">@tomjwebb</a></p>— Ward Appeltans (@WrdAppltns) <a href="https://twitter.com/WrdAppltns/status/556108414955560961">January 16, 2015</a></blockquote> <script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script> --- <blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">You download a zip file of <a href="https://twitter.com/hashtag/OpenData?src=hash">#OpenData</a>. Apart from your data file(s), what else should it contain?</p>— Leigh Dodds (@ldodds) <a href="https://twitter.com/ldodds/status/828657155863638016">February 6, 2017</a></blockquote> <script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script> --- ## **#otherpeoplesdata dream match!** #### **Thought experiment: Imagine a dream open data set** #### **How would you locate it?** - what details would you need to know to determine relevance? - what information would you need to know to use it? <img src="assets/img/missing-unicorn.jpg" height="300px"> --- class: top, right, inverse # metadata = data about data *** --- > ### _"Information that **describes, explains, locates**, or in some way makes it easier to **find, access**, and **use** a resource (in this case, data)."_" .pull-left[ <img src="http://chiphouston.com/wp-content/uploads/2016/06/who-what-when-where-and-why11.jpg" width="200px"> ] .pull-right[ ### **Data Reuse Checklist** <http://mozillascience.github.io/checklist/><img src="https://mozillascience.github.io/working-open-workshop/assets/images/science-lab-logo.svg" width="300px"> ] > ### **Backbone of digital curation** > > **Without it, a digital resource may be irretrievable, unidentifiable or unusable** --- ### **Descriptive** - enables **identification, location** and **retrieval** of data, often includes use of **controlled vocabularies** for classification and indexing. ### **Technical** - describes the **technical processes** used to **produce**, or required to **use** a digital data object. ### **Administrative** - used to manage **administrative aspects** of the digital object e.g. **intellectual property rights and acquisition.** --- ## **Elements of metadata** - #### **Structured data files:** - readable by machines and humans, accessible through the web - #### **Controlled vocabularies** eg. [NERC Vocabulary server](https://www.bodc.ac.uk/resources/products/web_services/vocab/) - allows for connectivity of data ### **KEY TO SEARCH FUNCTION** - By structuring & adhering to controlled vocabularies, data can be **combined, accessed** and **searched!** - **Different communities** develop **different standards** which define both the structure and content of metadata --- class: top, right, inverse # metadata in research *** --- ## Identifying the right metadata standard - **General:** Dublin Core Metadata Initiative [Specification](http://dublincore.org/specifications/) - **[NERC Data Centers:](https://nerc.ukri.org/research/sites/data/)** Check with individual data centers for their metadata specification. - **[Re3data.org](https://www.re3data.org/):** Registry of Research Data Repositories. --- ### **Seek help from support teams** Most university libraries have assistants dedicated to Research Data Management: <blockquote class="twitter-tweet" data-conversation="none" data-lang="en"><p lang="en" dir="ltr"><a href="https://twitter.com/tomjwebb">@tomjwebb</a> <a href="https://twitter.com/ScientificData">@ScientificData</a> Talk to their librarian for data management strategies <a href="https://twitter.com/hashtag/datainfolit?src=hash">#datainfolit</a></p>— Yasmeen Shorish (@yasmeen_azadi) <a href="https://twitter.com/yasmeen_azadi/status/556129700129800192">January 16, 2015</a></blockquote> <script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script> --- # Key metadata: ## the bare minimum ### document **data coverage** information - **taxonomic coverage**: a table containing **taxonomic information on species in data**. - also record authority / source - **temporal coverage**: temporal range and resolution details - **spatial coverage**: + a human readable geographic description of the study area + spatial range and resolution details + include depth (marine/freshwater) or altitudinal (terrestrial) information Make sure to record units! --- ### document protocols in a `methods` document Keep a dynamic document used to **plan**, **record** and **write up** methods. <blockquote class="twitter-tweet" data-conversation="none" data-lang="en"><p lang="en" dir="ltr"><a href="https://twitter.com/tomjwebb">@tomjwebb</a> record every detail about how/where/why it is collected</p>— Sal Keith (@Sal_Keith) <a href="https://twitter.com/Sal_Keith/status/556110605053349888">January 16, 2015</a></blockquote> <script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script> **Any additional information other users would need to combine your data with theirs? Record it** --- class: inverse, middle, center # Practical metadata *** --- # [ACCE DTP RDM](https://acce.shef.ac.uk/event/acce-data-management-workshop/) course <br> Teaching this course has always felt challenging in terms of practical exercises -- - **Defining** Metadata & **explaining importance**: β -- - Advising on domain specific **Controlled Vocabularies** & **structure** β - How can we practice creating metadata? --- # [rOpenSci Unconf 18](http://unconf18.ropensci.org/) ##### May 21 - 22, 2018. Seattle <img src="assets/seattle.svg" height="65%" /> --- # rOpenSci Unconf mission > bringing together scientists, developers, and open data enthusiasts from academia, industry, government, and non-profits to get together for a few days and hack on various projects. <br> #### Ideas for projects submitted through GitHub [**issues**](https://github.com/ropensci/unconf18/issues) in the [**runconf18** repo](https://github.com/ropensci/unconf18) --- ## issue [#72](https://github.com/ropensci/unconf18/issues/72) π <img src="assets/issue.png" width="100%"> --- # Metadata team! ------------ Luckily, a **whole bunch of other awesome folks** were also thinking about these topics and interested in working on them! π€© (in alphabetical order): - [Carl Boettiger](https://github.com/cboettig) - [Scott Chamberlain](https://github.com/sckott) - [Auriel Fournier](https://github.com/aurielfournier): #[41](https://github.com/ropensci/unconf18/issues/41) - [Kelly Hondula](https://github.com/khondula) - [Anna Krystalli](https://github.com/annakrystalli) - [Bryce Mecum](https://github.com/amoeba) - [MaΓ«lle Salmon](https://github.com/maelle) - [Kate Webbink](https://github.com/magpiedin): #[52](https://github.com/ropensci/unconf18/issues/52) - [Kara Woo](https://github.com/karawoo): #[68](https://github.com/ropensci/unconf18/issues/68) --- # [rOpenSciLabs](https://github.com/ropenscilabs) pkg [**`dataspice`**](https://github.com/ropenscilabs/dataspice) > Package [**`dataspice`**](https://github.com/ropenscilabs/dataspice) makes it easier for researchers to **create basic, lightweight and concise metadata files for their datasets**. <br> - Metadata **collected in `csv` files** -- - Metadata fields are **based on [schema.org](http://schema.org/Dataset)** + underlies Google [Datasets](https://developers.google.com/search/docs/data-types/dataset) metadata specification -- - Helper functions and shinyapps to **extract and edit metadata files**. -- - Ability to produce: + **structured json-ld metadata file**. + a helpful dataset **README webpage**. <br> --- ### [Google unveils search engine for open data](https://www.nature.com/articles/d41586-018-06201-x) #### _The tool, called Google Dataset Search, should help researchers to find the data they need more easily._ ##### Nature NEWS - 05 SEPTEMBER 2018 <img src="assets/google_search.png" width="100%"> <br> *** <https://toolbox.google.com/datasetsearch> --- ## `dataspice` tutorial <br> The goal of this section is to provide a **practical exercise in creating metadata** for an **example field collected data product** using package `dataspice`. - Understand basic metadata and why it is important -- - Understand where and how to store them -- - Understand how they can feed into more complex metadata objects. --- ## `dataspice` workflow <img src="https://github.com/ropenscilabs/dataspice/raw/main/man/figures/dataspice_workflow.png" width="100%" /> --- class: inverse, right, center # Practical *** -- ### time for some live coding π± <br> <br> *** _head to the [tutorial](#dataspice)_ --- class: inverse, right, center # Outro *** --- # Additional metadata tips - ### The approach we went for is very general / minimal -- - #### You can **make your datasets more discoverable** by developing **richer/more domain specific metadata** files. -- - eg. create [Ecological Metadata Language (EML)](https://knb.ecoinformatics.org/#external//emlparser/docs/index.html) metadata using r pkg [`EML`](https://github.com/ropensci/EML). -- - reposit your data at [KNB](https://knb.ecoinformatics.org/#data) -- - allows richer [search and presentation of metadata](https://knb.ecoinformatics.org/#view/IC.13.1) --- # KNB data portal ## Powerful search <img src="assets/knb.png" width="60%" /> --- # KNB data portal ## Rich interactive metadata <img src="assets/knb_attributes.png" width="60%" /> --- # Parting words -- - #### Any metadata documentation is better than none π -- - #### Start small and build up to more complex standards π― -- - - #### But make sure to cover bare minimum β οΈ -- - #### Reach out for help from your local librarians or try the [rOpenSci discussion board](https://discuss.ropensci.org/) π -- ## π― ## Get back [home](https://annakrystalli.me/rrresearchACCE20/)