class: center, middle, inverse, title-slide # Practical Data Management w/ dataspice ## ✨ spicing up your data with
tasty metadata ✨ --- # 👋 Hello and welcome ### me: Dr Anna Krystalli - **Research Software Engineer**, _University of Sheffield_ + twitter **@annakrystalli** + github **@annakrystalli** + email **a.krystalli[at]sheffield.ac.uk** -- - **Backround** + PhD in Marine Macroecology (data parasite 👿) -- + [rOpenSci](http://onboarding.ropensci.org/) Editor -- + ACCE Doctoral Partnership **Research Data Management** course instructor --- background-image: url(https://github.com/yihui/xaringan/releases/download/v0.0.2/karl-moustache.jpg) --- class: inverse, middle, center # Background --- # [ACCE DTP RDM](https://acce.shef.ac.uk/event/acce-data-management-workshop/) course ##### May 1 - 2, 2018 <br> Includes a **section on Metadata** ✨ -- - **Defining** Metadata & **explaining importance**: ✅ -- - Advising on domain specific **Controlled Vocabularies** & **structure** ❌ --- # [rOpenSci Unconf 18](http://unconf18.ropensci.org/) ##### May 21 - 22, 2018. Seattle ![](assets/seattle.svg) --- --- # rOpenSci Unconf mission > bringing together scientists, developers, and open data enthusiasts from academia, industry, government, and non-profits to get together for a few days and hack on various projects. <br> #### Ideas for projects submitted through GitHub [**issues**](https://github.com/ropensci/unconf18/issues) in the [**runconf18** repo](https://github.com/ropensci/unconf18) --- ## issue [#72](https://github.com/ropensci/unconf18/issues/72) 🙋 <img src="assets/issue.png" width="100%"> --- # Metadata team! ------------ Luckily, a **whole bunch of other awesome folks** were also thinking about these topics and interested in working on them! 🤩 (in alphabetical order): - [Carl Boettiger](https://github.com/cboettig) - [Scott Chamberlain](https://github.com/sckott) - [Auriel Fournier](https://github.com/aurielfournier): #[41](https://github.com/ropensci/unconf18/issues/41) - [Kelly Hondula](https://github.com/khondula) - [Anna Krystalli](https://github.com/annakrystalli) - [Bryce Mecum](https://github.com/amoeba) - [Maëlle Salmon](https://github.com/maelle) - [Kate Webbink](https://github.com/magpiedin): #[52](https://github.com/ropensci/unconf18/issues/52) - [Kara Woo](https://github.com/karawoo): #[68](https://github.com/ropensci/unconf18/issues/68) --- # [rOpenSciLabs](https://github.com/ropenscilabs) pkg [**`dataspice`**](https://github.com/ropenscilabs/dataspice) > Package [**`dataspice`**](https://github.com/ropenscilabs/dataspice) makes it easier for researchers to **create basic, lightweight and concise metadata files for their datasets**. <br> - Metadata **collected in `csv` files** -- - Metadata fields are **based on [schema.org](http://schema.org/Dataset)** + underlies Google [Datasets](https://developers.google.com/search/docs/data-types/dataset) metadata specification -- - Helper functions and shinyapps to **extract and edit metadata files**. -- - Ability to produce: + **structured json-ld metadata file**. + a helpful dataset **README webpage**. <br> --- ### [Google unveils search engine for open data](https://www.nature.com/articles/d41586-018-06201-x) #### _The tool, called Google Dataset Search, should help researchers to find the data they need more easily._ ##### Nature NEWS - 05 SEPTEMBER 2018 <img src="assets/google_search.png" width="100%"> <br> *** <https://toolbox.google.com/datasetsearch> --- class: inverse, middle, center # `dataspice` tutorial <br> #### tutorial repo: ### <https://github.com/annakrystalli/dataspice-tutorial> #### useful links: ### <http://annakrystalli.me/dataspice-tutorial/useful_links.html> --- # `dataspice` tutorial <br> The goal of dataspice-tutorial is a **practical exercise in creating metadata** for an **example field collected data product** using package `dataspice`. - Understand basic metadata and why it is important -- - Understand where and how to store them -- - Understand how they can feed into more complex metadata objects. --- ## `dataspice` workflow <img src="https://github.com/ropenscilabs/dataspice/raw/master/man/figures/dataspice_workflow.png" width="100%"> --- # Example dataset ### Data source : #### [National Ecological Observatory Network](https://www.neonscience.org/) (NEON) data portal ### Dataset selected: #### *Woody plant vegetation structure* This data product contains the quality-controlled, native sampling resolution data from **in-situ measurements of live and standing dead woody individuals and shrub groups**, from all **terrestrial NEON sites** with qualifying woody vegetation. -- - **Structure and mapping data** are reported **per individual per plot** - **Sampling metadata**, such as per growth form sampling area, are reported **per plot**. --- # `dataspice` workshop data ![](http://data.neonscience.org/neon-data-theme/images/logo--blue-neon-data.png) The data are a **trimmed subset** of data downladed from the [**NEON data portal**](http://data.neonscience.org/browse-data) after filtering for: - time periods between **`2015-06` - `2016-06`** - locations within NEON Domain area **`D01: Northeast`** Filter returned data from **2 sites** from **`2015-6`** to **`2015-11`**. <br> *** ##### Citation: _National Ecological Observatory Network. 2018. Data Products: DP1.10098.001. Provisional data downloaded from http://data.neonscience.org on 2018-05-04. Battelle, Boulder, CO, USA_ --- ## vst_perplotperyear.csv Plot level data ``` ## # A tibble: 165 x 14 ## date siteID plotID plotType nlcdClass decimalLatitude ## <date> <chr> <chr> <chr> <chr> <dbl> ## 1 2015-06-06 BART BART_… tower deciduou… 44.1 ## 2 2015-07-16 BART BART_… tower deciduou… 44.1 ## 3 2015-07-21 BART BART_… tower deciduou… 44.1 ## 4 2015-07-22 BART BART_… tower mixedFor… 44.1 ## 5 2015-07-22 BART BART_… tower deciduou… 44.1 ## 6 2015-07-22 BART BART_… tower deciduou… 44.1 ## 7 2015-07-22 BART BART_… tower deciduou… 44.1 ## 8 2015-07-23 BART BART_… tower mixedFor… 44.1 ## 9 2015-07-23 BART BART_… tower deciduou… 44.1 ## 10 2015-07-28 BART BART_… tower mixedFor… 44.1 ## # … with 155 more rows, and 8 more variables: decimalLongitude <dbl>, ## # treesPresent <lgl>, shrubsPresent <lgl>, lianasPresent <lgl>, ## # totalSampledAreaTrees <dbl>, totalSampledAreaShrubSapling <dbl>, ## # totalSampledAreaLiana <dbl>, recordedBy <chr> ``` --- ## vst_mappingandtagging.csv Individual level data ``` ## # A tibble: 1,799 x 7 ## date siteID plotID individualID taxonID scientificName recordedBy ## <date> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 2015-08-04 BART BART_0… NEON.PLA.D0… TSCA Tsuga canaden… 6HzkzFDdL… ## 2 2015-08-04 BART BART_0… NEON.PLA.D0… TSCA Tsuga canaden… 6HzkzFDdL… ## 3 2015-07-22 BART BART_0… NEON.PLA.D0… FAGR Fagus grandif… zODC+zTh3… ## 4 2015-08-26 BART BART_0… NEON.PLA.D0… FAGR Fagus grandif… zODC+zTh3… ## 5 2015-08-04 BART BART_0… NEON.PLA.D0… PICEA Picea sp. 6HzkzFDdL… ## 6 2015-08-04 BART BART_0… NEON.PLA.D0… TSCA Tsuga canaden… 0uwWHUCkG… ## 7 2015-07-22 BART BART_0… NEON.PLA.D0… FAGR Fagus grandif… zODC+zTh3… ## 8 2015-08-12 BART BART_0… NEON.PLA.D0… FAGR Fagus grandif… jRr6tAEXv… ## 9 2015-07-22 BART BART_0… NEON.PLA.D0… FAGR Fagus grandif… jRr6tAEXv… ## 10 2015-08-25 BART BART_0… NEON.PLA.D0… FAGR Fagus grandif… 0uwWHUCkG… ## # … with 1,789 more rows ``` --- class: inverse, middle, center # Practical -- ### time for some live coding 😱 <br> <br> *** _or head to the [tutorial](http://annakrystalli.me/dataspice-tutorial/) if working through this on your own_ --- class: inverse, middle, center # Outro --- # Additional metadata tips - ### The approach we went for is very general / minimal -- - #### You can **make your datasets more discoverable** by developing **richer/more domain specific metadata** files. -- - eg. create [Ecological Metadata Language (EML)](https://knb.ecoinformatics.org/#external//emlparser/docs/index.html) metadata using r pkg [`eml2`](https://github.com/cboettig/eml2). -- - reposit your data at [KNB](https://knb.ecoinformatics.org/#data) -- - allows richer [search and presentation of metadata](https://knb.ecoinformatics.org/#view/IC.13.1) --- # KNB data portal ## Powerful search <img src="assets/knb.png" width="100%"> --- # KNB data portal ## Rich interactive metadata <img src="assets/knb_attributes.png" width="100%"> --- # Parting words -- - #### Any metadata documentation is better than none 👍 -- - Got ideas for features / found any bugs? Open an [issue](https://github.com/ropenscilabs/dataspice/issues)! -- - Are you into metadata? [Become a contributor](https://github.com/ropenscilabs/dataspice/)! -- - If you use the [`xaringan`](https://slides.yihui.name/xaringan/#1) pkg, don't forget to try the `yolo=true` option, of `xaringan::moon_reader`. -- ## 👋