class: top, left, inverse ## ACCE DTP ### _Reproducible Research Data and Project Management in R_ *** .bottom[ # Good File Naming <br> **<svg viewBox="0 0 448 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M400 64h-48V12c0-6.6-5.4-12-12-12h-40c-6.6 0-12 5.4-12 12v52H160V12c0-6.6-5.4-12-12-12h-40c-6.6 0-12 5.4-12 12v52H48C21.5 64 0 85.5 0 112v352c0 26.5 21.5 48 48 48h352c26.5 0 48-21.5 48-48V112c0-26.5-21.5-48-48-48zm-6 400H54c-3.3 0-6-2.7-6-6V160h352v298c0 3.3-2.7 6-6 6z"></path> </svg> April-May 2021** <br> **<svg viewBox="0 0 288 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M112 316.94v156.69l22.02 33.02c4.75 7.12 15.22 7.12 19.97 0L176 473.63V316.94c-10.39 1.92-21.06 3.06-32 3.06s-21.61-1.14-32-3.06zM144 0C64.47 0 0 64.47 0 144s64.47 144 144 144 144-64.47 144-144S223.53 0 144 0zm0 76c-37.5 0-68 30.5-68 68 0 6.62-5.38 12-12 12s-12-5.38-12-12c0-50.73 41.28-92 92-92 6.62 0 12 5.38 12 12s-5.38 12-12 12z"></path> </svg> Online** ] --- # Background ### Let's face it... - There are going to be files - **LOTS** of files - The files will **change over time** - The files will **have relationships to each other** ### It'll probably get complicated --- ![](assets/img/files_messy_tidy.png) --- ## **Strategy against chaos** ### **File organization** and **naming** is a mighty weapon against chaos - Make a file's **name** and **location** ***VERY INFORMATIVE*** about: - what it is, - why it exists, - how it relates to other things - The more things are **self-explanatory**, the better. --- ## File naming <br> ### **Names matter** ![](assets/img/cheers.png) --- ### **What works, what doesn't?** **NO** ~~~ myabstract.docx Joe’s Filenames Use Spaces and Punctuation.xlsx figure 1.png fig 2.png JW7d^(2sl@deletethisandyourcareerisoverWx2*.txt ~~~ **YES** ~~~ 2014-06-08_abstract-for-sla.docx joes-filenames-are-getting-better.xlsx fig01_scatterplot-talk-length-vs-interest.png fig02_histogram-talk-attendance.png 1986-01-28_raw-data-from-challenger-o-rings.txt ~~~ --- # **Three principles for good (file) names** 1. ### **Machine readable** 1. ### **Human readable** 1. ### **Play well with default ordering** --- # **Machine readable** - **Regular expression and globbing friendly** + Avoid spaces, punctuation, accented characters, case sensitivity - **Easy to compute on** + Deliberate use of delimiters --- ## Filtering and search through [Globbing](http://searchsecurity.techtarget.com/definition/globbing) ### **Excerpt of complete file listing:** ![](assets/img/plasmid_names.png) --- ### **Example of globbing to filter file listing:** ![](assets/img/plasmid_glob.png) --- ### **Search using Mac OS Finder search facilities** ![](assets/img/plasmid_mac_os_search.png) --- ### **Search using regex in R** ![](assets/img/plasmid_regex.png) --- ## **Delimit information with punctuation** **Deliberate use of `"-"` and `"_"` allows recovery of metadata from the filenames:** - `"_"` underscore used to delimit units of metadata I want to access later - `"-"` hyphen used to delimit words so our eyes don't bleed ![](assets/img/plasmid_delimiters.png) --- ### Splitting filenames by delimiters ![](assets/img/plasmid_delimiters_code.png) This happens to be `R` but also possible in the `shell`, `Python`, etc. --- ## **Include important metadata** e.g. I'm saving a number of files of temperature data extracted at different resolutions (`res`) and for a number of months (`month`). Including these parameters in the filename allows me to use them to target files to read in. ```r write.csv(df, paste("variable", res, month, sep ="_")) df <- read.csv(paste("variable", res, month, sep ="_")) ``` --- ## **Recap: machine readable** - **Easy to search for files later** - **Easy to filter file lists based on names** - **Easy to extract info from file names, e.g. by splitting** **New to regular expressions and globbing? be kind to yourself and avoid** + Spaces in file names + Punctuation + Accented characters --- # **Human readable** - **Name contains info on content** - **Connects to concept of a** [***slug***](https://en.wikipedia.org/wiki/Semantic_URL#Slug) **from semantic URLs** --- ### **Example** #### **Which set of file(name)s do you want at 3 a.m. before a deadline?** ![](assets/img/human_readable_not_options.png) --- ## **Embrace the slug** <img src="assets/img/slug_filenames.png" height="400px"> --- ### Use slugs to link inputs, scripts & outputs #### **The `R` scripts:** ~~~ 01_marshal-data.r 02_pre-dea-filtering.r 03_dea-with-limma-voom.r 04_explore-dea-results.r 90_limma-model-term-name-fiasco.r ~~~ #### **The figures left behind:** ~~~ 02_pre-dea-filtering-preDE-filtering.png 03-dea-with-limma-voom-voom-plot.png 04_explore-dea-results-focus-term-adjusted-p-values1.png 04_explore-dea-results-focus-term-adjusted-p-values2.png ... 90_limma-model-term-name-fiasco-first-voom.png 90_limma-model-term-name-fiasco-second-voom.png ~~~ --- ## **Recap: Human readable** - `\(\rightarrow\)` **Easy to figure out what the heck something is, based on its name** --- # **Play well with default ordering** - **Put something numeric first** - **Use the ISO 8601 standard for dates** - **Left pad other numbers with zeros** --- ### Examples ### **Chronological order:** ![](assets/img/chronological_order.png) --- ### Dates Use the **ISO 8601** standard for dates: `YYYY-MM-DD` ![](assets/img/chronological_order.png) --- ![iso_psa](assets/img/iso_8601.tiff) --- ### **Logical order:** Put something numeric first ![](assets/img/logical_order.png) --- ## Left pad other numbers with zeros ![](assets/img/logical_order.png) **If you don’t left pad, you get this:** ~~~ 10_final-figs-for-publication.R 1_data-cleaning.R 2_fit-model.R ~~~ which is just sad :( --- ## Recap: Play well with default ordering - Put something numeric first - Use the ISO 8601 standard for dates - Left pad other numbers with zeros --- # **Recap: Three principles for (file) names** 1. Machine readable 2. Human readable 3. Play well with default ordering ## Go forth and use awesome file names :) --- ## Get back [home](index.html)