class: top, right, inverse ## ACCE Research Data and Project Management *** .bottom[ # File naming #### 10-11 April 2019, University of Sheffield #### Dr Anna Krystalli @annakrystalli ] --- # Background ### Let's face it... - There are going to be files - **LOTS** of files - The files will **change over time** - The files will **have relationships to each other** ### It'll probably get complicated --- data:image/s3,"s3://crabby-images/cea7e/cea7e91a68e8e016a5537759c64d018213fb8396" alt="" --- ## **Strategy against chaos** ### **File organization** and **naming** is a mighty weapon against chaos - Make a file's **name** and **location** ***VERY INFORMATIVE*** about: - what it is, - why it exists, - how it relates to other things - The more things are **self-explanatory**, the better. --- ## File naming <br> ### **Names matter** data:image/s3,"s3://crabby-images/e2f6b/e2f6bb45995cfe08803570ca280ebb2e80f16a77" alt="" --- ### **What works, what doesn't?** **NO** ~~~ myabstract.docx Joe’s Filenames Use Spaces and Punctuation.xlsx figure 1.png fig 2.png JW7d^(2sl@deletethisandyourcareerisoverWx2*.txt ~~~ **YES** ~~~ 2014-06-08_abstract-for-sla.docx joes-filenames-are-getting-better.xlsx fig01_scatterplot-talk-length-vs-interest.png fig02_histogram-talk-attendance.png 1986-01-28_raw-data-from-challenger-o-rings.txt ~~~ --- # **Three principles for good (file) names** 1. ### **Machine readable** 1. ### **Human readable** 1. ### **Play well with default ordering** --- # **Machine readable** - **Regular expression and globbing friendly** + Avoid spaces, punctuation, accented characters, case sensitivity - **Easy to compute on** + Deliberate use of delimiters --- ## Filtering and search through [Globbing](http://searchsecurity.techtarget.com/definition/globbing) ### **Excerpt of complete file listing:** data:image/s3,"s3://crabby-images/07dc2/07dc219877dc45654cdbc13839309c05557a3c18" alt="" --- ### **Example of globbing to filter file listing:** data:image/s3,"s3://crabby-images/beb6c/beb6c154db4c47c69fcf13a3fd6ce2340316a1d2" alt="" --- ### **Search using Mac OS Finder search facilities** data:image/s3,"s3://crabby-images/8c146/8c146fa1a7caddc46d52186f003ebeb849d20210" alt="" --- ### **Search using regex in R** data:image/s3,"s3://crabby-images/c3cad/c3cadd84dff1b26ce5718bd829c70db21f18d2f6" alt="" --- ## **Delimit information with punctuation** **Deliberate use of `"-"` and `"_"` allows recovery of metadata from the filenames:** - `"_"` underscore used to delimit units of metadata I want to access later - `"-"` hyphen used to delimit words so our eyes don't bleed data:image/s3,"s3://crabby-images/93dff/93dffe23f65c360219453c0848ec6a6210ed038e" alt="" --- ### Splitting filenames by delimiters data:image/s3,"s3://crabby-images/df753/df75383368c387ba87bdb415721b0369e27b40bd" alt="" This happens to be `R` but also possible in the `shell`, `Python`, etc. --- ## **Include important metadata** e.g. I'm saving a number of files of temperature data extracted at different resolutions (`res`) and for a number of months (`month`). Including these parameters in the filename allows me to use them to target files to read in. ```r write.csv(df, paste("variable", res, month, sep ="_")) df <- read.csv(paste("variable", res, month, sep ="_")) ``` --- ## **Recap: machine readable** - **Easy to search for files later** - **Easy to filter file lists based on names** - **Easy to extract info from file names, e.g. by splitting** **New to regular expressions and globbing? be kind to yourself and avoid** + Spaces in file names + Punctuation + Accented characters --- # **Human readable** - **Name contains info on content** - **Connects to concept of a** [***slug***](https://en.wikipedia.org/wiki/Semantic_URL#Slug) **from semantic URLs** --- ### **Example** #### **Which set of file(name)s do you want at 3 a.m. before a deadline?** data:image/s3,"s3://crabby-images/eb1f8/eb1f8bae070c83b28a1089dc23fb596683a759f9" alt="" --- ## **Embrace the slug** <img src="assets/img/slug_filenames.png" height="400px"> --- ### Use slugs to link inputs, scripts & outputs #### **The `R` scripts:** ~~~ 01_marshal-data.r 02_pre-dea-filtering.r 03_dea-with-limma-voom.r 04_explore-dea-results.r 90_limma-model-term-name-fiasco.r ~~~ #### **The figures left behind:** ~~~ 02_pre-dea-filtering-preDE-filtering.png 03-dea-with-limma-voom-voom-plot.png 04_explore-dea-results-focus-term-adjusted-p-values1.png 04_explore-dea-results-focus-term-adjusted-p-values2.png ... 90_limma-model-term-name-fiasco-first-voom.png 90_limma-model-term-name-fiasco-second-voom.png ~~~ --- ## **Recap: Human readable** - `\(\rightarrow\)` **Easy to figure out what the heck something is, based on its name** --- # **Play well with default ordering** - **Put something numeric first** - **Use the ISO 8601 standard for dates** - **Left pad other numbers with zeros** --- ### Examples ### **Chronological order:** data:image/s3,"s3://crabby-images/dba6f/dba6f17b533c898f74ce17baab92f351e37dd16c" alt="" --- ### Dates Use the **ISO 8601** standard for dates: `YYYY-MM-DD` data:image/s3,"s3://crabby-images/dba6f/dba6f17b533c898f74ce17baab92f351e37dd16c" alt="" --- data:image/s3,"s3://crabby-images/75d8a/75d8aff76b7ead9857dfeb929aa7552d4a6bf5ad" alt="iso_psa" --- ### **Logical order:** Put something numeric first data:image/s3,"s3://crabby-images/4db2b/4db2b80f77fd93c8fa689bc35fafa2cb3e20ff48" alt="" --- ## Left pad other numbers with zeros data:image/s3,"s3://crabby-images/4db2b/4db2b80f77fd93c8fa689bc35fafa2cb3e20ff48" alt="" **If you don’t left pad, you get this:** ~~~ 10_final-figs-for-publication.R 1_data-cleaning.R 2_fit-model.R ~~~ which is just sad :( --- ## Recap: Play well with default ordering - Put something numeric first - Use the ISO 8601 standard for dates - Left pad other numbers with zeros --- # **Recap: Three principles for (file) names** 1. Machine readable 2. Human readable 3. Play well with default ordering ## Go forth and use awesome file names :) --- ## Get back [home](index.html)