Version Control with Git

Hands up - who has heard of version control software?
What do you think it does?


What is Version control? 🤔

The management of changes to documents, computer programs, large web sites, and other collections of information.

Examples:

  • Numbering of book editions
  • Wikipedia’s Page history


Where did it come from?

The need for a logical way to organize and control revisions has existed for almost as long as writing has existed, but revision control became much more important, and complicated when the era of computing began

Elements of a Version Control system

  • Changes are usually identified by a number or letter code, termed the “revision number”

  • Each revision is associated with a timestamp and the person making the change.

  • Only changes to a file are recorded rather than saving a whole new copy.

  • Revisions can be compared, restored, and with some types of files, merged.


What is git? 🤔

Open source (free to use) Version control software. Usually accessed via the command line, or a client program.

Where did it come from?

Git development began in 2006 after many developers of the Linux kernel gave up access to BitKeeper (at the time the best but proprietary)

Linus Torvalds on the name git:

“I’m an egotistical bastard, and I name all my projects after myself. First ‘Linux’, now ‘git’

More on the name in the source code original readme file

Why use it in research?

Exhibit A

Image: xkcd CC BY-NC 2.5

Figure 1: Image: xkcd CC BY-NC 2.5

What is GitHub 🤔

A website that allows you to store your Git repositories online and makes it easy to collaborate with others. They also provide other services like issue (bug) tracking and wikis. Similar services are GitLab and BitBucket.

Why use it in research:

To enable collaboration and track contributions

images: Mozilla Science Lab CC-BY 4.0


Anatomy of GitHub Repo

  • Readme files. Create a README.md file to explain what your project is, and how to install and use it. README.md is the file that is automatically displayed when you open a GitHub repo.

  • License. Without some sort of licence, the contents of the repository are technically closed. Some allow users of the code to do anything they like with their code - these are known as permissive licences. Examples are the MIT Licence or Apache.

  • Contributing guide - make a file called CONTRIBUTING.md and guidelines for contributors so they know what they should do if they want to help you out.

  • Code of Conduct - good projects have codes of conduct to make sure that people are treated well. Github has an Code of Conduct wizard to make it easy to add one.

  • Issues - use GitHub issues to record and discuss tasks.


Git, Github & Rstudio

Before: git only through the terminal

Rstudio & usethis to the rescue!

Rstudio + usethis 📦 == heavenly Git & GitHub

  • Initialise Rstudio project with Git by just checking a box!

    • Forgot to? use usethis::use_git()
  • visual panel to easily see the status of all your files

  • interactive navigation through file version history

Demo

How does Git work?

When a local directory becomes initialised with git, a hidden .git folder is added to it.

it’s now called a repository

  • New copies of files you tell git to track will be added to that .git folder.

  • After adding, git will track any modifications to those files

first commit - whole file added

Any file unknown to git will have a yellow ? box next to it.

The first time you commit a file you are adding it to .git, effectively telling it to start tracking the file

second commit - only difference highlighted

The first time you commit a file, only the changes are shown and any file that has uncommited modifications is shown with a blue M

When all changes have been committed, the git panel is clear.

Enough theory, how about in practice!

Configure git & GitHub

Configure git

First, git needs to know who you are so your commits can be attributed to you. usethis to the rescue again!

Check your configuration

usethis::git_sitrep()

Set your configuration

Use your github username and and the email you used to sign-up on GitHub

usethis::use_git_config(
    user.name = "Jane",
    user.email = "jane@example.org")

Set up GITHUB PAT

To authenticate with GitHub, you’ll also need a Personal Authorisation Token (PAT). Password-based authentication for Git is deprecated, i.e. you really should not be sending your username and password every time you push or pull. Here, I’m referring to the username and password you would use to login to GitHub in the browser.

What should you do instead?

Get a personal access token (PAT) and use that as your credential for HTTPS operations. (The PAT will actually be sent as the password and the username is somewhat artificial, consulted only for credential lookup.)

How to get a PAT?

GitHub offers instructions for creating a personal access token.

The usethis package has a helper function that takes you to the web form to create a PAT, with the added benefit that it pre-selects the recommended scopes:

usethis::create_github_token()
● Call `gitcreds::gitcreds_set()` to register this token in the local Git credential store
  It is also a great idea to store this token in any password-management software that you use
✓ Opening URL 'https://github.com/settings/tokens/new?scopes=repo,user,gist,workflow&description=R:GITHUB_PAT'

will open up the GitHub panel to generate your PAT.

Once you are happy with the selected scopes, click “Generate token”. As the page says, you must store this token somewhere, because you’ll never be able to see it again, once you leave that page or close the window.

Do not ever hard-wire your PAT into your code! A PAT should always be retrieved implicitly, for example, from the Git credential store or from an environment variable.

Store your credential

Below, we will add your PAT to our .Renviron file as well as the Git credential store as a semi-persistent convenience, sort of like “remember me” on a website. But, just like logging into websites, it is entirely possible that your PAT will somehow be forgotten from the credential store and you will need to re-enter it.

If you goof this up, i.e. generate a PAT but fail to capture it on your system, you’ll have to generate another one. This is not the end of the world, but you should delete the “lost” PAT on GitHub. If you aren’t disciplined about labelling PATs and deleting lost PATs, you will find yourself in an unsettling situation where you can’t be sure which PAT(s) are in use. When logged into your GitHub account, you can manage your PATs here:

ALSO STORE A COPY SOMEWHERE LOCALLY ON YOUR COMPUTER FOR NOW, WE WILL NEED IT IN FUTURE SESSIONS

Store in .Renviron

Copy it and paste it into your .Renviron file as system variable GITHUB_PAT.

usethis::edit_r_environ()

Use edit_r_environ() to open and edit your .Renviron file

Add to the file and save.

Cache with gitcreds package

As of November 2020, there are two R packages for accessing the Git credential store:

It is likely that these packages will eventually combine into one and, even now, they are largely interoperable. You don’t need to follow the instructions for both packages – pick one!

We will use the gitcreds package here.

If you don’t have gitcreds installed, install via install.packages("gitcreds").

Then call gitcreds::gitcreds_set():

gitcreds::gitcreds_set()

gitcreds::gitcreds_set() is a very handy function, since it reports any current credential, allows you to see it, allows you to keep or replace an existing credential, and can also store a credential for the first time.

Respond to the prompt with your personal access token (PAT).

You can check that you’ve stored a credential with gitcreds_get():

gitcreds::gitcreds_get()
#> <gitcreds>
#>   protocol: https
#>   host    : github.com
#>   username: PersonalAccessToken
#>   password: <-- hidden -->
TODO store your PAT with a password manager

Treat this PAT like a password! Currently, we’ve only store a copyable form of the PAT in our .Renviron file. If you use a password management app, such as 1Password or LastPass (which you should), it is highly recommended to add this PAT to your entry for GitHub.

Version Controlling projects

Turn our project into a repository

If you didn’t initialise git at the beginning of your project, you can do so now with usethis::use_git():

This will try to commit everything in the repo so far in one go! Override that behaviour by selecting a negative response when asked.

usethis::use_git()
✔ Initialising Git repo
✔ Adding '.Rhistory', '.RData' to '.gitignore'
There are 10 uncommitted files:
* '.DS_Store'
* '.gitignore'
* '.Rbuildignore'
* 'analysis.R'
* 'data-raw/'
* 'data/'
* 'index.html'
* 'index.Rmd'
* 'R/'
* 'wood-survey.Rproj'
Is it ok to commit them?

1: Negative
2: Nope
3: I agree

Selection: 

Next allow Rstudio to restart when asked:

● A restart of RStudio is required to activate the Git pane
Restart now?

1: Absolutely
2: Absolutely not
3: Negative

Committing files

In our project, let’s have a look at the Rstudio Git tab. It shows all the files currently in the folder. The yellow ? indicates none of the files have been added to git yet.

Add files

To commit changes in a file just select it in the git pane. When changes to a file are commited for the first time, the whole file is indicated as Added (green A).

Let’s focus on the files and analytical data we created so far. For now ignore the data-raw folder and all the other files we didn’t create:

Commit changes

Click on commit and write an appropriate commit message:

Create a README

Our repository also needs a README. We only need a simple plain markdown (.md) file for our README.

We can create a template using usethis::use_readme_md()

usethis::use_readme_md()

Edit README

Adapt the template, adding a short description about your project.

Add and commit your new README

Create repository on GitHub

Now that we have set up a GITHUP_PAT, we can use function usethis::use_github() to create a GitHub repository for our project:

usethis::use_github(protocol = "https")
✔ Checking that current branch is 'master'
● Check title and description
  Name:        wood-survey
  Description: 
Are title and description ok?

1: No
2: Yes
3: Nope

Answer affirmatively for the process to continue. Once the repo is created and any commmited files pushed, the repo is launched in the browser:

Host html content on GitHub

Let’s head to the repo and have a look at what we’ve shared. To host our html content on GitHub, we need to enable gh-pages in our repository.

Go to repo Settings

Enable gh-pages

Go to repo Settings and navigate to the Pages panel on the left hand side and change your settings to the following configuration:

Ensure the Enforce HTTPS option is selected.

Click on the link displayed and go check out your work!

Copy the link. In the main repo page, edit the page details at the right by clicking the gear button and paste copied the url in the website field.

Once added it provides easy access to the rendered content:

Tracking changes

Making a change to our index.Rmd

Let’s add the link to the rendered content to our place holder in index.Rmd

Let’s add and commit our changes

Pushing changes to GitHub

Click on the ⬆️ button on the Git tab to push our changes up to the repository

Let’s go have a look at the history 🕒

You might need to reconfigure your GITHUB_PAT credentials after a break Use usethis::edit_r_environ() to get your GITHUB_PAT and then gitcreds::gitcreds_set() to set them again.

Deleting files

  • Create a new file, any type of file.

  • Add and Commit it.

  • Delete it

  • Commit the deletion

  • Look back through the history

Ignoring files through .gitignore

There may be files that you don’t want to commit to git, e.g. 

  • data files that are too large

  • documents with sensitive information (eg authorisation tokens etc)

  • intermediate files that you don’t need to save copies of.

Tell git to ingnore them by adding them to the .gitignore file.

When we open .gitgnore we see there are a number of files already added. Let’s the rest of the files we want to ignore.

.Rproj.user
.Rhistory
.RData
.Rbuildignore
.DS_Store

gitignore regex

You can use regex (regular expressions) in .gitignore files to ignore files according to a pattern.

  • directoryname/* will ignore all files in a directory.

  • *.html will ignore any file ending in .html

  • prefix “!” which negates the pattern

So let’s use regex to ignore all files in attic/ and all files in data-raw/ apart from individual.R. Add the following to the bottom of .gitignore

attic/*
data-raw/*
!data-raw/individual.R

Commit .gitignore

Now that we’ve determined which files we want to ignore, let’s commit .gitignore so we can have a record of it and track any changes.

Git tips

  • commit early, commit often
  • commit logical bits of work together
  • write meaninful messages

Further Resources

Never forget