Last updated: 2018-09-05

workflowr checks: (Click a bullet for more information)
  • R Markdown file: up-to-date

    Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

  • Environment: empty

    Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

  • Seed: set.seed(20180820)

    The command set.seed(20180820) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

  • Session information: recorded

    Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

  • Repository version: fd226bf

    Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

    Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
    
    Ignored files:
        Ignored:    .DS_Store
        Ignored:    .Rhistory
        Ignored:    .Rproj.user/
        Ignored:    analysis/.DS_Store
        Ignored:    analysis/assets/
        Ignored:    data-raw/
        Ignored:    data/csv/
        Ignored:    data/raster/
        Ignored:    data/sf/
        Ignored:    docs/.DS_Store
    
    Untracked files:
        Untracked:  .Rbuildignore
        Untracked:  analysis/mapping.Rmd
    
    Unstaged changes:
        Modified:   .gitignore
        Modified:   analysis/_site.yml
    
    
    Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
Expand here to see past versions:
    File Version Author Date Message
    html 1d54c68 annakrystalli 2018-09-05 Build site.
    html 837dbac annakrystalli 2018-09-05 Build site.
    Rmd da82ab8 annakrystalli 2018-09-05 workflowr::wflow_publish(“analysis/gis.Rmd”)
    html c8c9096 annakrystalli 2018-09-05 Build site.
    Rmd eb17960 annakrystalli 2018-09-05 workflowr::wflow_publish(“analysis/gis.Rmd”)
    html 4c6e18d annakrystalli 2018-09-05 Build site.
    Rmd 496feae annakrystalli 2018-09-05 workflowr::wflow_publish(“analysis/gis.Rmd”)
    html d6d0523 annakrystalli 2018-09-05 Build site.
    Rmd 06f1792 annakrystalli 2018-09-05 workflowr::wflow_publish(“analysis/gis.Rmd”)
    html f22d555 annakrystalli 2018-09-05 Build site.
    Rmd 7403f6b annakrystalli 2018-09-05 workflowr::wflow_publish(“analysis/gis.Rmd”)
    html 80b8fec annakrystalli 2018-09-04 Build site.
    Rmd 1a58bc5 annakrystalli 2018-09-04 workflowr::wflow_publish(“analysis/gis.Rmd”)


Geographic Information Systems

Geographic Information Systems (GIS): computer based systems used to:

  • collect
  • store
  • manage
  • analyse
  • display spatial or geographic data

In science, GIS allow us to examine spatial realationships, patterns and processes.

Applied to a broad range of phenomena: from physical, biological to cultural and demographic.

Geospatial Data

Geospatial data types

Most common spatial data types:

  • raster: gridded data
  • vector: geographical feautures
    • points
    • lines
    • polygons

Vector data tends to dominate the social sciences because human settlements tend to have discrete borders.

Raster often dominates in environmental sciences because of the reliance on remote sensing data.

There is much overlap in some fields and raster and vector datasets can be used side-by-side: ecologists and demographers, for example, commonly use both vector and raster data.

Geospatial data = data + spatial metadata

Ultimately they are just data with geographic metadata attached to it in a structured way that both locates the data and also makes it interoperable. One of the most important aspects of this georgraphic metadata is the Coordinate Reference System being used.


Coordinate Reference Systems

Coordinate Reference Systems (CRS) provide a framework for defining real-world locations.

The basis for any Coordinate reference system is a model of the earth (Globe Model) on which locations can be defined.

We can think of the differences between Globe Models as being analogous to differences in globes you’ve actually seen in your life – some are perfect balls, while others are spheres with elevations, while others capture the full weirdness of the earth’s shape.

Image source: http://www.nickeubank.com/wp-content/uploads/2018/02/CoordinateSystems_And_Projections.pdf

Types of common coordinate systems

There are two types of Coordinate Reference Systems:

  • Geographic coordinate system (GCS): A global or spherical coordinate system such as latitude-longitude.
  • Projected coordinate systems (PCS): A projected coordinate system provides various mechanisms to project maps of the earth’s spherical surface onto a two-dimensional Cartesian coordinate plane. Projected coordinate systems are referred to as map projections.

Geographic Coordinate System

Geographic Coordinate Systems are defined by a Datum

A Datum essentially defines the Globe Model of the coordinate system.

It has:

  • angular units (eg. degrees)
  • defines a starting point (i.e. where is (0,0) in particular which longitude is defined to be 0°?), so the angles reference a meaningful spot on the earth.
  • defines the position of the spheroid relative to the center of the earth and therefore the origin and orientation of latitude and longitude lines.

A point is referenced by its longitude and latitude values. Longitude and latitude are angles measured from the earth’s center to a point on the earth’s surface.

Image source: https://datacarpentry.org/r-raster-vector-geospatial/images/map_usa_different_projections.jpg

The right datum for the right job

A particular datum can be selected for use in a specific geographic area, because it does a good job of modelling the surface of the earth in that part of the world.

Image source: http://desktop.arcgis.com/en/arcmap/10.3/guide-books/map-projections/GUID-E94B3A5F-D997-4E6F-B400-BC15C4B0DCDC-web.gif

Orange Peel Analogy: a datum is your choice of fruit to use in the orange peel analogy. Is the earth an orange, a lemon, a lime, a grapefruit?

Citrus fruit on display at the market

Citrus fruit on display at the market

Image source: https://farm3.staticflickr.com/2260/2508805118_500f5bba28_n.jpg

Common Datums

Geographic (datum) transformations

To change between geographic coordinate system, you may need to perform a geographic (datum) transformation. This is a well-defined mathematical method to convert coordinates between two geographic coordinate systems.

Projected coordinate systems (PCS)

Projected coordinate systems include a projection

A Projection is a mathematical transformation of the angular measurements on a round earth to a flat surface (i.e. paper or a computer screen).

Units of measurement are linear, usually meters.

Orange Peel Analogy: a projection is how you peel your orange and then flatten the peel.

Image source: http://blogs.lincoln.ac.nz/gis/2017/03/29/where-on-earth-are-we/


Common projected coordinate systems:

-Mercator projection: a cylindrical map projection presented by the Flemish geographer and cartographer Gerardus Mercator in 1569. It became the standard map projection for nautical purposes because of its ability to represent lines of constant course as straight segments that conserve the angles with the meridians.

  • Universal Transverse Mercator (UTM) system: The UTM system is not a single map projection. The system instead divides the Earth into sixty zones, each being a six-degree band of longitude, and uses a secant transverse Mercator projection in each zone.

  • Albers Equal Area: a conic, equal area map projection that uses two standard parallels. Although scale and shape are not preserved, distortion is minimal between the standard parallels.

Image source: https://datacarpentry.org/r-raster-vector-geospatial/images/map_usa_different_projections.jpg


Q: Why is WGS 84 so different to other three projected maps?

Latitude lines are concentric circles that become smaller near the poles. They form a single point at the poles.

When a lat/lon CRS is flattened, the implicit flattening function used is the Plate Carree Function, one of the simplest functions that simply turns the globe into a cylinder and then unwraps it, resulting in increasing distortion of features as one approaches the poles.

Q: Why do the three projected maps differ from each other?

Going back to the orange peel analogy, different projections use a different starting point and method to peel the earth’s surface into a flat surface.


Map reprojections

To change between projected CRSs you need to reproject your data. This often involves unprojecting back to lat/lon, transforming the unprojected coordinates and then reprojecting.


Notation for Coordinate Reference Systems in R

You have two options for identifying a CRS in most R commands. The documentation for a command that requires projection information will tell you which is required. Often you can choose between the two options.

EPSG Code

An EPSG (European Petroleum Survey Group) Code is an ID that has been assigned to most common projections to make reference to a particular projection easy. An EPSG Code is also called an SRID (Spatial Reference Identifier). Technically, EPSG is the authority that assigns SRIDs, but you will hear these terms used interchangibly.

pros/cons

The main advantages to using this method of specifying a projection are that it is standardized and ensures you have the same parameters every time.

The disadvantage is that if you need to know the parameters used by the projection or it’s name, you have to look them up, but that’s fairly easy to to at spatialreference.org. Also, you can’t customize the parameters if you use an EPSG code.

PROJ.4 String

PROJ.4 is an open source library for defining and converting between coordinate reference systems. It defines a standard way to write projection parameters.

For example, the proj.4 string: +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0 encodes:

  • proj=longlat: the data are in a geographic (latitude and longitude) coordinate system
  • datum=WGS84: the datum WGS84 (the datum refers to the 0,0 reference for the coordinate system used in the projection)
  • ellps=WGS84: the ellipsoid (how the earth’s roundness is calculated) is WGS84

Note that there are no specified units above. This is because this geographic coordinate reference system is in latitude and longitude which is most often recorded in decimal degrees.

pros/cons

Two important advantages to using this option are:

  1. the parameters are human-readable and immediately transparent
  2. the strings are easily customized.

The main disadvantage to this option is that it’s easy to make a mistake when you reproduce the string, accidentally changing parameters.

For more, check out Proj4 FAQs

The most common mistake to beware of:

You may find yourself in a situatation were you’ll need to tell your GIS (in our case R) what the projection/CRS of your data should be. In these cases, the GIS needs to know what the projection/CRS currently is, not what you would like it to be. If you need to change a projection, you need to go through a different process, often called Re-project or Transform.

A motivating example

(from Geocomputation in R, Section 2.1):

The city of London can be represented by the coordinates c(-0.1, 51.5). This means that its location is -0.1 degrees east and 51.5 degrees north of the origin. The origin in this case is at 0 degrees longitude (the Prime Meridian) and 0 degree latitude (the Equator) in a geographic (‘lon/lat’) coordinate system (GCS).

The same point could also be approximated in a PCS with ‘Easting/Northing’ values of c(530000, 180000) in the British National Grid (BNG), meaning that London is located 530 km East and 180 km North of the origin of the CRS (right panel).

The location of BNG’s origin, in the sea beyond South West Peninsular, ensures that most locations in the UK have positive Easting and Northing values.

Sources and inspiration for this section:

more advice from twitter experts

Session information

sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.3

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] workflowr_1.0.1   Rcpp_0.12.18      digest_0.6.15    
 [4] rprojroot_1.3-2   R.methodsS3_1.7.1 backports_1.1.2  
 [7] git2r_0.21.0      magrittr_1.5      evaluate_0.11    
[10] stringi_1.2.4     whisker_0.3-2     R.oo_1.21.0      
[13] R.utils_2.6.0     rmarkdown_1.10    tools_3.4.4      
[16] stringr_1.3.1     yaml_2.1.19       compiler_3.4.4   
[19] htmltools_0.3.6   knitr_1.20       

This reproducible R Markdown analysis was created with workflowr 1.0.1