In this project, wer’re using Gapminder data to explore the properties of R markdown.

Gapminder produces free teaching resources helping make the world understandable based on reliable statistics.

We’ll inspect and visualise the gapminder dataset sourced from the gapminder package. The main object in this package is the gapminder data frame or “tibble”.

Installation

Install gapminder from CRAN:

install.packages("gapminder")

To perform our analysis we also need to install some additional packages

install.packages(c("ggplot", "DT", "skimr"))

Data

Get data from the gapminder package

library(gapminder)
DT::datatable(gapminder, caption = "gapminder dataset sourced from the gapminder r package")

Summarise the data

library(skimr)
skim(gapminder)
## Skim summary statistics
##  n obs: 1704 
##  n variables: 6 
## 
## ── Variable type:factor ────────────────────────────────────────────────────────────────
##   variable missing complete    n n_unique
##  continent       0     1704 1704        5
##    country       0     1704 1704      142
##                              top_counts ordered
##  Afr: 624, Asi: 396, Eur: 360, Ame: 300   FALSE
##      Afg: 12, Alb: 12, Alg: 12, Ang: 12   FALSE
## 
## ── Variable type:integer ───────────────────────────────────────────────────────────────
##  variable missing complete    n    mean       sd    p0        p25     p50
##       pop       0     1704 1704 3e+07    1.1e+08 60011 2793664    7e+06  
##      year       0     1704 1704  1979.5 17.27     1952    1965.75  1979.5
##       p75       p100     hist
##  2e+07       1.3e+09 ▇▁▁▁▁▁▁▁
##   1993.25 2007       ▇▃▇▃▃▇▃▇
## 
## ── Variable type:numeric ───────────────────────────────────────────────────────────────
##   variable missing complete    n    mean      sd     p0     p25     p50
##  gdpPercap       0     1704 1704 7215.33 9857.45 241.17 1202.06 3531.85
##    lifeExp       0     1704 1704   59.47   12.92  23.6    48.2    60.71
##      p75      p100     hist
##  9325.46 113523.13 ▇▁▁▁▁▁▁▁
##    70.85     82.6  ▁▂▅▅▅▅▇▃

Analysis

Relationship between GDP per capita and Life Expectancy

Linear model of Life expectancy as a function of GDP per capita

lm <- lm(lifeExp ~ gdpPercap, data = gapminder) 
lm
## 
## Call:
## lm(formula = lifeExp ~ gdpPercap, data = gapminder)
## 
## Coefficients:
## (Intercept)    gdpPercap  
##   5.396e+01    7.649e-04
summary(lm)
## 
## Call:
## lm(formula = lifeExp ~ gdpPercap, data = gapminder)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -82.754  -7.758   2.176   8.225  18.426 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 5.396e+01  3.150e-01  171.29   <2e-16 ***
## gdpPercap   7.649e-04  2.579e-05   29.66   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.49 on 1702 degrees of freedom
## Multiple R-squared:  0.3407, Adjusted R-squared:  0.3403 
## F-statistic: 879.6 on 1 and 1702 DF,  p-value: < 2.2e-16

Plot Life expectancy vs GDP per cap on a log scale

library(ggplot2)

ggplot(gapminder, 
       aes(x = gdpPercap, y = lifeExp, colour = continent)) +
  geom_point() + 
  scale_x_log10()