Project Objectives

Extract word frequecies from the NOW corpus during specific time periods

SAMPLES

Sources

UK:

Daily Mail, The Guardian, The Independent, Financial Times, Independent, Metro, Evening Standard, The National, Daily Star, The Courier, The Sun, The Times

US:

The Atlantic , TIME , Los Angeles Times , Wall Street Journal , Chicago Tribune , New York Daily News , Washington Post , New York Post , The Seattle Times , Washington Times , Dallas Morning News , National Geographic , The Week Magazine , Baltimore Sun , The New Yorker , New York Magazine , Milwaukee Journal Sentinel , Minneapolis Star Tribune , Politico

Time periods

Brexit: 23/05/2016 – 24/07/2016
2012 US Elections: 08/10/2012 – 09/12/2012
continuous random sample: 03/02/2014 – 06/04/2014
weekly random sample: randomly select 9 weeks, except those in the above three samples

Words

all with frequency >= 5

OUTPUTS

DATAFRAMES

	newspaper1	newspaper2	newspaper3	newspaper4	newspaper5
word1	f11	f12	f13	…	f1n
word2	f21	f22	f23	…	f2n
word3	f31	f32	f33	…	f3n
…	…	…	…	…	…
wordm	fm1	fm2	fm3	fm4	fm4

• Keep separate tables for UK and US newspapers as well as for each of the four Time-criteria

• File-format: anything Python & R friendly (as there will be a lot of preprocessing steps and descriptive statistics) – csv, tab-delimited, some compressed formats easy to open in both etc.

• Please, save all scripts!

Vignettes

Reports

r01 - Article level data availability

Reproducing the analysis

The full analysis (including rebuilding the database) can be reproduced using the scripts in folder code in this sequence:

m01_proc-data.sh: Copy files required to build database to an ooominds user directory (shell).
m02_make-db.sh: Create database in the ooominds user’s directory (shell / sqlite3).
m03_pop-db.R: Do some preliminary cleaning and populate database (R). (see vignette)
m04_extract_word_freq.R: Sample database for different time periods and calculate frequencies (R). see vignette

Home

Project Objectives

Extract word frequecies from the NOW corpus during specific time periods

SAMPLES

Sources

UK:

US:

Time periods

Words

OUTPUTS

Vignettes

Compile the Database

Extract word frequencies

Reports

r01 - Article level data availability

Reproducing the analysis

All functions associated with the workflow are in the `R` directory.

Home

Project Objectives

Extract word frequecies from the NOW corpus during specific time periods

SAMPLES

Sources

UK:

US:

Time periods

Words

OUTPUTS

Vignettes

Compile the Database

Extract word frequencies

Reports

r01 - Article level data availability

Reproducing the analysis

All functions associated with the workflow are in the R directory.

All functions associated with the workflow are in the `R` directory.