Project Objectives

Extract word frequecies from the NOW corpus during specific time periods



SAMPLES

Sources

UK:

Daily Mail, The Guardian, The Independent, Financial Times, Independent, Metro, Evening Standard, The National, Daily Star, The Courier, The Sun, The Times

US:

The Atlantic , TIME , Los Angeles Times , Wall Street Journal , Chicago Tribune , New York Daily News , Washington Post , New York Post , The Seattle Times , Washington Times , Dallas Morning News , National Geographic , The Week Magazine , Baltimore Sun , The New Yorker , New York Magazine , Milwaukee Journal Sentinel , Minneapolis Star Tribune , Politico


Time periods

  • Brexit: 23/05/2016 – 24/07/2016

  • 2012 US Elections: 08/10/2012 – 09/12/2012

  • continuous random sample: 03/02/2014 – 06/04/2014

  • weekly random sample: randomly select 9 weeks, except those in the above three samples


Words

all with frequency >= 5



OUTPUTS

DATAFRAMES

newspaper1 newspaper2 newspaper3 newspaper4 newspaper5
word1 f11 f12 f13 f1n
word2 f21 f22 f23 f2n
word3 f31 f32 f33 f3n
wordm fm1 fm2 fm3 fm4 fm4

• Keep separate tables for UK and US newspapers as well as for each of the four Time-criteria

• File-format: anything Python & R friendly (as there will be a lot of preprocessing steps and descriptive statistics) – csv, tab-delimited, some compressed formats easy to open in both etc.

• Please, save all scripts!



Reproducing the analysis

The full analysis (including rebuilding the database) can be reproduced using the scripts in folder code in this sequence:


All functions associated with the workflow are in the R directory.

This R Markdown site was created with workflowr