Daily Mail, The Guardian, The Independent, Financial Times, Independent, Metro, Evening Standard, The National, Daily Star, The Courier, The Sun, The Times
The Atlantic , TIME , Los Angeles Times , Wall Street Journal , Chicago Tribune , New York Daily News , Washington Post , New York Post , The Seattle Times , Washington Times , Dallas Morning News , National Geographic , The Week Magazine , Baltimore Sun , The New Yorker , New York Magazine , Milwaukee Journal Sentinel , Minneapolis Star Tribune , Politico
Brexit: 23/05/2016 – 24/07/2016
2012 US Elections: 08/10/2012 – 09/12/2012
continuous random sample: 03/02/2014 – 06/04/2014
weekly random sample: randomly select 9 weeks, except those in the above three samples
all with frequency >= 5
DATAFRAMES
newspaper1 | newspaper2 | newspaper3 | newspaper4 | newspaper5 | |
---|---|---|---|---|---|
word1 | f11 | f12 | f13 | … | f1n |
word2 | f21 | f22 | f23 | … | f2n |
word3 | f31 | f32 | f33 | … | f3n |
… | … | … | … | … | … |
wordm | fm1 | fm2 | fm3 | fm4 | fm4 |
• Keep separate tables for UK and US newspapers as well as for each of the four Time-criteria
• File-format: anything Python & R friendly (as there will be a lot of preprocessing steps and descriptive statistics) – csv, tab-delimited, some compressed formats easy to open in both etc.
• Please, save all scripts!
The full analysis (including rebuilding the database) can be reproduced using the scripts in folder code
in this sequence:
m01_proc-data.sh
: Copy files required to build database to an ooominds user directory (shell
).m02_make-db.sh
: Create database in the ooominds user’s directory (shell
/ sqlite3
).m03_pop-db.R
: Do some preliminary cleaning and populate database (R
). (see vignette)
m04_extract_word_freq.R
: Sample database for different time periods and calculate frequencies (R
). see vignette
R
directory.This R Markdown site was created with workflowr