Corpora¶
This page contains several corpora relevant to political science research, categorized by country and key source, a link for where to find them and a note if they are not free. We are working with many on these to develop Texti.
Parties and elections¶
Item |
Country |
Description |
Access |
Link |
---|---|---|---|---|
Manifesto Project |
51 inc. OECD |
All political manifestos from the first democratic election onwards. |
API; stata, spss, csv, xslx |
|
Speeches |
UK |
Speeches from party leaders from 1895 to today |
HTML on site |
|
Regional manifesto |
Spain |
1980 to 2019, all regional parties |
Download |
|
Regional manifesto |
Wales and Scotland |
1999 and 2016 |
Download |
|
Regional manifesto |
Italy |
Fragmented depending on region |
Download |
Parliament Activity¶
Item |
Country |
Description |
Access |
Link |
---|---|---|---|---|
Parliamentary Questions Answered |
UK |
278428 questions; csv |
API |
|
EP Plenary |
European Union |
1997 to 2019 |
HTTP resolvable URIs |
|
Parliament Debates |
France |
Debates of the 15th legislature |
HTTP resolvable URIs; XML |
|
Lords Written Questions |
UK |
52004 questions |
API; csv |
|
Commons Written Questions |
UK |
275929 questions |
API; csv |
|
Questions to the Government |
France |
Since 2017 |
HTTP resolvable URIs |
|
Questions to the Government - without debates |
France |
Since 2017 |
HTTP resolvable URIs |
|
Written quesions to the Government |
France |
Since 2017 |
HTTP resolvable URIs |
|
Parliamentary Debates on Europe |
France |
2002 to 2012 |
HTTP resolvable URIs |
|
Parliamentary speeches |
Austria, Czech Republic, Germany, Denmark, Netherlands, NZ, Spain, Sweden, UK, Ireland |
21 to 32 years of data |
API on DataVerse; full-text vectors in rds |
|
Parliament Rules |
UK |
1811 to 2019 |
Download |
|
Parliament Rules |
Ireland |
1922 to 2020 |
Download |
|
Debates and Replies to Questions |
Ireland |
All |
API |
|
Senate “Dossiers Legislatifs” |
France |
Documents discussed since 1977 |
Download |
|
Amendments by the Senate |
France |
Amendments since 2001 |
Download |
|
Lords Bill Amendments |
UK |
11727 Amendments |
API |
|
Questions to the Government (Senate) |
France |
Since 1978 |
Download |
|
Research Briefings |
UK |
9739 briefings |
API, csv with 500 records limit |
|
Proceedings |
European union |
1996-2011 |
Download, xml |
Legislative Documents¶
Item |
Country |
Description |
Access |
Link |
---|---|---|---|---|
All legislation |
European Union |
Summaries of EU legislation (full corpus exists but wrong license) |
HTML on site (can email Dimiter Toshkov for |
|
Trade agreements |
European Union |
All free trade agreements |
List of linked PDFs |
|
Bills |
UK |
All bills since 2007 |
API |
|
All Legal Texts |
France |
Constitution, laws and decrees, court rulings, treaties (in French and translated) |
Downloadable + beta API |
|
Legislation |
Wales |
All Bills, Acts, Marshalled lists |
XML export |
|
The Record of Proceedings |
Wales |
All proceedings |
XML export |
|
International Environment Agency |
World |
Most environmental treaties and agreements |
List of .txt on the website |
|
Bills and Acts |
Ireland |
All |
API |
|
All trade agreements |
All |
All |
Download |
Identity and Culture¶
Item |
Country |
Description |
Access |
Link |
---|---|---|---|---|
National Anthems |
World |
194 countries |
Download |
Presidential & Governmental Activity¶
Item |
Country |
Description |
Access |
Link |
---|---|---|---|---|
Political speeches |
UK |
8000+ political speeches on British Politics |
HTML |
|
Official correspondence |
UK |
All official correspondence of PMs |
API |
|
PM transcripts |
Australia |
Ministerial transcripts from 1940s to date |
API; xml |
|
Speeches |
EU |
All ECB President / VP speeches |
Download; csv |
|
Speeches |
Germany |
6,685 speeches by 71 officials, spanning a time from 1984 to 2017 |
Download, xml |
|
Speeches |
EU |
18,403 speeches from EU leaders from 2007 to 2015 |
API from DataVerse; csv raw speeches, and term-document matrices in R |
|
State of the Nation |
South Africa |
1990 to 2018 |
Download from Kaggle; txt per speech |
Participative democracy¶
Item |
Country |
Description |
Access |
Link |
---|---|---|---|---|
Public consultations |
France |
Recent public consultations |
HTTP-resolvable URIs |
|
E-petitions |
UK |
All official e-petitions |
API; JSON, xml, csv, HTML |
News and Media¶
Item |
Country |
Description |
Access |
Link |
---|---|---|---|---|
EUvsDisinfo |
Europe |
Debunked news articles by European External Action Services |
API; HTML |
|
New York Times |
All |
Archive metadata, books, comments, reviews, most popular articles |
API; JSON |
e.g. Here |
Public debates over European integration |
Austria, Britain, France, Germany, Sweden, and Switzerland |
1970s to 2012 from newspapers |
csv, dta |
|
Public debates over globalization issues |
Austria, Britain, France, Germany, the Netherlands, and Switzerland |
2004-2006 from newspapers |
csv, dta |
|
Archive of Political emais |
Australie, Canada, France, Germany, Ireland, Italy, NZ, UK, USA |
348,680 emails |
HTML |
|
News articles |
Not specified |
9+ million articles and metadata for each |
CSV split in 1GB zip files, download from GitHub |
|
Poliwoops |
Many countries including USA, UK and most European countries |
Deleted tweets by public officials and politicians |
API; JSON |
Messy list of promising websites¶
Websites that might be goldmines but would require some time to explore.
European Language Resource Coordincation
A lot of legal / official documents translated and sometimes already processed. E.g. IP case law, audits, a lot of legal texts from EU countries (not sure how useful they really are, but it is a lot of them, there might be some interesting ones)
Clarin
List of 24 parliamentary corpora, not all easy access
https://www.clarin.eu/resource-families/parliamentary-corpora
EveryCRSReport.com
Reports from the Congressional Research Service — essentially the national legislature’s think-tank.
Supreme court transcripts
Complementary text data¶
Texts that are not necessarily directly relevant to political science research but are used for context / complement. E.g. annotate etc.
Wikipedia or other “ground truth” sources
Network data
Dictionaries: e.g. sentiment or emotions to use automated dictionary methods with one click
US Political Science focus¶
Item |
Country |
Description |
Access |
Link |
---|---|---|---|---|
General Social Survey |
US |
General Social Survey (GSS) monitors societal change in the US |
Download: for SPSS, STATA |
|
The Supreme Court Database |
US |
Case Centered Data - Total Rows : 13,533 |
Download: CSV, DTA (STATA), POR (SPSS), RDATA, XLSX |
|
The Supreme Court Database |
US |
Justice Centered Data - Total Rows : 121,224 |
Download: CSV, DTA (STATA), POR (SPSS), RDATA, XLSX |
|
Congressional speech data |
US |
Congressional-speech corpus includes labels for whether the speaker supported or opposed, by-name references between speakers, and the scores that our agreement/disagreement classifier(s), debate and related extracted information. (9.8 Mb, tar.gz format) |
Download: compressed tar.gz, multiple types including CSV |
|
ANES |
US |
Electoral behavior, political participation, and public opinion studies - Time Series Studies , Pilot Studies, Special Studies |
Download |
|
CorPS |
US |
CORPS is a corpus of political speeches tagged with specific audience reactions, such as APPLAUSE or LAUGHTER. |
Request from marco.guerini[at]trentorise.eu and strappa[at]fbk.eu |
|
Congressional Record for the 43rd-114th Congresses |
US |
Parsed Speeches and Phrase Counts |
Download: zip of organized txt files |
|
GDELT |
US |
All events from broadcast, print, and web news from nearly every corner of every country in over 100 languages |
Download: CSV |
|
The American Presidency Project |
US |
Presidential documents, papers, press, orders, memoranda etc |
HTML |
|
Full text corpus data |
US |
10 large corpora of English: iWeb, COCA, COHA, NOW, Coronavirus, GloWbE, TV Corpus, Movie Corpus, Soap Corpus, Wikipedia |
Purchase raw data in 3 formats |
|
GovInfo |
US |
Congressional Bills; Bill Status; Bill Summaries; Commerce Business Daily; Code of Federal Regulations (Annual Edition); Electronic Code of Federal Regulations; Federal Register; United States Government Manual; House Rules and Manual; Privacy Act Issuances; Public Papers of the Presidents of the United States; Supreme Court Decisions 1937-1975 (FLITE) |
Download: XML |
|
DIME PLUS |
US |
Database on Ideology, Money in Politics, and Elections: Public version 2.0 |
Download: compressed CSV |
|
Replication data for: Tracing the Flow of Policy Ideas in Legislatures: A Text Reuse Approach |
US |
Replication Data |
Download: compressed archive |
|
CONGRESSIONAL & FEDERAL - Government Web Harvests |
US |
The National Archives and Records Administration (NARA) web harvests (i.e. capture) of Federal Agency public web sites since 2004 |
Web harvests |
|
Congress.gov - Bill Status |
US |
Bill Status data includes all data from the existing Bill Summaries data se |
XML bulk data; API |