Using automated event detection to reduce data collection costs with an application to the BFRS dataset

Project Active from 8 Jul 2019 to 31 Aug 2019 State

Researchers

Ali Hasanain

Assistant Professor, Lahore University of Management Sciences

Muhammad Fareed Zaffar

Associate Professor, Lahore University of Management Sciences

The project aims to replicate and extend the BFRS dataset (which measured political violence in Pakistan, based on press reporting, from 1988 to 2011) by using machine learning and algorithm techniques.

Incident-level data on political violence has been analysed to study causal linkages between terrorism and economic growth. Such research aids policymakers in formulating policies that are focused on reducing the impact of terrorism. Better scholarship and policy using the BFRS dataset can create not only academic value but also strengthen the case for improving internal security in Pakistan, while better-informed voters can create pressure on the government to improve (Banerjee, Kumar, Pande, and Su; Ferrera, 2011).

BFRS compiles political violence by recording location, consequence, cause, type of violence, and party responsible. However, a problem with datasets compiled by manual newspaper extraction is the recurring cost of updating. Advancements in textual analytics suggest a better way to keep the dataset up to date: extracting data through automation.

We propose to create a similar dataset from 2010 to the present day by automating the identification and categorisation of events using textual analysis with pattern recognition. The automation will streamline the process and, once developed, this machine-learning tool will provide quick updates without incurring any additional cost. The project also aims to create the capacity for the construction of similar datasets on subjects other than violence.

Our proposed dataset’s overlap with the BFRS time period will allow us to compare the two and gauge the accuracy of this developed tool. We will search for events reported in BFRS in our scraped data; we will train our algorithms on how to detect such events using this data; and, finally, we will use the trained algorithm on data scraped between 2013 and the present to extract further instances of political violence without human intervention. As an additional check on data quality, human coders will review a sample of extracted data to detect false positives.

Project outputs

Applied Development Economics Seminar Series: Dr Syed Ali Hasanain

Video

Themes: State
Countries: Pakistan

Directed by

Themes

Current issues

Initiatives

Using automated event detection to reduce data collection costs with an application to the BFRS dataset

Researchers

Ali Hasanain

Muhammad Fareed Zaffar

Project outputs

Applied Development Economics Seminar Series: Dr Syed Ali Hasanain

Decentralisation in a weak state: Traditional and state governance in the DRC

A consumer incentive scheme experiment: Pilot study

Household welfare and access to services in conflict-affected Kachin State

A maternal and child health intervention in the context of unprecedented flooding: Lessons for disaster responsive social protectiom

Researchers

Ali Hasanain

Muhammad Fareed Zaffar

Project outputs

Applied Development Economics Seminar Series: Dr Syed Ali Hasanain

Share

More from IGC

Decentralisation in a weak state: Traditional and state governance in the DRC

A consumer incentive scheme experiment: Pilot study

Household welfare and access to services in conflict-affected Kachin State

A maternal and child health intervention in the context of unprecedented flooding: Lessons for disaster responsive social protectiom