Sudan Web Scraper

Hourly collection of conflict-related news on Sudan from national, regional, and international sources. The toolset aggregates full articles, links, dates, images, and metadata into a transparent, queryable dataset for research, monitoring, and decision support.

Why this exists: Existing datasets (e.g., ACLED, UCDP) are valuable but can be delayed and opaque about sources. This scraper emphasizes timeliness (hourly jobs) and source transparency (URLs + full text where allowed).

Key Features

Multi-source coverage – National, regional, and international outlets (APIs + static/dynamic websites).
Hourly automation – Production via cron/Task Scheduler; incremental updates.
Transparent data – Store source URL + article text (where legally permitted).
De-duplication – URL-based duplicate detection, with update-on-collision logic.
Export friendly – CSV/Excel examples for quick analysis using Pandas.
Modular design – Add or modify crawlers independently (src/crawlers/, src/utils/).

Installation

Requirements

Python 3.11+
Dependencies listed in requirements.txt or environment.yml

Setup

Clone the repository and create the environment:

git clone https://github.com/stccenter/sudan_web_scraper
cd sudan_web_scraper
conda env create -f environment.yml
conda activate sudan-scraper

Or manually install with pip:

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r requirements.txt

Citation

@article{sudan_scraper_2025,
  title   = {Automating Data Collection to Support Conflict Analysis: Scraping the Internet for Monitoring Hourly Conflict in Sudan},
  author  = {Yahya Masri and Anusha Srirenganathan and Samir Ahmed et al.},
  year    = {2025},
  note    = {Revisions},
  url     = {https://github.com/stccenter/sudan_web_scraper}
}

Name		Name	Last commit message	Last commit date
Latest commit History 185 Commits
.vscode		.vscode
Crawlers		Crawlers
Performance Analysis		Performance Analysis
assets		assets
data		data
docs		docs
notebooks		notebooks
src/utils		src/utils
.gitignore		.gitignore
README.md		README.md
Sudan_Web_Scraping.code-workspace		Sudan_Web_Scraping.code-workspace
environment.yml		environment.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sudan Web Scraper

Key Features

Installation

Requirements

Setup

Citation

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

stccenter/sudan_web_scraper

Folders and files

Latest commit

History

Repository files navigation

Sudan Web Scraper

Key Features

Installation

Requirements

Setup

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages