Skip to content

CASOS-IDeaS-CMU/Detection-and-Discovery-of-Misinformation-Sources

Repository files navigation

Detection and Discovery of Misinformation Sources using Attributed Webgraphs

Interactive News Webgraph

To explore the webgraph related to these news sites, checkout our interactive webgraph exploration tool built ontop of the CommonCrawl dataset.

Introduction

These scripts can be used to train classifiers using the NewsSEO dataset and is based on the research paper "Detection and Discovery of Misinformation Sources using Attributed Webgraphs" [PDF]. If you use, extend or build upon this project, please cite the following paper (upcoming at ICWSM 2024):

@article{carragher2024detection,
  title={Detection and Discovery of Misinformation Sources using Attributed Webgraphs},
  author={Carragher, Peter and Williams, Evan M and Carley, Kathleen M},
  journal={arXiv preprint arXiv:2401.02379},
  year={2024}
}

Inputs

  • Follow the readme to populate the data directory with the NewsSEO dataset
  • Webgraph data & SEO attributes have been pulled from ahrefs.com
  • Labels have been scraped from mediabiasfactcheck.com using this open-source scraper

Environment Setup

pip3 install -r requirements.txt
# Generate edge weights
cd analysis && python3 weights.py && cd ../ 
# Run GNN weight scheme experiments
python3 gnns/train.py 0 
# Run GNN top N backlink experiments
python3 gnns/train.py 1

Outputs

This code is provided as is, and neither the author nor the university is responsible for maintaining it. It provides the following functionality:

  • A classifier that predicts the reliability of news sources
  • A classifier that predicts the political leaning of news sources
  • A discovery system that finds more unreliable news sources from an initial list of news sites

More specifically, the repository is organized as follows:

License

BSD 3-Clause License

Copyright (c) 2024, Peter Carragher

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

  3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

About

Train classifiers on SEO and Webgraph data to detect unreliable news sources.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors