Skip to content

bissli82/S8-RentScraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cleveland Area Section 8 Rent Scraper

A Python tool to scrape Cleveland area Section 8 rent determination data and generate comprehensive visual reports with city mappings and statistical analysis.

Description

This project scrapes rent data from Cleveland area housing authority websites and processes it to create detailed visual summaries. It's designed to help analyze Section 8 housing rental data across different ZIP codes and cities in the Cleveland metropolitan area.

Features

🏠 Data Collection

  • Concurrent web scraping of Cleveland area ZIP codes
  • Automatic form handling and AJAX response parsing
  • Progress tracking with real-time updates
  • Comprehensive error handling and data validation

📊 Visual Reports

  • PNG tables with professional formatting
  • ZIP | City format showing both ZIP codes and corresponding city names
  • Red highlighting for high-rent areas (customizable thresholds)
  • Average calculations integrated directly into the table
  • Gray styling for headers and averages for clear visual separation

🎯 Smart Highlighting

  • 2 Bedrooms: $1,500+ highlighted in red
  • 3 Bedrooms: $1,800+ highlighted in red
  • 4 Bedrooms: $2,200+ highlighted in red

🏙️ Cleveland Area Coverage

Includes comprehensive city mappings for Cleveland area ZIP codes:

  • Cleveland (multiple ZIP codes)
  • Garfield Heights (44125)
  • Parma (44130, 44134)
  • Independence (44131)
  • Maple Heights (44137)
  • Lakewood (44117)
  • Cleveland Heights (44118, 44121)
  • Rocky River (44116)
  • Strongsville (44136, 44149)
  • And many more Cleveland area cities

📁 Multiple Output Formats

  • PNG images for presentations
  • CSV files for data analysis
  • JSON raw data for further processing

Requirements

  • Python 3.9+
  • Dependencies managed with Poetry

Installation

  1. Clone this repository

    git clone <repository-url>
    cd S8-RentScraper
  2. Install Poetry (if you haven't already)

    # On Windows
    pip install poetry
    
    # On macOS/Linux
    curl -sSL https://install.python-poetry.org | python3 -
  3. Install dependencies

    poetry install

Usage

Basic Usage

python s8_rentscraper.py

What the script does:

  1. Fetches all available ZIP codes from the housing authority website
  2. Scrapes rent data for each ZIP code concurrently (8 parallel requests)
  3. Processes and validates the data
  4. Generates visual table with city mappings and highlighting
  5. Saves results in multiple formats (PNG, JSON)

Output Files

  • results.json - Raw scraped data
  • rent_table.png - Visual table with highlighting and statistics
  • scraper.log - Execution log with timing information

Configuration

The script includes customizable thresholds in the code:

thresholds = {
    "2": 1500,  # 2-bedroom threshold
    "3": 1800,  # 3-bedroom threshold  
    "4": 2200   # 4-bedroom threshold
}

Dependencies

  • requests: HTTP library for web scraping
  • beautifulsoup4: HTML parsing and data extraction
  • pandas: Data manipulation and analysis
  • matplotlib: Plotting and table visualization
  • tqdm: Progress bars and status tracking

Project Structure

S8-RentScraper/
├── s8_rentscraper.py    # Main scraper script
├── pyproject.toml       # Poetry dependencies
├── README.md           # This file
├── .gitignore          # Git ignore patterns
├── results.json        # Generated: Raw scraped data
├── rent_table.png      # Generated: Visual table
└── scraper.log         # Generated: Execution logs

Features in Detail

Visual Table Features

  • ZIP | City labels: Clear identification of areas
  • Red highlighting: Immediate identification of high-rent areas
  • Average row: Quick statistical overview
  • Professional styling: Gray headers and consistent formatting
  • Comprehensive coverage: All Cleveland area ZIP codes with accurate city names

Performance

  • Concurrent scraping: 8 parallel requests for faster execution
  • Error resilience: Continues scraping even if individual ZIP codes fail
  • Progress tracking: Real-time updates on scraping progress
  • Timing information: Detailed execution time logging

License

This project is for educational and research purposes. Data is sourced from public housing authority websites.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages