A Python tool to scrape Cleveland area Section 8 rent determination data and generate comprehensive visual reports with city mappings and statistical analysis.
This project scrapes rent data from Cleveland area housing authority websites and processes it to create detailed visual summaries. It's designed to help analyze Section 8 housing rental data across different ZIP codes and cities in the Cleveland metropolitan area.
- Concurrent web scraping of Cleveland area ZIP codes
- Automatic form handling and AJAX response parsing
- Progress tracking with real-time updates
- Comprehensive error handling and data validation
- PNG tables with professional formatting
- ZIP | City format showing both ZIP codes and corresponding city names
- Red highlighting for high-rent areas (customizable thresholds)
- Average calculations integrated directly into the table
- Gray styling for headers and averages for clear visual separation
- 2 Bedrooms: $1,500+ highlighted in red
- 3 Bedrooms: $1,800+ highlighted in red
- 4 Bedrooms: $2,200+ highlighted in red
Includes comprehensive city mappings for Cleveland area ZIP codes:
- Cleveland (multiple ZIP codes)
- Garfield Heights (44125)
- Parma (44130, 44134)
- Independence (44131)
- Maple Heights (44137)
- Lakewood (44117)
- Cleveland Heights (44118, 44121)
- Rocky River (44116)
- Strongsville (44136, 44149)
- And many more Cleveland area cities
- PNG images for presentations
- CSV files for data analysis
- JSON raw data for further processing
- Python 3.9+
- Dependencies managed with Poetry
-
Clone this repository
git clone <repository-url> cd S8-RentScraper
-
Install Poetry (if you haven't already)
# On Windows pip install poetry # On macOS/Linux curl -sSL https://install.python-poetry.org | python3 -
-
Install dependencies
poetry install
python s8_rentscraper.py- Fetches all available ZIP codes from the housing authority website
- Scrapes rent data for each ZIP code concurrently (8 parallel requests)
- Processes and validates the data
- Generates visual table with city mappings and highlighting
- Saves results in multiple formats (PNG, JSON)
results.json- Raw scraped datarent_table.png- Visual table with highlighting and statisticsscraper.log- Execution log with timing information
The script includes customizable thresholds in the code:
thresholds = {
"2": 1500, # 2-bedroom threshold
"3": 1800, # 3-bedroom threshold
"4": 2200 # 4-bedroom threshold
}- requests: HTTP library for web scraping
- beautifulsoup4: HTML parsing and data extraction
- pandas: Data manipulation and analysis
- matplotlib: Plotting and table visualization
- tqdm: Progress bars and status tracking
S8-RentScraper/
├── s8_rentscraper.py # Main scraper script
├── pyproject.toml # Poetry dependencies
├── README.md # This file
├── .gitignore # Git ignore patterns
├── results.json # Generated: Raw scraped data
├── rent_table.png # Generated: Visual table
└── scraper.log # Generated: Execution logs
- ZIP | City labels: Clear identification of areas
- Red highlighting: Immediate identification of high-rent areas
- Average row: Quick statistical overview
- Professional styling: Gray headers and consistent formatting
- Comprehensive coverage: All Cleveland area ZIP codes with accurate city names
- Concurrent scraping: 8 parallel requests for faster execution
- Error resilience: Continues scraping even if individual ZIP codes fail
- Progress tracking: Real-time updates on scraping progress
- Timing information: Detailed execution time logging
This project is for educational and research purposes. Data is sourced from public housing authority websites.