Open-sourced by AutonCorp - Advanced engineering automation
A powerful, intelligent table extraction system for PDF datasheets. Extracts real data tables while filtering out schematics, pin diagrams, and other false positives.
Built for engineers who need clean, readable data from complex technical documents.
- Smart Table Detection: Uses geometric analysis to identify real data tables
- Beautiful ASCII Rendering: Renders tables as gorgeous ASCII art with proper alignment
- False Positive Filtering: Automatically rejects circuit schematics, corrupted content, and artifacts
- Multiple Output Formats: ASCII art, Markdown, and raw data
- Manufacturer Agnostic: Works with datasheets from any manufacturer
pip install -r requirements.txt# Extract tables from first page
python demo.py your_datasheet.pdf
# Extract from specific page (1-indexed)
python demo.py your_datasheet.pdf 3from smart_table_extractor import SmartTableExtractor
from true_geometric_renderer import TrueGeometricRenderer
import fitz
# Open PDF
doc = fitz.open("datasheet.pdf")
page = doc[0]
# Extract tables
extractor = SmartTableExtractor()
tables, corrupted_zones = extractor.extract_tables_from_page(page)
# Render beautifully
renderer = TrueGeometricRenderer()
pymupdf_tables = page.find_tables()
for i, table_data in enumerate(tables):
if i < len(pymupdf_tables.tables):
ascii_art = renderer.render(pymupdf_tables.tables[i], page)
print(ascii_art)
doc.close()- Validates table structure (minimum rows/columns, reasonable dimensions)
- Detects oversized cells (probably graphs, not tables)
- Identifies over-segmentation (too many empty cells)
- Curve Detection: Rejects tables containing curves/circles (circuit schematics)
- Text Corruption Detection: Identifies encoding issues
- Content Density: Ensures tables aren't mostly empty
Automatically classifies tables as:
maximum_ratingselectrical_characteristicspin_configurationdevice_informationoperating_conditions- And more...
TABLE 1: Maximum Ratings
Size: 4 rows × 3 columns
Position: (50.2, 120.5, 300.8, 180.2)
Beautiful ASCII Rendering:
┌─────────────────────┬────────┬─────────┐
│ Parameter │ Symbol │ Value │
├─────────────────────┼────────┼─────────┤
│ Supply Voltage │ VCC │ 16V │
│ Input Voltage │ VIN │ VCC │
│ Operating Temp │ TA │ 70°C │
└─────────────────────┴────────┴─────────┘
↓ Transformed into clean ASCII ↓

smart_table_extractor.py: Core table detection and validationtrue_geometric_renderer.py: Beautiful ASCII table renderingdemo.py: Example usage and command-line interface
- Python 3.7+
- PyMuPDF (fitz) 1.23.0+
AutonCorp specializes in advanced engineering automation and productivity tools.
This table extractor emerged from our need to efficiently parse technical documentation and extract structured data from complex PDFs.
Open Source - Share with friends and build amazing things!
Part of AutonCorp's mission to democratize advanced engineering tools.
