Syndicate PDF Table Extractor

Open-sourced by AutonCorp - Advanced engineering automation

A powerful, intelligent table extraction system for PDF datasheets. Extracts real data tables while filtering out schematics, pin diagrams, and other false positives.

Built for engineers who need clean, readable data from complex technical documents.

Features

Smart Table Detection: Uses geometric analysis to identify real data tables
Beautiful ASCII Rendering: Renders tables as gorgeous ASCII art with proper alignment
False Positive Filtering: Automatically rejects circuit schematics, corrupted content, and artifacts
Multiple Output Formats: ASCII art, Markdown, and raw data
Manufacturer Agnostic: Works with datasheets from any manufacturer

Quick Start

Installation

pip install -r requirements.txt

Usage

# Extract tables from first page
python demo.py your_datasheet.pdf

# Extract from specific page (1-indexed)
python demo.py your_datasheet.pdf 3

Programmatic Usage

from smart_table_extractor import SmartTableExtractor
from true_geometric_renderer import TrueGeometricRenderer
import fitz

# Open PDF
doc = fitz.open("datasheet.pdf")
page = doc[0]

# Extract tables
extractor = SmartTableExtractor()
tables, corrupted_zones = extractor.extract_tables_from_page(page)

# Render beautifully
renderer = TrueGeometricRenderer()
pymupdf_tables = page.find_tables()

for i, table_data in enumerate(tables):
    if i < len(pymupdf_tables.tables):
        ascii_art = renderer.render(pymupdf_tables.tables[i], page)
        print(ascii_art)

doc.close()

What Makes It Smart?

Geometric Intelligence

Validates table structure (minimum rows/columns, reasonable dimensions)
Detects oversized cells (probably graphs, not tables)
Identifies over-segmentation (too many empty cells)

Content Analysis

Curve Detection: Rejects tables containing curves/circles (circuit schematics)
Text Corruption Detection: Identifies encoding issues
Content Density: Ensures tables aren't mostly empty

Table Classification

Automatically classifies tables as:

maximum_ratings
electrical_characteristics
pin_configuration
device_information
operating_conditions
And more...

Example Output

Command Line Demo

Beautiful ASCII Tables

TABLE 1: Maximum Ratings
Size: 4 rows × 3 columns
Position: (50.2, 120.5, 300.8, 180.2)

Beautiful ASCII Rendering:
┌─────────────────────┬────────┬─────────┐
│ Parameter           │ Symbol │ Value   │
├─────────────────────┼────────┼─────────┤
│ Supply Voltage      │ VCC    │ 16V     │
│ Input Voltage       │ VIN    │ VCC     │
│ Operating Temp      │ TA     │ 70°C    │
└─────────────────────┴────────┴─────────┘

Before vs After

Original PDF Table:

↓ Transformed into clean ASCII ↓

Components

smart_table_extractor.py: Core table detection and validation
true_geometric_renderer.py: Beautiful ASCII table rendering
demo.py: Example usage and command-line interface

Requirements

Python 3.7+
PyMuPDF (fitz) 1.23.0+

About AutonCorp

AutonCorp specializes in advanced engineering automation and productivity tools.

This table extractor emerged from our need to efficiently parse technical documentation and extract structured data from complex PDFs.

License

Open Source - Share with friends and build amazing things!

Part of AutonCorp's mission to democratize advanced engineering tools.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
__pycache__		__pycache__
images		images
README.md		README.md
__init__.py		__init__.py
demo.py		demo.py
requirements.txt		requirements.txt
smart_table_extractor.py		smart_table_extractor.py
true_geometric_renderer.py		true_geometric_renderer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Syndicate PDF Table Extractor

Features

Quick Start

Installation

Usage

Programmatic Usage

What Makes It Smart?

Geometric Intelligence

Content Analysis

Table Classification

Example Output

Command Line Demo

Beautiful ASCII Tables

Before vs After

Components

Requirements

About AutonCorp

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Syndicate PDF Table Extractor

Features

Quick Start

Installation

Usage

Programmatic Usage

What Makes It Smart?

Geometric Intelligence

Content Analysis

Table Classification

Example Output

Command Line Demo

Beautiful ASCII Tables

Before vs After

Components

Requirements

About AutonCorp

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages