GitHub - cpljames269/2DMatrixScanner: This Python script is designed to process a single PDF file in the same directory, specifically looking for QR codes (or other 1D/2D barcodes) in the corners of each page. It decodes these barcodes to extract specific numeric information related to page sets within the PDF.

PDF Barcode Processor This Python script is a utility for automatically processing PDF documents to extract structured information from barcodes (such as QR codes) located in the corners of each page. It is designed to work with documents that use a specific barcode format to define logical "sets" of pages, allowing for automated processing, splitting, or data extraction.

The script uses a combination of powerful libraries to:

Identify a single PDF file in its directory.

Scan each page for barcodes in a specific format.

Interpret the decoded barcode data to understand page numbering, set size, and global page numbers.

Generate detailed reports in JSON format for further use.

Features Automatic PDF Detection: Automatically finds a single PDF file in the script's directory.

Barcode Scanning: Scans all four corners of each page for 1D or 2D barcodes.

High-Resolution Processing: Renders PDF pages at 600 DPI to ensure reliable barcode decoding.

Structured Output: Generates multiple JSON files with extracted data for easy integration with other systems.

sets_info.json: Detailed information on each document set.

pages_per_set.json: A list of page counts for each set.

global_page_numbers.json: A list of the global page numbers where each set begins.

check_to_fix.json: Flags potential errors where a barcode might not be on the first page of a set.

Intelligent Navigation: Skips pages within a set after successfully decoding the starting barcode, greatly speeding up the process for multi-page documents.

Error Handling: Provides clear output for missing or invalid PDFs and for decoding failures.

Self-Cleaning: Creates and manages an cropped_images folder to store temporary files, ensuring a clean working environment.

Requirements The script depends on several Python libraries. You can install them using pip:

pip install PyMuPDF Pillow pyzxing

PyMuPDF (fitz): For processing PDF documents.

Pillow (PIL): For image manipulation and cropping.

PyZXing: A wrapper for the ZXing barcode decoding library. This requires a working Java Runtime Environment (JRE) to be installed on your system.

Usage Placement: Place the Python script (.py file) in the same directory as the PDF file you want to process.

Execution: Open a terminal or command prompt in that directory and run the script:

python your_script_name.py

Review Output: The script will print its progress to the console and generate several JSON files and a cropped_images folder containing the temporary image files.

Barcode Format The script is configured to look for a specific numeric barcode format. The expected format is a 7 or 8-digit number structured as follows:

[2 digits] - Page number within the current set (e.g., 01, 02).

[2 digits] - Total pages in the current set (e.g., 12, 04).

[3-4 digits] - Global page number where the set starts.

For example, a barcode decoded as 01120005 would be interpreted as:

Page 1 of 12.

The set starts on global page 5.

Configuration You can easily adjust the following parameters within the script:

output_folder: The name of the folder for temporary images.

corner_crop_size: The size of the square region (in pixels) to crop from each corner. Increasing this value (900 pixels or more) can improve decoding accuracy on low-quality PDFs, while a smaller value (400 pixels) is faster.

Contribution If you find a bug or have a suggestion for an improvement, feel free to open an issue or submit a pull request on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Tests		Tests
041725-9517-print.pdfZone.Identifier		041725-9517-print.pdfZone.Identifier
061625-4729-print.pdfZone.Identifier		061625-4729-print.pdfZone.Identifier
2dMatrix.py		2dMatrix.py
README.md		README.md
check_to_fix.json		check_to_fix.json
cyb431v1_module_1_report_template.docx		cyb431v1_module_1_report_template.docx
global_page_numbers.json		global_page_numbers.json
pages_per_set.json		pages_per_set.json
sets_info.json		sets_info.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages