Skip to content
#

pdf-processing

Here are 164 public repositories matching this topic...

document-processing-pipeline-for-regulated-industries

A boilerplate solution for processing image and PDF documents for regulated industries, with lineage and pipeline operations metadata services.

  • Updated Oct 25, 2021
  • Python

LangGraphRAG: A terminal-based Retrieval-Augmented Generation system using LangGraph. Features include message history caching, query transformation, and vector database retrieval. Ideal for NLP researchers and developers working on advanced conversational AI and information retrieval systems.

  • Updated Jul 13, 2024
  • Python

πŸ“š AI-Powered Book EPUB Knowledge Extractor & Summarizer Transform your PDF books into structured knowledge effortlessly! This tool leverages AI to analyze books page by page, extracting key insights, definitions, and concepts, and organizes them into Markdown summaries for easier study

  • Updated Sep 28, 2025
  • Python

MistralOCR is an open-source application that transforms documents into structured data using Mistral AI's OCR capabilities. Built with FastAPI and Streamlit, it provides an intuitive interface for extracting and processing text from PDFs and images, making document digitization effortless and accurate.

  • Updated Jan 9, 2026
  • Python

A Python library for extracting tables from PDF documents using computer vision and image processing techniques. It converts PDF pages to images, detects tables, recognizes their structure, and outputs clean data in JSON format.

  • Updated Oct 18, 2025
  • Python
SafePDF

SafePDF is a privacy-focused offline tool for PDF manipulation. Merge, compress, split, and organize your PDF files securely: No internet required, your documents stay local and safe.

  • Updated Jan 6, 2026
  • Python

PdfSnipper is a lightweight and efficient Python package designed to simplify the management of PDF files, pages, and their conversions during various NLP, Computer Vision (CV), or other data processing tasks. The package eliminates the need for repetitive code by providing intuitive, ready-to-use functions for common PDF-related operations.

  • Updated Feb 3, 2025
  • Python

πŸ€– RAGbot – a RAG chatbot ✨ featuring a React frontend with πŸ“ Markdown rendering & βž— LaTeX support, 🐍 Python FastAPI backend, πŸ” FAISS vector database for semantic search, 🧠 Sentence Transformers embeddings (all-MiniLM-L6-v2), πŸ¦™ LongCat LLM integration, πŸ“„ PDF/Markdown document indexing, and 🎨 responsive dark mode UI!

  • Updated Jan 3, 2026
  • Python

Improve this page

Add a description, image, and links to the pdf-processing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pdf-processing topic, visit your repo's landing page and select "manage topics."

Learn more