library supporting NLP and CV research on scientific papers
-
Updated
Nov 8, 2024 - Python
library supporting NLP and CV research on scientific papers
Multiple and Large PDF Documents Text Extraction.
The Privacy Firewall for LLMs
A boilerplate solution for processing image and PDF documents for regulated industries, with lineage and pipeline operations metadata services.
Official Python client library for Nutrient Document Web Services API - PDF processing, OCR, watermarking, and document manipulation with automatic Office format conversion
Python scripts that converts PDF files to text, splits them into chunks, and stores their vector representations using GPT4All embeddings in a Chroma DB. It also provides a script to query the Chroma DB for similarity search based on user input.
Local, privacy-friendly resume analysis: convert, classify, and get advice using TFβIDF, Logistic Regression, and sentence-transformer embeddings.
LangGraphRAG: A terminal-based Retrieval-Augmented Generation system using LangGraph. Features include message history caching, query transformation, and vector database retrieval. Ideal for NLP researchers and developers working on advanced conversational AI and information retrieval systems.
A powerful, multi-modal Telegram bot leveraging cutting-edge AI technologies including Gemini, DeepSeek, OpenRouter, and 50+ AI models for comprehensive conversational assistance, media processing, and collaborative features with MCP (Model Context Protocol) integration.
π AI-Powered Book EPUB Knowledge Extractor & Summarizer Transform your PDF books into structured knowledge effortlessly! This tool leverages AI to analyze books page by page, extracting key insights, definitions, and concepts, and organizes them into Markdown summaries for easier study
MistralOCR is an open-source application that transforms documents into structured data using Mistral AI's OCR capabilities. Built with FastAPI and Streamlit, it provides an intuitive interface for extracting and processing text from PDFs and images, making document digitization effortless and accurate.
LegalEase AI is a document simplification tool built using Gemini API, Streamlit, and Hugging Face models. It allows users to upload legal PDFs and automatically receive simplified summaries, clause-level insights, and structured information designed for clarity and accessibility.
Local RAG app with zero-config Docker setup. FastAPI + Streamlit + Qdrant + Ollama. Just run `docker-compose up --build`! π
An all-in-one GUI management toolkit built with PyQt6, offering a suite of tools for file synchronization, media organization, PDF merging, code formatting, and more.
A Python library for extracting tables from PDF documents using computer vision and image processing techniques. It converts PDF pages to images, detects tables, recognizes their structure, and outputs clean data in JSON format.
Built for HackRx 6.0 β Bajaj Finservβs Annual Hackathon, this backend system enables intelligent queryβretrieval over large documents using LLMs, semantic search, and explainable decision logic.
SafePDF is a privacy-focused offline tool for PDF manipulation. Merge, compress, split, and organize your PDF files securely: No internet required, your documents stay local and safe.
πβ¨ PDF AI Chatbot: Turn your PDFs into knowledge! Explore, summarize, and ask questions using Artificial Intelligence and RAG. ππ€
PdfSnipper is a lightweight and efficient Python package designed to simplify the management of PDF files, pages, and their conversions during various NLP, Computer Vision (CV), or other data processing tasks. The package eliminates the need for repetitive code by providing intuitive, ready-to-use functions for common PDF-related operations.
π€ RAGbot β a RAG chatbot β¨ featuring a React frontend with π Markdown rendering & β LaTeX support, π Python FastAPI backend, π FAISS vector database for semantic search, π§ Sentence Transformers embeddings (all-MiniLM-L6-v2), π¦ LongCat LLM integration, π PDF/Markdown document indexing, and π¨ responsive dark mode UI!
Add a description, image, and links to the pdf-processing topic page so that developers can more easily learn about it.
To associate your repository with the pdf-processing topic, visit your repo's landing page and select "manage topics."