RagX is a high-performance, scalable, and extensible RAG (Retrieval-Augmented Generation) core service, refactored from WeKnora. It focuses on providing high-quality document parsing and maintaining a loosely coupled architecture for services like document slicing, vectorization, retrieval, and reranking.
- High-Quality Document Parsing: Supports various document formats, including text, PDF, DOCX, and images (with OCR). Special focus on handling complex layouts and tables.
- Advanced Image and Table Processing:
- Multimodal Processing: Extracts and analyzes images from documents, using OCR to get text and VLM to generate captions.
- Smart Table Handling: Converts tables from
.docxto HTML. For tables in images or PDFs, it uses a "PDF to image" strategy, leveraging VLM-based OCR for high-quality structural extraction.
- Pluggable Components: Easily integrate with different vector databases, embedding models, and rerankers.
- Dynamic OCR Engine: Implements a two-stage OCR process. It first uses a fast, local OCR engine (like PaddleOCR) for initial text extraction. If it detects a potential table, it then uses a more powerful VLM-based OCR (like Nanonets) for more accurate, structured data extraction. This approach balances cost, speed, and accuracy.
- Scalable Architecture: Designed for high-concurrency and low-latency serving.
- Developer-Friendly API: Clear and concise HTTP API for all core functionalities.
cmd/server: Main application entry point.internal/: Core application logic.application/: Application services and repositories.config/: Configuration management.container/: Dependency injection.router/: HTTP routing.
services/docreader: Python-based document parsing service.config/: Configuration files.
(To be updated)