Skip to content

iXimNet/RagX

Repository files navigation

RagX

RagX is a high-performance, scalable, and extensible RAG (Retrieval-Augmented Generation) core service, refactored from WeKnora. It focuses on providing high-quality document parsing and maintaining a loosely coupled architecture for services like document slicing, vectorization, retrieval, and reranking.

Core Features

  • High-Quality Document Parsing: Supports various document formats, including text, PDF, DOCX, and images (with OCR). Special focus on handling complex layouts and tables.
  • Advanced Image and Table Processing:
    • Multimodal Processing: Extracts and analyzes images from documents, using OCR to get text and VLM to generate captions.
    • Smart Table Handling: Converts tables from .docx to HTML. For tables in images or PDFs, it uses a "PDF to image" strategy, leveraging VLM-based OCR for high-quality structural extraction.
  • Pluggable Components: Easily integrate with different vector databases, embedding models, and rerankers.
  • Dynamic OCR Engine: Implements a two-stage OCR process. It first uses a fast, local OCR engine (like PaddleOCR) for initial text extraction. If it detects a potential table, it then uses a more powerful VLM-based OCR (like Nanonets) for more accurate, structured data extraction. This approach balances cost, speed, and accuracy.
  • Scalable Architecture: Designed for high-concurrency and low-latency serving.
  • Developer-Friendly API: Clear and concise HTTP API for all core functionalities.

Project Structure

  • cmd/server: Main application entry point.
  • internal/: Core application logic.
    • application/: Application services and repositories.
    • config/: Configuration management.
    • container/: Dependency injection.
    • router/: HTTP routing.
  • services/docreader: Python-based document parsing service.
  • config/: Configuration files.

Getting Started

(To be updated)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors