RagX

RagX is a high-performance, scalable, and extensible RAG (Retrieval-Augmented Generation) core service, refactored from WeKnora. It focuses on providing high-quality document parsing and maintaining a loosely coupled architecture for services like document slicing, vectorization, retrieval, and reranking.

Core Features

High-Quality Document Parsing: Supports various document formats, including text, PDF, DOCX, and images (with OCR). Special focus on handling complex layouts and tables.
Advanced Image and Table Processing:
- Multimodal Processing: Extracts and analyzes images from documents, using OCR to get text and VLM to generate captions.
- Smart Table Handling: Converts tables from .docx to HTML. For tables in images or PDFs, it uses a "PDF to image" strategy, leveraging VLM-based OCR for high-quality structural extraction.
Pluggable Components: Easily integrate with different vector databases, embedding models, and rerankers.
Dynamic OCR Engine: Implements a two-stage OCR process. It first uses a fast, local OCR engine (like PaddleOCR) for initial text extraction. If it detects a potential table, it then uses a more powerful VLM-based OCR (like Nanonets) for more accurate, structured data extraction. This approach balances cost, speed, and accuracy.
Scalable Architecture: Designed for high-concurrency and low-latency serving.
Developer-Friendly API: Clear and concise HTTP API for all core functionalities.

Project Structure

cmd/server: Main application entry point.
internal/: Core application logic.
- application/: Application services and repositories.
- config/: Configuration management.
- container/: Dependency injection.
- router/: HTTP routing.
services/docreader: Python-based document parsing service.
config/: Configuration files.

Getting Started

(To be updated)

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
client		client
cmd/server		cmd/server
config		config
internal		internal
memory-bank		memory-bank
services/docreader		services/docreader
ui		ui
.gitignore		.gitignore
README.md		README.md
doc_processing_analysis_and_optimization.md		doc_processing_analysis_and_optimization.md
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum
ragx.Dockerfile		ragx.Dockerfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RagX

Core Features

Project Structure

Getting Started

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RagX

Core Features

Project Structure

Getting Started

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages