CFT-RAG: An Entity Tree Based Retrieval Augmented Generation Algorithm With Cuckoo Filter
pip install uv
uv sync
pip install lab-1806-vec-db==0.2.3 python-dotenv sentence-transformers openai pybloom_live
export HF_ENDPOINT=https://hf-mirror.com
python -m spacy download zh_core_web_smArguments:
vec-db-key: The key for the vector database.tree-num-max: The maximum number of trees to build.entities-file-name: The name of the entities file.search-method: The search method to use:0for Vector Database Only1for Naive Tree-RAG2for Bloom Filter Search in Tree-RAG5for improved Bloom Filter Search in Tree-RAG7for Cuckoo Filter in Tree-RAG8for Approximate Nearest Neighbors in Tree-RAG9for Approximate Nearest Neighbors in Graph-RAG
node-num-max: The maximum number of nodes to build.
Example:
python main.py --tree-num-max 50 --search-method 7
Individually testing the performance of the improved Cuckoofilter and the sorting results:
python test_tree.py
TRAG-cuckoofilter is based on https://github.com/efficient/cuckoofilter.
Use of datasets:
| Dataset | MedQA | AESLC | DART | Rui'an People's Hospital |
|---|---|---|---|---|
| Scale | Large | Medium | Medium | Small |
| Source | https://github.com/jind11/MedQA | https://huggingface.co/datasets/Yale-LILY/aeslc | https://github.com/Yale-LILY/dart | https://www.rahos.gov.cn |

