Skip to content

Commit d82884f

Browse files
committed
docs: readme features
1 parent a1d5323 commit d82884f

File tree

3 files changed

+30
-112
lines changed

3 files changed

+30
-112
lines changed

README.md

Lines changed: 7 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -57,16 +57,6 @@ DB-GPT is an experimental open-source project that uses localized GPT large mode
5757
Run on an RTX 4090 GPU.
5858
##### Chat Excel
5959
![excel](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/0474d220-2a9f-449f-a940-92c8a25af390)
60-
##### Chat Plugin
61-
![auto_plugin_new](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/7d95c347-f4b7-4fb6-8dd2-c1c02babaa56)
62-
##### LLM Management
63-
![llm_manage](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/501d6b3f-c4ce-4197-9a6f-f016f8150a11)
64-
##### FastChat && vLLM
65-
![vllm](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/0c9475d2-45ee-4573-aa5a-814f7fd40213)
66-
##### Trace
67-
![trace_new](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/69bd14b8-14d0-4ca9-9cb7-6cef44a2bc93)
68-
##### Chat Knowledge
69-
![kbqa_new](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/72266a48-edef-4c6d-88c6-fbb1a24a6c3e)
7060

7161
## Install
7262
![Docker](https://img.shields.io/badge/docker-%230db7ed.svg?style=for-the-badge&logo=docker&logoColor=white)
@@ -97,23 +87,23 @@ Run on an RTX 4090 GPU.
9787
## Features
9888

9989
Currently, we have released multiple key features, which are listed below to demonstrate our current capabilities:
100-
- Private KBQA & data processing
90+
- **Private Domain Q&A & Data Processing**
10191

10292
The DB-GPT project offers a range of features to enhance knowledge base construction and enable efficient storage and retrieval of both structured and unstructured data. These include built-in support for uploading multiple file formats, the ability to integrate plug-ins for custom data extraction, and unified vector storage and retrieval capabilities for managing large volumes of information.
10393

104-
- Multiple data sources & visualization
94+
- **Multi-Data Source & GBI(Generative Business intelligence)**
10595

10696
The DB-GPT project enables seamless natural language interaction with various data sources, including Excel, databases, and data warehouses. It facilitates effortless querying and retrieval of information from these sources, allowing users to engage in intuitive conversations and obtain insights. Additionally, DB-GPT supports the generation of analysis reports, providing users with valuable summaries and interpretations of the data.
10797

108-
- Multi-Agents&Plugins
98+
- **Multi-Agents&Plugins**
10999

110100
It supports custom plug-ins to perform tasks, natively supports the Auto-GPT plug-in model, and the Agents protocol adopts the Agent Protocol standard.
111101

112-
- Fine-tuning text2SQL
113-
102+
- **Automated Fine-tuning text2SQL**
103+
114104
An automated fine-tuning lightweight framework built around large language models, Text2SQL data sets, LoRA/QLoRA/Pturning, and other fine-tuning methods, making TextSQL fine-tuning as convenient as an assembly line. [DB-GPT-Hub](https://github.com/eosphoros-ai/DB-GPT-Hub)
115105

116-
- Multi LLMs Support, Supports multiple large language models, currently supporting
106+
- **SMMF(Service-oriented Multi-model Management Framework)**
117107

118108
Massive model support, including dozens of large language models such as open source and API agents. Such as LLaMA/LLaMA2, Baichuan, ChatGLM, Wenxin, Tongyi, Zhipu, etc.
119109
- [Vicuna](https://huggingface.co/Tribbiani/vicuna-13b)
@@ -126,30 +116,14 @@ Currently, we have released multiple key features, which are listed below to dem
126116
- [falcon-40b](https://huggingface.co/tiiuae/falcon-40b)
127117
- [internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b)
128118
- [Qwen-7B-Chat/Qwen-14B-Chat](https://huggingface.co/Qwen/)
129-
- [RWKV-4-Raven](https://huggingface.co/BlinkDL/rwkv-4-raven)
130-
- [CAMEL-13B-Combined-Data](https://huggingface.co/camel-ai/CAMEL-13B-Combined-Data)
131-
- [dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b)
132-
- [h2ogpt-gm-oasst1-en-2048-open-llama-7b](https://huggingface.co/h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b)
133-
- [fastchat-t5-3b-v1.0](https://huggingface.co/lmsys/fastchat-t5)
134-
- [mpt-7b-chat](https://huggingface.co/mosaicml/mpt-7b-chat)
135-
- [gpt4all-13b-snoozy](https://huggingface.co/nomic-ai/gpt4all-13b-snoozy)
136-
- [Nous-Hermes-13b](https://huggingface.co/NousResearch/Nous-Hermes-13b)
137-
- [codet5p-6b](https://huggingface.co/Salesforce/codet5p-6b)
138-
- [guanaco-33b-merged](https://huggingface.co/timdettmers/guanaco-33b-merged)
139-
- [WizardLM-13B-V1.0](https://huggingface.co/WizardLM/WizardLM-13B-V1.0)
140-
- [WizardLM/WizardCoder-15B-V1.0](https://huggingface.co/WizardLM/WizardCoder-15B-V1.0)
141-
- [Llama2-Chinese-13b-Chat](https://huggingface.co/FlagAlpha/Llama2-Chinese-13b-Chat)
142-
- [OpenLLaMa OpenInstruct](https://huggingface.co/VMware/open-llama-7b-open-instruct)
143-
144-
Etc.
145119

146120
- Support API Proxy LLMs
147121
- [x] [ChatGPT](https://api.openai.com/)
148122
- [x] [Tongyi](https://www.aliyun.com/product/dashscope)
149123
- [x] [Wenxin](https://cloud.baidu.com/product/wenxinworkshop?track=dingbutonglan)
150124
- [x] [ChatGLM](http://open.bigmodel.cn/)
151125

152-
- Privacy and security
126+
- **Privacy and Security**
153127

154128
The privacy and security of data are ensured through various technologies, such as privatized large models and proxy desensitization.
155129

@@ -313,16 +287,6 @@ The core capabilities mainly consist of the following parts:
313287

314288
As of October 10, 2023, by fine-tuning an open-source model of 13 billion parameters using this project, the execution accuracy on the Spider evaluation dataset has surpassed that of GPT-4!
315289

316-
| name | Execution Accuracy | reference |
317-
| ----------------------------------| ------------------ | ------------------------------------------------------------------------------------------------------------------------------ |
318-
| **GPT-4** | **0.762** | [numbersstation-eval-res](https://www.numbersstation.ai/post/nsql-llama-2-7b) |
319-
| ChatGPT | 0.728 | [numbersstation-eval-res](https://www.numbersstation.ai/post/nsql-llama-2-7b) |
320-
| **CodeLlama-13b-Instruct-hf_lora**| **0.789** | sft train by our this project,only used spider train dataset ,the same eval way in this project with lora SFT |
321-
| CodeLlama-13b-Instruct-hf_qlora | 0.774 | sft train by our this project,only used spider train dataset ,the same eval way in this project with qlora and nf4,bit4 SFT |
322-
| wizardcoder | 0.610 | [text-to-sql-wizardcoder](https://github.com/cuplv/text-to-sql-wizardcoder/tree/main) |
323-
| CodeLlama-13b-Instruct-hf | 0.556 | eval in this project default param |
324-
| llama2_13b_hf_lora_best | 0.744 | sft train by our this project,only used spider train dataset ,the same eval way in this project |
325-
326290
[More Information about Text2SQL finetune](https://github.com/eosphoros-ai/DB-GPT-Hub)
327291

328292
## Licence

README.zh.md

Lines changed: 22 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -34,12 +34,9 @@
3434
</div>
3535

3636
## DB-GPT 是什么?
37+
DB-GPT是一个开源的数据库领域大模型框架。目的是构建大模型领域的基础设施,通过开发多模型管理、Text2SQL效果优化、RAG框架以及优化、Multi-Agents框架协作等多种技术能力,让围绕数据库构建大模型应用更简单,更方便。
3738

38-
随着大模型的发布迭代,大模型变得越来越智能,在使用大模型的过程当中,遇到极大的数据安全与隐私挑战。在利用大模型能力的过程中我们的私密数据跟环境需要掌握自己的手里,完全可控,避免任何的数据隐私泄露以及安全风险。基于此,我们发起了DB-GPT项目,为所有以数据库为基础的场景,构建一套完整的私有大模型解决方案。 此方案因为支持本地部署,所以不仅仅可以应用于独立私有环境,而且还可以根据业务模块独立部署隔离,让大模型的能力绝对私有、安全、可控。我们的愿景是让围绕数据库构建大模型应用更简单,更方便。
39-
40-
DB-GPT 是一个开源的以数据库为基础的GPT实验项目,使用本地化的GPT大模型与您的数据和环境进行交互,无数据泄露风险,100% 私密
41-
42-
39+
数据3.0 时代,基于模型、数据库,企业/开发者可以用更少的代码搭建自己的专属应用。
4340

4441
## 目录
4542

@@ -59,19 +56,8 @@ DB-GPT 是一个开源的以数据库为基础的GPT实验项目,使用本地
5956

6057
##### Chat Excel
6158
![excel](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/0474d220-2a9f-449f-a940-92c8a25af390)
62-
#### Chat Plugin
63-
![auto_plugin_new](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/7d95c347-f4b7-4fb6-8dd2-c1c02babaa56)
64-
#### LLM Management
65-
![llm_manage](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/501d6b3f-c4ce-4197-9a6f-f016f8150a11)
66-
#### FastChat && vLLM
67-
![vllm](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/0c9475d2-45ee-4573-aa5a-814f7fd40213)
68-
#### Trace
69-
![trace_new](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/69bd14b8-14d0-4ca9-9cb7-6cef44a2bc93)
70-
#### Chat Knowledge
71-
![kbqa_new](https://github.com/eosphoros-ai/DB-GPT/assets/13723926/72266a48-edef-4c6d-88c6-fbb1a24a6c3e)
7259

7360
#### 根据自然语言对话生成分析图表
74-
7561
<p align="left">
7662
<img src="./assets/chat_excel/chat_excel_6.png" width="800px" />
7763
</p>
@@ -80,10 +66,6 @@ DB-GPT 是一个开源的以数据库为基础的GPT实验项目,使用本地
8066
<img src="./assets/dashboard.png" width="800px" />
8167
</p>
8268

83-
<p align="left">
84-
<img src="./assets/chat_dashboard/chat_dashboard_2.png" width="800px" />
85-
</p>
86-
8769
## 安装
8870

8971
![Docker](https://img.shields.io/badge/docker-%230db7ed.svg?style=for-the-badge&logo=docker&logoColor=white)
@@ -111,26 +93,23 @@ DB-GPT 是一个开源的以数据库为基础的GPT实验项目,使用本地
11193
- [**FAQ**](https://db-gpt.readthedocs.io/en/latest/getting_started/faq/deploy/deploy_faq.html)
11294

11395
## 特性一览
114-
115-
目前我们已经发布了多种关键的特性,这里一一列举展示一下当前发布的能力。
116-
117-
- 私域问答&数据处理
96+
- **私域问答&数据处理&RAG**
11897

11998
支持内置、多文件格式上传、插件自抓取等方式自定义构建知识库,对海量结构化,非结构化数据做统一向量存储与检索
120-
121-
- 多数据源&可视化
99+
100+
- **多数据源&GBI**
122101

123102
支持自然语言与Excel、数据库、数仓等多种数据源交互,并支持分析报告。
124103

125-
- 自动化微调
104+
- **自动化微调**
126105

127106
围绕大语言模型、Text2SQL数据集、LoRA/QLoRA/Pturning等微调方法构建的自动化微调轻量框架, 让TextSQL微调像流水线一样方便。详见: [DB-GPT-Hub](https://github.com/eosphoros-ai/DB-GPT-Hub)
128107

129-
- Multi-Agents&Plugins
108+
- **Data-Driven Multi-Agents&Plugins**
130109

131110
支持自定义插件执行任务,原生支持Auto-GPT插件模型,Agents协议采用Agent Protocol标准
132111

133-
- 多模型支持与管理
112+
- **多模型支持与管理**
134113

135114
海量模型支持,包括开源、API代理等几十种大语言模型。如LLaMA/LLaMA2、Baichuan、ChatGLM、文心、通义、智谱等。
136115
- 支持多种大语言模型, 当前已支持如下模型:
@@ -141,30 +120,14 @@ DB-GPT 是一个开源的以数据库为基础的GPT实验项目,使用本地
141120
- [baichuan-7B](https://huggingface.co/baichuan-inc/baichuan-7B)
142121
- [chatglm-6b](https://huggingface.co/THUDM/chatglm-6b)
143122
- [chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b)
144-
- [falcon-40b](https://huggingface.co/tiiuae/falcon-40b)
145-
- [internlm-chat-7b](https://huggingface.co/internlm/internlm-chat-7b)
146-
- [Qwen-7B-Chat/Qwen-14B-Chat](https://huggingface.co/Qwen/)
147-
- [RWKV-4-Raven](https://huggingface.co/BlinkDL/rwkv-4-raven)
148-
- [CAMEL-13B-Combined-Data](https://huggingface.co/camel-ai/CAMEL-13B-Combined-Data)
149-
- [dolly-v2-12b](https://huggingface.co/databricks/dolly-v2-12b)
150-
- [h2ogpt-gm-oasst1-en-2048-open-llama-7b](https://huggingface.co/h2oai/h2ogpt-gm-oasst1-en-2048-open-llama-7b)
151-
- [fastchat-t5-3b-v1.0](https://huggingface.co/lmsys/fastchat-t5)
152-
- [mpt-7b-chat](https://huggingface.co/mosaicml/mpt-7b-chat)
153-
- [gpt4all-13b-snoozy](https://huggingface.co/nomic-ai/gpt4all-13b-snoozy)
154-
- [Nous-Hermes-13b](https://huggingface.co/NousResearch/Nous-Hermes-13b)
155-
- [codet5p-6b](https://huggingface.co/Salesforce/codet5p-6b)
156-
- [guanaco-33b-merged](https://huggingface.co/timdettmers/guanaco-33b-merged)
157-
- [WizardLM-13B-V1.0](https://huggingface.co/WizardLM/WizardLM-13B-V1.0)
158-
- [WizardLM/WizardCoder-15B-V1.0](https://huggingface.co/WizardLM/WizardCoder-15B-V1.0)
159-
- [Llama2-Chinese-13b-Chat](https://huggingface.co/FlagAlpha/Llama2-Chinese-13b-Chat)
160-
- [OpenLLaMa OpenInstruct](https://huggingface.co/VMware/open-llama-7b-open-instruct)
123+
161124
- 支持在线代理模型
162125
- [x] [ChatGPT](https://api.openai.com/)
163126
- [x] [Tongyi](https://www.aliyun.com/product/dashscope)
164127
- [x] [Wenxin](https://cloud.baidu.com/product/wenxinworkshop?track=dingbutonglan)
165128
- [x] [ChatGLM](http://open.bigmodel.cn/)
166129

167-
- 隐私安全
130+
- **隐私安全**
168131

169132
通过私有化大模型、代理脱敏等多种技术保障数据的隐私安全。
170133

@@ -192,22 +155,23 @@ DB-GPT 是一个开源的以数据库为基础的GPT实验项目,使用本地
192155
| [StarRocks](https://github.com/StarRocks/starrocks) | No | TODO |
193156

194157
## 架构方案
195-
DB-GPT基于 [FastChat](https://github.com/lm-sys/FastChat) 构建大模型运行环境。此外,我们通过LangChain提供私域知识库问答能力。同时我们支持插件模式, 在设计上原生支持Auto-GPT插件。我们的愿景是让围绕数据库和LLM构建应用程序更加简便和便捷。
196-
197158
整个DB-GPT的架构,如下图所示
198-
199159
<p align="center">
200160
<img src="./assets/DB-GPT_zh.png" width="800px" />
201161
</p>
202162

203-
核心能力主要有以下几个部分。
204-
1. 多模型:支持多LLM,如LLaMA/LLaMA2、CodeLLaMA、ChatGLM、QWen、Vicuna以及代理模型ChatGPT、Baichuan、tongyi、wenxin等
205-
2. 私域知识库问答: 可以根据本地文档(如pdf、word、excel等数据)进行高质量的智能问答。
206-
3. 统一数据向量存储和索引: 将数据嵌入为向量并存储在向量数据库中,提供内容相似性搜索。
207-
4. 多数据源: 用于连接不同的模块和数据源,实现数据的流动和交互。
208-
5. Agent与插件: 提供Agent和插件机制,使得用户可以自定义并增强系统的行为。
209-
6. 隐私和安全: 您可以放心,没有数据泄露的风险,您的数据100%私密和安全。
210-
7. Text2SQL: 我们通过在大型语言模型监督微调(SFT)来增强文本到SQL的性能
163+
核心能力主要有以下几个部分:
164+
- **RAG(Retrieval Augmented Generation)**,RAG是当下落地实践最多,也是最迫切的领域,DB-GPT目前已经实现了一套基于RAG的框架,用户可以基于DB-GPT的RAG能力构建知识类应用。
165+
166+
- **GBI**:生成式BI是DB-GPT项目的核心能力之一,为构建企业报表分析、业务洞察提供基础的数智化技术保障。
167+
168+
- **Fine-tune框架**: 模型微调是任何一个企业在垂直、细分领域落地不可或缺的能力,DB-GPT提供了完整的微调框架,实现与DB-GPT项目的无缝打通,在最近的微调中,基于spider的准确率已经做到了82.5%
169+
170+
- **数据驱动的Multi-Agents框架**: DB-GPT提供了数据驱动的自进化微调框架,目标是可以持续基于数据做决策与执行。
171+
172+
- **数据工厂**: 数据工厂主要是在大模型时代,做可信知识、数据的清洗加工。
173+
174+
- **数据源**: 对接各类数据源,实现生产业务数据无缝对接到DB-GPT核心能力。
211175

212176
### RAG生产落地实践架构
213177
<p align="center">
@@ -345,16 +309,6 @@ The MIT License (MIT)
345309
- SFT模型准确率
346310
截止20231010,我们利用本项目基于开源的13B大小的模型微调后,在Spider的评估集上的执行准确率,已经超越GPT-4!
347311

348-
| 模型名称 | 执行准确率 | 说明 |
349-
| ----------------------------------| ------------------ | ------------------------------------------------------------------------------------------------------------------------------ |
350-
| **GPT-4** | **0.762** | [numbersstation-eval-res](https://www.numbersstation.ai/post/nsql-llama-2-7b) |
351-
| ChatGPT | 0.728 | [numbersstation-eval-res](https://www.numbersstation.ai/post/nsql-llama-2-7b) |
352-
| **CodeLlama-13b-Instruct-hf_lora**| **0.789** | sft train by our this project,only used spider train dataset ,the same eval way in this project with lora SFT |
353-
| CodeLlama-13b-Instruct-hf_qlora | 0.774 | sft train by our this project,only used spider train dataset ,the same eval way in this project with qlora and nf4,bit4 SFT |
354-
| wizardcoder | 0.610 | [text-to-sql-wizardcoder](https://github.com/cuplv/text-to-sql-wizardcoder/tree/main) |
355-
| CodeLlama-13b-Instruct-hf | 0.556 | eval in this project default param |
356-
| llama2_13b_hf_lora_best | 0.744 | sft train by our this project,only used spider train dataset ,the same eval way in this project |
357-
358312
[More Information about Text2SQL finetune](https://github.com/eosphoros-ai/DB-GPT-Hub)
359313

360314
## 联系我们

docs/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
Overview
77
------------------
88

9-
| DB-GPT is an open-source framework for large models in the database field. Its purpose is to build infrastructure for the domain of large models, making it easier and more convenient to develop applications around databases. By developing various technical capabilities such as:
9+
| DB-GPT is an open-source framework for large models in the databases fields. It's purpose is to build infrastructure for the domain of large models, making it easier and more convenient to develop applications around databases. By developing various technical capabilities such as:
1010
1111
1. **SMMF(Service-oriented Multi-model Management Framework)**
1212
2. **Text2SQL Fine-tuning**

0 commit comments

Comments
 (0)