Skip to content

Commit d6eb283

Browse files
authored
fix(chat-excel):Explicitly create data tables from the df (#2437) (#2464)
# Description fix:#2437 Optimize the prompts for reconstructing data tables to ensure that the output field names comply with SQL standards, avoiding field names that start with numbers. # How Has This Been Tested? Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration # Snapshots: Include snapshots for easier review. # Checklist: - [x] My code follows the style guidelines of this project - [x] I have already rebased the commits and make the commit message conform to the project standard. - [x] I have performed a self-review of my own code - [x] I have commented my code, particularly in hard-to-understand areas - [x] I have made corresponding changes to the documentation - [x] Any dependent changes have been merged and published in downstream modules
2 parents 6a79c8c + 9bcc8e2 commit d6eb283

File tree

2 files changed

+11
-5
lines changed

2 files changed

+11
-5
lines changed

packages/dbgpt-app/src/dbgpt_app/scene/chat_data/chat_excel/excel_learning/prompt.py

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -52,9 +52,11 @@
5252
4. If it's in other languages, translate them to English, and replace spaces with \
5353
underscores
5454
5. If it's special characters, delete them directly
55-
6. All column fields must be analyzed and converted, remember to output in JSON
55+
6. DuckDB adheres to the SQL standard, which requires that identifiers \
56+
(column names, table names) cannot start with a number.
57+
7. All column fields must be analyzed and converted, remember to output in JSON
5658
Avoid phrases like ' // ... (similar analysis for other columns) ...'
57-
7. You need to provide the original column names and the transformed new column names \
59+
8. You need to provide the original column names and the transformed new column names \
5860
in the JSON, as well as your analysis of the meaning and function of that column. If \
5961
it's a time type, please provide the time format, such as: \
6062
yyyy-MM-dd HH:MM:ss
@@ -111,9 +113,10 @@
111113
3. 如果是中文,将中文字段名翻译为英文,并且将空格替换为下划线
112114
4. 如果是其它语言,将其翻译为英文,并且将空格替换为下划线
113115
5. 如果是特殊字符,直接删除
114-
6. 所以列的字段都必须分析和转换,切记在 JSON 中输出
116+
6. DuckDB遵循SQL标准,要求标识符(列名、表名)不能以数字开头
117+
7. 所以列的字段都必须分析和转换,切记在 JSON 中输出
115118
' // ... (其他列的类似分析) ...)' 之类的话术
116-
7. 你需要在json中提供原始列名和转化后的新的列名,以及你分析\
119+
8. 你需要在json中提供原始列名和转化后的新的列名,以及你分析\
117120
的该列的含义和作用,如果是时间类型请给出时间格式类似:\
118121
yyyy-MM-dd HH:MM:ss
119122

packages/dbgpt-app/src/dbgpt_app/scene/chat_data/chat_excel/excel_reader.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -171,7 +171,10 @@ def read_from_df(
171171

172172
df = df.rename(columns=lambda x: x.strip().replace(" ", "_"))
173173
# write data in duckdb
174-
db.register(table_name, df)
174+
db.register("temp_df_table", df)
175+
# The table is explicitly created due to the issue at
176+
# https://github.com/eosphoros-ai/DB-GPT/issues/2437.
177+
db.execute(f"CREATE TABLE {table_name} AS SELECT * FROM temp_df_table")
175178
return table_name
176179

177180

0 commit comments

Comments
 (0)