Skip to content

Commit 4377fe3

Browse files
committed
refactor(data_juicer_agent): update imports and add tests
1 parent 5572595 commit 4377fe3

File tree

7 files changed

+308
-196
lines changed

7 files changed

+308
-196
lines changed

data_juicer_agent/.gitignore

Lines changed: 0 additions & 126 deletions
This file was deleted.

data_juicer_agent/README.md

Lines changed: 34 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -4,31 +4,32 @@ A multi-agent data processing system built on [AgentScope](https://github.com/mo
44

55
## 📋 Table of Contents
66

7-
- [📋 Table of Contents](#-table-of-contents)
8-
- [What Does This Agent Do?](#what-does-this-agent-do)
9-
- [Architecture](#architecture)
10-
- [Quick Start](#quick-start)
11-
- [System Requirements](#system-requirements)
12-
- [Installation](#installation)
13-
- [Configuration](#configuration)
14-
- [Usage](#usage)
15-
- [Agent Introduction](#agent-introduction)
16-
- [Data Processing Agent](#data-processing-agent)
17-
- [Code Development Agent (DJ Dev Agent)](#code-development-agent-dj-dev-agent)
18-
- [Advanced Features](#advanced-features)
19-
- [Operator Retrieval](#operator-retrieval)
20-
- [Retrieval Modes](#retrieval-modes)
21-
- [Usage](#usage-1)
22-
- [MCP Agent](#mcp-agent)
23-
- [MCP Server Types](#mcp-server-types)
24-
- [Configuration](#configuration-1)
25-
- [Usage Methods](#usage-methods)
26-
- [Feature Preview](#feature-preview)
27-
- [Data-Juicer Q\&A Agent (Demo Available)](#data-juicer-qa-agent-demo-available)
28-
- [Data Analysis and Visualization Agent (In Development)](#data-analysis-and-visualization-agent-in-development)
29-
- [Troubleshooting](#troubleshooting)
30-
- [Common Issues](#common-issues)
31-
- [Optimization Recommendations](#optimization-recommendations)
7+
- [DataJuicer Agent](#datajuicer-agent)
8+
- [📋 Table of Contents](#-table-of-contents)
9+
- [What Does This Agent Do?](#what-does-this-agent-do)
10+
- [Architecture](#architecture)
11+
- [Quick Start](#quick-start)
12+
- [System Requirements](#system-requirements)
13+
- [Installation](#installation)
14+
- [Configuration](#configuration)
15+
- [Usage](#usage)
16+
- [Agent Introduction](#agent-introduction)
17+
- [Data Processing Agent](#data-processing-agent)
18+
- [Code Development Agent (DJ Dev Agent)](#code-development-agent-dj-dev-agent)
19+
- [Advanced Features](#advanced-features)
20+
- [Operator Retrieval](#operator-retrieval)
21+
- [Retrieval Modes](#retrieval-modes)
22+
- [Usage](#usage-1)
23+
- [MCP Agent](#mcp-agent)
24+
- [MCP Server Types](#mcp-server-types)
25+
- [Configuration](#configuration-1)
26+
- [Usage Methods](#usage-methods)
27+
- [Feature Preview](#feature-preview)
28+
- [Data-Juicer Q\&A Agent (Demo Available)](#data-juicer-qa-agent-demo-available)
29+
- [Data Analysis and Visualization Agent (In Development)](#data-analysis-and-visualization-agent-in-development)
30+
- [Troubleshooting](#troubleshooting)
31+
- [Common Issues](#common-issues)
32+
- [Optimization Recommendations](#optimization-recommendations)
3233

3334
## What Does This Agent Do?
3435

@@ -68,7 +69,14 @@ Router Agent ──┐
6869
### Installation
6970

7071
```bash
71-
uv pip install -e .
72+
# Recommended to use uv
73+
uv pip install -r requirements.txt
74+
```
75+
76+
or
77+
78+
```bash
79+
pip install -r requirements.txt
7280
```
7381

7482
### Configuration

data_juicer_agent/README_ZH.md

Lines changed: 33 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -4,30 +4,31 @@
44

55
## 📋 目录
66

7-
- [📋 目录](#-目录)
8-
- [这个智能体做了什么?](#这个智能体做了什么)
9-
- [架构](#架构)
10-
- [快速开始](#快速开始)
11-
- [系统要求](#系统要求)
12-
- [安装](#安装)
13-
- [配置](#配置)
14-
- [使用](#使用)
15-
- [智能体介绍](#智能体介绍)
16-
- [数据处理智能体](#数据处理智能体)
17-
- [代码开发智能体](#代码开发智能体)
18-
- [高级功能](#高级功能)
19-
- [算子检索](#算子检索)
20-
- [检索模式](#检索模式)
21-
- [使用](#使用-1)
22-
- [MCP 智能体](#mcp-智能体)
23-
- [MCP 服务器类型](#mcp-服务器类型)
24-
- [配置](#配置-1)
25-
- [使用方法](#使用方法)
26-
- [功能预览](#功能预览)
27-
- [Data-Juicer 问答智能体 (演示可用)](#data-juicer-问答智能体-演示可用)
28-
- [数据分析与可视化智能体 (开发中)](#数据分析与可视化智能体-开发中)
29-
- [常见问题](#常见问题)
30-
- [优化建议](#优化建议)
7+
- [DataJuicer 智能体](#datajuicer-智能体)
8+
- [📋 目录](#-目录)
9+
- [这个智能体做了什么?](#这个智能体做了什么)
10+
- [架构](#架构)
11+
- [快速开始](#快速开始)
12+
- [系统要求](#系统要求)
13+
- [安装](#安装)
14+
- [配置](#配置)
15+
- [使用](#使用)
16+
- [智能体介绍](#智能体介绍)
17+
- [数据处理智能体](#数据处理智能体)
18+
- [代码开发智能体](#代码开发智能体)
19+
- [高级功能](#高级功能)
20+
- [算子检索](#算子检索)
21+
- [检索模式](#检索模式)
22+
- [使用](#使用-1)
23+
- [MCP 智能体](#mcp-智能体)
24+
- [MCP 服务器类型](#mcp-服务器类型)
25+
- [配置](#配置-1)
26+
- [使用方法](#使用方法)
27+
- [功能预览](#功能预览)
28+
- [Data-Juicer 问答智能体 (演示可用)](#data-juicer-问答智能体-演示可用)
29+
- [数据分析与可视化智能体 (开发中)](#数据分析与可视化智能体-开发中)
30+
- [常见问题](#常见问题)
31+
- [优化建议](#优化建议)
3132

3233
## 这个智能体做了什么?
3334

@@ -67,7 +68,14 @@ Data-Juicer (DJ) 是一个一站式系统,面向大模型的文本及多模态
6768
### 安装
6869

6970
```bash
70-
uv pip install -e .
71+
# 推荐使用uv
72+
uv pip install -r requirements.txt
73+
```
74+
75+
76+
77+
```bash
78+
pip install -r requirements.txt
7179
```
7280

7381
### 配置

data_juicer_agent/main.py

Lines changed: 16 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,16 @@
77
from agentscope.formatter import DashScopeChatFormatter
88
from agentscope.memory import InMemoryMemory
99
from agentscope.agent import UserAgent
10-
from agentscope.tool import Toolkit
1110

12-
from agent_factory import create_agent
13-
from prompts import DJ_SYS_PROMPT, DJ_DEV_SYS_PROMPT, ROUTER_SYS_PROMPT, MCP_SYS_PROMPT
14-
from tools import dj_toolkit, dj_dev_toolkit, mcp_tools, get_mcp_toolkit, agents2toolkit
11+
from .agent_factory import create_agent
12+
from .prompts import DJ_SYS_PROMPT, DJ_DEV_SYS_PROMPT, ROUTER_SYS_PROMPT, MCP_SYS_PROMPT
13+
from .tools import (
14+
dj_toolkit,
15+
dj_dev_toolkit,
16+
mcp_tools,
17+
get_mcp_toolkit,
18+
agents2toolkit,
19+
)
1520

1621
# Create shared configuration
1722
model = DashScopeChatModel(
@@ -145,10 +150,14 @@ async def main(
145150
if __name__ == "__main__":
146151
# Example tasks
147152
# project_root = os.path.abspath(os.path.dirname(__file__))
148-
# task = f"数据存储在{project_root}/data/demo-dataset-images.jsonl,筛选掉样本中,文本字段长度小于5的样本,以及图片size小于100Kb的样本。并将输出结果保存到./outputs路径下。"
153+
# task = (
154+
# f"The data is stored in {project_root}/data/demo-dataset-images.jsonl. "
155+
# "Among the samples, the text field length is less than 5 "
156+
# "and the image size is less than 100Kb. "
157+
# "And save the output results to the ./outputs path."
158+
# )
149159
#
150160
# DJ Development example task:
151-
# task = "我想开发一个新的DataJuicer过滤算子,用于过滤掉没有人声的音频文件"
161+
# task = "I want to develop a new DataJuicer filter operator to filter out audio files without vocals"
152162
#
153-
# MCP Agent will be automatically selected for advanced processing tasks
154163
fire.Fire(main)

data_juicer_agent/pyproject.toml

Lines changed: 0 additions & 12 deletions
This file was deleted.

data_juicer_agent/requirements.txt

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
agentscope>=1.0.5
2+
py-data-juicer>=1.4.2
3+
faiss-cpu>=1.12.0
4+
fire>=0.7.1
5+
langchain-community

0 commit comments

Comments
 (0)