[api][integration] Support agent skills in python#596
[api][integration] Support agent skills in python#596wenjin272 wants to merge 4 commits intoapache:mainfrom
Conversation
|
Regarding the skill e2e testing, I evaluated qwen3:8b, qwen3.5:4b, and qwen3.5:9b. I found that only qwen3.5:9b with thinking enabled could consistently follow the prompts. However, in the GitHub CI environment, the lack of GPU resources results in very slow model inference, causing tests to take over 30 minutes to complete. Consequently, I have temporarily skipped the skill e2e tests. I plan to dedicate time later to investigate and resolve this issue. |
80caeb0 to
2352ecd
Compare
alnzng
left a comment
There was a problem hiding this comment.
Left a few comments, will continue the review. BTW, for future PRs, I'd suggest separating code formatting / refactoring changes from actual logic changes into separate PRs.
| ) | ||
|
|
||
|
|
||
| # TODO: Implement |
There was a problem hiding this comment.
I would suggest to remove this unimplemented class to avoid confusion.
|
|
||
| @staticmethod | ||
| def _split_commands(command: str) -> List[str]: | ||
| """Split a command string by shell operators, respecting quotes. |
There was a problem hiding this comment.
IIUC, The intention of this method is to extract individual executable commands from a compound command string, not to perform a full shell/bash tokenization. Shell has many operators (e.g. redirection >, >>, <), but this method only targets command-separating operators, those that connect multiple executables. Can you confirm? If yes, let's make it clear in the comment.
Also can we confirm all command-separating operators are covered? For example, are && and || handled correctly?
Out of curiosity, is there any native Python lib or thirdparty lib could provide this kind of functionality? I am not sure if we covered all the necessary shell / bash operators, so I'm thinking if there is a such kind of lib then we could just use it.
There was a problem hiding this comment.
Yes, the functionality of this function is indeed as you described.
I could not find an existing third-party library that provides this functionality, so I implemented one myself by referencing agentscope-java.
I added some tests to confirm that delimiters such as &&, |, and ; are handled correctly. Additional tests are needed for ||.
Actually, I currently feel that the security validation for our execute_shell_command might be implemented too simply. I mainly referenced agentscope-java, but the bash tool in Claude Code's implementation includes extensive security validation rules. Therefore, I may refactor it by referencing Claude Code.
|
|
||
| if skill.name != skill_dir.name: | ||
| logger.warning( | ||
| f"The skill name {skill.name}is different from the base directory {skill_dir.name}." |
There was a problem hiding this comment.
{skill.name}is -> {skill.name} is
| ) | ||
|
|
||
| # Pattern to match key: value format | ||
| KEY_VALUE_PATTERN = re.compile(r"^([a-zA-Z_][a-zA-Z0-9_-]*)\s*:\s*(.*)$") |
There was a problem hiding this comment.
Looks like we can remove this? I didn't see it is used in any other place.
There was a problem hiding this comment.
Initially, the coding agent generated code that used regular expressions to parse YAML frontmatter. I later switched to using a YAML library but forgot to remove these two PATTERNs.
I will remove this.
| return ParsedMarkdown(metadata={}, content=markdown_content) | ||
|
|
||
| try: | ||
| metadata = yaml.safe_load(yaml_content) |
There was a problem hiding this comment.
does markdown frontmatter always follow YAML syntax?
There was a problem hiding this comment.
According to the Agent Skills specification: https://agentskills.io/specification#skill-md-format, The SKILL.md file must contain YAML frontmatter followed by Markdown content.
| class MarkdownSkillParser: | ||
| """Utility for parsing Markdown files with YAML frontmatter.""" | ||
|
|
||
| # Pattern to match frontmatter: starts with ---, ends with --- |
There was a problem hiding this comment.
is below a valid frontmatter?
---
name: tricky
description: |
This has a
--- inside
---
There was a problem hiding this comment.
This is a valid frontmatter, and the parsed result is {'description': 'This has a', 'name': 'tricky'}, the left content will be regarded as markdown content inside\n---\n.
| and optional resource files: | ||
|
|
||
| baseDir/ | ||
| ├── skill-name-1/ |
There was a problem hiding this comment.
What will happen if the users define folder with other different names? For example, this Claude official skill: https://github.com/anthropics/skills/tree/main/skills/skill-creator
There was a problem hiding this comment.
I don't quite understand this question. Regarding skill-creator, then the baseDir can be any dir the flink-agents job can access, like /flink/usrlib/skills, and the skill-name-1 will be skill-creator.
| return func | ||
|
|
||
|
|
||
| def skills(func: Callable) -> Callable: |
| "so we will not fix this issue for now." | ||
| ) | ||
| def test_long_term_memory_async_execution_in_action(tmp_path: Path) -> None: # noqa: D103 | ||
| def test_long_term_memory_async_execution_in_action(tmp_path: Path) -> None: |
| Only the first token of a command is checked against this list. | ||
| """ | ||
|
|
||
| paths: List[str] = Field(default_factory=list) |
There was a problem hiding this comment.
I'd suggest to provide some factory methods, like Skills.from_local_dir() or Skills.from_url(). This aligns with Prompt.from_text() / from_messages(), and is more easy-to-use.
There was a problem hiding this comment.
And maybe hide the implementation of Skills from the users.
| """Get another resource declared in the same Agent.""" | ||
|
|
||
| @abstractmethod | ||
| def generate_skill_discovery_prompt(self, *skill_names: str) -> str: |
There was a problem hiding this comment.
I'd suggest the name generate_available_skills_prompt
| VECTOR_STORE = "vector_store" | ||
| PROMPT = "prompt" | ||
| MCP_SERVER = "mcp_server" | ||
| SKILL = "skill" |
| TOOL("tool"), | ||
| MCP_SERVER("mcp_server"); | ||
| MCP_SERVER("mcp_server"), | ||
| SKILL("skill"); |
| from flink_agents.runtime.skill.skill_repository import SkillRepository | ||
|
|
||
|
|
||
| class RegisteredSkill: |
There was a problem hiding this comment.
Why do we need both AgentSkill and RegisteredSkill ?
| allowed_script_types: List[str] = Field( | ||
| default_factory=lambda: ["shell", "python"] | ||
| ) | ||
| allowed_commands: List[str] = Field(default_factory=list) |
There was a problem hiding this comment.
This doesn't sounds right. How is allowed script types / commands an attribute of the skills? If the model decide to execute something, how do we know it's following the instructions of a skill or not?
| class ExecuteCommandArgs(BaseModel): | ||
| """Arguments for ExecuteCommandTool.""" | ||
|
|
||
| skill_name: str = Field( |
There was a problem hiding this comment.
Not sure about requiring a skill name for executing a command. Skills are not executable programs like tools, but are a set of knowledges, scripts and resource for doing something. It's the LLM/agent that learns a skill and perform actions. You cannot execute a skill, nor saying an action belongs to a certain skill.
Linked issue: #590
Purpose of change
This PR provides basic support for using agent skills in the Flink Agents Python API. We will further enhance usability in subsequent PRs.
Tests
ut & e2e
API
Yes, add abstraction for agnet skill, and modify the chat model api.
Documentation
doc-neededdoc-not-neededdoc-included