-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Open
Description
Motivation
Tool call tests today only cover a small model (Llama-3.2-1B in test/registered/openai_server/function_call/test_openai_function_calling.py). Large models like DeepSeek V3.2 have no tool call coverage in CI, so bugs like #17593 and #17551 only get caught when users hit them. This issue tracks adding tool call tests to the nightly 8-GPU suite.
Scenarios
These should be common across all models that support tool calling:
Basic
- Format check —
tool_callsis a non-empty list,function.name/function.argumentspresent, arguments is valid JSON,finish_reasonis"tool_calls" - Field placement — tool call goes in
tool_calls, notcontent([Bug] DeepSeek-V3.2 tool calls incorrectly output to content field instead of tool_calls field #17593) - Streaming — chunks concatenate to valid JSON,
finish_reasoncorrect
tool_choice
"required"— always returns tool call"none"— never returns tool call- Specific function — returns the specified one
Multi-turn
- Tool result follow-up — pass tool result back, model replies based on it
- Thinking + tool call — after tool result, output in
contentnotreasoning_content([Bug] DeepSeek V3.2: All output marked as reasoning_content after tool call result #17551, DeepSeek specific for now — might be model-internal, will write the test first and see)
Other
- Parallel tool calls — multiple tool calls in one request
- Strict mode —
strict: trueenforces schema
CI integration
Add to test/registered/8-gpu-models/test_deepseek_v32.py, two variants:
Non-MTP:
--tp=8 --dp=8 --enable-dp-attention
--tool-call-parser deepseekv32
--reasoning-parser deepseek-v3
MTP (speculative decoding):
same as above +
--speculative-algorithm=EAGLE
--speculative-num-steps=3
--speculative-eagle-topk=1
--speculative-num-draft-tokens=4
env: SGLANG_ENABLE_SPEC_V2=1
Both in nightly-8-gpu-common.
Plan
Start with DeepSeek V3.2, then extend to GLM / Qwen / others reusing the same scenarios.
Refs
- Small model tests:
test/registered/openai_server/function_call/test_openai_function_calling.py - Parser tests:
test/registered/function_call/test_parallel_tool_calls.py - [Bug] DeepSeek-V3.2 tool calls incorrectly output to content field instead of tool_calls field #17593, [Bug] DeepSeek V3.2: All output marked as reasoning_content after tool call result #17551
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels