Skip to content

[FEAT] Add Anthropic compatible API endpoint#18630

Merged
Fridge003 merged 8 commits intosgl-project:mainfrom
JustinTong0323:feat-authropic-api
Feb 21, 2026
Merged

[FEAT] Add Anthropic compatible API endpoint#18630
Fridge003 merged 8 commits intosgl-project:mainfrom
JustinTong0323:feat-authropic-api

Conversation

@JustinTong0323
Copy link
Collaborator

@JustinTong0323 JustinTong0323 commented Feb 11, 2026

Summary

  • Add Anthropic-compatible /v1/messages and /v1/messages/count_tokens endpoints that translate between Anthropic Messages API format and
    SGLang's existing OpenAI-compatible chat completion infrastructure
  • Supports non-streaming, streaming (SSE), tool use, system messages, and all standard Anthropic request parameters
  • Enables tools like Claude Code to use SGLang-served models as drop-in Anthropic API replacements via ANTHROPIC_BASE_URL

Details

Architecture: A translation layer (AnthropicServing) that delegates to OpenAIServingChat internally. Anthropic requests are converted to
ChatCompletionRequest, processed through existing infrastructure, and responses are converted back to Anthropic format.

New files:

  • python/sglang/srt/entrypoints/anthropic/protocol.py — Pydantic models for Anthropic Messages API
  • python/sglang/srt/entrypoints/anthropic/serving.py — Core handler with request/response conversion and streaming state machine
  • test/registered/openai_server/basic/test_anthropic_server.py — 19 basic API tests
  • test/registered/openai_server/function_call/test_anthropic_tool_use.py — 10 tool use tests
  • test/manual/vlm/test_anthropic_vision.py — visual understanding test

Modified files:

  • python/sglang/srt/entrypoints/http_server.py — Register endpoints and initialize handler

Test plan

  • 19 basic tests passing (non-streaming, streaming, system messages, content blocks, error handling, count tokens)
  • 10 tool use tests passing (tool format, tool_choice auto/any/specific, multi-turn, streaming tool calls, event sequence)
  • Manual integration test with cherry-studio anthropic client
  • Manual integration test with Claude Code via set ANTHROPIC_BASE_URL

Note: This PR is vibed by claude code, I believe he is more familiar with it's own api then me 🤣

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
…odels

This commit introduces a new endpoint for counting tokens in messages compatible with the Anthropic API. It includes the implementation of the `AnthropicCountTokensRequest` and `AnthropicCountTokensResponse` models, as well as the necessary handling logic in the `AnthropicServing` class. Additionally, tests for the new endpoint have been added to ensure proper functionality and validation of token counting behavior.

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @JustinTong0323, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request adds Anthropic API compatibility to the SGLang server, allowing users to interact with SGLang models using the Anthropic Messages API. It introduces new endpoints, implements a translation layer for request/response conversion, and includes thorough testing to ensure proper functionality.

Highlights

  • New Anthropic-compatible API endpoints: Introduces /v1/messages and /v1/messages/count_tokens endpoints, enabling translation between Anthropic Messages API format and SGLang's OpenAI-compatible chat completion infrastructure.
  • Comprehensive Support: Supports non-streaming, streaming (SSE), tool use, system messages, and all standard Anthropic request parameters.
  • Seamless Integration: Allows tools like Claude Code to use SGLang-served models as drop-in Anthropic API replacements via ANTHROPIC_BASE_URL.
  • Translation Layer Architecture: Implements an AnthropicServing translation layer that delegates to OpenAIServingChat internally, converting Anthropic requests to ChatCompletionRequest and responses back to Anthropic format.
  • Extensive Testing: Includes 19 basic API tests and 10 tool use tests, ensuring comprehensive functionality and reliability.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • python/sglang/srt/entrypoints/anthropic/protocol.py
    • Added Pydantic models for Anthropic Messages API protocol.
  • python/sglang/srt/entrypoints/anthropic/serving.py
    • Added core handler with request/response conversion and streaming state machine for Anthropic Messages API.
  • python/sglang/srt/entrypoints/http_server.py
    • Registered Anthropic-compatible API endpoints and initialized the Anthropic serving handler.
  • test/registered/openai_server/basic/test_anthropic_server.py
    • Added 19 basic API tests for the Anthropic-compatible server.
  • test/registered/openai_server/function_call/test_anthropic_tool_use.py
    • Added 10 tool use tests for the Anthropic-compatible server.
Activity
  • Added new files for Anthropic API protocol and serving logic.
  • Modified http_server.py to register new endpoints.
  • Implemented request/response conversion between Anthropic and OpenAI formats.
  • Added tests for basic API functionality and tool use.
  • Verified integration with cherry-studio anthropic client and Claude Code.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces Anthropic-compatible API endpoints by creating a translation layer over the existing OpenAI-compatible infrastructure. This is a significant and well-executed feature, covering streaming, tool use, and other standard parameters. The code is well-structured and includes a comprehensive set of tests. My review focuses on improving robustness by using uuid for ID generation, removing some dead code, and suggesting a minor refactoring in the tests to reduce code duplication. Overall, this is a high-quality contribution.

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
@JustinTong0323 JustinTong0323 changed the title Add Anthropic compatible api endpoint [FEAT] Add Anthropic compatible API endpoint Feb 11, 2026
Copy link
Collaborator

@jhinpan jhinpan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comprehensive Review: Anthropic-Compatible API Endpoints (PR #18630)

Thanks for this well-structured contribution! The translation-layer approach of delegating to OpenAIServingChat is clean and avoids duplicating business logic. However, this review identified several issues that need attention before merge, organized by severity.


🔴 Critical Bugs (5)

C1. Image URL conversion is broken (serving.py:128)
Raw base64 data is passed directly as a URL. OpenAI format expects a data URI: data:<media_type>;base64,<data>. All image inputs will silently fail.

# Current (broken):
"url": block.source.get("data", ""),
# Fix:
media_type = block.source.get("media_type", "image/png")
data = block.source.get("data", "")
"url": f"data:{media_type};base64,{data}",

C2. tool_result uses id instead of tool_use_id (protocol.py:40)
The Anthropic spec defines tool_use_id for tool_result blocks, but AnthropicContentBlock only has id. SDK clients sending spec-compliant tool_use_id will have it silently ignored, breaking tool-use round-trips. The serving.py code at line 151 reads block.id which won't be populated from tool_use_id in the JSON payload.

C3. tool_result.content with list content produces garbled output (serving.py:147)
When block.content is a list[dict] (Anthropic allows content blocks inside tool_result), str() produces Python repr like [{'type': 'text', 'text': '...'}] instead of extracting text. Should iterate the list and concatenate text block values.

C4. Non-streaming response id uses OpenAI chatcmpl-* format (serving.py:594)
id=response.id passes through the OpenAI ID. Anthropic IDs must be msg_*. Streaming correctly generates msg_* IDs (line 361), but non-streaming doesn't. Fix: use f"msg_{uuid.uuid4().hex}" as in the streaming path.

C5. Missing thinking content block type and request parameter (protocol.py:35, 94-108)
Extended thinking is a major Anthropic API feature. The type literal only allows text|image|tool_use|tool_result, missing thinking and redacted_thinking. No thinking request parameter exists. While not strictly required for an MVP, SDK clients using thinking blocks will get validation errors — this should at least be documented as a known limitation.


🟠 Important Issues (8)

I1. stop_sequence stop_reason is never emitted (serving.py:46-50)
Both natural EOS and stop-sequence stops map to "end_turn". The code doesn't check matched_stop from the OpenAI response. Clients relying on stop_reason == "stop_sequence" will never see it.

I2. Streaming errors are silently swallowed (serving.py:410-414)
If the OpenAI stream emits an error, parsing as ChatCompletionStreamResponse fails and the continue silently skips it. The stream ends without an error event or message_stop. Should emit an Anthropic error event.

I3. Empty responses produce no content blocks (streaming state machine)
If the model emits only a finish_reason with no text/tool deltas, the stream emits message_start → message_delta → message_stop with zero content blocks, violating the Anthropic event schema which expects at least one content block.

I4. Missing "none" tool_choice type (protocol.py:74)
The Anthropic spec defines "none" to disallow all tool use. Only auto|any|tool are accepted.

I5. Missing disable_parallel_tool_use on AnthropicToolChoice)
This is a commonly-used Anthropic API field with no equivalent. Should be documented as unsupported at minimum.

I6. System text blocks concatenated without separator (serving.py:105)
["Hello", "World"] becomes "HelloWorld". Should use "\n" as separator.

I7. Tight coupling to OpenAIServingChat private methods (serving.py:248, 260, 269, 351, 652)
Five underscore-prefixed private methods are called. Any internal refactor of OpenAIServingChat will silently break this. Consider making these methods a stable internal API (e.g., remove underscore prefix and document them) or add a comment acknowledging the coupling.

I8. Exception details exposed in error responses (serving.py:77, 277, 323, 669)
message=str(e) can leak internal details (stack frames, module paths) to clients. Use generic messages for 500 errors; only expose specifics for 400 errors.


🟡 Minor Issues (7)

M1. x-api-key header is never validated — Anthropic clients using only x-api-key (the standard Anthropic auth pattern) without Authorization: Bearer will be rejected. Tests misleadingly send both headers.

M2. AnthropicUsage is missing cache_creation_input_tokens and cache_read_input_tokens in the message_delta usage tracking (they're defined in the model but never populated from the OpenAI response).

M3. response.choices[0] at serving.py:564 can IndexError on empty choices (defensive check recommended).

M4. AnthropicError.type should be a Literal enum of known error types (invalid_request_error, authentication_error, permission_error, not_found_error, rate_limit_error, api_error, overloaded_error).

M5. input_schema validator mutates input (adds type: "object") rather than rejecting invalid schemas.

M6. message_delta usage includes both input_tokens and output_tokens — Anthropic spec says message_delta should only have output_tokens (input_tokens goes in message_start).

M7. No ping event generation — useful for keep-alive on long-running streams.


🟢 Test Coverage Gaps

  1. No streaming error test — no test for error events during streaming
  2. No auth failure test — wrong/missing API key not tested
  3. No tool_result with list content test — only string content tested
  4. No stop_sequence stop_reason assertiontest_stop_sequences only checks status 200
  5. No concurrent request test
  6. _parse_sse_events silently swallows parse errors — could mask test failures
  7. Several tool use tests have conditional assertions (e.g. if len(tool_use_starts) > 0:) that pass even if no tool use occurred

✅ What's Done Well

  • Clean architecture — Translation-layer pattern is correct; no duplicated business logic
  • SSE event format — Correct event: <type>\ndata: <json>\n\n formatting
  • Content block lifecycle — start→deltas→stop ordering is mostly correct for both text and tool_use
  • tool_choice mappingauto→auto, any→required, tool→function is correct
  • stream_options injection — Properly enables usage tracking for streaming
  • Streaming arg reconstruction testtest_tool_use_streaming_args_parsing is excellent
  • Comparative token count test — Smart approach that's model-independent
  • No route conflicts/v1/messages and /v1/messages/count_tokens are clean additions
  • Test isolation — Each test class manages its own server lifecycle

Recommendation

Request changes. The 5 critical bugs (especially C1-C4) will produce incorrect behavior for common API usage patterns. C1 breaks all image inputs. C2 breaks spec-compliant tool use round-trips. C3 produces garbled tool results. C4 violates the ID format contract. These should be fixed before merge. The important issues (I1-I3 especially) should also be addressed to prevent silent failures in production.

JustinTong0323 and others added 2 commits February 12, 2026 08:16
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
@JustinTong0323
Copy link
Collaborator Author

/tag-and-rerun-ci

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
@github-actions github-actions bot added the Multi-modal multi-modal language model label Feb 16, 2026
JustinTong0323 and others added 2 commits February 16, 2026 01:36
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
@JustinTong0323
Copy link
Collaborator Author

/rerun-failed-ci

@JustinTong0323
Copy link
Collaborator Author

/rerun-failed-ci

@Fridge003 Fridge003 merged commit cc45167 into sgl-project:main Feb 21, 2026
217 of 235 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

high priority Multi-modal multi-modal language model run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants