Skip to content

fix: turbomind backend config in cli serve#3784

Merged
lvhan028 merged 1 commit intoInternLM:mainfrom
PeymanRM:fix-cli-arg-passing
Jul 28, 2025
Merged

fix: turbomind backend config in cli serve#3784
lvhan028 merged 1 commit intoInternLM:mainfrom
PeymanRM:fix-cli-arg-passing

Conversation

@PeymanRM
Copy link
Copy Markdown
Contributor

Motivation

When serving with CLI, arguments: max_prefill_token_num and num_tokens_per_iter weren't being set.

Modification

Added max_prefill_token_num and num_tokens_per_iter as arguments for TurbomindEngineConfig in cli api_serve after being parsed.

Use cases

Taking advantage of "Dynamic SplitFuse"-like behavior using CLI.

Checklist

  • Pre-commit or other linting tools are used to fix the potential lint issues.
  • The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
  • If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
  • The documentation has been modified accordingly, like docstring or example tutorials.

@PeymanRM PeymanRM marked this pull request as ready for review July 27, 2025 14:10
@lvhan028 lvhan028 merged commit 5f0647f into InternLM:main Jul 28, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants