Skip to content

Commit 947927b

Browse files
authored
[V3.2] Change default CP token split method to --round-robin-split (sgl-project#18613)
1 parent 4f7422f commit 947927b

File tree

3 files changed

+5
-5
lines changed

3 files changed

+5
-5
lines changed

docs/advanced_features/server_arguments.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -462,7 +462,7 @@ Please consult the documentation below and [server_args.py](https://github.com/s
462462
| `--rl-on-policy-target` | The training system that SGLang needs to match for true on-policy. | `None` | `fsdp` |
463463
| `--enable-attn-tp-input-scattered` | Allow input of attention to be scattered when only using tensor parallelism, to reduce the computational load of operations such as qkv latent. | `False` | bool flag (set to enable) |
464464
| `--enable-nsa-prefill-context-parallel` | Enable context parallelism used in the long sequence prefill phase of DeepSeek v3.2. | `False` | bool flag (set to enable) |
465-
| `--nsa-prefill-cp-mode` | Token splitting mode for the prefill phase of DeepSeek v3.2 under context parallelism. Optional values: `in-seq-split` (default), `round-robin-split`. `round-robin-split` distributes tokens across ranks based on `token_idx % cp_size`. It supports multi-batch prefill, fused MoE, and FP8 KV cache. | `in-seq-split` | `in-seq-split`, `round-robin-split` |
465+
| `--nsa-prefill-cp-mode` | Token splitting mode for the prefill phase of DeepSeek v3.2 under context parallelism. Optional values: `round-robin-split`(default),`in-seq-split`. `round-robin-split` distributes tokens across ranks based on `token_idx % cp_size`. It supports multi-batch prefill, fused MoE, and FP8 KV cache. | `in-seq-split` | `in-seq-split`, `round-robin-split` |
466466
| `--enable-fused-qk-norm-rope` | Enable fused qk normalization and rope rotary embedding. | `False` | bool flag (set to enable) |
467467
| `--enable-precise-embedding-interpolation` | Enable corner alignment for resize of embeddings grid to ensure more accurate(but slower) evaluation of interpolated embedding values. | `False` | bool flag (set to enable) |
468468

docs/basic_usage/deepseek_v32.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -306,7 +306,7 @@ DeepSeek-V3.2-Speciale:
306306

307307
For context parallel in DeepSeek V3.2 model, we provide two different modes of splitting tokens, which can be controlled with argument `--nsa-prefill-cp-mode`.
308308

309-
### In sequence splitting (default setting)
309+
### In sequence splitting
310310

311311
The first mode can be enabled by `--nsa-prefill-cp-mode in-seq-split`. This mode implements context parallel for DSA by splitting the sequence uniformly between context parallel ranks. At attention stage, each cp rank computes the indexer results of sharded sequence, and collects the whole kv cache through all gather operator.
312312

@@ -326,7 +326,7 @@ Example:
326326
python -m sglang.launch_server --model deepseek-ai/DeepSeek-V3.2-Exp --tp 8 --ep 8 --dp 2 --enable-dp-attention --enable-nsa-prefill-context-parallel --nsa-prefill-cp-mode in-seq-split --max-running-requests 32
327327
```
328328

329-
### Round robin splitting
329+
### Round robin splitting (default setting)
330330

331331
This mode can be enabled by specifying the parameter `--nsa-prefill-cp-mode round-robin-split`, which distributes tokens across ranks based on `token_idx % cp_size`.
332332

python/sglang/srt/server_args.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -628,7 +628,7 @@ class ServerArgs:
628628
enable_attn_tp_input_scattered: bool = False
629629
# Context parallelism used in the long sequence prefill phase of DeepSeek v3.2
630630
enable_nsa_prefill_context_parallel: bool = False
631-
nsa_prefill_cp_mode: str = "in-seq-split"
631+
nsa_prefill_cp_mode: str = "round-robin-split"
632632
enable_fused_qk_norm_rope: bool = False
633633
enable_precise_embedding_interpolation: bool = False
634634

@@ -4684,7 +4684,7 @@ def add_cli_args(parser: argparse.ArgumentParser):
46844684
type=str,
46854685
default=ServerArgs.nsa_prefill_cp_mode,
46864686
choices=NSA_PREFILL_CP_SPLIT_CHOICES,
4687-
help="Token splitting mode for the prefill phase of DeepSeek v3.2 under context parallelism. Optional values: 'in-seq-split' (default), 'round-robin-split'. "
4687+
help="Token splitting mode for the prefill phase of DeepSeek v3.2 under context parallelism. Optional values: 'round-robin-split'(default), 'in-seq-split' "
46884688
"'round-robin-split' distributes tokens across ranks based on token_idx %% cp_size. It supports multi-batch prefill, fused MoE, and FP8 KV cache.",
46894689
)
46904690
parser.add_argument(

0 commit comments

Comments
 (0)