Contrib: Add Qwen3.6-27B (post-training update of Qwen3.5-27B) by jimburtoft · Pull Request #140 · aws-neuron/neuronx-distributed-inference

jimburtoft · 2026-04-23T23:28:05Z

Summary

Adds NxDI contrib implementation of Qwen3.6-27B, a 27B parameter dense model with hybrid DeltaNet + GQA attention architecture
Qwen3.6-27B is a post-training update of Qwen3.5-27B (PR Contrib: Add Qwen3.5-27B with hybrid DeltaNet + GQA architecture #128) with identical architecture (qwen3_5 model_type) -- improved agentic coding and thinking preservation, only weights differ
Same NxDI implementation as Qwen3.5-27B with updated documentation, Qwen3.6-27B benchmarks, quality validation, and cross-references between the two contribs

Relationship to PR #128 (Qwen3.5-27B)

This contrib uses the same Qwen35* classes and modeling_qwen35*.py filenames as the Qwen3.5-27B contrib (PR #128). The code is identical -- both models share the qwen3_5 model_type. Only the HuggingFace model ID and weights differ.

Config Compatibility

Qwen3.6-27B adds output_gate_type="swish" to text_config. Investigation confirmed this field is completely unused by HF transformers (zero references across v4.57.6, v5.6.0, and GitHub main) and by this NxDI code. No code changes required.

Test Results

Unit Tests (42/42 PASS, CPU only)

Module	Tests
test_config.py	26/26
test_weight_conversion.py	16/16

Architecture-level tests -- identical results to Qwen3.5-27B.

Quality Validation (7/7 PASS, trn2.3xlarge, TP=4, SDK 2.29)

Test	Result
Speed of light	PASS
17 * 23 = 391	PASS
60mph * 2.5h = 150 miles	PASS
is_prime function	PASS
French translation	PASS
Capital of Japan	PASS
sqrt(144) = 12	PASS

Performance (trn2.3xlarge, TP=4, LNC=2, BF16, SDK 2.29)

Metric	Qwen3.6-27B	Qwen3.5-27B	Delta
TPOT (P50)	54.2 ms	53 ms	+2.3%
Throughput	18.5 tok/s	18.9 tok/s	-2.1%
TTFT (P50)	306 ms	576 ms	*

* TTFT difference due to compilation config (256-token vs 128-token bucket), not model. Architectural performance is equivalent.

Files (15 files, ~6600 lines)

contrib/models/Qwen3.6-27B/
├── README.md
├── src/
│   ├── __init__.py
│   ├── modeling_qwen35.py              (text decoder)
│   ├── modeling_qwen35_vision.py       (vision encoder)
│   ├── modeling_qwen35_vl.py           (VL pipeline)
│   └── nki_kernels/
│       ├── __init__.py
│       ├── nki_deltanet.py             (recurrent kernel)
│       ├── nki_deltanet_chunked.py     (per-chunk kernel)
│       └── nki_deltanet_fused.py       (fused chunked kernel)
└── test/
    ├── unit/
    │   ├── test_config.py              (26 tests)
    │   └── test_weight_conversion.py   (16 tests)
    └── integration/
        └── test_model.py               (8 tests)

Checklist

Contrib-only (no changes to NxDI src/)
Unit tests (42/42 pass)
Quality validation (7/7 pass on trn2.3xlarge, SDK 2.29)
Benchmarks (TPOT=54.2ms, 18.5 tok/s)
README with architecture details, benchmarks, cross-reference to Qwen3.5-27B, and config compatibility notes
Apache 2.0 license headers
SDK 2.29+ / NKI 0.3.0 required

Qwen3.6-27B shares identical architecture with Qwen3.5-27B (qwen3_5 model_type, hybrid DeltaNet + GQA). Same NxDI implementation as PR aws-neuron#128 with updated documentation, Qwen3.6-27B benchmarks, and cross-references. Validated on trn2.3xlarge (TP=4, SDK 2.29): 7/7 quality tests passed, TPOT=54.2ms, 18.5 tok/s, TTFT=306ms. Performance within 2% of Qwen3.5-27B.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contrib: Add Qwen3.6-27B (post-training update of Qwen3.5-27B)#140

Contrib: Add Qwen3.6-27B (post-training update of Qwen3.5-27B)#140
jimburtoft wants to merge 1 commit intoaws-neuron:mainfrom
jimburtoft:contrib/qwen3.6-27b

jimburtoft commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jimburtoft commented Apr 23, 2026

Summary

Relationship to PR #128 (Qwen3.5-27B)

Config Compatibility

Test Results

Unit Tests (42/42 PASS, CPU only)

Quality Validation (7/7 PASS, trn2.3xlarge, TP=4, SDK 2.29)

Performance (trn2.3xlarge, TP=4, LNC=2, BF16, SDK 2.29)

Files (15 files, ~6600 lines)

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant