Skip to content

fix: initialize embeddings_pre_norm_masked=false in llama_context#23256

Merged
ggerganov merged 1 commit into
ggml-org:masterfrom
abetlen:fix/qwen35-pre-norm-mask-init
May 18, 2026
Merged

fix: initialize embeddings_pre_norm_masked=false in llama_context#23256
ggerganov merged 1 commit into
ggml-org:masterfrom
abetlen:fix/qwen35-pre-norm-mask-init

Conversation

@abetlen
Copy link
Copy Markdown
Collaborator

@abetlen abetlen commented May 18, 2026

Overview

This PR fixes a bug introduced in #23198 by the new embeddings_pre_norm_masked struct member for llama_context. When left uninitialised embeddings_pre_norm_masked caused a bug in the construction of Qwen3.5 graphs where get_rows_f32 failed in an assert because it tried to grab an invalid row index.

Additional information

Failing CI run with the relevant assert

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: yes, gpt 5.5 xhigh was used through the codex cli to find the root cause of this bug when the CI job failed.

@abetlen abetlen requested a review from ggerganov as a code owner May 18, 2026 08:40
@ggerganov ggerganov merged commit 49c21f9 into ggml-org:master May 18, 2026
45 of 49 checks passed
@ggerganov
Copy link
Copy Markdown
Member

Thanks @abetlen!

Jcfunk added a commit to Jcfunk/llama.cpp that referenced this pull request May 19, 2026
* master: (100 commits)
  Agent update
  hexagon: add support for TRI op (ggml-org#22822)
  ggml-hexagon: add PAD op HVX kernel (ggml-org#23078)
  docker : add OCI image labels for version and build date (ggml-org#21653)
  common : remove hf cache migration (ggml-org#23266)
  ui: Update KaTeX package and clean up logs from `sass` warnings (ggml-org#23275)
  feat: add scroll-to-bottom button to chat + prevent forced scroll down (ggml-org#23270)
  ui: Refactor models store, MCP service, and gate logs behind VITE_DEBUG (ggml-org#23236)
  ui: Centralize monospace font styles in app.css (ggml-org#23272)
  webui: fix Tailwind v4 utility classes missing when built via cmake (ggml-org#23253)
  llama: initialize pre-norm embedding mask flag (ggml-org#23256)
  add myself to conversion (ggml-org#23261)
  ci : added kleidiai-server to server-self-hosted workflow (ggml-org#22435)
  scripts : allow wc2wt with an existing branch (ggml-org#23189)
  sycl: scalar SWAR byte-subtract in Q6_K MMVQ dot product (ggml-org#22156)
  sycl: route small f32 matmuls to oneMKL, bypass oneDNN (ggml-org#22150)
  sycl : fix error when use -mg 1 error (ggml-org#23140)
  update bid to match each layers MTP source (ggml-org#23237)
  cmake : do not check for bin install dir (ggml-org#23234)
  feat: Support d_conv=15 for ssm-conv.cu (ggml-org#23017)
  ...
kgrama pushed a commit to kgrama/llama.cpp that referenced this pull request May 19, 2026
xxmustafacooTR pushed a commit to xxPlayground/llama-cpp-turboquant that referenced this pull request May 19, 2026
rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 19, 2026
ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request May 19, 2026
fhnmor21 pushed a commit to fhnmor21/llama-cpp-turboquant that referenced this pull request May 19, 2026
jimbothigpen added a commit to jimbothigpen/llama.cpp that referenced this pull request May 21, 2026
- Known issue: v326 vulkan cpy bf16->f32 SIGSEGV on GFX1103 PHOENIX (remediation pending)
- v326: vulkan BF16 copy pipelines (mainline PR ggml-org#22677 cherry-pick)
- v325: pre-norm embedding mask init fix (mainline PR ggml-org#23256 cherry-pick)
- v324: Vulkan BF16 FA dispatch via inline uvec2 dequant (COOPMAT1 path)
- v323: KV cache CPU fallback for types lacking GPU SET_ROWS support
dbrain pushed a commit to dbrain/hbd-llama-cpp-turboquant that referenced this pull request May 21, 2026
baramofme pushed a commit to baramofme/llama-cpp-turboquant that referenced this pull request May 23, 2026
srossitto79 pushed a commit to srossitto79/llama.cpp that referenced this pull request May 23, 2026
jimbothigpen added a commit to jimbothigpen/llama.cpp that referenced this pull request May 25, 2026
- Known issue: v326 vulkan cpy bf16->f32 SIGSEGV on GFX1103 PHOENIX (remediation pending)
- v326: vulkan BF16 copy pipelines (mainline PR ggml-org#22677 cherry-pick)
- v325: pre-norm embedding mask init fix (mainline PR ggml-org#23256 cherry-pick)
- v324: Vulkan BF16 FA dispatch via inline uvec2 dequant (COOPMAT1 path)
- v323: KV cache CPU fallback for types lacking GPU SET_ROWS support
jimbothigpen added a commit to jimbothigpen/llama.cpp that referenced this pull request May 25, 2026
- Known issue: v326 vulkan cpy bf16->f32 SIGSEGV on GFX1103 PHOENIX (remediation pending)
- v326: vulkan BF16 copy pipelines (mainline PR ggml-org#22677 cherry-pick)
- v325: pre-norm embedding mask init fix (mainline PR ggml-org#23256 cherry-pick)
- v324: Vulkan BF16 FA dispatch via inline uvec2 dequant (COOPMAT1 path)
- v323: KV cache CPU fallback for types lacking GPU SET_ROWS support
fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants