Skip to content

build: bump gptqmodel from 4.0.0.dev0+cu126torch2.7 to 7.0.0#657

Open
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/gptqmodel-7.0.0
Open

build: bump gptqmodel from 4.0.0.dev0+cu126torch2.7 to 7.0.0#657
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/gptqmodel-7.0.0

Conversation

@dependabot
Copy link
Copy Markdown
Contributor

@dependabot dependabot Bot commented on behalf of github May 5, 2026

Bumps gptqmodel from 4.0.0.dev0+cu126torch2.7 to 7.0.0.

Release notes

Sourced from gptqmodel's releases.

🚀 GPTQModel v7.0.0

🔥 Major

  • New Huawei Ascend NPU quantization support with torch based kernels for inference
  • All CUDA/ROCm compiled kernels are now JIT (just-in-time) compiled on first use
  • Pip/UV install no longer requires the --no-build-isolation flag

🧠 New model support and compatibility wins

  • Added support for GLM 5/5.1, GLM OCR, GLM ASR, Gemma 3n, Falcon Mamba, and InternVL Chat.
  • Extended OpenVINO GPTQ patching to understand GPTQModel's newer kernels.
  • Fixed Qwen3 dtype handling, Qwen3.5 MoE module-tree assertions, Qwen2-VL calibration input capture, and Qwen 3.6 MoE regressions.
  • Fixed Llama4Router replacement behavior, Phi-3 defused MLP module mapping, Phi-4 runtime requirements, Instella rope-scaling compatibility, Ling compatibility, Mixtral MoE checkpoint module names, Brumby thread safety, Baichuan compatibility, and Gemma 3 saving.
  • Fixed exllamav3_torch import under meta-device context.

⚡ Kernels, JIT, and hardware acceleration

  • Moved all compilation required kernels to JIT compilation on first-use and cleaned up Marlin import probing, CUDA header handling, nvcc flag checks, and Torch/CUDA mismatch handling.
  • Synced Marlin/Machete kernels with upstream and added hardware-specific Marlin boost paths.
  • Guarded CUTLASS version mismatches and fixed generated-kernel staleness.
  • Added global kernel rebuild support for CI and safer shared extension locks.
  • Added Ascend NPU support.
  • Fixed AWQ JIT cache invalidation, illegal memory access, SM120 execution, GEMM_Fast shared-memory launch, and BF16 bias validation.
  • Fixed BACKEND.MARLIN loading for gptq_v2 format and added Marlin import coverage.

🔥 Quantization, AWQ, FP8, and dequant

  • Added FP8/FP4 CPU dequant and DeepSeek FP8 .scale dequant export.
  • Added dtype auto-decoding and decode path updates.
  • Reduced AWQ scale-search activation memory and split AWQ integration tests for cleaner coverage.
  • Fail fast on unsupported act-group-aware GPTQ shapes instead of continuing into invalid layouts.
  • Fixed INT3 qzero format conversion, GAR width compatibility, and GPTQ batched keep-mask handling.
  • Improved AWQ W4A8 and BF16 validation paths, plus post-quant MoE routing behavior.
  • Used loader device selection for EoRA adapter generation.

🐢 LazyTurtle, loading, and model plumbing

  • Refactored input capture into BaseQModel and model-specific QModels for cleaner replay and calibration flows.
  • Renamed and hardened the turtle path into LazyTurtle, with stricter materialization failures and better expected-skip handling.
  • Fixed LazyTurtle materialization for non-square fused experts, PhiMoE, nested HF weight renames, reversed WeightRenaming semantics, and non-Safetensors checkpoints.
  • Improved out-of-model tensor handling for MTP prefix/files paths.
  • Removed BaseModel.loader_requires_dtype and normalized config dtype handling through get_hf_config_dtype().
  • Fixed multi-GPU replay output retention, GPTQ finalizer overlap, and quantization OOMs from retained callable cache keys.

🧰 CI, packaging, and developer workflow

  • Cleaned up CI shell logic, environment setup, UV cache handling, reusable Torch tests, CPU-only grouping, runner selection, retry behavior, and offload temp paths.
  • Kept CI and Torch CUDA versions aligned, moved to newer Docker images, and surfaced real exit codes and GPU names.
  • Removed lm-eval, deprecated tests, deprecated artifact IDs, pause UI lifecycle code, and tabulate from CI/test paths.

... (truncated)

Commits

@github-actions
Copy link
Copy Markdown

This PR has been inactive for 10 days and is now marked as stale.

@github-actions github-actions Bot added the stale label May 16, 2026
@dependabot dependabot Bot force-pushed the dependabot/pip/gptqmodel-7.0.0 branch from 399c489 to 19cac70 Compare May 23, 2026 07:28
Bumps [gptqmodel](https://github.com/ModelCloud/GPTQModel) from 4.0.0.dev0+cu126torch2.7 to 7.0.0.
- [Release notes](https://github.com/ModelCloud/GPTQModel/releases)
- [Commits](https://github.com/ModelCloud/GPTQModel/commits/v7.0.0)

---
updated-dependencies:
- dependency-name: gptqmodel
  dependency-version: 7.0.0
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot Bot force-pushed the dependabot/pip/gptqmodel-7.0.0 branch from 19cac70 to dcadc5a Compare May 23, 2026 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants