build: bump gptqmodel from 4.0.0.dev0+cu126torch2.7 to 7.0.0 by dependabot[bot] · Pull Request #657 · PrunaAI/pruna

dependabot · 2026-05-05T01:24:39Z

Bumps gptqmodel from 4.0.0.dev0+cu126torch2.7 to 7.0.0.

Release notes

🚀 GPTQModel v7.0.0

🔥 Major

New Huawei Ascend NPU quantization support with torch based kernels for inference

All CUDA/ROCm compiled kernels are now JIT (just-in-time) compiled on first use

Pip/UV install no longer requires the --no-build-isolation flag

🧠 New model support and compatibility wins

Added support for GLM 5/5.1, GLM OCR, GLM ASR, Gemma 3n, Falcon Mamba, and InternVL Chat.

Extended OpenVINO GPTQ patching to understand GPTQModel's newer kernels.

Fixed Qwen3 dtype handling, Qwen3.5 MoE module-tree assertions, Qwen2-VL calibration input capture, and Qwen 3.6 MoE regressions.

Fixed Llama4Router replacement behavior, Phi-3 defused MLP module mapping, Phi-4 runtime requirements, Instella rope-scaling compatibility, Ling compatibility, Mixtral MoE checkpoint module names, Brumby thread safety, Baichuan compatibility, and Gemma 3 saving.

Fixed exllamav3_torch import under meta-device context.

⚡ Kernels, JIT, and hardware acceleration

Moved all compilation required kernels to JIT compilation on first-use and cleaned up Marlin import probing, CUDA header handling, nvcc flag checks, and Torch/CUDA mismatch handling.

Synced Marlin/Machete kernels with upstream and added hardware-specific Marlin boost paths.

Guarded CUTLASS version mismatches and fixed generated-kernel staleness.

Added global kernel rebuild support for CI and safer shared extension locks.

Added Ascend NPU support.

Fixed AWQ JIT cache invalidation, illegal memory access, SM120 execution, GEMM_Fast shared-memory launch, and BF16 bias validation.

Fixed BACKEND.MARLIN loading for gptq_v2 format and added Marlin import coverage.

🔥 Quantization, AWQ, FP8, and dequant

Added FP8/FP4 CPU dequant and DeepSeek FP8 .scale dequant export.

Added dtype auto-decoding and decode path updates.

Reduced AWQ scale-search activation memory and split AWQ integration tests for cleaner coverage.

Fail fast on unsupported act-group-aware GPTQ shapes instead of continuing into invalid layouts.

Fixed INT3 qzero format conversion, GAR width compatibility, and GPTQ batched keep-mask handling.

Improved AWQ W4A8 and BF16 validation paths, plus post-quant MoE routing behavior.

Used loader device selection for EoRA adapter generation.

🐢 LazyTurtle, loading, and model plumbing

Refactored input capture into BaseQModel and model-specific QModels for cleaner replay and calibration flows.

Renamed and hardened the turtle path into LazyTurtle, with stricter materialization failures and better expected-skip handling.

Fixed LazyTurtle materialization for non-square fused experts, PhiMoE, nested HF weight renames, reversed WeightRenaming semantics, and non-Safetensors checkpoints.

Improved out-of-model tensor handling for MTP prefix/files paths.

Removed BaseModel.loader_requires_dtype and normalized config dtype handling through get_hf_config_dtype().

Fixed multi-GPU replay output retention, GPTQ finalizer overlap, and quantization OOMs from retained callable cache keys.

🧰 CI, packaging, and developer workflow

Cleaned up CI shell logic, environment setup, UV cache handling, reusable Torch tests, CPU-only grouping, runner selection, retry behavior, and offload temp paths.

Kept CI and Torch CUDA versions aligned, moved to newer Docker images, and surfaced real exit codes and GPU names.

Removed lm-eval, deprecated tests, deprecated artifact IDs, pause UI lifecycle code, and tabulate from CI/test paths.

... (truncated)

Commits

See full diff in compare view

github-actions · 2026-05-16T00:24:07Z

This PR has been inactive for 10 days and is now marked as stale.

Bumps [gptqmodel](https://github.com/ModelCloud/GPTQModel) from 4.0.0.dev0+cu126torch2.7 to 7.0.0. - [Release notes](https://github.com/ModelCloud/GPTQModel/releases) - [Commits](https://github.com/ModelCloud/GPTQModel/commits/v7.0.0) --- updated-dependencies: - dependency-name: gptqmodel dependency-version: 7.0.0 dependency-type: direct:development update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>

dependabot Bot added the python-dependencies label May 5, 2026

github-actions Bot added the stale label May 16, 2026

dependabot Bot force-pushed the dependabot/pip/gptqmodel-7.0.0 branch from 399c489 to 19cac70 Compare May 23, 2026 07:28

dependabot Bot force-pushed the dependabot/pip/gptqmodel-7.0.0 branch from 19cac70 to dcadc5a Compare May 23, 2026 10:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build: bump gptqmodel from 4.0.0.dev0+cu126torch2.7 to 7.0.0#657

build: bump gptqmodel from 4.0.0.dev0+cu126torch2.7 to 7.0.0#657
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/gptqmodel-7.0.0

dependabot Bot commented on behalf of github May 5, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

dependabot Bot commented on behalf of github May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 GPTQModel v7.0.0

🔥 Major

🧠 New model support and compatibility wins

⚡ Kernels, JIT, and hardware acceleration

🔥 Quantization, AWQ, FP8, and dequant

🐢 LazyTurtle, loading, and model plumbing

🧰 CI, packaging, and developer workflow

Uh oh!

github-actions Bot commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

dependabot Bot commented on behalf of github May 5, 2026 •

edited

Loading