- You can clone the GPTQ-for-LLaMA repository and Replace the
qwen.pyfile in theGPTQ-for-Qwendirectory.
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
cd GPTQ-for-LLaMa For more details about the GPTQ-for-LLaMA repository, please refer to this link.
OR You can simply clone our repository, the directory GPTQ-for-Qwen can be directly run and used.
- Install required packages
conda create -n gptq python=3.10 -y
conda activate gptq
pip install --upgrade pip # enable PEP 660 supportBefore starting quantization, you need to replace some files to support the Qwen3 model:
- Add eval_my directory: Place the
eval_mydirectory under theGPTQ-for-Qwendirectory.
CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path c4(Validation dataset) \
--wbits 4 --true-sequential --act-order \
--groupsize 128 --model_name gptq_4B_w4_128 \
--save gptq_4B_w4_128.pth CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path c4(Validation dataset) \
--wbits 4 --true-sequential --act-order \
--groupsize 128 --load gptq_4B_w4_128.pth \
--evalCUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path c4(Validation dataset) \
--wbits 4 --true-sequential --act-order \
--groupsize -1 --model_name gptq_4B_w4_128 \
--save gptq_4B_w4_128.pth CUDA_VISIBLE_DEVICES=0 python path_of_qwen.py your_model_path c4(Validation dataset) \
--wbits 4 --true-sequential --act-order \
--groupsize -1 --load gptq_4B_w4_128.pth \
--eval- Available quantization bit-widths (w_bit): 2, 4, 8.
- Use the
--groupsizeparameter (e.g., 128) for group-wise quantization - Set the
groupsizeparameter to -1 for per-channel quantization - The calibration dataset is for the GPTQ quantization process. You can choose a different dataset (default is C4).
- The
model_nameparameter is chosen as a suffix for storing the result tables of MMLU. It is not very important. - Make sure you have sufficient GPU memory to run a 32B-sized model