This package contains a PyTorch implementation of UniQuanF.
UniQuanF (Unified Quantization with Flexible Mapping) is an accurate quantization method for large language models (LLMs) without loss of accuracy. We propose UniQuan (Unified Quantization) which unifies the strong optimizability of uniform quantization (UQ) and the powerful expressiveness of binary-coding quantization (BCQ) through unifying their quantization process. We propose UniQuanF by unifying FlexRound and ALTERNATING, the best-performing UQ and BCQ methods, respectively, based on UniQuan.
The following is an overview of our codes.
UniQuanF
│
│
├─ src : a directory for source codes
│ ├─ main.py : a main code running UniQuanF
│ ├─ arguments.py : descriptions for arguments
│ ├─ uniquanf.py : codes for optimization
│ ├─ cached_loader.py : codes for managing cached inputs
│ ├─ swap_linear.py : codes for swapping linear layers into quantized ones
│ ├─ bcq_quant_layer.py : codes for quantized linear layers
│ ├─ alternating.py : an implementation of a general alternaing update
│ ├─ loss.py : codes for loss functions
│ ├─ evaluation.py : codes for comprehensive evaluation of quantized models.
│ ├─ categories.py : utility codes for general purpose
│ ├─ data_utils.py : utility codes retaining to datasets
│ ├─ general_utils.py : utility codes for general purpose
│ └─ evaluation_utils.py : utility codes for evaluation
│
├─ scripts : a directory for script files
│ ├─ evaluate.sh : a script file for evaluating the quanized model
│ └─ run.sh : a script file for running UniQuanF
│
└─ README.md
The list of dependencies is as follows:
- python >= 3.10.12
- tqdm 4.66.5
- numpy 1.26.3
- torch 2.3.1
- datasets 2.21.0
- transformers 4.42.0
Install dependencies using the following command:
pip install -r requirements.txtInstall lm-eval package using the following command:
git clone https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
git checkout tags/v0.4.2
pip install -e .
cd ..Install evaluate package using the following command:
git clone https://github.com/huggingface/evaluate.git
cd evaluate
pip install -e .
cd ..Our code automatically downloads the dataset needed when you run our main.py file except for MMLU.
MMLU is located in data/mmmlu/ directory and you don't have to manually download any datasets.
Experimental settings
model_name_or_path: the path of the directory for the dense modeldataset_name: the name of the sample datasetnum_samples: the number of samples in the sample datasetseed: a random seedn_bits_w: a desired bit-width for weightsgroup_size: the size of weight groups
Hyperparameters of UniQuanF
u_lr: a learning rate for the quantization parameters of UQb_lr: a learning rate for the quantization parameters of BCQiters_w: the number of iterations for optimizationper_device_train_batch_size: a batch size for optimizationperiod: a remapping period (p)grid_search_iters: the number of iterations for grid search (G)alternating_update_iters: the number of iterations for an alternating update (T)
We provide the code for running UniQuanF as in scripts/run.sh. Run the script file as follows:
bash scripts/run.shIf you want to evaluate the quantized model, use evaluation.py file as in scripts/evaluate.sh. Run the script file as follows:
bash scripts/evaluate.shIf you find UniQuanF useful or relevant to your research, please kindly cite our paper:
@inproceedings{park2025uniquanf,
title={Unifying Uniform and Binary-coding Quantization for
Accurate Compression of Large Language Models},
author={Park, Seungcheol and Bae, Jeongin and Kwon,
Beomseok and Kim, Minjun and Kim, Byeongwook and Kwon,
Se Jung and Kang, U and Lee, Donsoo},
booktitle={Proceedings of the 63rd Annual Meeting of the Association
for Computational Linguistics (Volume 1: Long Papers),
{ACL} 2025, Vienna, Austria, July 27-August 1st, 2025},
year={2025}
}This repository is for research purposes only. For any other purposes, please contact the authors.