Unifying Uniform and Binary-coding Quantization for Accurate Compression of Large Language Models

This package contains a PyTorch implementation of UniQuan_F.

Overview

Overview of UniQuan_F

UniQuan_F (Unified Quantization with Flexible Mapping) is an accurate quantization method for large language models (LLMs) without loss of accuracy. We propose UniQuan (Unified Quantization) which unifies the strong optimizability of uniform quantization (UQ) and the powerful expressiveness of binary-coding quantization (BCQ) through unifying their quantization process. We propose UniQuan_F by unifying FlexRound and ALTERNATING, the best-performing UQ and BCQ methods, respectively, based on UniQuan.

Code Description

The following is an overview of our codes.

UniQuanF
│
│
├─  src                     : a directory for source codes
│   ├─ main.py              : a main code running UniQuanF
│   ├─ arguments.py         : descriptions for arguments
│   ├─ uniquanf.py          : codes for optimization
│   ├─ cached_loader.py     : codes for managing cached inputs
│   ├─ swap_linear.py       : codes for swapping linear layers into quantized ones
│   ├─ bcq_quant_layer.py   : codes for quantized linear layers
│   ├─ alternating.py       : an implementation of a general alternaing update
│   ├─ loss.py              : codes for loss functions
│   ├─ evaluation.py        : codes for comprehensive evaluation of quantized models. 
│   ├─ categories.py        : utility codes for general purpose
│   ├─ data_utils.py        : utility codes retaining to datasets
│   ├─ general_utils.py     : utility codes for general purpose
│   └─ evaluation_utils.py  : utility codes for evaluation
│
├─ scripts                  : a directory for script files
│   ├─ evaluate.sh          : a script file for evaluating the quanized model
│   └─ run.sh               : a script file for running UniQuanF
│
└─ README.md

Prerequisite

Install dependencies

The list of dependencies is as follows:

python >= 3.10.12
tqdm 4.66.5
numpy 1.26.3
torch 2.3.1
datasets 2.21.0
transformers 4.42.0

Install dependencies using the following command:

pip install -r requirements.txt

Install lm-eval package using the following command:

git clone https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
git checkout tags/v0.4.2
pip install -e .
cd ..

Install evaluate package using the following command:

git clone https://github.com/huggingface/evaluate.git
cd evaluate
pip install -e .
cd ..

Datasets

Our code automatically downloads the dataset needed when you run our main.py file except for MMLU. MMLU is located in data/mmmlu/ directory and you don't have to manually download any datasets.

Running

Key arguments of UniQuan_F

Experimental settings

model_name_or_path: the path of the directory for the dense model
dataset_name: the name of the sample dataset
num_samples: the number of samples in the sample dataset
seed: a random seed
n_bits_w: a desired bit-width for weights
group_size: the size of weight groups

Hyperparameters of UniQuan_F

u_lr: a learning rate for the quantization parameters of UQ
b_lr: a learning rate for the quantization parameters of BCQ
iters_w: the number of iterations for optimization
per_device_train_batch_size: a batch size for optimization
period: a remapping period (p)
grid_search_iters: the number of iterations for grid search (G)
alternating_update_iters: the number of iterations for an alternating update (T)

A code for running UniQuan_F

We provide the code for running UniQuan_F as in scripts/run.sh. Run the script file as follows:

bash scripts/run.sh

If you want to evaluate the quantized model, use evaluation.py file as in scripts/evaluate.sh. Run the script file as follows:

bash scripts/evaluate.sh

Reference

If you find UniQuan_F useful or relevant to your research, please kindly cite our paper:

@inproceedings{park2025uniquanf,
  title={Unifying Uniform and Binary-coding Quantization for
         Accurate Compression of Large Language Models},
  author={Park, Seungcheol and Bae, Jeongin and Kwon,
          Beomseok and Kim, Minjun and Kim, Byeongwook and Kwon,
          Se Jung and Kang, U and Lee, Donsoo},
  booktitle={Proceedings of the 63rd Annual Meeting of the Association
             for Computational Linguistics (Volume 1: Long Papers),
             {ACL} 2025, Vienna, Austria, July 27-August 1st, 2025},
  year={2025}
}

License

This repository is for research purposes only. For any other purposes, please contact the authors.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
imgs		imgs
scripts		scripts
src		src
ReadMe.md		ReadMe.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unifying Uniform and Binary-coding Quantization for Accurate Compression of Large Language Models

Overview

Overview of UniQuan_F

Code Description

Prerequisite

Install dependencies

Datasets

Running

Key arguments of UniQuan_F

A code for running UniQuan_F

Reference

License

Related Projects

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Unifying Uniform and Binary-coding Quantization for Accurate Compression of Large Language Models

Overview

Overview of UniQuanF

Code Description

Prerequisite

Install dependencies

Datasets

Running

Key arguments of UniQuanF

A code for running UniQuanF

Reference

License

Related Projects

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Overview of UniQuan_F

Key arguments of UniQuan_F

A code for running UniQuan_F

Packages