Skip to content

snudm-starlab/UniQuanF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Unifying Uniform and Binary-coding Quantization for Accurate Compression of Large Language Models

This package contains a PyTorch implementation of UniQuanF.

Overview

Overview of UniQuanF

UniQuanF (Unified Quantization with Flexible Mapping) is an accurate quantization method for large language models (LLMs) without loss of accuracy. We propose UniQuan (Unified Quantization) which unifies the strong optimizability of uniform quantization (UQ) and the powerful expressiveness of binary-coding quantization (BCQ) through unifying their quantization process. We propose UniQuanF by unifying FlexRound and ALTERNATING, the best-performing UQ and BCQ methods, respectively, based on UniQuan.

Code Description

The following is an overview of our codes.

UniQuanF
│
│
├─  src                     : a directory for source codes
│   ├─ main.py              : a main code running UniQuanF
│   ├─ arguments.py         : descriptions for arguments
│   ├─ uniquanf.py          : codes for optimization
│   ├─ cached_loader.py     : codes for managing cached inputs
│   ├─ swap_linear.py       : codes for swapping linear layers into quantized ones
│   ├─ bcq_quant_layer.py   : codes for quantized linear layers
│   ├─ alternating.py       : an implementation of a general alternaing update
│   ├─ loss.py              : codes for loss functions
│   ├─ evaluation.py        : codes for comprehensive evaluation of quantized models. 
│   ├─ categories.py        : utility codes for general purpose
│   ├─ data_utils.py        : utility codes retaining to datasets
│   ├─ general_utils.py     : utility codes for general purpose
│   └─ evaluation_utils.py  : utility codes for evaluation
│
├─ scripts                  : a directory for script files
│   ├─ evaluate.sh          : a script file for evaluating the quanized model
│   └─ run.sh               : a script file for running UniQuanF
│
└─ README.md

Prerequisite

Install dependencies

The list of dependencies is as follows:

  • python >= 3.10.12
  • tqdm 4.66.5
  • numpy 1.26.3
  • torch 2.3.1
  • datasets 2.21.0
  • transformers 4.42.0

Install dependencies using the following command:

pip install -r requirements.txt

Install lm-eval package using the following command:

git clone https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
git checkout tags/v0.4.2
pip install -e .
cd ..

Install evaluate package using the following command:

git clone https://github.com/huggingface/evaluate.git
cd evaluate
pip install -e .
cd ..

Datasets

Our code automatically downloads the dataset needed when you run our main.py file except for MMLU. MMLU is located in data/mmmlu/ directory and you don't have to manually download any datasets.

Running

Key arguments of UniQuanF

Experimental settings

  • model_name_or_path: the path of the directory for the dense model
  • dataset_name: the name of the sample dataset
  • num_samples: the number of samples in the sample dataset
  • seed: a random seed
  • n_bits_w: a desired bit-width for weights
  • group_size: the size of weight groups

Hyperparameters of UniQuanF

  • u_lr: a learning rate for the quantization parameters of UQ
  • b_lr: a learning rate for the quantization parameters of BCQ
  • iters_w: the number of iterations for optimization
  • per_device_train_batch_size: a batch size for optimization
  • period: a remapping period (p)
  • grid_search_iters: the number of iterations for grid search (G)
  • alternating_update_iters: the number of iterations for an alternating update (T)

A code for running UniQuanF

We provide the code for running UniQuanF as in scripts/run.sh. Run the script file as follows:

bash scripts/run.sh

If you want to evaluate the quantized model, use evaluation.py file as in scripts/evaluate.sh. Run the script file as follows:

bash scripts/evaluate.sh

Reference

If you find UniQuanF useful or relevant to your research, please kindly cite our paper:

@inproceedings{park2025uniquanf,
  title={Unifying Uniform and Binary-coding Quantization for
         Accurate Compression of Large Language Models},
  author={Park, Seungcheol and Bae, Jeongin and Kwon,
          Beomseok and Kim, Minjun and Kim, Byeongwook and Kwon,
          Se Jung and Kang, U and Lee, Donsoo},
  booktitle={Proceedings of the 63rd Annual Meeting of the Association
             for Computational Linguistics (Volume 1: Long Papers),
             {ACL} 2025, Vienna, Austria, July 27-August 1st, 2025},
  year={2025}
}

License

This repository is for research purposes only. For any other purposes, please contact the authors.

Related Projects

About

Unifying Uniform and Binary-coding Quantization for Accurate Compression of Large Language Models (ACL 2025)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors