GitHub - bugboy1769/llm_serving: LLM inferencing using huggingface transformers on Apple silicon. Batch processing, KV caching, Continuous batching. ToDo: LoRA from scratch, Quantization.

LLM Inference Optimisation on a Local Macbook

The code is largely from https://learn.deeplearning.ai/courses/efficiently-serving-llms, it is my implementation of it on a macbook. The key change is moving everything to MPS. Sometimes, some torch initialisations occur natively on the CPU, without explicit movement you're thrown a SIGBUS error your way. Also, while debugging, Claude failed to identify the problem effectively over multiple exchanges while ChatGPT (Think Longer) one-shotted it. Currently in repo:

Vanilla Generation
KV Caching - Doesn't work too well for llama style models have they have much more stringent implementations for passing past_key_values.
Continuous Batching - Currently fixing an error which causes kv caching to fail at the last token to be generated in the first batch.
Nice Graphs and Conceptual Comments

ToDo: Quantization and LoRA.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
__pycache__		__pycache__
time_graphs		time_graphs
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
batch_gen_rewrite.py		batch_gen_rewrite.py
batch_generation_w_plot.py		batch_generation_w_plot.py
batch_helpers.py		batch_helpers.py
cont_batching_functions.py		cont_batching_functions.py
continuous_batching_implement.py		continuous_batching_implement.py
kv_cache_rewrite.py		kv_cache_rewrite.py
kv_caching.py		kv_caching.py
tensor_test_file.py		tensor_test_file.py
vanilla_batching.py		vanilla_batching.py
vanilla_gpt2_generation.py		vanilla_gpt2_generation.py
weird_market_interest function.py		weird_market_interest function.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Inference Optimisation on a Local Macbook

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM Inference Optimisation on a Local Macbook

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages