# V0.1.5 Iteration Plan ## New Model Support - [x] [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) - Supported in #54 ## Feature Support - [x] Remove the `pycuda` dependency; #20 - Supported in #30 - [x] Change the `flash_attn` dependency to optional; #23 @liyucheng09 - [x] Add unittest. @liyucheng09 - Supported in #31 - [x] Support multi-gpu #25; - Supported in #30 - [x] Add end-to-end benchmark script using vLLM #18; - Supported in #49 ## Bugfix - [x] Fix the apply_rotary_pos_emb_single function. #25. - Fixed in #30 - [x] Fix the import warning; #28 - Fixed in #30 - [x] Fix the vLLM >= 0.4.1; #42 - Fixed in #44 - [x] Fix the `is_flash_attn_2_available` issue; - Fixed in #54
V0.1.5 Iteration Plan
New Model Support
Feature Support
pycudadependency; [Question]: Question about KV-cache storage #20flash_attndependency to optional; [Question]: Is A6000 supported? #23 @liyucheng09Bugfix
setup.py#28is_flash_attn_2_availableissue;