feat(model): add support for 4-bit quantization#209
Conversation
Codecov ReportPatch coverage:
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. Additional details and impacted files@@ Coverage Diff @@
## master #209 +/- ##
==========================================
- Coverage 94.54% 94.27% -0.27%
==========================================
Files 7 7
Lines 330 332 +2
==========================================
+ Hits 312 313 +1
- Misses 18 19 +1
☔ View full report in Codecov by Sentry. |
|
@peakji Thanks for working on this. Just wanted to mention that while qlora 4 bit is a good option to have, it is around 8x slower in inference than GPTQ/AutoGPTQ. I hope autogptq support is added one day in addition to this (also, gptq model download sizes are like a quarter of the size of full precision models so it can save a lot of space as well) |
No description provided.