Feature request
I'd like to request support for the 8-bit version of the Schedule-Free Optimizer.
Motivation
Schedule-Free optimizers are useful because they remove the need for an external learning-rate schedule but their AdamW variant still carries optimizer-state memory overhead. A related request was raised in the Schedule-Free repository, where the maintainer suggested that projects already maintaining 8-bit optimizers might be a better place for Schedule-Free 8-bit variants. Since bitsandbytes already provides memory-efficient 8-bit optimizers, an 8-bit Schedule-Free AdamW variant seems good.
This would be useful for users who want the training behavior of ScheduleFree AdamW but are constrained by GPU memory.
Your contribution
I made a small experimental prototype and observed similar loss curves to the official fp32 ScheduleFree AdamW implementation in limited experiments. So far I have only tested small CNNs and ViT-Tiny on CIFAR-10 due to GPU constraints. In those tests, the prototype gave roughly 40% memory reduction in my setup, although throughput was somewhat lower. These results are preliminary and not meant as a broad benchmark.
Feature request
I'd like to request support for the 8-bit version of the Schedule-Free Optimizer.
Motivation
Schedule-Free optimizers are useful because they remove the need for an external learning-rate schedule but their AdamW variant still carries optimizer-state memory overhead. A related request was raised in the Schedule-Free repository, where the maintainer suggested that projects already maintaining 8-bit optimizers might be a better place for Schedule-Free 8-bit variants. Since bitsandbytes already provides memory-efficient 8-bit optimizers, an 8-bit Schedule-Free AdamW variant seems good.
This would be useful for users who want the training behavior of ScheduleFree AdamW but are constrained by GPU memory.
Your contribution
I made a small experimental prototype and observed similar loss curves to the official fp32 ScheduleFree AdamW implementation in limited experiments. So far I have only tested small CNNs and ViT-Tiny on CIFAR-10 due to GPU constraints. In those tests, the prototype gave roughly 40% memory reduction in my setup, although throughput was somewhat lower. These results are preliminary and not meant as a broad benchmark.