A list of ideas to explore: * [x] Lazy transfers (so we don't load data into the GPU at once) * [x] FP16 on load * [x] FP16 policies on Axon * [x] ~[Attention slicing](https://github.com/huggingface/diffusers/pull/366)~ (no longer applicable https://github.com/huggingface/diffusers/issues/4487) * [x] ~[Flash attention](https://huggingface.co/docs/diffusers/optimization/fp16#memory-efficient-attention) ([JAX version](https://github.com/lucidrains/flash-attention-jax))~ (see notes in https://github.com/elixir-nx/bumblebee/pull/300) * [ ] [DPM-Solver++](https://mobile.twitter.com/pcuenq/status/1590665645233881089) (more schedulers [here](https://github.com/ozanciga/diffusion-for-beginners), [here](https://stable-diffusion-art.com/samplers/#Samplers_overview), and in the comments below) ([another PyTorch implementation](https://github.com/lucidrains/denoising-diffusion-pytorch/pull/148/files)) * [ ] [TokenMerging](https://arxiv.org/abs/2303.17604) * [ ] LCM+LoRA * [x] ~DeepCache~ (not applicable https://github.com/elixir-nx/bumblebee/issues/147#issuecomment-1963787773)
A list of ideas to explore:
Attention slicing(no longer applicable Remove attention slicing from docs huggingface/diffusers#4487)Flash attention (JAX version)(see notes in Refactor attention implementation #300)DeepCache(not applicable Reduce StableDiffusion memory usage #147 (comment))