diff --git a/.github/workflows/UnitTests.yml b/.github/workflows/UnitTests.yml index ac85a6d43..7954c285d 100644 --- a/.github/workflows/UnitTests.yml +++ b/.github/workflows/UnitTests.yml @@ -57,11 +57,11 @@ jobs: - name: PyTest run: | #--deselect=src/maxdiffusion/tests/input_pipeline_interface_test.py export LIBTPU_INIT_ARGS='--xla_tpu_scoped_vmem_limit_kib=65536' - HF_HUB_CACHE=/mnt/disks/github-runner-disk/ HF_HOME=/mnt/disks/github-runner-disk/ TOKENIZERS_PARALLELISM=false python3 -m pytest --deselect=src/maxdiffusion/tests/ltx_transformer_step_test.py -x -# add_pull_ready: + HF_HUB_CACHE=/mnt/disks/github-runner-disk/ HF_HOME=/mnt/disks/github-runner-disk/ TOKENIZERS_PARALLELISM=false python3 -m pytest --ignore=src/maxdiffusion/kernels/ --deselect=src/maxdiffusion/tests/ltx_transformer_step_test.py -x +# add_pull_ready # if: github.ref != 'refs/heads/main' # permissions: # checks: read # pull-requests: write # needs: build -# uses: ./.github/workflows/AddLabel.yml \ No newline at end of file +# uses: ./.github/workflows/AddLabel.yml diff --git a/README.md b/README.md index 62bb25f90..71ed5d9c0 100755 --- a/README.md +++ b/README.md @@ -17,6 +17,7 @@ [![Unit Tests](https://github.com/AI-Hypercomputer/maxdiffusion/actions/workflows/UnitTests.yml/badge.svg)](https://github.com/AI-Hypercomputer/maxdiffusion/actions/workflows/UnitTests.yml) # What's new? +- **`2026/04/16`**: Support for Tokamax Ring Attention kernel is now added. - **`2026/03/31`**: Wan2.2 SenCache inference is now supported for T2V and I2V (up to 1.4x speedup) - **`2026/03/25`**: Wan2.1 and Wan2.2 Magcache inference is now supported - **`2026/03/25`**: LTX-2 Video Inference is now supported @@ -623,6 +624,24 @@ To generate images, run the following command: ... ``` +### Ring Attention +We added ring attention support for Wan models. Below are the stats for one `720p` (81 frames) video generation (with CFG DP): +| Accelerator | Model | Attention Type | Inference Steps | Sharding | e2e Generation Time | +| -- | -- | -- | -- | -- | -- | +| v7x-8 | WAN 2.1 | Tokamax Flash | 50 | dp2-fsdp1-context4-tp1 | 264.2 | +| v7x-8 | WAN 2.1 | Tokamax Ring | 50 | dp2-fsdp1-context4-tp1 | **252.4** | +| v7x-8 | WAN 2.2 | Tokamax Flash | 40 | dp2-fsdp1-context4-tp1 | 212.7 | +| v7x-8 | WAN 2.2 | Tokamax Ring | 40 | dp2-fsdp1-context4-tp1 | **201.7** | + +| Accelerator | Model | Attention Type | Inference Steps | Sharding | e2e Generation Time | +| -- | -- | -- | -- | -- | -- | +| v7x-16 | WAN 2.1 | Tokamax Flash | 50 | dp2-fsdp1-context8-tp1 | 146.6 | +| v7x-16 | WAN 2.1 | Tokamax Ring | 50 | dp2-fsdp1-context8-tp1 | **137.2** | +| v7x-16 | WAN 2.2 | Tokamax Flash | 40 | dp2-fsdp1-context8-tp1 | **117.8** | +| v7x-16 | WAN 2.2 | Tokamax Ring | 40 | dp2-fsdp1-context8-tp1 | 137.5 | + +(* There are some known stability issues for ring attention on 16 TPUs, please use `tokamax_flash` attention instead.) + ## Flux First make sure you have permissions to access the Flux repos in Huggingface.