Hi, Since it is common to use with deepspeed zero w/ offloading when training large LLM, does TE currently support in this mode? Currently deepspeed support is just unittest as refered by TE's readme: https://github.com/microsoft/DeepSpeed/pull/3731 Thx~
Hi,
Since it is common to use with deepspeed zero w/ offloading when training large LLM, does TE currently support in this mode?
Currently deepspeed support is just unittest as refered by TE's readme: deepspeedai/DeepSpeed#3731
Thx~