Planning of custom kernels on 2025 H2
DeepEP-compatible EP Library
Our goal of this DeepEP-compatible library is to provide a set of consistent APIs with DeepSeek's DeepEP, relies upon our custom kernels.
Low-latency Kernels
Please be noticed that we have a restriction of BS <= 512 for low-latency kernels. Which means these kernels only can be used in decode instances, not prefill ones.
- [Aug 15th.] Remove the restriction of
HiddenSize = 7168, thus supporting more MoE models.
Normal Kernels
- [Aug 30th.] Support prefill with
SeqLen <= 2048
Quantization Kernels
- [TBD] Simple AWQ dequant kernels
- [TBD] Simple GPTQ dequant kernels