Skip to content

[Feature] Add a FP8 Gemm backend for choosing FP8 gemm kernel #13773

@Fridge003

Description

@Fridge003

Checklist

Motivation

Currently in SGLang, the FP8 Gemm kernels we use is controlled by a series of environment variables or implicit dispatching logics, as in https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/layers/quantization/fp8_utils.py#L151

To make a better control, we need a server argument like --fp8-gemm-runner-backend, similar to --moe-runner-backend

Related resources

No response

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions