Skip to content

Approximate prefix cache scorer should incorporate the absolute match length when calculating the score #2561

@ahg-g

Description

@ahg-g

What would you like to be added:
Currently the approximate prefix cache scorer calculates the score based on the match percentage of the prompt itself. The absolute prompt length should be a factor in the score.

Why is this needed:
The current approach may not be ideal for short sequences, which will typically get a high match percentage (since the denominator is small) and so results in high prefix match score and could be more susceptible to hot spots.

Metadata

Metadata

Assignees

Labels

triage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions