[Relax][Web] Add ApplyPresenceAndRequencyPenalty#16504
Merged
MasterJH5574 merged 1 commit intoapache:mainfrom Feb 1, 2024
Merged
[Relax][Web] Add ApplyPresenceAndRequencyPenalty#16504MasterJH5574 merged 1 commit intoapache:mainfrom
MasterJH5574 merged 1 commit intoapache:mainfrom
Conversation
Member
Author
MasterJH5574
approved these changes
Feb 1, 2024
CharlieFRuan
added a commit
to mlc-ai/web-llm
that referenced
this pull request
Feb 15, 2024
This PR adds `GenerationConfig`, which allows per-generation configs.
See `get-started.ts` for its example usage:
```typescript
let genConfig: webllm.GenerationConfig = {
presence_penalty: 0.5,
frequency_penalty: 0.5,
max_gen_len: 20,
// stop: ["is", "Canada"] // for demonstration purpose
}
const prompt0 = "What is the capital of Canada?";
const reply0 = await chat.generate(prompt0, generateProgressCallback, 1, genConfig);
```
In addition to the existing fields in `mlc-chat-config.json`, we also
support OpenAI-like fields `frequency_penalty`, `presence_penalty`, and
`stop` to prepare for the incoming OpenAI-like APIs.
This PR also sets up unit tests; use `npm test` to run tests. However,
some work needs to be done to support end-to-end testing (e.g. accessing
WebGPU in a test environment).
All prebuilt WASMs are updated correspondingly:
mlc-ai/binary-mlc-llm-libs#90 as we introduced a
new API in tvmjs's `runtime.ts` via
apache/tvm#16504.
Note that the update of Llama WASMs is breaking in the sense that users
will have to update their WebLLM npm.
atebites-hub
pushed a commit
to atebites-hub/web-llm
that referenced
this pull request
Oct 4, 2025
This PR adds `GenerationConfig`, which allows per-generation configs.
See `get-started.ts` for its example usage:
```typescript
let genConfig: webllm.GenerationConfig = {
presence_penalty: 0.5,
frequency_penalty: 0.5,
max_gen_len: 20,
// stop: ["is", "Canada"] // for demonstration purpose
}
const prompt0 = "What is the capital of Canada?";
const reply0 = await chat.generate(prompt0, generateProgressCallback, 1, genConfig);
```
In addition to the existing fields in `mlc-chat-config.json`, we also
support OpenAI-like fields `frequency_penalty`, `presence_penalty`, and
`stop` to prepare for the incoming OpenAI-like APIs.
This PR also sets up unit tests; use `npm test` to run tests. However,
some work needs to be done to support end-to-end testing (e.g. accessing
WebGPU in a test environment).
All prebuilt WASMs are updated correspondingly:
mlc-ai/binary-mlc-llm-libs#90 as we introduced a
new API in tvmjs's `runtime.ts` via
apache/tvm#16504.
Note that the update of Llama WASMs is breaking in the sense that users
will have to update their WebLLM npm.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds
ApplyPresenceAndFrequencyPenalty()tolm_support.ccand exposes it to Web runtime.This is essentially the same as
applyRepetitionPenaltyexcept we follow a different way of penalizing repeating tokens, following https://platform.openai.com/docs/guides/text-generation/frequency-and-presence-penalties.Tested end-to-end with WebLLM.