Skip to content

[Relax][Web] Add ApplyPresenceAndRequencyPenalty#16504

Merged
MasterJH5574 merged 1 commit intoapache:mainfrom
CharlieFRuan:pr-0201-freq-penalty
Feb 1, 2024
Merged

[Relax][Web] Add ApplyPresenceAndRequencyPenalty#16504
MasterJH5574 merged 1 commit intoapache:mainfrom
CharlieFRuan:pr-0201-freq-penalty

Conversation

@CharlieFRuan
Copy link
Copy Markdown
Member

This PR adds ApplyPresenceAndFrequencyPenalty() to lm_support.cc and exposes it to Web runtime.

This is essentially the same as applyRepetitionPenalty except we follow a different way of penalizing repeating tokens, following https://platform.openai.com/docs/guides/text-generation/frequency-and-presence-penalties.

Tested end-to-end with WebLLM.

@CharlieFRuan CharlieFRuan changed the title [Relax][WebGPU] Add ApplyPresenceAndRequencyPenalty [Relax][Web] Add ApplyPresenceAndRequencyPenalty Feb 1, 2024
@CharlieFRuan
Copy link
Copy Markdown
Member Author

cc @tqchen @MasterJH5574

@MasterJH5574 MasterJH5574 merged commit 5c45ae8 into apache:main Feb 1, 2024
CharlieFRuan added a commit to mlc-ai/web-llm that referenced this pull request Feb 15, 2024
This PR adds `GenerationConfig`, which allows per-generation configs.
See `get-started.ts` for its example usage:

```typescript
let genConfig: webllm.GenerationConfig = {
  presence_penalty: 0.5,
  frequency_penalty: 0.5,
  max_gen_len: 20,
  // stop: ["is", "Canada"]  // for demonstration purpose
}

const prompt0 = "What is the capital of Canada?";
const reply0 = await chat.generate(prompt0, generateProgressCallback, 1, genConfig);
```

In addition to the existing fields in `mlc-chat-config.json`, we also
support OpenAI-like fields `frequency_penalty`, `presence_penalty`, and
`stop` to prepare for the incoming OpenAI-like APIs.

This PR also sets up unit tests; use `npm test` to run tests. However,
some work needs to be done to support end-to-end testing (e.g. accessing
WebGPU in a test environment).

All prebuilt WASMs are updated correspondingly:
mlc-ai/binary-mlc-llm-libs#90 as we introduced a
new API in tvmjs's `runtime.ts` via
apache/tvm#16504.

Note that the update of Llama WASMs is breaking in the sense that users
will have to update their WebLLM npm.
atebites-hub pushed a commit to atebites-hub/web-llm that referenced this pull request Oct 4, 2025
This PR adds `GenerationConfig`, which allows per-generation configs.
See `get-started.ts` for its example usage:

```typescript
let genConfig: webllm.GenerationConfig = {
  presence_penalty: 0.5,
  frequency_penalty: 0.5,
  max_gen_len: 20,
  // stop: ["is", "Canada"]  // for demonstration purpose
}

const prompt0 = "What is the capital of Canada?";
const reply0 = await chat.generate(prompt0, generateProgressCallback, 1, genConfig);
```

In addition to the existing fields in `mlc-chat-config.json`, we also
support OpenAI-like fields `frequency_penalty`, `presence_penalty`, and
`stop` to prepare for the incoming OpenAI-like APIs.

This PR also sets up unit tests; use `npm test` to run tests. However,
some work needs to be done to support end-to-end testing (e.g. accessing
WebGPU in a test environment).

All prebuilt WASMs are updated correspondingly:
mlc-ai/binary-mlc-llm-libs#90 as we introduced a
new API in tvmjs's `runtime.ts` via
apache/tvm#16504.

Note that the update of Llama WASMs is breaking in the sense that users
will have to update their WebLLM npm.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants