Skip to content

Commit 312ed40

Browse files
DarkLight1337Isotr0py
authored andcommitted
[Doc] Show default pooling method in a table (vllm-project#11904)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Isotr0py <2037008807@qq.com>
1 parent 7ad4af9 commit 312ed40

File tree

2 files changed

+45
-22
lines changed

2 files changed

+45
-22
lines changed

docs/source/models/generative_models.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,14 +8,14 @@ In vLLM, generative models implement the {class}`~vllm.model_executor.models.Vll
88
Based on the final hidden states of the input, these models output log probabilities of the tokens to generate,
99
which are then passed through {class}`~vllm.model_executor.layers.Sampler` to obtain the final text.
1010

11+
For generative models, the only supported `--task` option is `"generate"`.
12+
Usually, this is automatically inferred so you don't have to specify it.
13+
1114
## Offline Inference
1215

1316
The {class}`~vllm.LLM` class provides various methods for offline inference.
1417
See [Engine Arguments](#engine-args) for a list of options when initializing the model.
1518

16-
For generative models, the only supported {code}`task` option is {code}`"generate"`.
17-
Usually, this is automatically inferred so you don't have to specify it.
18-
1919
### `LLM.generate`
2020

2121
The {class}`~vllm.LLM.generate` method is available to all generative models in vLLM.
@@ -33,7 +33,7 @@ for output in outputs:
3333
```
3434

3535
You can optionally control the language generation by passing {class}`~vllm.SamplingParams`.
36-
For example, you can use greedy sampling by setting {code}`temperature=0`:
36+
For example, you can use greedy sampling by setting `temperature=0`:
3737

3838
```python
3939
llm = LLM(model="facebook/opt-125m")

docs/source/models/pooling_models.md

Lines changed: 41 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -14,30 +14,53 @@ As shown in the [Compatibility Matrix](#compatibility-matrix), most vLLM feature
1414
pooling models as they only work on the generation or decode stage, so performance may not improve as much.
1515
```
1616

17-
## Offline Inference
18-
19-
The {class}`~vllm.LLM` class provides various methods for offline inference.
20-
See [Engine Arguments](#engine-args) for a list of options when initializing the model.
21-
22-
For pooling models, we support the following {code}`task` options:
23-
24-
- Embedding ({code}`"embed"` / {code}`"embedding"`)
25-
- Classification ({code}`"classify"`)
26-
- Sentence Pair Scoring ({code}`"score"`)
27-
- Reward Modeling ({code}`"reward"`)
17+
For pooling models, we support the following `--task` options.
18+
The selected option sets the default pooler used to extract the final hidden states:
19+
20+
```{list-table}
21+
:widths: 50 25 25 25
22+
:header-rows: 1
23+
24+
* - Task
25+
- Pooling Type
26+
- Normalization
27+
- Softmax
28+
* - Embedding (`embed`)
29+
- `LAST`
30+
- ✅︎
31+
- ✗
32+
* - Classification (`classify`)
33+
- `LAST`
34+
- ✗
35+
- ✅︎
36+
* - Sentence Pair Scoring (`score`)
37+
- \*
38+
- \*
39+
- \*
40+
* - Reward Modeling (`reward`)
41+
- `ALL`
42+
- ✗
43+
- ✗
44+
```
2845

29-
The selected task determines the default {class}`~vllm.model_executor.layers.Pooler` that is used:
46+
\*The default pooler is always defined by the model.
3047

31-
- Embedding: Extract only the hidden states corresponding to the last token, and apply normalization.
32-
- Classification: Extract only the hidden states corresponding to the last token, and apply softmax.
33-
- Sentence Pair Scoring: Extract only the hidden states corresponding to the last token, and apply softmax.
34-
- Reward Modeling: Extract all of the hidden states and return them directly.
48+
```{note}
49+
If the model's implementation in vLLM defines its own pooler, the default pooler is set to that instead of the one specified in this table.
50+
```
3551

3652
When loading [Sentence Transformers](https://huggingface.co/sentence-transformers) models,
37-
we attempt to override the default pooler based on its Sentence Transformers configuration file ({code}`modules.json`).
53+
we attempt to override the default pooler based on its Sentence Transformers configuration file (`modules.json`).
3854

39-
You can customize the model's pooling method via the {code}`override_pooler_config` option,
55+
```{tip}
56+
You can customize the model's pooling method via the `--override-pooler-config` option,
4057
which takes priority over both the model's and Sentence Transformers's defaults.
58+
```
59+
60+
## Offline Inference
61+
62+
The {class}`~vllm.LLM` class provides various methods for offline inference.
63+
See [Engine Arguments](#engine-args) for a list of options when initializing the model.
4164

4265
### `LLM.encode`
4366

0 commit comments

Comments
 (0)