@@ -14,30 +14,53 @@ As shown in the [Compatibility Matrix](#compatibility-matrix), most vLLM feature
1414pooling models as they only work on the generation or decode stage, so performance may not improve as much.
1515```
1616
17- ## Offline Inference
18-
19- The {class}` ~vllm.LLM ` class provides various methods for offline inference.
20- See [ Engine Arguments] ( #engine-args ) for a list of options when initializing the model.
21-
22- For pooling models, we support the following {code}` task ` options:
23-
24- - Embedding ({code}` "embed" ` / {code}` "embedding" ` )
25- - Classification ({code}` "classify" ` )
26- - Sentence Pair Scoring ({code}` "score" ` )
27- - Reward Modeling ({code}` "reward" ` )
17+ For pooling models, we support the following ` --task ` options.
18+ The selected option sets the default pooler used to extract the final hidden states:
19+
20+ ``` {list-table}
21+ :widths: 50 25 25 25
22+ :header-rows: 1
23+
24+ * - Task
25+ - Pooling Type
26+ - Normalization
27+ - Softmax
28+ * - Embedding (`embed`)
29+ - `LAST`
30+ - ✅︎
31+ - ✗
32+ * - Classification (`classify`)
33+ - `LAST`
34+ - ✗
35+ - ✅︎
36+ * - Sentence Pair Scoring (`score`)
37+ - \*
38+ - \*
39+ - \*
40+ * - Reward Modeling (`reward`)
41+ - `ALL`
42+ - ✗
43+ - ✗
44+ ```
2845
29- The selected task determines the default {class} ` ~vllm.model_executor.layers.Pooler ` that is used:
46+ \* The default pooler is always defined by the model.
3047
31- - Embedding: Extract only the hidden states corresponding to the last token, and apply normalization.
32- - Classification: Extract only the hidden states corresponding to the last token, and apply softmax.
33- - Sentence Pair Scoring: Extract only the hidden states corresponding to the last token, and apply softmax.
34- - Reward Modeling: Extract all of the hidden states and return them directly.
48+ ``` {note}
49+ If the model's implementation in vLLM defines its own pooler, the default pooler is set to that instead of the one specified in this table.
50+ ```
3551
3652When loading [ Sentence Transformers] ( https://huggingface.co/sentence-transformers ) models,
37- we attempt to override the default pooler based on its Sentence Transformers configuration file ({code} ` modules.json ` ).
53+ we attempt to override the default pooler based on its Sentence Transformers configuration file (` modules.json ` ).
3854
39- You can customize the model's pooling method via the {code}` override_pooler_config ` option,
55+ ``` {tip}
56+ You can customize the model's pooling method via the `--override-pooler-config` option,
4057which takes priority over both the model's and Sentence Transformers's defaults.
58+ ```
59+
60+ ## Offline Inference
61+
62+ The {class}` ~vllm.LLM ` class provides various methods for offline inference.
63+ See [ Engine Arguments] ( #engine-args ) for a list of options when initializing the model.
4164
4265### ` LLM.encode `
4366
0 commit comments