Conversation
There was a problem hiding this comment.
Pull request overview
This pull request refactors the perplexity calculation logic to fix issues with long context processing by replacing step tracking through a separate steps parameter with proper session state management.
Changes:
- Replaced the
stepsparameter withsessionsparameter inasync_get_logitsand_get_pplmethods to track sequence state through session objects - Sessions are now created at the caller level (
get_ppl) and passed down to processing methods - Session step tracking is now managed through the session object's state via
session.update(step=i)andsession.step
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| lmdeploy/serve/core/async_engine.py | Modified async_get_logits to accept sessions parameter instead of steps, using session state for step tracking; removed session cleanup logic |
| lmdeploy/pipeline.py | Updated get_ppl, _get_long_text_ppl, and _get_ppl to create and pass session objects for proper state management across long context chunks |
Comments suppressed due to low confidence (1)
lmdeploy/pipeline.py:296
- The sessions created at line 285 and 290 are never explicitly cleaned up or closed. When
sequence_end=True(the default in_get_ppl), sessions should be properly closed to release resources. The old code removed session cleanup logic fromasync_get_logits(lines 631-633 in the old version), but no cleanup logic was added at the callers. This can lead to resource leaks, especially for long-running operations.
Consider adding session cleanup in the get_ppl method after _get_ppl or _get_long_text_ppl completes, or ensure that sessions are properly managed through the session manager's remove method when they are no longer needed.
session = self.session_mgr.get()
res = self._get_long_text_ppl(session, input_ids=_input_ids, max_input_len=max_input_len)
result.append(res)
else:
_input_ids = [input_ids[indices[i]] for i in range(start, end)]
sessions = [self.session_mgr.get() for _ in range(start, end)]
res = self._get_ppl(
sessions=sessions,
input_ids=_input_ids,
max_input_len=max_input_len,
)
result.extend(res)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| steps: List[int] | None = None, | ||
| sessions: List['Session'] | None = None, | ||
| sequence_start: bool = True, | ||
| sequence_end: bool = True) -> List[torch.Tensor]: |
There was a problem hiding this comment.
The async_get_logits method signature was changed to accept a sessions parameter, but the method lacks documentation (docstring) explaining this parameter and its expected behavior. This makes it unclear to users:
- When they should pass sessions vs letting the method create them internally
- Who is responsible for session cleanup when sessions are passed in
- The relationship between sessions and input_ids
Consider adding a comprehensive docstring that documents all parameters, return values, and session management responsibilities.
| sequence_end: bool = True) -> List[torch.Tensor]: | |
| sequence_end: bool = True) -> List[torch.Tensor]: | |
| """Compute logits for one or more input sequences asynchronously. | |
| This is a low-level helper used, for example, by :meth:`async_get_reward_score` | |
| to obtain per-token logits without performing full text generation. | |
| Args: | |
| input_ids (List[List[int]]): A list of token id sequences. Each inner | |
| list represents a single input sequence. The length of this list | |
| determines the number of parallel requests issued by this method. | |
| sessions (List[Session] | None, optional): Optional list of pre-created | |
| :class:`Session` objects to use for each input sequence. When | |
| provided, ``len(sessions)`` must equal ``len(input_ids)`` and | |
| each session at index ``i`` is used exclusively for | |
| ``input_ids[i]``. | |
| - If ``sessions`` is ``None`` (default), the method acquires | |
| sessions internally via ``self.session_mgr.get()``. The | |
| lifecycle of these sessions (reuse, pooling, and cleanup) is | |
| managed outside this method by the engine/session manager. | |
| - If a list of sessions is supplied, the caller retains | |
| ownership of those sessions. This method will not close, end, | |
| or free any session; callers are responsible for any required | |
| cleanup (for example via ``SessionManager`` or | |
| :meth:`free_cache`) and for deciding whether to reuse the | |
| sessions for subsequent calls. | |
| sequence_start (bool, optional): Whether the provided ``input_ids`` | |
| should be treated as the start of a new sequence for each | |
| session. This flag is forwarded to :meth:`safe_run` unchanged. | |
| sequence_end (bool, optional): Whether this call marks the end of | |
| the sequence for each session. This flag is forwarded to | |
| :meth:`safe_run` unchanged. | |
| Returns: | |
| List[torch.Tensor]: A list of tensors containing the logits for each | |
| input sequence. The list is index-aligned with ``input_ids`` such | |
| that element ``i`` corresponds to ``input_ids[i]``. For each | |
| sequence, the returned tensor has shape ``(L_i, V)`` where ``L_i`` | |
| is the length of ``input_ids[i]`` and ``V`` is the vocabulary size. | |
| Only logits corresponding to the provided input tokens are | |
| returned; any generated tokens (if produced internally) are not | |
| included. | |
| """ |
|
base evaluate testcase passed! |
No description provided.