fix get_ppl for long context by lvhan028 · Pull Request #4314 · InternLM/lmdeploy

lvhan028 · 2026-02-02T07:33:47Z

No description provided.

Copilot

Pull request overview

This pull request refactors the perplexity calculation logic to fix issues with long context processing by replacing step tracking through a separate steps parameter with proper session state management.

Changes:

Replaced the steps parameter with sessions parameter in async_get_logits and _get_ppl methods to track sequence state through session objects
Sessions are now created at the caller level (get_ppl) and passed down to processing methods
Session step tracking is now managed through the session object's state via session.update(step=i) and session.step

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
lmdeploy/serve/core/async_engine.py	Modified `async_get_logits` to accept sessions parameter instead of steps, using session state for step tracking; removed session cleanup logic
lmdeploy/pipeline.py	Updated `get_ppl`, `_get_long_text_ppl`, and `_get_ppl` to create and pass session objects for proper state management across long context chunks

Comments suppressed due to low confidence (1)

lmdeploy/pipeline.py:296

The sessions created at line 285 and 290 are never explicitly cleaned up or closed. When sequence_end=True (the default in _get_ppl), sessions should be properly closed to release resources. The old code removed session cleanup logic from async_get_logits (lines 631-633 in the old version), but no cleanup logic was added at the callers. This can lead to resource leaks, especially for long-running operations.

Consider adding session cleanup in the get_ppl method after _get_ppl or _get_long_text_ppl completes, or ensure that sessions are properly managed through the session manager's remove method when they are no longer needed.

                session = self.session_mgr.get()
                res = self._get_long_text_ppl(session, input_ids=_input_ids, max_input_len=max_input_len)
                result.append(res)
            else:
                _input_ids = [input_ids[indices[i]] for i in range(start, end)]
                sessions = [self.session_mgr.get() for _ in range(start, end)]
                res = self._get_ppl(
                    sessions=sessions,
                    input_ids=_input_ids,
                    max_input_len=max_input_len,
                )
                result.extend(res)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lmdeploy/serve/core/async_engine.py

Copilot · 2026-02-02T07:38:18Z

lmdeploy/serve/core/async_engine.py

-                               steps: List[int] | None = None,
+                               sessions: List['Session'] | None = None,
                               sequence_start: bool = True,
                               sequence_end: bool = True) -> List[torch.Tensor]:


The async_get_logits method signature was changed to accept a sessions parameter, but the method lacks documentation (docstring) explaining this parameter and its expected behavior. This makes it unclear to users:

When they should pass sessions vs letting the method create them internally

Who is responsible for session cleanup when sessions are passed in

The relationship between sessions and input_ids

Consider adding a comprehensive docstring that documents all parameters, return values, and session management responsibilities.

Suggested change

sequence_end: bool = True) -> List[torch.Tensor]:

sequence_end: bool = True) -> List[torch.Tensor]:

"""Compute logits for one or more input sequences asynchronously.

This is a low-level helper used, for example, by :meth:`async_get_reward_score`

to obtain per-token logits without performing full text generation.

Args:

input_ids (List[List[int]]): A list of token id sequences. Each inner

list represents a single input sequence. The length of this list

determines the number of parallel requests issued by this method.

sessions (List[Session] | None, optional): Optional list of pre-created

:class:`Session` objects to use for each input sequence. When

provided, ``len(sessions)`` must equal ``len(input_ids)`` and

each session at index ``i`` is used exclusively for

``input_ids[i]``.

- If ``sessions`` is ``None`` (default), the method acquires

sessions internally via ``self.session_mgr.get()``. The

lifecycle of these sessions (reuse, pooling, and cleanup) is

managed outside this method by the engine/session manager.

- If a list of sessions is supplied, the caller retains

ownership of those sessions. This method will not close, end,

or free any session; callers are responsible for any required

cleanup (for example via ``SessionManager`` or

:meth:`free_cache`) and for deciding whether to reuse the

sessions for subsequent calls.

sequence_start (bool, optional): Whether the provided ``input_ids``

should be treated as the start of a new sequence for each

session. This flag is forwarded to :meth:`safe_run` unchanged.

sequence_end (bool, optional): Whether this call marks the end of

the sequence for each session. This flag is forwarded to

:meth:`safe_run` unchanged.

Returns:

List[torch.Tensor]: A list of tensors containing the logits for each

input sequence. The list is index-aligned with ``input_ids`` such

that element ``i`` corresponds to ``input_ids[i]``. For each

sequence, the returned tensor has shape ``(L_i, V)`` where ``L_i``

is the length of ``input_ids[i]`` and ``V`` is the vocabulary size.

Only logits corresponding to the provided input tokens are

returned; any generated tokens (if produced internally) are not

included.

"""

zhulinJulia24 · 2026-02-02T09:03:11Z

base evaluate testcase passed!

fix get_ppl for long context

f457c54

Copilot AI review requested due to automatic review settings February 2, 2026 07:33

Copilot started reviewing on behalf of lvhan028 February 2, 2026 07:34 View session

lvhan028 added the Bug:P0 label Feb 2, 2026

lvhan028 requested a review from lzhangzz February 2, 2026 07:34

Copilot AI reviewed Feb 2, 2026

View reviewed changes

lzhangzz approved these changes Feb 2, 2026

View reviewed changes

remove sessions

0d695b8

lvhan028 merged commit 13e277f into InternLM:main Feb 3, 2026
4 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix get_ppl for long context#4314

fix get_ppl for long context#4314
lvhan028 merged 2 commits intoInternLM:mainfrom
lvhan028:fix-ppl-long-context

lvhan028 commented Feb 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Feb 2, 2026

Uh oh!

zhulinJulia24 commented Feb 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

-                               sequence_end: bool = True) -> List[torch.Tensor]:
+                               sequence_end: bool = True) -> List[torch.Tensor]:
+        """Compute logits for one or more input sequences asynchronously.
+        This is a low-level helper used, for example, by :meth:`async_get_reward_score`
+        to obtain per-token logits without performing full text generation.
+        Args:
+            input_ids (List[List[int]]): A list of token id sequences. Each inner
+                list represents a single input sequence. The length of this list
+                determines the number of parallel requests issued by this method.
+            sessions (List[Session] | None, optional): Optional list of pre-created
+                :class:`Session` objects to use for each input sequence. When
+                provided, ``len(sessions)`` must equal ``len(input_ids)`` and
+                each session at index ``i`` is used exclusively for
+                ``input_ids[i]``.
+                - If ``sessions`` is ``None`` (default), the method acquires
+                  sessions internally via ``self.session_mgr.get()``. The
+                  lifecycle of these sessions (reuse, pooling, and cleanup) is
+                  managed outside this method by the engine/session manager.
+                - If a list of sessions is supplied, the caller retains
+                  ownership of those sessions. This method will not close, end,
+                  or free any session; callers are responsible for any required
+                  cleanup (for example via ``SessionManager`` or
+                  :meth:`free_cache`) and for deciding whether to reuse the
+                  sessions for subsequent calls.
+            sequence_start (bool, optional): Whether the provided ``input_ids``
+                should be treated as the start of a new sequence for each
+                session. This flag is forwarded to :meth:`safe_run` unchanged.
+            sequence_end (bool, optional): Whether this call marks the end of
+                the sequence for each session. This flag is forwarded to
+                :meth:`safe_run` unchanged.
+        Returns:
+            List[torch.Tensor]: A list of tensors containing the logits for each
+            input sequence. The list is index-aligned with ``input_ids`` such
+            that element ``i`` corresponds to ``input_ids[i]``. For each
+            sequence, the returned tensor has shape ``(L_i, V)`` where ``L_i``
+            is the length of ``input_ids[i]`` and ``V`` is the vocabulary size.
+            Only logits corresponding to the provided input tokens are
+            returned; any generated tokens (if produced internally) are not
+            included.
+        """

Conversation

lvhan028 commented Feb 2, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

zhulinJulia24 commented Feb 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants