Skip to content

fix get_ppl for long context#4314

Merged
lvhan028 merged 2 commits intoInternLM:mainfrom
lvhan028:fix-ppl-long-context
Feb 3, 2026
Merged

fix get_ppl for long context#4314
lvhan028 merged 2 commits intoInternLM:mainfrom
lvhan028:fix-ppl-long-context

Conversation

@lvhan028
Copy link
Copy Markdown
Collaborator

@lvhan028 lvhan028 commented Feb 2, 2026

No description provided.

Copilot AI review requested due to automatic review settings February 2, 2026 07:33
@lvhan028 lvhan028 added the Bug:P0 label Feb 2, 2026
@lvhan028 lvhan028 requested a review from lzhangzz February 2, 2026 07:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request refactors the perplexity calculation logic to fix issues with long context processing by replacing step tracking through a separate steps parameter with proper session state management.

Changes:

  • Replaced the steps parameter with sessions parameter in async_get_logits and _get_ppl methods to track sequence state through session objects
  • Sessions are now created at the caller level (get_ppl) and passed down to processing methods
  • Session step tracking is now managed through the session object's state via session.update(step=i) and session.step

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
lmdeploy/serve/core/async_engine.py Modified async_get_logits to accept sessions parameter instead of steps, using session state for step tracking; removed session cleanup logic
lmdeploy/pipeline.py Updated get_ppl, _get_long_text_ppl, and _get_ppl to create and pass session objects for proper state management across long context chunks
Comments suppressed due to low confidence (1)

lmdeploy/pipeline.py:296

  • The sessions created at line 285 and 290 are never explicitly cleaned up or closed. When sequence_end=True (the default in _get_ppl), sessions should be properly closed to release resources. The old code removed session cleanup logic from async_get_logits (lines 631-633 in the old version), but no cleanup logic was added at the callers. This can lead to resource leaks, especially for long-running operations.

Consider adding session cleanup in the get_ppl method after _get_ppl or _get_long_text_ppl completes, or ensure that sessions are properly managed through the session manager's remove method when they are no longer needed.

                session = self.session_mgr.get()
                res = self._get_long_text_ppl(session, input_ids=_input_ids, max_input_len=max_input_len)
                result.append(res)
            else:
                _input_ids = [input_ids[indices[i]] for i in range(start, end)]
                sessions = [self.session_mgr.get() for _ in range(start, end)]
                res = self._get_ppl(
                    sessions=sessions,
                    input_ids=_input_ids,
                    max_input_len=max_input_len,
                )
                result.extend(res)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

steps: List[int] | None = None,
sessions: List['Session'] | None = None,
sequence_start: bool = True,
sequence_end: bool = True) -> List[torch.Tensor]:
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The async_get_logits method signature was changed to accept a sessions parameter, but the method lacks documentation (docstring) explaining this parameter and its expected behavior. This makes it unclear to users:

  • When they should pass sessions vs letting the method create them internally
  • Who is responsible for session cleanup when sessions are passed in
  • The relationship between sessions and input_ids

Consider adding a comprehensive docstring that documents all parameters, return values, and session management responsibilities.

Suggested change
sequence_end: bool = True) -> List[torch.Tensor]:
sequence_end: bool = True) -> List[torch.Tensor]:
"""Compute logits for one or more input sequences asynchronously.
This is a low-level helper used, for example, by :meth:`async_get_reward_score`
to obtain per-token logits without performing full text generation.
Args:
input_ids (List[List[int]]): A list of token id sequences. Each inner
list represents a single input sequence. The length of this list
determines the number of parallel requests issued by this method.
sessions (List[Session] | None, optional): Optional list of pre-created
:class:`Session` objects to use for each input sequence. When
provided, ``len(sessions)`` must equal ``len(input_ids)`` and
each session at index ``i`` is used exclusively for
``input_ids[i]``.
- If ``sessions`` is ``None`` (default), the method acquires
sessions internally via ``self.session_mgr.get()``. The
lifecycle of these sessions (reuse, pooling, and cleanup) is
managed outside this method by the engine/session manager.
- If a list of sessions is supplied, the caller retains
ownership of those sessions. This method will not close, end,
or free any session; callers are responsible for any required
cleanup (for example via ``SessionManager`` or
:meth:`free_cache`) and for deciding whether to reuse the
sessions for subsequent calls.
sequence_start (bool, optional): Whether the provided ``input_ids``
should be treated as the start of a new sequence for each
session. This flag is forwarded to :meth:`safe_run` unchanged.
sequence_end (bool, optional): Whether this call marks the end of
the sequence for each session. This flag is forwarded to
:meth:`safe_run` unchanged.
Returns:
List[torch.Tensor]: A list of tensors containing the logits for each
input sequence. The list is index-aligned with ``input_ids`` such
that element ``i`` corresponds to ``input_ids[i]``. For each
sequence, the returned tensor has shape ``(L_i, V)`` where ``L_i``
is the length of ``input_ids[i]`` and ``V`` is the vocabulary size.
Only logits corresponding to the provided input tokens are
returned; any generated tokens (if produced internally) are not
included.
"""

Copilot uses AI. Check for mistakes.
@zhulinJulia24
Copy link
Copy Markdown
Collaborator

base evaluate testcase passed!

@lvhan028 lvhan028 merged commit 13e277f into InternLM:main Feb 3, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants