Skip to content

[FLINK-40003][runtime] Fix IOMetrics not visible to state transition listeners#28555

Open
cjohnson-confluent wants to merge 1 commit into
apache:masterfrom
confluentinc:FLINK-40003-iometrics-state-listeners
Open

[FLINK-40003][runtime] Fix IOMetrics not visible to state transition listeners#28555
cjohnson-confluent wants to merge 1 commit into
apache:masterfrom
confluentinc:FLINK-40003-iometrics-state-listeners

Conversation

@cjohnson-confluent

Copy link
Copy Markdown

What is the purpose of the change

ExecutionStateUpdateListeners registered on the ExecutionGraph are notified
inline during Execution.transitionState(). When a listener reads
execution.getIOMetrics() during a terminal state notification, it always
gets null because updateAccumulatorsAndMetrics() is called after
transitionState() in both markFinished() and processFail().

completeCancelling() already has the correct ordering -- metrics are stored
before the state transition. This PR aligns the other two methods.

Brief change log

  • Execution.markFinished(): move updateAccumulatorsAndMetrics() before
    transitionState(current, FINISHED)
  • Execution.processFail(): move updateAccumulatorsAndMetrics() before
    transitionState(current, FAILED)

Verifying this change

New test testIOMetricsVisibleToListenersDuringStateTransition in
DefaultExecutionGraphDeploymentTest registers an ExecutionStateUpdateListener
and asserts that getIOMetrics() is non-null (and correct) at notification time
for both FINISHED and FAILED transitions.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no

…listeners

Execution.markFinished() and processFail() called transitionState()
before updateAccumulatorsAndMetrics(), so listeners notified inline
during the transition always saw null from getIOMetrics().
completeCancelling() already had the correct ordering.

Move updateAccumulatorsAndMetrics() before transitionState() in both
methods to match.

Co-Authored-By: Claude <noreply@anthropic.com>
@flinkbot

flinkbot commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@spuru9 spuru9 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appears to be some issue with missing variables in the CI, can you check and fix those.

@github-actions github-actions Bot added the community-reviewed PR has been reviewed by the community. label Jun 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-reviewed PR has been reviewed by the community.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants