Skip to content

Fix: More accurate perf metrics for slow-starting streams#385

Merged
Aaron ("AJ") Steers (aaronsteers) merged 3 commits into
mainfrom
feat/improved-perf-metrics-for-slow-starting-streams
Sep 20, 2024
Merged

Fix: More accurate perf metrics for slow-starting streams#385
Aaron ("AJ") Steers (aaronsteers) merged 3 commits into
mainfrom
feat/improved-perf-metrics-for-slow-starting-streams

Conversation

@aaronsteers

@aaronsteers Aaron ("AJ") Steers (aaronsteers) commented Sep 20, 2024

Copy link
Copy Markdown
Member

Prior to this PR, PyAirbyte tracked the start time of a stream along with the first record received. This works well for most cases, but if a stream takes a while to send its first record, then our metrics will be skewed to look like the sync is faster than it really is.

This update uses the start trace event to get a better start time for each stream.

Summary by CodeRabbit

  • New Features

    • Improved logging functionality for stream reading, ensuring clearer and more efficient tracking of stream start and end events.
    • Enhanced handling of state messages for better clarity and robustness in data processing.
  • Bug Fixes

    • Simplified logic for logging stream statuses, reducing complexity and potential errors in state message handling.

@coderabbitai

coderabbitai Bot commented Sep 20, 2024

Copy link
Copy Markdown
Contributor
Walkthrough

Walkthrough

The pull request introduces modifications to the airbyte/progress.py file, focusing on enhancing the logging functionality for stream reading and state messages. Key changes include the removal of an unused type import, updates to the methods responsible for logging the start and end of stream reads, and a simplification of state message handling. These adjustments aim to improve clarity and maintainability in the logging process.

Changes

File Change Summary
airbyte/progress.py - Removed unused import for Type.
- Replaced _log_stream_read_start with log_stream_start, ensuring single logging per stream read.
- Simplified end logging logic by checking stream status directly.
- Updated state message handling to access message.state directly and ensure record count is robust against None.

Possibly related PRs

Suggested labels

enable-ai-review


Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 8adc080 and bb485ec.

Files selected for processing (1)
  • airbyte/progress.py (3 hunks)
Files skipped from review as they are similar to previous changes (1)
  • airbyte/progress.py

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    -- @coderabbitai generate unit testing code for this file.
    -- @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    -- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    -- @coderabbitai read src/utils.ts and generate unit testing code.
    -- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    -- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@aaronsteers

Aaron ("AJ") Steers (aaronsteers) commented Sep 20, 2024

Copy link
Copy Markdown
Member Author

/fix-pr

Auto-Fix Job Info

This job attempts to auto-fix any linting or formating issues. If any fixes are made,
those changes will be automatically committed and pushed back to the PR.
(This job requires that the PR author has "Allow edits from maintainers" enabled.)

PR auto-fix job started... Check job output.

✅ Changes applied successfully.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Outside diff range and nitpick comments (1)
airbyte/progress.py (1)

425-425: Consider splitting the long line, wdyt?

Ruff flagged line 425 for exceeding the line length limit of 100 characters. To improve readability and adhere to the coding style, how about splitting it into multiple lines like this:

self._print_info_message(
    f"Read started on stream `{stream_name}` at "
    f"`{pendulum.now().format('HH:mm:ss')}`..."
)
Tools
Ruff

425-425: Line too long (101 > 100)

(E501)

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between f4a3170 and 8adc080.

Files selected for processing (1)
  • airbyte/progress.py (3 hunks)
Additional context used
Ruff
airbyte/progress.py

425-425: Line too long (101 > 100)

(E501)

Additional comments not posted (3)
airbyte/progress.py (3)

421-427: LGTM!

The log_stream_start function looks good. It consolidates the logging functionality and ensures the start time is only recorded once per stream. Nice work!

Tools
Ruff

425-425: Line too long (101 > 100)

(E501)


Line range hint 429-432: Looks good!

The _log_stream_read_end function is straightforward and does its job of logging the stream read completion and recording the end time. No issues found.

Tools
Ruff

425-425: Line too long (101 > 100)

(E501)


279-287: Nicely done!

The changes in tally_records_read look great. The function now correctly handles the AirbyteStreamStatus.STARTED and AirbyteStreamStatus.COMPLETE trace messages, calling the appropriate logging functions. The logic is clean and easy to follow.

@aaronsteers Aaron ("AJ") Steers (aaronsteers) changed the title Feat: Improve perf metrics for slow-starting streams Fix: More accurate perf metrics for slow-starting streams Sep 20, 2024
@aaronsteers Aaron ("AJ") Steers (aaronsteers) deleted the feat/improved-perf-metrics-for-slow-starting-streams branch September 20, 2024 22:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant