Skip to content

MINOR: Add readiness check for connector and separate Kafka cluster in ExactlyOnceSourceIntegrationTest::testSeparateOffsetsTopic#16306

Merged
C0urante merged 1 commit intoapache:trunkfrom
C0urante:fix-flaky-testSeparateOffsetsTopic
Jun 13, 2024
Merged

MINOR: Add readiness check for connector and separate Kafka cluster in ExactlyOnceSourceIntegrationTest::testSeparateOffsetsTopic#16306
C0urante merged 1 commit intoapache:trunkfrom
C0urante:fix-flaky-testSeparateOffsetsTopic

Conversation

@C0urante
Copy link
Copy Markdown
Contributor

@C0urante C0urante commented Jun 12, 2024

Similar to #16286.

This test is pretty flaky and has failed on 7% of all trunk builds in the last 90 days (see Gradle Enterprise).

Part of this test includes bringing up a separate Kafka cluster that is targeted by a source connector. We do not currently wait on the successful startup of that Kafka cluster before starting that connector, and we do not wait on the successful startup of the connector and its tasks before waiting for the connector to produce records within a bounded timeout.

By adding assertions that the separate Kafka cluster and the connector+tasks are healthy before waiting for the connector to produce records, we accomplish two things:

  • We reduce the chance of flaky failures by allowing more time to pass for more resource-intensive operations to complete (5 minutes for Kafka cluster startup and 2 minutes for connector+tasks startup, vs. 30 seconds for record production)
  • We also provide more granularity into possible causes of failure; if the separate Kafka cluster or the connector+tasks fail to start, tests should report that failure directly, instead of simply reporting that not enough records were produced in time

Although there is a decent chance that this change will reduce flakiness for the affected test, the second benefit (more informative failure messages) is IMO significant enough that a close examination of logs for failed builds, multiple CI runs with this change, or other time-consuming efforts are not warranted.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@C0urante C0urante added connect tests Test fixes (including flaky tests) labels Jun 12, 2024
@C0urante C0urante force-pushed the fix-flaky-testSeparateOffsetsTopic branch 2 times, most recently from 083b1ea to 2bf4407 Compare June 12, 2024 17:24
…n ExactlyOnceSourceIntegrationTest::testSeparateOffsetsTopic
@C0urante C0urante force-pushed the fix-flaky-testSeparateOffsetsTopic branch from 2bf4407 to d774fe2 Compare June 12, 2024 17:25
Copy link
Copy Markdown
Contributor

@gharris1727 gharris1727 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Chris!

@C0urante C0urante merged commit 9ddd58b into apache:trunk Jun 13, 2024
@C0urante C0urante deleted the fix-flaky-testSeparateOffsetsTopic branch June 13, 2024 03:43
C0urante added a commit that referenced this pull request Jun 13, 2024
…n ExactlyOnceSourceIntegrationTest::testSeparateOffsetsTopic (#16306)

Reviewers: Greg Harris <gharris1727@gmail.com>
apourchet added a commit to apourchet/kafka that referenced this pull request Jun 13, 2024
commit f380cd1
Author: Edoardo Comar <ecomar@uk.ibm.com>
Date:   Thu Jun 13 15:01:08 2024 +0100

    MINOR: Add integration tag to AdminFenceProducersIntegrationTest (apache#16326)

    Add @tag("integration") to AdminFenceProducersIntegrationTest

    Reviewers: Chris Egerton <chrise@aiven.io>

commit 11c85a9
Author: Dongnuo Lyu <139248811+dongnuo123@users.noreply.github.com>
Date:   Thu Jun 13 05:11:01 2024 -0400

    MINOR: Make online downgrade failure logs less noisy and update the timeouts scheduled in `convertToConsumerGroup` (apache#16290)

    This patch:
    - changes the order of the checks in `validateOnlineDowngrade`, so that only when the last member using the consumer protocol leave and the group still has classic member(s), `online downgrade is disabled` is logged if the policy doesn't allow downgrade.
    - changes the session timeout in `convertToConsumerGroup` from `consumerGroupSessionTimeoutMs` to `member.classicProtocolSessionTimeout().get()`.

    Reviewers: David Jacot <djacot@confluent.io>

commit ea60666
Author: Ken Huang <100591800+m1a2st@users.noreply.github.com>
Date:   Thu Jun 13 17:11:37 2024 +0900

    KAFKA-16921 [1/N] Migrate all junit 4 code to junit 5 for connect module (apache#16253)

    Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

commit 596b945
Author: gongxuanzhang <gongxuanzhang@foxmail.com>
Date:   Thu Jun 13 15:39:32 2024 +0800

    KAFKA-16643 Add ModifierOrder checkstyle rule (apache#15890)

    Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

commit 103ff5c
Author: Antoine Pourchet <antoine@responsive.dev>
Date:   Thu Jun 13 01:32:39 2024 -0600

    KAFKA-15045: (KIP-924 pt. 24) internal TaskAssignor rename to LegacyTaskAssignor (apache#16318)

    Since the new public API for TaskAssignor shared a name, this rename will prevent users from confusing the internal definition with the public one.

    Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>

commit e59c887
Author: brenden20 <118419078+brenden20@users.noreply.github.com>
Date:   Thu Jun 13 02:30:05 2024 -0500

    KAFKA-16557 Implemented OffsetFetchRequestState toStringBase and added a test for it (apache#16291)

    Reviewers: Lianet Magrans <lianetmr@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>

commit dd6fcc6
Author: TingIāu "Ting" Kì <kitingiao@gmail.com>
Date:   Thu Jun 13 14:35:33 2024 +0800

    KAFKA-16901 Add unit tests for ConsumerRecords#records(String) (apache#16227)

    Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

commit fe98888
Author: Lianet Magrans <98415067+lianetm@users.noreply.github.com>
Date:   Thu Jun 13 08:31:16 2024 +0200

    MINOR: Improving log for outstanding requests on close and cleanup (apache#16304)

    Reviewers: Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>

commit 9ddd58b
Author: Chris Egerton <chrise@aiven.io>
Date:   Thu Jun 13 05:43:33 2024 +0200

    MINOR: Add readiness check for connector and separate Kafka cluster in ExactlyOnceSourceIntegrationTest::testSeparateOffsetsTopic (apache#16306)

    Reviewers: Greg Harris <gharris1727@gmail.com>

commit 0a203a9
Author: TingIāu "Ting" Kì <kitingiao@gmail.com>
Date:   Thu Jun 13 09:47:51 2024 +0800

    KAFKA-16938 non-dynamic props gets corrupted due to circular reference between DynamicBrokerConfig and DynamicConfig. (apache#16302)

    Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

commit 6d1f8f8
Author: Gantigmaa Selenge <39860586+tinaselenge@users.noreply.github.com>
Date:   Thu Jun 13 02:42:39 2024 +0100

    MINOR: Clean up for KafkaAdminClientTest (apache#16285)

    Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

commit e76e1da
Author: Chris Egerton <chrise@aiven.io>
Date:   Thu Jun 13 02:18:23 2024 +0200

    KAFKA-16935: Automatically wait for cluster startup in embedded Connect integration tests (apache#16288)

    Reviewers: Greg Harris <gharris1727@gmail.com>
apourchet added a commit to apourchet/kafka that referenced this pull request Jun 13, 2024
commit 4333af5
Author: A. Sophie Blee-Goldman <ableegoldman@gmail.com>
Date:   Thu Jun 13 11:27:50 2024 -0700

    KAFKA-15045: (KIP-924 pt. 25) Rename old internal StickyTaskAssignor to LegacyStickyTaskAssignor (apache#16322)

    To avoid confusion in 3.8/until we fully remove all the old task assignors and internal config, we should rename the old internal assignor classes like the StickyTaskAssignor so that they won't be mixed up with the new version of the assignor (which is also named StickyTaskAssignor)

    Reviewers: Bruno Cadonna <cadonna@apache.org>, Josep Prat <josep.prat@aiven.io>

commit f380cd1
Author: Edoardo Comar <ecomar@uk.ibm.com>
Date:   Thu Jun 13 15:01:08 2024 +0100

    MINOR: Add integration tag to AdminFenceProducersIntegrationTest (apache#16326)

    Add @tag("integration") to AdminFenceProducersIntegrationTest

    Reviewers: Chris Egerton <chrise@aiven.io>

commit 11c85a9
Author: Dongnuo Lyu <139248811+dongnuo123@users.noreply.github.com>
Date:   Thu Jun 13 05:11:01 2024 -0400

    MINOR: Make online downgrade failure logs less noisy and update the timeouts scheduled in `convertToConsumerGroup` (apache#16290)

    This patch:
    - changes the order of the checks in `validateOnlineDowngrade`, so that only when the last member using the consumer protocol leave and the group still has classic member(s), `online downgrade is disabled` is logged if the policy doesn't allow downgrade.
    - changes the session timeout in `convertToConsumerGroup` from `consumerGroupSessionTimeoutMs` to `member.classicProtocolSessionTimeout().get()`.

    Reviewers: David Jacot <djacot@confluent.io>

commit ea60666
Author: Ken Huang <100591800+m1a2st@users.noreply.github.com>
Date:   Thu Jun 13 17:11:37 2024 +0900

    KAFKA-16921 [1/N] Migrate all junit 4 code to junit 5 for connect module (apache#16253)

    Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

commit 596b945
Author: gongxuanzhang <gongxuanzhang@foxmail.com>
Date:   Thu Jun 13 15:39:32 2024 +0800

    KAFKA-16643 Add ModifierOrder checkstyle rule (apache#15890)

    Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

commit 103ff5c
Author: Antoine Pourchet <antoine@responsive.dev>
Date:   Thu Jun 13 01:32:39 2024 -0600

    KAFKA-15045: (KIP-924 pt. 24) internal TaskAssignor rename to LegacyTaskAssignor (apache#16318)

    Since the new public API for TaskAssignor shared a name, this rename will prevent users from confusing the internal definition with the public one.

    Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>

commit e59c887
Author: brenden20 <118419078+brenden20@users.noreply.github.com>
Date:   Thu Jun 13 02:30:05 2024 -0500

    KAFKA-16557 Implemented OffsetFetchRequestState toStringBase and added a test for it (apache#16291)

    Reviewers: Lianet Magrans <lianetmr@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>

commit dd6fcc6
Author: TingIāu "Ting" Kì <kitingiao@gmail.com>
Date:   Thu Jun 13 14:35:33 2024 +0800

    KAFKA-16901 Add unit tests for ConsumerRecords#records(String) (apache#16227)

    Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

commit fe98888
Author: Lianet Magrans <98415067+lianetm@users.noreply.github.com>
Date:   Thu Jun 13 08:31:16 2024 +0200

    MINOR: Improving log for outstanding requests on close and cleanup (apache#16304)

    Reviewers: Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>

commit 9ddd58b
Author: Chris Egerton <chrise@aiven.io>
Date:   Thu Jun 13 05:43:33 2024 +0200

    MINOR: Add readiness check for connector and separate Kafka cluster in ExactlyOnceSourceIntegrationTest::testSeparateOffsetsTopic (apache#16306)

    Reviewers: Greg Harris <gharris1727@gmail.com>

commit 0a203a9
Author: TingIāu "Ting" Kì <kitingiao@gmail.com>
Date:   Thu Jun 13 09:47:51 2024 +0800

    KAFKA-16938 non-dynamic props gets corrupted due to circular reference between DynamicBrokerConfig and DynamicConfig. (apache#16302)

    Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

commit 6d1f8f8
Author: Gantigmaa Selenge <39860586+tinaselenge@users.noreply.github.com>
Date:   Thu Jun 13 02:42:39 2024 +0100

    MINOR: Clean up for KafkaAdminClientTest (apache#16285)

    Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

commit e76e1da
Author: Chris Egerton <chrise@aiven.io>
Date:   Thu Jun 13 02:18:23 2024 +0200

    KAFKA-16935: Automatically wait for cluster startup in embedded Connect integration tests (apache#16288)

    Reviewers: Greg Harris <gharris1727@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

connect tests Test fixes (including flaky tests)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants