Skip to content

KAFKA-16935: Automatically wait for cluster startup in embedded Connect integration tests#16288

Merged
C0urante merged 3 commits intoapache:trunkfrom
C0urante:kafka-16935
Jun 13, 2024
Merged

KAFKA-16935: Automatically wait for cluster startup in embedded Connect integration tests#16288
C0urante merged 3 commits intoapache:trunkfrom
C0urante:kafka-16935

Conversation

@C0urante
Copy link
Copy Markdown
Contributor

@C0urante C0urante commented Jun 11, 2024

Jira

Alters the EmbeddedConnect::start method to automatically wait for the successful startup of both the embedded Kafka and Connect clusters.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

final Collection<String> topics = Arrays.asList(topic1, topic2, topic3);

Map<String, String> connectorProps = baseSinkConnectorProps(String.join(",", topics));
// Need an eager assignor here; round robin is as good as any
Copy link
Copy Markdown
Contributor Author

@C0urante C0urante Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary for this PR. Just cleaning up (this comment is inaccurate) while I'm in the neighborhood.

Comment on lines -123 to -125
TestUtils.waitForCondition(
() -> connect.requestPost(connectorTasksEndpoint, "[]", invalidSignatureHeaders).getStatus() == FORBIDDEN.getStatusCode(),
30000L, "Timed out waiting for workers to start");
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be no need for retry logic in this part.

@@ -169,7 +164,6 @@ public void testFailToStartWhenInternalTopicsAreNotCompacted() throws Interrupte
// Start the brokers but not Connect
log.info("Starting {} Kafka brokers, but no Connect workers yet", numBrokers);
connect.start();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting, this sets numWorkers=0, and therefore can call the blocking start() method safely.

WDYT about changing testFailToCreateInternalTopicsWithMoreReplicasThanBrokers to use the same pattern? That would eliminate the need for the non-blocking start method, and simplify the control flow.

The minority of call-sites are going to expect the workers to fail to start up, so I think it's okay to use a workaround instead of giving them a first-class method.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to rework testFailToCreateInternalTopicsWithMoreReplicasThanBrokers, but just in case someone downstream is using start in a way that relies on existing behavior, I'd also like to keep in the non-blocking method, even if we don't use it in any of our tests. I know we're not under any obligation to but it's such a small change that the negligible maintenance burden seems worth the tradeoff. Does that work for you?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For downstream users intentionally giving the embedded connect bad configs, they will need to alter the test anyway, so I'd prefer that they follow the pattern used by upstream.

I don't think that anyone should really be relying on this behavior, otherwise I would make an argument for start() remaining non-blocking (which i definitely don't want).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great point. I've removed the overloaded variant and tweaked the only remaining integration test that uses it. That test itself is actually subtly broken since it doesn't (and never did) synchronously wait for worker startup before checking that internal topics don't exist, but since this PR doesn't introduce a regression I've thrown a TODO into the codebase and left everything else for a follow-up.

Copy link
Copy Markdown
Contributor

@gharris1727 gharris1727 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Chris!

Could you create a newbie ticket for fixing the wait condition InternalTopicsIntegrationTest#testFailToCreateInternalTopicsWithMoreReplicasThanBrokers?

@C0urante
Copy link
Copy Markdown
Contributor Author

Done! Filed KAFKA-16943.

@C0urante C0urante merged commit e76e1da into apache:trunk Jun 13, 2024
@C0urante C0urante deleted the kafka-16935 branch June 13, 2024 00:18
C0urante added a commit that referenced this pull request Jun 13, 2024
…ct integration tests (#16288)

Reviewers: Greg Harris <gharris1727@gmail.com>
apourchet added a commit to apourchet/kafka that referenced this pull request Jun 13, 2024
commit f380cd1
Author: Edoardo Comar <ecomar@uk.ibm.com>
Date:   Thu Jun 13 15:01:08 2024 +0100

    MINOR: Add integration tag to AdminFenceProducersIntegrationTest (apache#16326)

    Add @tag("integration") to AdminFenceProducersIntegrationTest

    Reviewers: Chris Egerton <chrise@aiven.io>

commit 11c85a9
Author: Dongnuo Lyu <139248811+dongnuo123@users.noreply.github.com>
Date:   Thu Jun 13 05:11:01 2024 -0400

    MINOR: Make online downgrade failure logs less noisy and update the timeouts scheduled in `convertToConsumerGroup` (apache#16290)

    This patch:
    - changes the order of the checks in `validateOnlineDowngrade`, so that only when the last member using the consumer protocol leave and the group still has classic member(s), `online downgrade is disabled` is logged if the policy doesn't allow downgrade.
    - changes the session timeout in `convertToConsumerGroup` from `consumerGroupSessionTimeoutMs` to `member.classicProtocolSessionTimeout().get()`.

    Reviewers: David Jacot <djacot@confluent.io>

commit ea60666
Author: Ken Huang <100591800+m1a2st@users.noreply.github.com>
Date:   Thu Jun 13 17:11:37 2024 +0900

    KAFKA-16921 [1/N] Migrate all junit 4 code to junit 5 for connect module (apache#16253)

    Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

commit 596b945
Author: gongxuanzhang <gongxuanzhang@foxmail.com>
Date:   Thu Jun 13 15:39:32 2024 +0800

    KAFKA-16643 Add ModifierOrder checkstyle rule (apache#15890)

    Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

commit 103ff5c
Author: Antoine Pourchet <antoine@responsive.dev>
Date:   Thu Jun 13 01:32:39 2024 -0600

    KAFKA-15045: (KIP-924 pt. 24) internal TaskAssignor rename to LegacyTaskAssignor (apache#16318)

    Since the new public API for TaskAssignor shared a name, this rename will prevent users from confusing the internal definition with the public one.

    Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>

commit e59c887
Author: brenden20 <118419078+brenden20@users.noreply.github.com>
Date:   Thu Jun 13 02:30:05 2024 -0500

    KAFKA-16557 Implemented OffsetFetchRequestState toStringBase and added a test for it (apache#16291)

    Reviewers: Lianet Magrans <lianetmr@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>

commit dd6fcc6
Author: TingIāu "Ting" Kì <kitingiao@gmail.com>
Date:   Thu Jun 13 14:35:33 2024 +0800

    KAFKA-16901 Add unit tests for ConsumerRecords#records(String) (apache#16227)

    Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

commit fe98888
Author: Lianet Magrans <98415067+lianetm@users.noreply.github.com>
Date:   Thu Jun 13 08:31:16 2024 +0200

    MINOR: Improving log for outstanding requests on close and cleanup (apache#16304)

    Reviewers: Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>

commit 9ddd58b
Author: Chris Egerton <chrise@aiven.io>
Date:   Thu Jun 13 05:43:33 2024 +0200

    MINOR: Add readiness check for connector and separate Kafka cluster in ExactlyOnceSourceIntegrationTest::testSeparateOffsetsTopic (apache#16306)

    Reviewers: Greg Harris <gharris1727@gmail.com>

commit 0a203a9
Author: TingIāu "Ting" Kì <kitingiao@gmail.com>
Date:   Thu Jun 13 09:47:51 2024 +0800

    KAFKA-16938 non-dynamic props gets corrupted due to circular reference between DynamicBrokerConfig and DynamicConfig. (apache#16302)

    Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

commit 6d1f8f8
Author: Gantigmaa Selenge <39860586+tinaselenge@users.noreply.github.com>
Date:   Thu Jun 13 02:42:39 2024 +0100

    MINOR: Clean up for KafkaAdminClientTest (apache#16285)

    Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

commit e76e1da
Author: Chris Egerton <chrise@aiven.io>
Date:   Thu Jun 13 02:18:23 2024 +0200

    KAFKA-16935: Automatically wait for cluster startup in embedded Connect integration tests (apache#16288)

    Reviewers: Greg Harris <gharris1727@gmail.com>
apourchet added a commit to apourchet/kafka that referenced this pull request Jun 13, 2024
commit 4333af5
Author: A. Sophie Blee-Goldman <ableegoldman@gmail.com>
Date:   Thu Jun 13 11:27:50 2024 -0700

    KAFKA-15045: (KIP-924 pt. 25) Rename old internal StickyTaskAssignor to LegacyStickyTaskAssignor (apache#16322)

    To avoid confusion in 3.8/until we fully remove all the old task assignors and internal config, we should rename the old internal assignor classes like the StickyTaskAssignor so that they won't be mixed up with the new version of the assignor (which is also named StickyTaskAssignor)

    Reviewers: Bruno Cadonna <cadonna@apache.org>, Josep Prat <josep.prat@aiven.io>

commit f380cd1
Author: Edoardo Comar <ecomar@uk.ibm.com>
Date:   Thu Jun 13 15:01:08 2024 +0100

    MINOR: Add integration tag to AdminFenceProducersIntegrationTest (apache#16326)

    Add @tag("integration") to AdminFenceProducersIntegrationTest

    Reviewers: Chris Egerton <chrise@aiven.io>

commit 11c85a9
Author: Dongnuo Lyu <139248811+dongnuo123@users.noreply.github.com>
Date:   Thu Jun 13 05:11:01 2024 -0400

    MINOR: Make online downgrade failure logs less noisy and update the timeouts scheduled in `convertToConsumerGroup` (apache#16290)

    This patch:
    - changes the order of the checks in `validateOnlineDowngrade`, so that only when the last member using the consumer protocol leave and the group still has classic member(s), `online downgrade is disabled` is logged if the policy doesn't allow downgrade.
    - changes the session timeout in `convertToConsumerGroup` from `consumerGroupSessionTimeoutMs` to `member.classicProtocolSessionTimeout().get()`.

    Reviewers: David Jacot <djacot@confluent.io>

commit ea60666
Author: Ken Huang <100591800+m1a2st@users.noreply.github.com>
Date:   Thu Jun 13 17:11:37 2024 +0900

    KAFKA-16921 [1/N] Migrate all junit 4 code to junit 5 for connect module (apache#16253)

    Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

commit 596b945
Author: gongxuanzhang <gongxuanzhang@foxmail.com>
Date:   Thu Jun 13 15:39:32 2024 +0800

    KAFKA-16643 Add ModifierOrder checkstyle rule (apache#15890)

    Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

commit 103ff5c
Author: Antoine Pourchet <antoine@responsive.dev>
Date:   Thu Jun 13 01:32:39 2024 -0600

    KAFKA-15045: (KIP-924 pt. 24) internal TaskAssignor rename to LegacyTaskAssignor (apache#16318)

    Since the new public API for TaskAssignor shared a name, this rename will prevent users from confusing the internal definition with the public one.

    Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>

commit e59c887
Author: brenden20 <118419078+brenden20@users.noreply.github.com>
Date:   Thu Jun 13 02:30:05 2024 -0500

    KAFKA-16557 Implemented OffsetFetchRequestState toStringBase and added a test for it (apache#16291)

    Reviewers: Lianet Magrans <lianetmr@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>

commit dd6fcc6
Author: TingIāu "Ting" Kì <kitingiao@gmail.com>
Date:   Thu Jun 13 14:35:33 2024 +0800

    KAFKA-16901 Add unit tests for ConsumerRecords#records(String) (apache#16227)

    Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

commit fe98888
Author: Lianet Magrans <98415067+lianetm@users.noreply.github.com>
Date:   Thu Jun 13 08:31:16 2024 +0200

    MINOR: Improving log for outstanding requests on close and cleanup (apache#16304)

    Reviewers: Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>

commit 9ddd58b
Author: Chris Egerton <chrise@aiven.io>
Date:   Thu Jun 13 05:43:33 2024 +0200

    MINOR: Add readiness check for connector and separate Kafka cluster in ExactlyOnceSourceIntegrationTest::testSeparateOffsetsTopic (apache#16306)

    Reviewers: Greg Harris <gharris1727@gmail.com>

commit 0a203a9
Author: TingIāu "Ting" Kì <kitingiao@gmail.com>
Date:   Thu Jun 13 09:47:51 2024 +0800

    KAFKA-16938 non-dynamic props gets corrupted due to circular reference between DynamicBrokerConfig and DynamicConfig. (apache#16302)

    Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

commit 6d1f8f8
Author: Gantigmaa Selenge <39860586+tinaselenge@users.noreply.github.com>
Date:   Thu Jun 13 02:42:39 2024 +0100

    MINOR: Clean up for KafkaAdminClientTest (apache#16285)

    Reviewers: Chia-Ping Tsai <chia7712@gmail.com>

commit e76e1da
Author: Chris Egerton <chrise@aiven.io>
Date:   Thu Jun 13 02:18:23 2024 +0200

    KAFKA-16935: Automatically wait for cluster startup in embedded Connect integration tests (apache#16288)

    Reviewers: Greg Harris <gharris1727@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

connect tests Test fixes (including flaky tests)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants