Skip to content

[KAFKA-18442] Remove broken upgrade-downgrade-upgrade path.#18424

Closed
josefk31 wants to merge 1 commit intoapache:trunkfrom
josefk31:jhooper/nobrokenupgradedowngrade
Closed

[KAFKA-18442] Remove broken upgrade-downgrade-upgrade path.#18424
josefk31 wants to merge 1 commit intoapache:trunkfrom
josefk31:jhooper/nobrokenupgradedowngrade

Conversation

@josefk31
Copy link
Copy Markdown
Contributor

@josefk31 josefk31 commented Jan 7, 2025

Fixes broken system tests.

This has likely been broken for at least a year but may have only been detected now when new tests where added.

The test does the following:

  1. starts kafka in older version of kafka
  2. upgrades kafka to newest (stopping, starting)
  3. downgrades to old version again (stopping, starting)

All while sending off verifiable producer/consumer to make sure all messages are read.

The test suite only fails for kafka-3.3.2 at the starting phase of step 3. According to logs, the test suite fails because of:

kafka.common.InconsistentBrokerMetadataException: BrokerMetadata is not consistent across log.dirs. This could happen if multiple brokers shared a log directory (log.dirs) or partial data was manually copied from another broker. Found:
- /mnt/kafka/kafka-metadata-logs -> {node.id=1, directory.id=ItAoMTrsidYVfoRnX3gsAA, version=1, cluster.id=I2eXt9rvSnyhct8BYmW6-w}
- /mnt/kafka/kafka-data-logs-2 -> {node.id=1, directory.id=MiQDnIX6WuYL0NdMaLOsRQ, version=1, cluster.id=I2eXt9rvSnyhct8BYmW6-w}
- /mnt/kafka/kafka-data-logs-1 -> {node.id=1, directory.id=F1m5lsdOIsGtTpTYT0Ao9g, version=1, cluster.id=I2eXt9rvSnyhct8BYmW6-w}

	at kafka.server.BrokerMetadataCheckpoint$.getBrokerMetadataAndOfflineDirs(BrokerMetadataCheckpoint.scala:194)
	at kafka.server.KafkaRaftServer$.initializeLogDirs(KafkaRaftServer.scala:184)
	at kafka.server.KafkaRaftServer.<init>(KafkaRaftServer.scala:61)
	at kafka.Kafka$.buildServer(Kafka.scala:79)
	at kafka.Kafka$.main(Kafka.scala:87)
	at kafka.Kafka.main(Kafka.scala)

This is only broken in 3.3.2 version of Kafka BrokerMetatadataCheckpoint.scala. In 3.3.2 kafka loads information about metadata directories via metadata.properties files and expects that the properties are duplicated for all log directories. We crash with a fatal error if they are non-duplicate which at that time would mean that another instance of kafka was using the same log directories.

However, at some point before #14628 we dropped the requirement that the metadata.properties files is duplicate in each directory since they will now contain a non-unique directory.id field for each dir. This has no effect on versions of kafka running kraft mode, greater than 3.4.x (they care only about uniqueness of node.id and cluster.id) but does affect 3.3.2 since it expects every metadata.properties file to be the same or else.

If we assume that this has been broken for a while and there is not a forward compat requirement, I propose removing this specific test version.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@github-actions github-actions bot added triage PRs from the community tests Test fixes (including flaky tests) small Small PRs labels Jan 7, 2025
@cmccabe
Copy link
Copy Markdown
Contributor

cmccabe commented Jan 7, 2025

Apache Kafka 3.3 is no longer supported

Does this affect all versions of kafka earlier than 3.4?

We should have a JIRA pointing out the software versions that we can't downgrade to, not just a "MINOR" PR

@josefk31 josefk31 changed the title [MINOR] Remove broken upgrade-downgrade-upgrade path. [KAFKA-18442] Remove broken upgrade-downgrade-upgrade path. Jan 8, 2025
@josefk31
Copy link
Copy Markdown
Contributor Author

josefk31 commented Jan 8, 2025

Apache Kafka 3.3 is no longer supported

Does this affect all versions of kafka earlier than 3.4?

We should have a JIRA pointing out the software versions that we can't downgrade to, not just a "MINOR" PR

  1. This specific issue will appear in from 2.8.x up to and including 3.3.x . The first Kafka release with the check was 2.8.

  2. Created KAFKA-18442 to refer to this issue which contains details.

@github-actions
Copy link
Copy Markdown

A label of 'needs-attention' was automatically added to this PR in order to raise the
attention of the committers. Once this issue has been triaged, the triage label
should be removed to prevent this automation from happening again.

@TaiJuWu
Copy link
Copy Markdown
Collaborator

TaiJuWu commented Jan 16, 2025

This was resolved by #18386

@github-actions
Copy link
Copy Markdown

A label of 'needs-attention' was automatically added to this PR in order to raise the
attention of the committers. Once this issue has been triaged, the triage label
should be removed to prevent this automation from happening again.

@github-actions
Copy link
Copy Markdown

A label of 'needs-attention' was automatically added to this PR in order to raise the
attention of the committers. Once this issue has been triaged, the triage label
should be removed to prevent this automation from happening again.

@josefk31
Copy link
Copy Markdown
Contributor Author

closing since #18386 resolves this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-attention small Small PRs tests Test fixes (including flaky tests) triage PRs from the community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants