Skip to content

KAFKA-16977: Reapply dynamic remote configs after broker restart#16353

Merged
satishd merged 1 commit intoapache:trunkfrom
kamalcph:KAFKA-15265
Jun 18, 2024
Merged

KAFKA-16977: Reapply dynamic remote configs after broker restart#16353
satishd merged 1 commit intoapache:trunkfrom
kamalcph:KAFKA-15265

Conversation

@kamalcph
Copy link
Copy Markdown
Contributor

@kamalcph kamalcph commented Jun 15, 2024

The below remote log configs can be configured dynamically:

  1. remote.log.manager.copy.max.bytes.per.second
  2. remote.log.manager.fetch.max.bytes.per.second and
  3. remote.log.index.file.cache.total.size.bytes

If those values are dynamically configured, after the broker restart, then it loads the static value from the config file instead of the dynamic value. Note that the issue happens only when running the server with ZooKeeper.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

The below remote log configs can be configured dynamically:

1. remote.log.manager.copy.max.bytes.per.second
2. remote.log.manager.fetch.max.bytes.per.second and
3. remote.log.index.file.cache.total.size.bytes

If those values are dynamically configured, then during the broker restart, it load the static value from the config file instead of the dynamic values.
@kamalcph kamalcph requested review from chia7712, satishd and showuon June 15, 2024 09:16
@kamalcph kamalcph added tiered-storage Related to the Tiered Storage feature backport-candidate This pull request is a candidate to be backported to previous versions labels Jun 15, 2024
Copy link
Copy Markdown
Member

@chia7712 chia7712 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kamalcph thanks for this patch. Please take a look at following questions.


indexCache = new RemoteIndexCache(rlmConfig.remoteLogIndexFileCacheTotalSizeBytes(), remoteLogStorageManager, logDir);
RemoteLogManagerConfig rlmConfig = config.remoteLogManagerConfig();
indexCache = new RemoteIndexCache(config.remoteLogIndexFileCacheTotalSizeBytes(), remoteLogStorageManager, logDir);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure whether I have caught the context, so please feel free to correct me.

  1. IIRC, the dynamical configs are loaded by another thread, and hence we may NOT see the latest configs, which were updated dynamically, in creating RemoteLogManager, right?
  2. those configs (remote.log.manager.copy.max.bytes.per.second, remote.log.manager.fetch.max.bytes.per.second) can be updated by reconfigure process, so it should be fine to initialize them with "stale" (static) configs after broker restart, right?

Copy link
Copy Markdown
Contributor Author

@kamalcph kamalcph Jun 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. IIRC, the dynamical configs are loaded by another thread, and hence we may NOT see the latest configs, which were updated dynamically, in creating RemoteLogManager, right?

No, KafkaServer/BrokerServer does config.dynamicConfig.initialize before creating the RemoteLogManager instance so the dynamic configs gets updated in the KafkaConfig object but not in the KafkaConfig.remoteLogManagerConfig().

I have tested the patch only with ZooKeeper. I think the behavior should be similar for KRaftMetadataCache/ConfigRepository.

  1. those configs (remote.log.manager.copy.max.bytes.per.second, remote.log.manager.fetch.max.bytes.per.second) can be updated by reconfigure process, so it should be fine to initialize them with "stale" (static) configs after broker restart, right?

This is correct, the reconfigure updates the dynamic value but we are referring to the static value as explained above.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested the patch only with ZooKeeper. I think the behavior should be similar for KRaftMetadataCache/ConfigRepository.

That is a good point, and maybe they do have something difference.

zkClientOpt.foreach { zkClient =>

in kraft, zkClientOpt is none so it does not update it with dynamical parts.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I confirmed in KRaft, it won't have this issue.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I confirmed in KRaft, it won't have this issue.

Sorry that I'm not sure which issue you confirmed. If we are taking about dynamic configs in starting. According to above comments, it seems to me this fix which tries to return latest (dynamic) configs works well only if kafka is in zk. In kraft, this fix is no-op as it still return static configs.

Please correct me If I'm lost

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chia7712 , yes, you're right! To KRaft, this fix is no-op.

LOGGER.debug("Received leadership changes for leaders: {} and followers: {}", partitionsBecomeLeader, partitionsBecomeFollower);

if (this.rlmConfig.isRemoteStorageSystemEnabled() && !isRemoteLogManagerConfigured()) {
if (config.remoteLogManagerConfig().isRemoteStorageSystemEnabled() && !isRemoteLogManagerConfigured()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unrelated to this PR, but config.remoteLogManagerConfig always return the same config even though the config is updated dynamically. That seems to be error-prone, since it means the "updated" config can return "remoteLogManagerConfig" with "previous" configs.

Copy link
Copy Markdown
Contributor Author

@kamalcph kamalcph Jun 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is the main issue. We access the remote configurations using KafkaConfig.remoteLogManagerConfig.xyz() but it reflects to the value in the static server.properties file not the dynamically updated one.

So, changed the usages to KafkaConfig.xyz() to get the dynamically updated value.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we have great RemoteLogManagerConfig for remote storage, could we avoid moving configs out of RemoteLogManagerConfig? Especially, we are trying to reduce the size of KafkaConfigs.

If this PR aims to fix it for zk mode, maybe we can make sure the remoteLogManagerConfig returned by config always has the latest configs. For example:

  1. config.remoteLogManagerConfig always create new remoteLogManagerConfig based on currentConfig
  2. KafkaConfig#updateCurrentConfig should update inner _remoteLogManagerConfig also

@kamalcph WDYT?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we take this refactoring as part of KAFKA-16976 ticket?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chia7712 Good point. A similar issue is raised in the comment. We can fix the bug with the current approach in 3.8 and finish cleaning up the raised issues in a followup as part of KAFKA-16976

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kamalcph @satishd sounds good to me :)

Copy link
Copy Markdown
Member

@satishd satishd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kamalcph for finding this issue and the offline discussion on the changes.

The root cause here is that the RemoteLogManagerConfig instance is already intialized with the earlier values and it is not getting updated when server is initialized with KafkaConfig.updateCurrentConfig is invoked.

We should avoid passing KafkaConfig to RemoteLogManagerConfig and we need to make sure whenever KafkaConfig gets updated RemoteLogManagerConfig gets updated. With the current changes, it is happening in two different ways that we should avoid.

We should address it with the below behavior

  • Update RemoteLogManagerConfig with the respective dynamic configs when a broker is initialized. In this case, the respective configs in RemoteLogManagerConfig should get updated and there won't be any dangling old config references.
  • Update the respective runtime components(RLM) based on the modified dynamic configs. This can continue happen as part of DynamicRemoteLogConfig.

@chia7712
Copy link
Copy Markdown
Member

Update RemoteLogManagerConfig with the respective dynamic configs when a broker is initialized. In this case, the respective configs in RemoteLogManagerConfig should get updated and there won't be any dangling old config references.

there was a discussion about "can we see the (latest) dynamic configs when a broker is initialized" (#16353 (comment)). In kraft, we create remoteLogManager[0] before updating dynamic configs [1]. Hence, it seems it is possible that remoteLogManager still see the static configs. This behavior is different to zk mode.

Those remoteLogManager configs [3] can be updated later when publishing metadata (latest configs), so that is possibly be fine (?). Fixing the startup order seems be a big issue, and maybe it is hard to guarantee "all" components can see latest configs in kraft.

[0] https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/BrokerServer.scala#L208
[1] https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/BrokerServer.scala#L499
[2]

if (isChangedLongValue(RemoteLogManagerConfig.REMOTE_LOG_INDEX_FILE_CACHE_TOTAL_SIZE_BYTES_PROP)) {

@kamalcph
Copy link
Copy Markdown
Contributor Author

yes, kraft config updates looks good to me. The DynamicConfigPublisher will eventually call the BrokerReconfigurable#reconfigure, then the configs gets updated on those respective components.

@kamalcph
Copy link
Copy Markdown
Contributor Author

kamalcph commented Jun 17, 2024

We should avoid passing KafkaConfig to RemoteLogManagerConfig and we need to make sure whenever KafkaConfig gets updated RemoteLogManagerConfig gets updated. With the current changes, it is happening in two different ways that we should avoid.

This require a good amount of refactoring in RemoteLogManagerConfig class. Shall we continue with this patch to backport it to 3.8 branch? Or, do we have to refactor the code and land the changes only in trunk/3.9 branch?

Copy link
Copy Markdown
Member

@showuon showuon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Could you mention that this issue only happened in ZK mode in the PR description? Thanks.

@kamalcph
Copy link
Copy Markdown
Contributor Author

LGTM! Could you mention that this issue only happened in ZK mode in the PR description? Thanks.

done.

@satishd
Copy link
Copy Markdown
Member

satishd commented Jun 17, 2024

This require a good amount of refactoring in RemoteLogManagerConfig class. Shall we continue with this patch to backport it to 3.8 branch? Or, do we have to refactor the code and land the changes only in trunk/3.9 branch?

We can make the suggested improvements as a followup in trunk. Filed a followup KAFKA-16976.

@satishd satishd changed the title KAFKA-15265: Reapply dynamic remote configs after broker restart KAFKA-16977: Reapply dynamic remote configs after broker restart Jun 17, 2024
Copy link
Copy Markdown
Member

@chia7712 chia7712 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@satishd
Copy link
Copy Markdown
Member

satishd commented Jun 17, 2024

Retriggered the CI job as one of the test runs timedout.

@satishd
Copy link
Copy Markdown
Member

satishd commented Jun 18, 2024

A few unrelated test failures, merging it to trunk for now.

@satishd satishd merged commit 8abeaf3 into apache:trunk Jun 18, 2024
@satishd
Copy link
Copy Markdown
Member

satishd commented Jun 18, 2024

@jlprat This is a bug fix that should be pushed to 3.8. wdyt?

@jlprat
Copy link
Copy Markdown
Contributor

jlprat commented Jun 18, 2024

Yes, it can be ported to 3.8. Thanks @satishd

satishd pushed a commit that referenced this pull request Jun 18, 2024
)

The below remote log configs can be configured dynamically:
1. remote.log.manager.copy.max.bytes.per.second
2. remote.log.manager.fetch.max.bytes.per.second and
3. remote.log.index.file.cache.total.size.bytes

If those values are configured dynamically, then during the broker restart, it ensures the dynamic values are loaded instead of the static values from the config.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>
@satishd
Copy link
Copy Markdown
Member

satishd commented Jun 18, 2024

Thanks @jlprat , merged to 3.8.

@kamalcph kamalcph deleted the KAFKA-15265 branch June 18, 2024 09:55
TaiJuWu pushed a commit to TaiJuWu/kafka that referenced this pull request Jul 4, 2024
…che#16353)

The below remote log configs can be configured dynamically:
1. remote.log.manager.copy.max.bytes.per.second
2. remote.log.manager.fetch.max.bytes.per.second and
3. remote.log.index.file.cache.total.size.bytes

If those values are configured dynamically, then during the broker restart, it ensures the dynamic values are loaded instead of the static values from the config.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-candidate This pull request is a candidate to be backported to previous versions tiered-storage Related to the Tiered Storage feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants