Skip to content

Optimize vault from load test#22595

Open
russell-stern wants to merge 32 commits into
developfrom
optimize_vault_responses
Open

Optimize vault from load test#22595
russell-stern wants to merge 32 commits into
developfrom
optimize_vault_responses

Conversation

@russell-stern
Copy link
Copy Markdown
Contributor

No description provided.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 21, 2026

CORA - Pending Reviewers

Codeowners Entry Overall Num Files Owners
* 💬 6 @smartcontractkit/foundations, @smartcontractkit/core
/core/capabilities/ 💬 2 @smartcontractkit/keystone, @smartcontractkit/capabilities-team
/core/services/ocr*/ 💬 8 @smartcontractkit/foundations, @smartcontractkit/core
/core/services/workflows/ 💬 2 @smartcontractkit/keystone
/.github/** 💬 1 @smartcontractkit/devex-cicd, @smartcontractkit/devex-tooling, @smartcontractkit/core
go.mod 💬 6 @smartcontractkit/core, @smartcontractkit/foundations
go.sum 💬 6 @smartcontractkit/core, @smartcontractkit/foundations
integration-tests/go.mod 💬 1 @smartcontractkit/core, @smartcontractkit/devex-tooling, @smartcontractkit/foundations
integration-tests/go.sum 💬 1 @smartcontractkit/core, @smartcontractkit/devex-tooling, @smartcontractkit/foundations

Legend: ✅ Approved | ❌ Changes Requested | 💬 Commented | 🚫 Dismissed | ⏳ Pending | ❓ Unknown

For more details, see the full review summary.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 21, 2026

✅ No conflicts with other open PRs targeting develop

@github-actions
Copy link
Copy Markdown
Contributor

I see you updated files related to core. Please run make gocs in the root directory to add a changeset as well as in the text include at least one of the following tags:

  • #added For any new functionality added.
  • #breaking_change For any functionality that requires manual action for the node to boot.
  • #bugfix For bug fixes.
  • #changed For any change to the existing functionality.
  • #db_update For any feature that introduces updates to database schema.
  • #deprecation_notice For any upcoming deprecation functionality.
  • #internal For changesets that need to be excluded from the final changelog.
  • #nops For any feature that is NOP facing and needs to be in the official Release Notes for the release.
  • #removed For any functionality/config that is removed.
  • #updated For any functionality that is updated.
  • #wip For any change that is not ready yet and external communication about it should be held off till it is feature complete.

@russell-stern russell-stern added the build-publish Build and Publish image to SDLC label May 21, 2026
@trunk-io
Copy link
Copy Markdown

trunk-io Bot commented May 21, 2026

Static BadgeStatic BadgeStatic BadgeStatic Badge

View Full Report ↗︎Docs

@russell-stern russell-stern marked this pull request as ready for review May 22, 2026 17:32
@russell-stern russell-stern requested review from a team as code owners May 22, 2026 17:32
if share.EncryptionKey == workflowNodeEncryptionPublicKeyStr {
localNodeShares = share.Shares
if len(share.BinaryShares) > 0 {
localNodeBinaryShares = share.BinaryShares
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@russell-stern Is each request guaranteed to just have 100% binary shares or 100% string shares? When you were working on the base64 implementation I think it was possible for the formats to be mixed; that would suggest you actually want to combine regular + binary shares

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We moved the binary shares encoding to be behind a feature flag so they'll be guaranteed to have one or the other


var currentBatch []*vaultcommon.StoredPendingQueueItem

flushBatch := func() error {
Copy link
Copy Markdown
Contributor

@cedric-cordenier cedric-cordenier May 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too long to inline IMO, just move it to a top-level private function called flushBatch; this also has the benefit of being clear about what the dependencies are rather than relying on closure variables which could get accidentally shadowed etc

return nil
}

queueLoop:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😬 If you're using gotos and loop statements in Golang that should be a signal to refactor the code so you can avoid it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The high-level logic should just be to:

  • loop over localQueueItems
  • maintain a variable of the current batch
  • clone this variable, add the new item to it
  • proto.Size it to see if we've exceeded the batch
  • check blob count
  • if we have -> return the current batch
  • otherwise assign the new cloned batch (with add'l item to the current batch)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, haven't seen gotos in golang before.
So a good time to refactor.

I would also suggest, maybe in a later PR if not now, to split into multiple files based on components. We have too many things going on in this 1 file.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I want to restructure the entire plugin file after we get these changes in

return nil, fmt.Errorf("could not marshal pending queue item: %w", ierr2)
var pack pendingQueueBlobPack
if optimizations {
pack, err = r.prepareObservationPendingQueueBlobs(ctx, seqNr, localQueueItems, pendingQueueHasID, maxBlobBytes, maxBlobHandleCount)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this imply that a blob could be one of two message types? How will you distinguish between them when unmarshalling?

return errors.New("failed to unmarshal observations: " + err.Error())
}

if len(ao.Observation) > r.maxObservationBytes {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this is unnecessary; I'd expect OCR to reject oversized observations automatically

}

// unmarshalPendingQueueBlob decodes a BroadcastBlob payload (legacy single item or StoredPendingQueueBatch).
func unmarshalPendingQueueBlob(blob []byte) ([]*vaultcommon.StoredPendingQueueItem, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worries me a little bit; the unmarshaller should know exactly what format the message is in
Proto messages are untyped so we can't guarantee that there won't be cases where we fall into the wrong type

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend adding an unambiguous prefix to the message or adding a field to the current wrapper type

keptItems = keptItems[:errBoundLimited.Limit]
}
r.metrics.trackPendingQueueWrittenSize(ctx, len(keptItems))
r.lggr.Infow("VAULT_OCR_STATE_TRANSITION_PENDING_WRITE", "seqNr", seqNr, "writtenCount", len(keptItems))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we follow the existing style for log messages -- this favours human readable messages for the logs?

I notice you have this throughout btw; please adjust the others too

vaultRequestID := vaultRequestIDFromMetadata(metadata)
lggr := logger.With(s.lggr, "requestedKeys", logKeys, "metadata", metadata, "vaultRequestID", vaultRequestID)
lggr.Infow("dispatching vault GetSecrets request")
lggr.Debug("fetching secrets...")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: One of these 2 logs can be deleted.

Comment thread core/services/workflows/v2/secrets.go Outdated
}

// vaultRequestIDFromMetadata mirrors the request ID formula in vault capability Execute.
func vaultRequestIDFromMetadata(md capabilities.RequestMetadata) string {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please move this function to a library, and let both secrets.go and capability.go share it?
This way we won't accidentally deviate.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this altogether. We added metadata to the logger which has this id in it. There's a pull request to update our skill to be able to use the new metadata fields: https://github.com/smartcontractkit/cre-docs/pull/42

}

func (s *secretsFetcher) decryptSecret(lggr logger.Logger, encryptedSecretBytes []byte, encryptedDecryptionShares []string, cfg *vaultConfig) (string, error) {
func encryptedDecryptionShareBytes(binaryShares [][]byte, hexShares []string) ([][]byte, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

func (s *secretsFetcher) decryptSecret(lggr logger.Logger, encryptedSecretBytes []byte, encryptedDecryptionShares []string, cfg *vaultConfig) (string, error) {
func encryptedDecryptionShareBytes(binaryShares [][]byte, hexShares []string) ([][]byte, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to write this function as a library here and let it be shared used across repos:
https://github.com/smartcontractkit/chainlink-common/blob/f9b356d61ca95d5a13d0105018432e884759778c/pkg/capabilities/actions/vault/vault.go
?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is just for the transition while switching to the new optimizations I'd rather keep it out of common. That way when we can just delete this code once the optimizations are enabled and won't need to make more common changes or leave extra code in there we can't delete

shareBytes, err := hex.DecodeString(share)
if err != nil {
return nil, fmt.Errorf("failed to decode share: %w", err)
if len(es.BinaryShares) > 0 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we have total 3 places where we handle binary or hex shares. better to consolidate into 1 library

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, once we make the switch to the optimizations I want to delete all this code

return nil
}

queueLoop:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, haven't seen gotos in golang before.
So a good time to refactor.

I would also suggest, maybe in a later PR if not now, to split into multiple files based on components. We have too many things going on in this 1 file.

@russell-stern russell-stern force-pushed the optimize_vault_responses branch from 40638ec to 307dd2d Compare May 26, 2026 12:39
@russell-stern russell-stern removed the build-publish Build and Publish image to SDLC label May 26, 2026
@cl-sonarqube-production
Copy link
Copy Markdown

Quality Gate failed Quality Gate failed

Failed conditions
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube

Catch issues before they fail your Quality Gate with our IDE extension SonarQube IDE SonarQube IDE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants