Skip to content

UUID parsing to include pre-pended shard ID#515

Closed
lkatalin wants to merge 1 commit intosigstore:mainfrom
lkatalin:uuidparsing
Closed

UUID parsing to include pre-pended shard ID#515
lkatalin wants to merge 1 commit intosigstore:mainfrom
lkatalin:uuidparsing

Conversation

@lkatalin
Copy link
Copy Markdown
Contributor

@lkatalin lkatalin commented Nov 30, 2021

Summary

This is a draft! Feedback welcome, including overall design considerations. I've got some to-dos that I am still sorting out as listed here and in code comments. It's not many lines of code, but I did a fair amount of experimentation to get to this point as it's my first "real" Rekor PR. : )

Here is how it's working currently:

Screenshot from 2021-11-30 13-01-40

Ticket Link

Fixes #487

Release Note


This would change the format of the UUID that people use on the CLI and pre-pend the 64-digit hex UUID with 
a 6-digit human-readable decimal shard ID, like so: 
`ssssss-uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu`
where `s` is a digit of the shard ID and `u` is a digit of the old UUID / hex-encoded trillian merkle leaf 
hash.
  • TODO: Allow for backwards-compatibility by letting people search for the old 64-digit UUIDs and internally prepending those with the current shard ID. Is there a way to allow two different valid lengths for a parameter in pflags.go and openapi.yaml? I'm looking into this.

@lkatalin
Copy link
Copy Markdown
Contributor Author

@lukehinds @asraa @bobcallaway @dlorenc
This is what I've got so far... I'm actively working on the remaining TODOs. If you have any thoughts I'm happy to incorporate them. Thanks!

return valueFactory(uuidFlag, validateString("required,len=64,hexadecimal"), "")
// this corresponds to the merkle leaf hash of entries, which is represented by a 64 character hexadecimal string, prepended by the 6-digit shard ID and '-' separator
// TODO: does this have to be a literal? can we not use the const to define the len?
// TODO: is there a way to validate only the UUID part of this string as hexadecimal? or validate different parts of the string differently? can / should they be split into different flags?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could make a custom validator! instead of validateString you could make validateShardedUUID with a custom impl like the ones found here

func validateSHAValue(v string) error {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @asraa ! That's exactly what I want to do. I appreciate the example.


queuedLeaf := resp.getAddResult.QueuedLeaf.Leaf
uuid := hex.EncodeToString(queuedLeaf.GetMerkleLeafHash())
shardID := "000000"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the thought that the shardID would be a base10 integer or base16 (hex) like the rest of the UUID?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bobcallaway I was going to have it as a base10 integer so that it's more easily human-readable, but I'm definitely open to suggestions here. Is there an advantage to having it in hex?

@lkatalin
Copy link
Copy Markdown
Contributor Author

lkatalin commented Dec 6, 2021

I've got an updated version here that

  • Fixed the linter errors
  • Created a custom validator for the new fullID
  • Added some custom types / structs in sharding.go so the shardID is no longer hardcoded in entries.go

Outstanding todos:

  • Make sure tests pass
  • Don't hardcode the CurrentShardID as 0 but instead grab it from / increment it in... somewhere the state is kept? Maybe in a separate PR? I don't understand this part well yet.
  • Update custom validator function to allow the old 64-digit UUIDs
  • Maybe make the shardID hex? Do we want to do this?
  • Add some (possibly dummy at first) routing in get.go to route the request to the proper shard. Maybe in a separate PR?

@lkatalin
Copy link
Copy Markdown
Contributor Author

lkatalin commented Dec 6, 2021

Also, I am not sure whether it is normal with go code to have go.mod and go.sum update as part of a PR. I'm not doing this deliberately so if that's not meant to happen I can reset them.

@lukehinds
Copy link
Copy Markdown
Member

Also, I am not sure whether it is normal with go code to have go.mod and go.sum update as part of a PR. I'm not doing this deliberately so if that's not meant to happen I can reset them.

That's fine, its different to cargo where you tend to only push the toml file.

@lkatalin lkatalin changed the title Preliminary draft of UUID parsing (wip with todos) UUID parsing to include pre-pended shard ID Dec 13, 2021
@lkatalin
Copy link
Copy Markdown
Contributor Author

I think this is almost ready for review, but I'm not sure how to reproduce the failing e2e test locally (and therefore figure out what's going on with it). It seems to pass:

[rekor@fedora rekor]$ sudo ./tests/e2e-test.sh 
starting services
rekor_redis-server_1 is up-to-date
rekor_mysql_1 is up-to-date
rekor_trillian-log-server_1 is up-to-date
rekor_trillian-log-signer_1 is up-to-date
rekor_rekor-server_1 is up-to-date
building CLI and server
waiting up to 60 sec for system to start
running tests
ok  	github.com/sigstore/rekor/tests	13.996s

@dlorenc
Copy link
Copy Markdown
Member

dlorenc commented Dec 14, 2021

Could be a flake if it passes locally trying to re-run it now.

The other thing to try might be making sure you're fully rebased.

@lkatalin
Copy link
Copy Markdown
Contributor Author

Could be a flake if it passes locally trying to re-run it now.

The other thing to try might be making sure you're fully rebased.

Hmm, I had rebased on main last night before running the tests and I just did so again. Puzzling.


// validate the ShardID
shardID := v[:sharding.ShardIDLen]
shardIDTag := "required," + "len=" + fmt.Sprintf("%v", sharding.ShardIDLen) + ",number"
Copy link
Copy Markdown

@jeffvance jeffvance Dec 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is there a reason this is not shardIDTag := "required, len=" + fmt.Sprintf("%v",?
Ditto for L215

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeffvance No, there isn't. : ) Thanks for the catch, I will update this.

@lkatalin lkatalin marked this pull request as ready for review December 14, 2021 17:37
@lkatalin lkatalin force-pushed the uuidparsing branch 2 times, most recently from af3e1bc to 1f7be40 Compare December 15, 2021 21:24
@lkatalin
Copy link
Copy Markdown
Contributor Author

It does look like it's an empty attestation on the failing test:

--- FAIL: TestIntoto (0.23s)
    e2e_test.go:439: g.Attesation:  []
    e2e_test.go:443: unexpected end of JSON input

@dlorenc
Copy link
Copy Markdown
Member

dlorenc commented Dec 18, 2021

I've managed to reproduce this locally! Sorry it took a bit longer than expected, I'm on a new M1 Macbook and it took a bit to get the dev environment running there.

--- FAIL: TestIntoto (0.52s)
    e2e_test.go:439: g.Attesation:  []
    e2e_test.go:443: unexpected end of JSON input
FAIL
FAIL	github.com/sigstore/rekor/tests	0.710s
FAIL

I think I have a guess at the issue...

queuedLeaf := resp.getAddResult.QueuedLeaf.Leaf
uuid := hex.EncodeToString(queuedLeaf.GetMerkleLeafHash())
shardID := sharding.NewCurrent()
uuid := shardID.ShardIDString + sharding.FullIDSeparator + hex.EncodeToString(queuedLeaf.GetMerkleLeafHash())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to make this a helper function, and call it anywhere we go from a MerkleLeafHash to a UUID.

To fix the e2e test, we also need it right here: https://github.com/sigstore/rekor/blob/main/pkg/api/entries.go#L95

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlorenc I can change it to a helper function, that's a great idea.
Thank you for figuring out what's going on with the e2e test!

@dlorenc
Copy link
Copy Markdown
Member

dlorenc commented Dec 18, 2021

OK, got a passing test over here! #548

I think I see the issue now. Let me know if you want to chat a bit more this week, I have a few ideas.

@lkatalin
Copy link
Copy Markdown
Contributor Author

OK, got a passing test over here! #548

I think I see the issue now. Let me know if you want to chat a bit more this week, I have a few ideas.

Thanks @dlorenc , I really appreciate you looking into it. I still need to understand how the changes you suggested are related to the e2e test. I'm also happy to incorporate those changes in whatever way you think is best to get this to pass CI and merge it asap.

@dlorenc
Copy link
Copy Markdown
Member

dlorenc commented Dec 19, 2021

I still need to understand how the changes you suggested are related to the e2e test

At a high level it's just a case of confusion between MerkleLeafHash and UUID - the code was storing something using MerkleTreeHash (an In-Toto Attestation), and then trying to retrieve it later by UUID.

The In-Toto Attestations are stored separately from the Transparency Log itself, to make them redactable if we ever need to. The failing test tests that behavior. The change in entries.go fixes that up so we retrieve it by the UUID instead of the MerkleTreeHash, so we're consistent.

The change in get.go is unnecessary, but helped me when debugging and is otherwise safe. You can drop it if you prefer and we can handle that one separately.

// TODO: The shardID will be part of the state and will be updated. Store it in state.go?
// TODO: Add a check that the CurrentShardID cannot be updated beyond 999,999

var CurrentShardID uint32 = 0
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will need to be passed or parsed out of the log index range flag, it would basically be the last one in the list: https://github.com/sigstore/rekor/blob/main/cmd/rekor-server/app/flags.go#L29

The client shouldn't need to know the active shard id - it's purely controlled server side by that flag.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I will use the ActiveIndex() from here #549

}

// Create a 6-digit string from shardID
func IntToString(i uint32) string {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use strconv.ParseInt here, it'll also handle the error cases.

metricNewEntries.Inc()

queuedLeaf := resp.getAddResult.QueuedLeaf.Leaf
uuid := hex.EncodeToString(queuedLeaf.GetMerkleLeafHash())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm not 100% sure this change is safe as is - I don't think it would be backwards compatible (we need a test for this). It means the server would start returning the shard-prefixed UUID to clients who create one - but older clients wouldn't have the change you added in here to get.go.

Older clients would then have a UUID that's invalid to them. I think we might need to drop this part for a release or two, to make sure clients can handle the new format.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I had not considered that older clients might still be used. I can write a test to make sure this will work when we include it.

dlorenc added a commit to dlorenc/rekor-cli that referenced this pull request Dec 19, 2021
This should help out with sigstore#515.

Signed-off-by: Dan Lorenc <lorenc.d@gmail.com>
dlorenc added a commit to dlorenc/rekor-cli that referenced this pull request Dec 19, 2021
This should help out with sigstore#515.

Signed-off-by: Dan Lorenc <lorenc.d@gmail.com>
Signed-off-by: Lily Sturmann <lsturman@redhat.com>
@lkatalin
Copy link
Copy Markdown
Contributor Author

I still need to understand how the changes you suggested are related to the e2e test

At a high level it's just a case of confusion between MerkleLeafHash and UUID - the code was storing something using MerkleTreeHash (an In-Toto Attestation), and then trying to retrieve it later by UUID.

The In-Toto Attestations are stored separately from the Transparency Log itself, to make them redactable if we ever need to. The failing test tests that behavior. The change in entries.go fixes that up so we retrieve it by the UUID instead of the MerkleTreeHash, so we're consistent.

The change in get.go is unnecessary, but helped me when debugging and is otherwise safe. You can drop it if you prefer and we can handle that one separately.

@dlorenc Thank you for this explanation - I had totally missed that there is another part of the code using the UUID. I'm glad the CI caught this.

@lkatalin
Copy link
Copy Markdown
Contributor Author

I'm still making changes to this.

lukehinds pushed a commit that referenced this pull request Dec 21, 2021
This should help out with #515.

Signed-off-by: Dan Lorenc <lorenc.d@gmail.com>
} else if len(uuid) == sharding.FullIDLen {
params.EntryUUID = uuid[sharding.FullIDLen-sharding.UUIDLen:]
} else {
return nil, fmt.Errorf("invalid UUID length: %v", len(uuid))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we augment this error message to tell the user what the valid length/format should be?

continue
}

// TODO: understand why we don't need k to match params.EntryUUID
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k should match the UUID returned but as Dan mentioned in his other review comment there would be a concern about backward compatibility.

Comment on lines +120 to +125
if len(uuid) == sharding.UUIDLen {
params.EntryUUID = uuid
} else if len(uuid) == sharding.FullIDLen {
params.EntryUUID = uuid[sharding.FullIDLen-sharding.UUIDLen:]
} else {
return nil, fmt.Errorf("invalid UUID length: %v", len(uuid))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, if you end up writing a helper method for creating this within the sharding package then would it make sense to move the validation logic in there, as well as add a new helper method to parse the input?

@lkatalin
Copy link
Copy Markdown
Contributor Author

lkatalin commented Feb 1, 2022

Closed in favor of #583 #587 and #623 which together resolve #487

@lkatalin lkatalin closed this Feb 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support UUID --> shard ID parsing

6 participants