[db-queries] Allow join expressions in `paginated-multicolumn` by hawkw · Pull Request #6530 · oxidecomputer/omicron

hawkw · 2024-09-05T22:35:25Z

Currently, the paginated_multicolumn utility in
nexus_db_queries::pagination only works when the select expression to
paginate is a table, and both columns to order by come from that table.
This means that it cannot easily be used to fix the bug in
instance_and_vmm_list_by_sled_agent that @davepacheco describes in
this comment, which would require using paginated_multicolumn to
paginate on two columns in an inner join expression.

This commit changes the giant wad of Diesel type ceremony on
paginated_multicolumn in order to ~~make it even worse~~ allow
expressions which are not tables to be paginated. I've added a test
demonstrating that this does, in fact, work.

Figuring out how to do this was...certainly an experience which I have
had. I think I need to lie down now.

sunshowers

😭 💀 🙃

iximeow

thank you for your service

@davepacheco

The `instance_and_vmm_list_by_sled_agent` query in `nexus-db-queries` returns a list of all instances, their projects, their active VMMs, and the sleds those active VMMs are running on. This is used by the `instance_watcher` background task to find VMMs to perform health checks for. The query is paginated by sled IDs to limit the amount of data fetched in one query, and so that the returned VMMs are grouped by sled, allowing the `instance_watcher` task to reuse the same sled-agent client connection pool for all VMMs on a sled. Unfortunately, the current pagination is a bit wrong, as @davepacheco kindly pointed out to me in [this comment][1]: > This is paginated by just `sled_id`, but that doesn't quite work. The > pagination marker must be unique for each row. I think you'll want to > use `paginated_multicolumn` with another id here, maybe either the > instance id or vmm id? As is, I'd expect this code to break when the > list of instances returned for a particular sled spans a pagination > boundary. (What I'd expect would happen there is: we process each page > fine up through the first page of results containing that sled id in > it, then we use that sled id as the marker, and the subsequent query > would fetch results with sled ids `>=` the one given, and so you'd > miss all the rows for that sled id that weren't on the first page of > results for that sled id.) I was surprised to see that this is not > documented on either `Paginator::found_batch()` nor `paginated()` :( The only reason we haven't encountered this bug yet is that currently, the batch size for the query is fairly high. However, PR #6527 reduces the batch size substantially to serve as a limit on the number of concurrently in flight health checks, meaning we'll see some VMMs not being health checked any time a sled has more than 16 VMMs on it. This commit fixes the query by changing it to use `paginated_multicolumn` to paginate the results by both the sled's ID *and* the VMM's ID. This is made possible by #6530, which changed `paginated_multicolumn` to allow the query fragment that's paginated to be a `FROM` clause that's a `JOIN`, in addition to a single table --- because we must join the `sled` and `vmm` tables in order to get both IDs. This change required restructuring the order in which the query is constructed, due to Diesel Reasons, so I took the liberty to add some comments while I was moving stuff around. [1]: #6519 (review)

@davepacheco

The `instance_and_vmm_list_by_sled_agent` query in `nexus-db-queries` returns a list of all instances, their projects, their active VMMs, and the sleds those active VMMs are running on. This is used by the `instance_watcher` background task to find VMMs to perform health checks for. The query is paginated by sled IDs to limit the amount of data fetched in one query, and so that the returned VMMs are grouped by sled, allowing the `instance_watcher` task to reuse the same sled-agent client connection pool for all VMMs on a sled. Unfortunately, the current pagination is a bit wrong, as @davepacheco kindly pointed out to me in [this comment][1]: > This is paginated by just `sled_id`, but that doesn't quite work. The > pagination marker must be unique for each row. I think you'll want to > use `paginated_multicolumn` with another id here, maybe either the > instance id or vmm id? As is, I'd expect this code to break when the > list of instances returned for a particular sled spans a pagination > boundary. (What I'd expect would happen there is: we process each page > fine up through the first page of results containing that sled id in > it, then we use that sled id as the marker, and the subsequent query > would fetch results with sled ids `>=` the one given, and so you'd > miss all the rows for that sled id that weren't on the first page of > results for that sled id.) I was surprised to see that this is not > documented on either `Paginator::found_batch()` nor `paginated()` :( The only reason we haven't encountered this bug yet is that currently, the batch size for the query is fairly high. However, PR #6527 reduces the batch size substantially to serve as a limit on the number of concurrently in flight health checks, meaning we'll see some VMMs not being health checked any time a sled has more than 16 VMMs on it. This commit fixes the query by changing it to use `paginated_multicolumn` to paginate the results by both the sled's ID *and* the VMM's ID. This is made possible by #6530, which changed `paginated_multicolumn` to allow the query fragment that's paginated to be a `FROM` clause that's a `JOIN`, in addition to a single table --- because we must join the `sled` and `vmm` tables in order to get both IDs. This change required restructuring the order in which the query is constructed, due to Diesel Reasons, so I took the liberty to add some comments while I was moving stuff around. [1]: #6519 (review)

hawkw added 2 commits September 5, 2024 13:09

my god...it's full of trait bounds

82c7953

oh my god it actually works

80e6b06

hawkw requested review from davepacheco and smklein September 5, 2024 22:35

sunshowers approved these changes Sep 5, 2024

View reviewed changes

iximeow approved these changes Sep 5, 2024

View reviewed changes

hawkw enabled auto-merge (squash) September 5, 2024 23:00

hawkw merged commit 9c81109 into main Sep 5, 2024

hawkw deleted the eliza/paginated-multicolumn branch September 5, 2024 23:46

hawkw mentioned this pull request Sep 6, 2024

[nexus] Fix pagination in instance-by-sled query #6534

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[db-queries] Allow join expressions in `paginated-multicolumn`#6530

[db-queries] Allow join expressions in `paginated-multicolumn`#6530
hawkw merged 2 commits into
mainfrom
eliza/paginated-multicolumn

hawkw commented Sep 5, 2024 •

edited

Loading

Uh oh!

sunshowers left a comment

Uh oh!

iximeow left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

hawkw commented Sep 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sunshowers left a comment

Choose a reason for hiding this comment

Uh oh!

iximeow left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hawkw commented Sep 5, 2024 •

edited

Loading