Skip to content

fix: tighten _operator user ACL permissions (#132)#136

Merged
jdheyburn merged 1 commit into
valkey-io:mainfrom
daanvinken:fix/tighten-operator-acl
Apr 22, 2026
Merged

fix: tighten _operator user ACL permissions (#132)#136
jdheyburn merged 1 commit into
valkey-io:mainfrom
daanvinken:fix/tighten-operator-acl

Conversation

@daanvinken
Copy link
Copy Markdown
Contributor

@daanvinken daanvinken commented Apr 15, 2026

This PR closes #132

Summary

The _operator system user was created with +@all as a placeholder while the operator's command surface was still evolving. Now that the core functionality is stable, this tightens it down to the minimal set of commands.

Implementation

Grepped all .B() client calls in internal/ to find every command the operator issues, then added +@connection for the valkey-go client handshake (CLIENT TRACKING, CLIENT SETINFO, etc.) which happens under the hood during connection setup.

Testing

All permissions are exercised by the existing E2E tests:

  • Cluster creation covers CLUSTER MEET, CLUSTER ADDSLOTSRANGE, CLUSTER REPLICATE, and the state discovery commands (CLUSTER MYID, CLUSTER MYSHARDID, CLUSTER INFO, CLUSTER NODES, INFO)
  • Scale-out test covers CLUSTER MIGRATESLOTS and CLUSTER GETSLOTMIGRATIONS (rebalancing slots to new shards)
  • Scale-in test covers CLUSTER FORGET (removing drained nodes) and the same migration commands
  • +@connection is exercised on every single test since it's the client handshake

Also tested locally: created a 3-shard cluster, scaled out to 4, scaled back to 2, checked operator logs for NOPERM errors after each step. All clean.

Adding a unit test for this would just maintain another static list that we compare against which doesn't make sense to me.

Replace +@ALL with the minimal set of commands the operator needs,
derived from grepping all .B() client calls in the codebase plus
+@connection for the valkey-go client handshake.

Signed-off-by: Daan Vinken <daanvinken@tythus.com>
@jdheyburn
Copy link
Copy Markdown
Collaborator

Thanks for raising!

This is great to have, but before we merge this in, could we get a CI check in place to verify that the _operator user has the permissions it needs? I worry about adding this and making it too brittle, and new contributors to the code base might not think about missing permissions.

What do you think?

@daanvinken
Copy link
Copy Markdown
Contributor Author

Thanks @jdheyburn !
The E2E tests already run inherently with _operator credentials, so any new command missing from the ACL would fail the E2E with a NOPERM error.

That said, I admit the feedback loop is slow (you'd only find out when the E2E runs). A static grep-based check would catch it earlier but is fragile (regex can't reliably parse Go method chains, Arbitrary() calls use string args, builder names don't map 1:1 to ACL names).

Did you have a certain type of CI check in mind? An AST parser sounds brittle as well. We could call ACL LOG after E2E tests to see if any commands were denied. Not sure if that would cause any real issues though.

@jdheyburn
Copy link
Copy Markdown
Collaborator

I don't think it should be blocking - I raised #144 so that we can keep track of it in the future.

@jdheyburn jdheyburn merged commit 2e20665 into valkey-io:main Apr 22, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tighten the ACL of the operator user

3 participants