fix: tighten _operator user ACL permissions (#132)#136
Conversation
Replace +@ALL with the minimal set of commands the operator needs, derived from grepping all .B() client calls in the codebase plus +@connection for the valkey-go client handshake. Signed-off-by: Daan Vinken <daanvinken@tythus.com>
|
Thanks for raising! This is great to have, but before we merge this in, could we get a CI check in place to verify that the _operator user has the permissions it needs? I worry about adding this and making it too brittle, and new contributors to the code base might not think about missing permissions. What do you think? |
|
Thanks @jdheyburn ! That said, I admit the feedback loop is slow (you'd only find out when the E2E runs). A static grep-based check would catch it earlier but is fragile (regex can't reliably parse Go method chains, Arbitrary() calls use string args, builder names don't map 1:1 to ACL names). Did you have a certain type of CI check in mind? An AST parser sounds brittle as well. We could call |
|
I don't think it should be blocking - I raised #144 so that we can keep track of it in the future. |
This PR closes #132
Summary
The
_operatorsystem user was created with+@allas a placeholder while the operator's command surface was still evolving. Now that the core functionality is stable, this tightens it down to the minimal set of commands.Implementation
Grepped all
.B()client calls ininternal/to find every command the operator issues, then added+@connectionfor the valkey-go client handshake (CLIENT TRACKING,CLIENT SETINFO, etc.) which happens under the hood during connection setup.Testing
All permissions are exercised by the existing E2E tests:
CLUSTER MEET,CLUSTER ADDSLOTSRANGE,CLUSTER REPLICATE, and the state discovery commands (CLUSTER MYID,CLUSTER MYSHARDID,CLUSTER INFO,CLUSTER NODES,INFO)CLUSTER MIGRATESLOTSandCLUSTER GETSLOTMIGRATIONS(rebalancing slots to new shards)CLUSTER FORGET(removing drained nodes) and the same migration commands+@connectionis exercised on every single test since it's the client handshakeAlso tested locally: created a 3-shard cluster, scaled out to 4, scaled back to 2, checked operator logs for NOPERM errors after each step. All clean.
Adding a unit test for this would just maintain another static list that we compare against which doesn't make sense to me.