Skip to content

(feat) Support for creating users on cluster init#82

Merged
jdheyburn merged 6 commits into
valkey-io:mainfrom
utdrmac:users-cluster-init
Feb 27, 2026
Merged

(feat) Support for creating users on cluster init#82
jdheyburn merged 6 commits into
valkey-io:mainfrom
utdrmac:users-cluster-init

Conversation

@utdrmac
Copy link
Copy Markdown
Contributor

@utdrmac utdrmac commented Feb 10, 2026

Resolves #36

This feature allows for the creation of Valkey users on cluster initialization. The feature abstracts Valkey ACL permissions into several objects, giving better readability, and creation flexibility to users that may not be intimately familiar with Valkey ACLs.

Signed-off-by: utdrmac <matthew.boehm@percona.com>
@utdrmac
Copy link
Copy Markdown
Contributor Author

utdrmac commented Feb 10, 2026

e2e tests pass in my local kind cluster

=== RUN   TestE2E
  Starting valkey-operator e2e test suite
Running Suite: e2e suite - /home/utdrmac/dockers/valkey-operator/test/e2e
=======================================================================
Random Seed: 1770743918

Will run 10 of 10 specs
------------------------------
...
Ran 10 of 10 Specs in 135.056 seconds
SUCCESS! -- 10 Passed | 0 Failed | 0 Pending | 0 Skipped
--- PASS: TestE2E (135.06s)
PASS
ok  	valkey.io/valkey-operator/test/e2e	135.065s
make cleanup-test-e2e
...

var _ = BeforeSuite(func() {
By("building the manager image")
cmd := exec.Command("make", "docker-build", fmt.Sprintf("IMG=%s", managerImage))
By("purging old events")
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K8S holds on to events for an extended period of time, even after a cluster is deleted. This call to purge events was added because subsequent e2e tests would fail if a previous test failed due to the e2e test fetching all events.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. We could move it to valkeycluster_test.go to have it close to where its needed.
This is also the only testcase/context where we use involvedObject.name=valkeycluster-sample.
I guess it needs to be in a BeforeEach instead then.

Comment thread Makefile

.PHONY: build
build: manifests generate fmt vet ## Build manager binary.
build: manifests generate fmt vet lint ## Build manager binary.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The golangci-lint was only ran during github e2e tests, and it was frustrating for this to fail remotely. Simple change to check this locally first.

Copy link
Copy Markdown
Contributor Author

@utdrmac utdrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments for 2 changes outside the scope of the PR.

Comment thread internal/controller/valkeycluster_controller.go
Copy link
Copy Markdown
Collaborator

@jdheyburn jdheyburn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great to see, thank you for putting the time in! Just left some comments for clarity

Comment thread internal/controller/users.go Outdated
Comment thread internal/controller/users.go
Comment thread internal/controller/users.go Outdated
Comment thread internal/controller/users.go
Comment thread internal/controller/valkeycluster_controller.go
Comment thread internal/controller/users.go
// Do not apply a password to this user
// +kubebuilder:default=false
NoPassword bool `json:"nopass,omitempty"`

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think passwordEnabled or enabledPassword would be good naming here instead of NoPassword. Also i am not sure this is general use case of having nopass attached to user. if they really want to acheieve this they can always use rawacl.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might not be general or recommended, but if Valkey exposes it as an option it might be useful to abstract it away.

https://valkey.io/topics/acl/#configure-acls-with-the-acl-command

This boolean is used in the PR in a few places. If we were to remove NoPassword here, would we change those conditions to be 'nopass' in RawAcl (pseudocode) ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea was since the ACL keyword is nopass then this follows-ish 1:1. Since the default is false if not specified, is there a strong reason to remove it? As the code is now, if removed, then passwords are always searched for even in a nopass=true case. The code would then need to check rawAcl for presence of nopass to achieve same logic.

Comment thread internal/controller/utils.go Outdated
Signed-off-by: utdrmac <matthew.boehm@percona.com>
@jdheyburn
Copy link
Copy Markdown
Collaborator

I think we're almost there with this PR, just pending @sandeepkunusoth approval from their remaining comments

var _ = BeforeSuite(func() {
By("building the manager image")
cmd := exec.Command("make", "docker-build", fmt.Sprintf("IMG=%s", managerImage))
By("purging old events")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. We could move it to valkeycluster_test.go to have it close to where its needed.
This is also the only testcase/context where we use involvedObject.name=valkeycluster-sample.
I guess it needs to be in a BeforeEach instead then.

for _, shard := range s.Shards {
for _, slot := range shard.Slots {
var next []SlotsRange
var next []SlotsRange //nolint:prealloc
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting that this isn't showing in CI, our current main (given by kubebuilder) uses

make lint            
...
Downloading github.com/golangci/golangci-lint/v2/cmd/golangci-lint@v2.7.2
0 issues.

which is using
go: downloading github.com/alexkohler/prealloc v1.0.0

Is this seen with newer versions of golangci-lint/prealloc?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must be. I'm using golangci-lint has version 2.8.0 built with go1.25.5

bjosv
bjosv previously approved these changes Feb 13, 2026
Copy link
Copy Markdown
Collaborator

@bjosv bjosv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, my previous comments are nits

Signed-off-by: utdrmac <matthew.boehm@percona.com>
@utdrmac
Copy link
Copy Markdown
Contributor Author

utdrmac commented Feb 19, 2026

Resolved conflict with c679809

Signed-off-by: utdrmac <matthew.boehm@percona.com>
@jdheyburn
Copy link
Copy Markdown
Collaborator

@utdrmac I think there is a bad merge pulling in some functions that no longer exist. Would you mind double checking to see what can be removed? Once that's done I'll approve and merge. Thanks again!

@utdrmac
Copy link
Copy Markdown
Contributor Author

utdrmac commented Feb 26, 2026

Hmm, I didn't run into any issues with git merge main. 🤷 Anyooo, I merged main and reran test suite:

...
  STEP: uninstalling CRDs @ 02/26/26 15:08:00.503
  running: "make uninstall"
  STEP: removing manager namespace @ 02/26/26 15:08:01.891
  running: "kubectl delete ns valkey-operator-system"
[AfterSuite] PASSED [26.256 seconds]
------------------------------

Ran 11 of 11 Specs in 220.267 seconds
SUCCESS! -- 11 Passed | 0 Failed | 0 Pending | 0 Skipped
--- PASS: TestE2E (220.27s)
PASS
ok  	valkey.io/valkey-operator/test/e2e	220.277s

Signed-off-by: utdrmac <matthew.boehm@percona.com>
@utdrmac
Copy link
Copy Markdown
Contributor Author

utdrmac commented Feb 26, 2026

f976ef7 is my forgotten sign-off of 318c67e

318c67e is a quick refactor to remove the extra boolean variable, and move around the logic so it's a bit cleaner

// primary, regardless of its node-index label. This handles the post-failover
// case where node-index=1 (or higher) was promoted by Valkey.
// Returns ("", "") if no primary is found.
func findShardPrimary(state *valkey.ClusterState, shardIndex int, pods *corev1.PodList) (nodeID, ip string) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this, and the other functions above it, the result of a bad merge?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upsertAnnotation is added by this PR. findShardPrimary was added in d777c83 I'm not seeing any merge issues.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My apologies, the comparison I was making made it appear as something different :(

@jdheyburn jdheyburn merged commit ed15c7f into valkey-io:main Feb 27, 2026
4 checks passed
@jdheyburn
Copy link
Copy Markdown
Collaborator

Thank for your time and effort on this @utdrmac !

@jdheyburn
Copy link
Copy Markdown
Collaborator

Hey @utdrmac, I just deployed the sample cluster to a test cluster with this commit.

I am wanting to check on the expectation for the charlie user which does not have a password (although they are disabled). The logs are erroring on that there is a missing password key for the user.

2026-02-27T17:29:09Z    ERROR   missing password key in secret  {"controller": "valkeycluster", "controllerGroup": "valkey.io", "controllerKind": "ValkeyCluster", "ValkeyCluster": {"name":"valkeycluster-sample","namespace":"valkey-operator-system"}, "namespace": "valkey-operator-system", "name": "valkeycluster-sample", "reconcileID": "d110d30b-4d32-45ff-adae-023fbb440395", "user": "charlie", "secret": "valkeycluster-sample-users", "key": "charlie"}

@utdrmac
Copy link
Copy Markdown
Contributor Author

utdrmac commented Feb 27, 2026

@jdheyburn That is correct due to the logic flow. The first thing checked for is a password. If NoPassword is true, fetchUserPasswords exits early, and then then continues building the user. If NoPassword is false, (as is the case for charlie), the search inside the Secret takes place. This returns error because we can't add a user without password. If charlie did have a password, or NoPassword was true, he would get added in disabled ("user charlie off").

I guess it comes down to a design decision: Should users that are disabled appear in the ACL file? If yes, (current behavior), then the error is correct. If no, then the logic can be changed to completely skip users which are disabled (and print info message that user was skipped due to being disabled).

sandeepkunusoth pushed a commit to sandeepkunusoth/valkey-k8s-operator that referenced this pull request Feb 28, 2026
Resolves valkey-io#36

This feature allows for the creation of Valkey users on cluster
initialization. The feature abstracts Valkey ACL permissions into
several objects, giving better readability, and creation flexibility to
users that may not be intimately familiar with Valkey ACLs.

---------

Signed-off-by: utdrmac <matthew.boehm@percona.com>
@hieu2102
Copy link
Copy Markdown
Contributor

@utdrmac I think we should still create users if no password is provided. Otherwise modules that override the authentication flow (i.e valkey-ldap) will not works

@sandeepkunusoth
Copy link
Copy Markdown
Member

@utdrmac I think we should still create users if no password is provided. Otherwise modules that override the authentication flow (i.e valkey-ldap) will not works

for nopass users we need to allow users with resetpass

@jdheyburn
Copy link
Copy Markdown
Collaborator

jdheyburn commented Mar 10, 2026

for nopass users we need to allow users with resetpass

@sandeepkunusoth Can you expand on the workflow you're describing and how it might be used?

@utdrmac @hieu2102 I think we should still create the user, but perhaps the logs could be tidied. If NoPass is defined then we don't need to error out, because we're not expecting a password.

@utdrmac
Copy link
Copy Markdown
Contributor Author

utdrmac commented Mar 10, 2026

@hieu2102 Gotcha; easy enough to change
@jdheyburn That makes sense on the logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

(feat) Support ACL user creation on cluster init

5 participants