Skip to content

Change GroupRef from bare string to typed MCPGroupRef struct#4809

Merged
ChrisJBurns merged 6 commits into
mainfrom
cburns/groupref-typed-struct
Apr 14, 2026
Merged

Change GroupRef from bare string to typed MCPGroupRef struct#4809
ChrisJBurns merged 6 commits into
mainfrom
cburns/groupref-typed-struct

Conversation

@ChrisJBurns
Copy link
Copy Markdown
Collaborator

@ChrisJBurns ChrisJBurns commented Apr 14, 2026

Summary

  • Why: GroupRef was the only cross-CRD reference using a bare string — every other ref (ExternalAuthConfigRef, ToolConfigRef, AuthServerRef, EmbeddingServerRef) uses a typed struct. This inconsistency prevented extending GroupRef with additional fields (like namespace) without a breaking change, and blocked struct-level validation markers. This must be fixed before v1beta1.
  • What: Define MCPGroupRef struct with Name field and nil-safe GetName() helper. Replace GroupRef string with *MCPGroupRef on MCPServerSpec, MCPRemoteProxySpec, MCPServerEntrySpec. Add a new top-level GroupRef *MCPGroupRef field on VirtualMCPServerSpec (with ResolveGroupName() helper) that takes precedence over the deprecated config.groupRef string path. Update all controllers, field indexes, backend reconciler, workload discoverer, tests, YAML examples, chainsaw tests, and documentation.

Closes #4634

Type of change

  • Refactoring (no behavior change)

Test plan

  • Unit tests (task test)
  • Linting (task lint-fix)
  • go build ./... compiles cleanly
  • task operator-test passes
  • task operator-generate && task operator-manifests regenerates successfully

Changes

File Change
cmd/thv-operator/api/v1alpha1/mcpserver_types.go Define MCPGroupRef struct with Name field + GetName() helper; change MCPServerSpec.GroupRef from string to *MCPGroupRef
cmd/thv-operator/api/v1alpha1/mcpremoteproxy_types.go Change MCPRemoteProxySpec.GroupRef from string to *MCPGroupRef
cmd/thv-operator/api/v1alpha1/mcpserverentry_types.go Change MCPServerEntrySpec.GroupRef from string to *MCPGroupRef; update printcolumn JSONPath to .spec.groupRef.name
cmd/thv-operator/api/v1alpha1/virtualmcpserver_types.go Add top-level GroupRef *MCPGroupRef field; add ResolveGroupName() helper; update Validate() to accept either spec.groupRef or config.groupRef
cmd/thv-operator/controllers/*.go Update validateGroupRef() methods on all 4 controllers + MCPGroup controller watch handlers to use GetName()/nil checks
cmd/thv-operator/main.go Update field index extractors to use GetName()
pkg/vmcp/k8s/backend_reconciler.go Update GroupRef extraction from CRD types to internal strings
pkg/vmcp/workloads/k8s.go Update group name comparisons
deploy/charts/operator-crds/ Regenerated CRD YAMLs (groupRef becomes object type)
examples/operator/ Update all YAML examples to struct format
docs/ Update architecture and guide documentation

Does this introduce a user-facing change?

Yes — this is a breaking wire-format change. The CRD field groupRef changes from a bare string to an object:

# Before
groupRef: my-group

# After
groupRef:
  name: my-group

Existing resources with the old format will fail validation against the new CRD schema. Users must update their manifests.

Implementation plan

Approved implementation plan

Design Decision: Option B1

Keep pkg/vmcp/config/config.go Config.Group as a string (platform-agnostic config model shared with CLI). Change all four CRD types to use MCPGroupRef. For VirtualMCPServer, add a top-level GroupRef that takes precedence over config.groupRef, following the existing pattern where IncomingAuth supersedes config.IncomingAuth.

Internal types (vmcpserver.Config.GroupRef, StatusResponse.GroupRef, BackendReconciler.GroupRef) stay as strings — they're not CRD API types.

Full plan: #4634

Special notes for reviewers

  • The diff is large (71 files) but highly mechanical — the production code changes are only ~116 lines across 13 files. The rest is generated CRD YAMLs, test updates (GroupRef: "x"GroupRef: &MCPGroupRef{Name: "x"}), YAML examples, and documentation.
  • config.Config.Group (used by CLI vMCP) intentionally stays as a string — it's a platform-agnostic model, not a CRD type.
  • VirtualMCPServer's spec.config.groupRef (string) is deprecated but still works via ResolveGroupName() for backwards compatibility.

Large PR Justification

  • We're making all changes across the codebase for the groupRef

Generated with Claude Code

Migration Guide

This is a breaking wire-format change for v1alpha1 CRDs. Existing resources in a cluster will fail validation against the new CRD schema after upgrading.

What changed: The groupRef field on MCPServer, MCPRemoteProxy, MCPServerEntry, and VirtualMCPServer changed from a bare string to a typed struct:

# Before
groupRef: my-group

# After
groupRef:
  name: my-group

How to migrate:

  1. Update all YAML manifests to use the new struct format
  2. Apply the new CRDs: kubectl apply -f deploy/charts/operator-crds/files/crds/
  3. Delete and re-apply affected resources (MCPServer, MCPRemoteProxy, MCPServerEntry, VirtualMCPServer)

For VirtualMCPServer, move spec.config.groupRef to the new top-level spec.groupRef field. The deprecated config.groupRef still works but should be migrated.

Large PR Justification

This is an atomic breaking wire-format change that cannot be split without leaving CRDs in an inconsistent intermediate state. The production code changes are ~116 lines across 13 files. The remaining diff is generated CRD YAMLs, mechanical test updates, YAML examples, and documentation — all review-exempt categories.

GroupRef was a bare string while every other cross-CRD reference
(ExternalAuthConfigRef, ToolConfigRef, AuthServerRef, EmbeddingServerRef)
uses a typed struct. This inconsistency prevented extending GroupRef
with additional fields (like namespace) without a breaking change.

Define MCPGroupRef struct with Name field and nil-safe GetName() helper.
Replace GroupRef string with *MCPGroupRef on MCPServerSpec,
MCPRemoteProxySpec, MCPServerEntrySpec, and add a new top-level GroupRef
field on VirtualMCPServerSpec that takes precedence over the deprecated
config.groupRef string path.

Internal types (vmcpserver.Config, StatusResponse, BackendReconciler,
config.Config) remain as strings since they are not CRD API types.

This is a breaking wire-format change (groupRef: "name" becomes
groupRef: {name: "name"}) that must happen before v1beta1.

Closes #4634

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…-struct

# Conflicts:
#	cmd/thv-operator/api/v1alpha1/virtualmcpserver_types.go
#	deploy/charts/operator-crds/files/crds/toolhive.stacklok.dev_virtualmcpservers.yaml
#	deploy/charts/operator-crds/templates/toolhive.stacklok.dev_virtualmcpservers.yaml
@github-actions github-actions Bot added the size/XL Extra large PR: 1000+ lines changed label Apr 14, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 14, 2026
@github-actions github-actions Bot dismissed their stale review April 14, 2026 14:15

Large PR justification has been provided. Thank you!

@github-actions
Copy link
Copy Markdown
Contributor

✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review.

Comment thread cmd/thv-operator/controllers/mcpserverentry_controller.go Outdated
Copy link
Copy Markdown
Collaborator Author

@ChrisJBurns ChrisJBurns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-Agent Consensus Review

Agents consulted: kubernetes-expert, go-expert-developer, code-reviewer, toolhive-expert

Consensus Summary

# Finding Consensus Severity Action
1 MCPServerEntry validateGroupRef nil guard 8/10 MEDIUM Fix
2 Missing unit tests for ResolveGroupName/GetName 8/10 MEDIUM Fix
3 Inconsistent nil-check patterns across indexers 8/10 LOW Discuss
4 config.Config.Group still marked Required despite deprecation 7/10 MEDIUM Discuss
5 Validate() error message only mentions spec.groupRef 7/10 LOW Discuss

Overall

This is a well-executed CRD API refactoring that replaces bare-string GroupRef fields with a typed MCPGroupRef struct across all four CRD types. The design is sound — MCPGroupRef follows the established patterns of ExternalAuthConfigRef, ToolConfigRef, and AuthServerRef, and the nil-safe GetName() helper is a good addition. The ResolveGroupName() deprecation bridge for VirtualMCPServer is clean and correctly preserves backward compatibility.

The diff is large (71 files) but highly mechanical — the production code changes are concentrated in ~13 files with ~116 lines of real logic changes. The consensus findings are mostly about defensive programming (nil guard), test coverage (the new precedence logic is untested), and consistency (mixed nil-check idioms). None are correctness blockers, but F1 and F2 are worth addressing before merge to avoid a potential controller panic and to protect the precedence logic from regressions.

Documentation

The PR updates docs/arch/09-operator-architecture.md and the Kubernetes guide, but several other doc files still reference the old config.groupRef string format: docs/operator/virtualmcpserver-api.md, docs/operator/virtualmcpcompositetooldefinition-guide.md, docs/operator/virtualmcpserver-observability.md. Per CLAUDE.md, task crdref-gen should also be re-run to regenerate docs/operator/crd-api.md. These could be a follow-up PR.

Finding #4 (file-level): config.Config.Group still marked Required

File: pkg/vmcp/config/config.go (not in diff)
Severity: MEDIUM | Consensus: 7/10

The Group field in config.Config retains its +kubebuilder:validation:Required marker, so groupRef remains in the required list of the config object in the CRD schema. Users who set spec.groupRef (the new preferred path) but also need config for telemetry/audit will see a misleading schema that still requires config.groupRef. Consider marking this field +optional and regenerating CRDs.

Raised by: toolhive-expert, kubernetes-expert


Generated with Claude Code

Comment thread cmd/thv-operator/controllers/mcpserverentry_controller.go Outdated
Comment thread cmd/thv-operator/api/v1alpha1/virtualmcpserver_types.go
Comment thread cmd/thv-operator/main.go
Comment thread cmd/thv-operator/api/v1alpha1/virtualmcpserver_types.go Outdated
- Add nil guard to MCPServerEntry validateGroupRef using GetName()
  instead of direct .Name access to prevent potential panic
- Add unit tests for MCPGroupRef.GetName() and
  VirtualMCPServer.ResolveGroupName() covering precedence logic
- Normalize nil-check pattern across all field index extractors to
  use GetName() == "" (handles both nil and empty name)
- Mark config.Config.Group as optional and deprecated since
  spec.groupRef is now the preferred path
- Improve Validate() error message to mention both spec.groupRef
  and config.groupRef paths

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 14, 2026
@ChrisJBurns
Copy link
Copy Markdown
Collaborator Author

Re Finding #4 (config.Config.Group still marked Required):

Fixed in 7412b0f. Changed the +kubebuilder:validation:Required marker to +optional and added omitempty to the JSON/YAML tags, so config.groupRef is no longer in the required list of the CRD schema. Users setting spec.groupRef (the preferred path) won't see misleading schema validation for the deprecated config.groupRef field.

Note: The CLI validator (pkg/vmcp/config/validator.go) still requires Group for standalone vMCP usage where config.groupRef is the only way to specify it — this is correct since CLI doesn't have spec.groupRef.

Re Documentation note: Agreed the other docs (virtualmcpserver-api.md, virtualmcpcompositetooldefinition-guide.md, virtualmcpserver-observability.md) should be updated. Will address in a follow-up PR as suggested.

@ChrisJBurns
Copy link
Copy Markdown
Collaborator Author

Re Finding #4 (config.Config.Group still marked Required):

Fixed in 7412b0f. Changed the +kubebuilder:validation:Required marker to +optional and added omitempty to the JSON/YAML tags, so config.groupRef is no longer in the required list of the CRD schema. Users setting spec.groupRef (the preferred path) will not see misleading schema validation for the deprecated config.groupRef field.

Note: The CLI validator (pkg/vmcp/config/validator.go) still requires Group for standalone vMCP usage where config.groupRef is the only way to specify it — this is correct since CLI does not have spec.groupRef.

Re Documentation note: Agreed the other docs (virtualmcpserver-api.md, virtualmcpcompositetooldefinition-guide.md, virtualmcpserver-observability.md) should be updated. Will address in a follow-up PR as suggested.

@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 14, 2026
@JAORMX
Copy link
Copy Markdown
Collaborator

JAORMX commented Apr 14, 2026

Hey! Did a thorough review of this one with a few expert lenses (K8s API design, Go code quality, vMCP/docs consistency). The prior review round already caught the important nil-safety and test gaps, and those fixes in 7412b0f3 look good. Nice work there.

Here's what's still standing out to me:

1. Missed example file: vmcp_with_telemetry_ref.yaml

So, examples/operator/virtual-mcps/vmcp_with_telemetry_ref.yaml still has the old bare-string format at lines 64 and 93:

groupRef: telemetry-demo

Every other example in that directory was updated, but this one slipped through. After the CRD schema lands, anyone copying this example gets a validation error. Would be good to fix it in this PR since it's a one-liner.

2. virtualmcpserver-api.md still documents the old API

The hand-written API docs at docs/operator/virtualmcpserver-api.md still show spec.config.groupRef as the primary and required field (lines 27, 31, 36-37, 611, 613). Type is listed as string, examples show the old format, etc.

I saw your comment about addressing docs in a follow-up, and I get that this PR is already large. But... shipping a breaking wire-format change with stale docs is risky. At minimum, updating the header (line 27) and the type (line 31) in this PR would prevent the worst confusion. The full example sweep can be follow-up.

3. crd-api.md needs regeneration

docs/operator/crd-api.md is auto-generated and wasn't re-run. A quick task crdref-gen should sort this out.

4. Pointer type for a required field on MCPServerEntry

This one's more of a design question, but worth raising before v1beta1. MCPServerEntry.Spec.GroupRef is +kubebuilder:validation:Required but uses *MCPGroupRef (pointer). In K8s API conventions, required fields typically use value types... pointers semantically mean "can be absent." The other three CRDs correctly use *MCPGroupRef because their GroupRef is optional, but MCPServerEntry's is always required.

Using MCPGroupRef (non-pointer, no omitempty) for MCPServerEntry would make the Go type contract match the API contract. The GetName() nil-safe helper would still work on a value type. Not a blocker for this PR, but something to consider.

5. No migration path documented

This is a breaking wire-format change for existing CRs in a cluster. The PR description acknowledges it (which is great!), but there's no guidance on what users actually need to do. Since it's v1alpha1, breaking changes are fair game, but a quick note in the description or release notes saying "delete and re-apply your MCPServer/MCPRemoteProxy/MCPServerEntry resources" would save folks some confusion. Especially since the error they'd get from etcd would be... not super obvious.

Overall

The design is solid. MCPGroupRef follows the existing ref patterns nicely, the ResolveGroupName() backward compat logic is clean, and the test coverage additions are good. The bulk of this PR is mechanical and correct. Fixing #1 and #3 should be quick, #2 is partially deferrable, and #4-5 are discussion items.

Generated with Claude Code

@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 14, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 14, 2026

Codecov Report

❌ Patch coverage is 63.75000% with 29 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.93%. Comparing base (be40131) to head (3883a63).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
cmd/thv-operator/main.go 0.00% 9 Missing ⚠️
pkg/vmcp/k8s/backend_reconciler.go 40.00% 6 Missing ⚠️
...md/thv-operator/controllers/mcpgroup_controller.go 66.66% 5 Missing ⚠️
...-operator/controllers/mcpremoteproxy_controller.go 16.66% 5 Missing ⚠️
...hv-operator/api/v1alpha1/virtualmcpserver_types.go 66.66% 1 Missing and 1 partial ⚠️
...perator/controllers/virtualmcpserver_controller.go 86.66% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4809      +/-   ##
==========================================
- Coverage   68.98%   68.93%   -0.05%     
==========================================
  Files         518      518              
  Lines       54985    55002      +17     
==========================================
- Hits        37932    37917      -15     
- Misses      14125    14156      +31     
- Partials     2928     2929       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions github-actions Bot removed the size/XL Extra large PR: 1000+ lines changed label Apr 14, 2026
@github-actions github-actions Bot added the size/XL Extra large PR: 1000+ lines changed label Apr 14, 2026
- Fix vmcp_with_telemetry_ref.yaml (missed in initial pass)
- Update virtualmcpserver-api.md to document spec.groupRef as primary
  field and update all YAML examples to struct format
- Update virtualmcpcompositetooldefinition-guide.md and
  virtualmcpserver-observability.md examples
- Regenerate crd-api.md to show MCPGroupRef typed struct

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ChrisJBurns
Copy link
Copy Markdown
Collaborator Author

Thanks for the thorough review! All 5 points addressed in 3883a63:

#1 Missed vmcp_with_telemetry_ref.yaml: Fixed. Both MCPServer and VirtualMCPServer groupRef entries updated to struct format.

#2 virtualmcpserver-api.md stale docs: Fixed. Updated the header, type, all YAML examples, and validation section. Also updated virtualmcpcompositetooldefinition-guide.md and virtualmcpserver-observability.md.

#3 crd-api.md regeneration: Done. Ran crd-ref-docs manually (the task has a path resolution issue in worktrees). Now correctly shows MCPGroupRef typed struct.

#4 Pointer for required field on MCPServerEntry: Valid design observation. Using *MCPGroupRef (pointer) for a required field is inconsistent with K8s API conventions where required fields use value types. However, changing to a non-pointer now would mean MCPServerEntry uses MCPGroupRef while the other three use *MCPGroupRef — a different kind of inconsistency. I think the pointer approach is pragmatic for this PR since it keeps all four CRDs using the same type. Worth revisiting as a separate cleanup before v1beta1 if the team prefers strict convention adherence.

#5 Migration guidance: Added a "Migration Guide" section to the PR description with concrete steps for users to update their manifests.

@github-actions github-actions Bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 14, 2026
@ChrisJBurns ChrisJBurns merged commit 0c5213e into main Apr 14, 2026
121 of 125 checks passed
@ChrisJBurns ChrisJBurns deleted the cburns/groupref-typed-struct branch April 14, 2026 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR: 1000+ lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Change GroupRef from bare string to typed struct for API consistency

2 participants