feat(migrate-eppo): add Eppo migration kit and extract shared core by fabriziodemaria · Pull Request #17 · spotify/confidence-ai-plugins

fabriziodemaria · 2026-05-27T10:52:00Z

Summary

Adds a /migrate-eppo slash command and skill that mirrors the existing PostHog migration, refactors both skills to share their Confidence-side conventions, ships a local fake-Eppo fixture server (schema-aligned with real Eppo) so the whole thing can be tested without an Eppo account, and — after auditing the open-source Confidence resolver — translates several targeting cases that earlier looked un-migratable.

Commits, intentionally separable:

feat(migrate-eppo) — new command, skill, README/CLAUDE.md.
refactor(migrations) — extracts the ~60% duplicated content into skills/_shared/migration-core.md.
chore + fix(migrate-eppo) — local fake-Eppo server, then realigned to Eppo's public OpenAPI 3.0 spec.
chore(migrate-eppo) — verify_migration.py behavioral-equivalence harness.
test(migrate-eppo) — archived-flag fixture for the list-filter path.
feat(migrate-eppo) — translate SemVer, regex alternation, and reusable audiences instead of blocking them.
feat(migrate-eppo) — translate IS_NULL via a ruleless presence criterion (not(exists)), instead of blocking or relying on a default-fallthrough trick.
feat(migrate-eppo) — emit Eppo's default allocation as an explicit catch-all rule (Confidence has no server-side flag default).

Operator mapping — what migrates vs what's blocked

After auditing the open-source resolver (spotify/confidence-resolver / state-to-wasm: target.proto, ir_builder.rs, value.rs, version.rs, spec payloads) and the admin API (epx-flags-admin TargetingValidator), these all have clean translations:

Eppo	Confidence translation
`ONE_OF` / `NOT_ONE_OF`	`setRule` (latter wrapped in `not`)
`MATCHES` prefix/suffix anchor	`startsWithRule` / `endsWithRule`
`MATCHES` alternation (`.*@(a\|b\|c)$`)	OR of `startsWith`/`endsWith`, one per branch
`GT`/`GTE`/`LT`/`LTE` numeric	`rangeRule` with `numberValue`
`GT`/`GTE`/`LT`/`LTE` SemVer	`rangeRule` with `versionValue`
`IS_NULL` (any shape)	ruleless presence criterion `{ attribute: { attributeName } }` referenced under `not`; composes with `and`/`or`, so combined and non-default cases work too
reusable `audiences[]` (`IS_IN`/`IS_NOT_IN`)	a Confidence segment per audience via `createSegment`, referenced by a segment criterion
`is_default` allocation	final catch-all `addTargetingRule` (no payload, 100% → default variant)

Genuinely blocked — now just two whole-flag cases:

Generic MATCHES regex — character classes, quantifiers, wildcard ., backrefs, multiple alternation groups.
SWITCHBACK allocations — Eppo rotates a subject through different variations across time windows; Confidence has no time-bucketed exposure (its per-subject assignment is sticky, the opposite of switchback). Not a sticky-assignment gap — consistent assignment is native, plus there's a materialization API.

(Unnormalizable version strings — not v-strippable into 2–4 numeric segments — also fall back to per-condition manual review.)

On `IS_NULL`

Confidence's resolver compiles an attribute criterion with no inner rule to an existence check (ir_builder.rs _ => arm → I64Neqz), the resolver's own spec fixtures contain a bare { "attributeName": "country" } criterion, and epx-flags-admin's TargetingValidator does no structural validation for ATTRIBUTE criteria — so it's creatable and resolvable. IS_NULL(attr) therefore maps directly to not(exists(attr)). (Confirmed in-product: the segment editor exposes a first-class is null operator.) One cosmetic caveat: the web segment editor may render a ruleless criterion as empty even though it resolves correctly.

On the default allocation

This is a separate concern from IS_NULL. Confidence has no server-side flag default: the Flag proto carries variants + an ordered rule list but no default field (createFlag accepts none), and an unmatched resolve returns the caller's code default (ClientDefaultAssignment). Migrating Eppo's is_default allocation as a phantom "flag default value" was a no-op. The skill now emits it as a final catch-all rule — addTargetingRule with variantAllocations { <default>: 100 } and no payload (an empty payload targets all contexts; a no-targeting segment matches everyone) — placed after every specific rule so it only catches no-match subjects. The shared-core instruction is gated on the source platform actually defining a default variation, so the PostHog flow is unaffected.

How to test end-to-end

python3 skills/migrate-eppo/test-fixtures/server.py
# → 13 fixture flags, 2 audiences, 3 environments
export EPPO_API_KEY=fake-key-for-testing
# /migrate-eppo plan flags ; API base http://127.0.0.1:3000/api/v1 ; env Production (id 1)

Fixtures (12 active + 1 archived):

`id`	`key`	Tests	Status
1	`internal-tools-gate`	`MATCHES .*suffix$` → `endsWithRule`	Migrate
2	`pricing-experiment`	waterfall + 50/50 multivariant + `ONE_OF` + catch-all default	Migrate
3	`legacy-search-rollout`	`NOT_ONE_OF` + numeric `GTE` + AND	Migrate
4	`subject-id-targeting`	special `id` → entity-field rewrite	Migrate
5	`legacy-checkout-redesign`	inactive in prod → 0%-rollout	Migrate (warning)
6	`mobile-only-feature`	SemVer `>=` → `versionValue`	Migrate
7	`general-regex-flag`	suffix alternation → OR of `endsWithRule`	Migrate
8	`missing-attribute-fallback`	`IS_NULL` → non-default variant via `not(exists)`	Migrate
9	`delivery-pricing-switchback`	`SWITCHBACK` → flag-level BLOCKED	BLOCKED
10	`premium-users-only`	`audiences[]` (`IS_IN`/`IS_NOT_IN`) → segments	Migrate
11	`regex-id-format`	generic regex → BLOCKED	BLOCKED
12	`null-and-condition`	`IS_NULL` ∧ `plan==free` → `and(not(exists), eq)`	Migrate
13	`old-onboarding-flow`	`is_archived: true` → hidden unless opted in	Skipped

Test plan checklist

Server smoke test — 401 without token; archived flag hidden by default (12) / visible with include_archived=true (13); per-env overrides apply.
Schema validation against Eppo's real OpenAPI 3.0 spec — zero violations.
Behavioral equivalence (verify_migration.py) — Eppo ground-truth waterfall (regex, SemVer, set membership, IS_NULL via not(exists), audience IS_IN/IS_NOT_IN, default allocation as catch-all) over 10 migratable flags × 9 contexts = 90 cases, 0 mismatches; 2 BLOCKED fixtures listed separately. Positive null paths covered (no-plan → on; no-country ∧ free → on).
BLOCKED-path fixtures — regex-id-format and delivery-pricing-switchback fire with the exact reason strings.
Regression — run /migrate-posthog plan flags end-to-end to confirm the shared-core refactor didn't break the existing PostHog flow.

Known follow-ups

Phase 2 (code migration) is untested — needs a sample Eppo-SDK codebase.

Draft because

Wanted feedback on the shared-core extraction before merging.
PostHog regression run still outstanding.

Adds a /migrate-eppo slash command and skill that mirror the existing PostHog migration: two-phase flow (flag definitions then code), opt-in gating per flag, progressive plan files, and the same execute sequence with positive- and negative-case resolve verification. Eppo-specific adjustments: - Uses Eppo's REST admin API via curl since no Claude MCP exists yet; requires an EPPO_API_KEY and never persists it to the plan file. - Asks the user to pick a source environment before scanning, because Eppo flag state is per-environment. - Maps Eppo's subjectKey to a single Confidence entity field; rewrites rules that target the special id attribute to use that field. - Emits one Confidence targeting rule per Eppo allocation in waterfall order, with trafficExposure -> rolloutPercentage and variation weights -> variant splits inside a single rule. - Constrains MATCHES to ^prefix.* / .*suffix$ and blocks SemVer comparisons, with execute refusing to proceed on unresolved blocks. - Flags disabled in the source environment are migrated at 0% rollout so they cannot accidentally activate during migration. Co-authored-by: Cursor <cursoragent@cursor.com>

Both platform migration skills (migrate-posthog, migrate-eppo) shared roughly 60% of their content: the Confidence targeting payload format, flag setup sequence, naming rules, execute flow, plan-file resume pattern, client-selection step, step-tracker conventions, and the multivariant split handling. Duplicating that content across N skills makes Confidence-side bug fixes like the targeting-payload-format correction in #15 a multi-place chore and gets worse with every new source platform. This commit extracts all platform-agnostic content into skills/_shared/migration-core.md. Each platform skill now starts with a "read the core file first" instruction and only contains: - Its migration overview ASCII art - Source-platform prerequisites (PostHog MCP install vs Eppo REST API key) - The platform-specific scan logic for Step 1 of plan flags - The platform-specific randomization mapping for Step 3 - The Operator Mapping table (source operator -> Confidence payload strategy) plus any blocked-operator guidance - Plan-flag template fields that differ across platforms - The platform-specific code-scan and transform-rule generation for plan code Steps 3 and 4 - Any platform-specific execute notes (e.g. Eppo's disabled-in-env handling) Net effect: -28% total lines, each platform skill ~500 lines instead of ~1100, and adding a third source platform (LaunchDarkly, Statsig, GrowthBook, ConfigCat, ...) is now mechanical - copy a platform skill template and fill in the four platform-specific sections above. Pure refactor; no user-facing behavior change. The slash commands, plan-file paths, MCP tool names, and execute flow are unchanged. Co-authored-by: Cursor <cursoragent@cursor.com>

Adds a Python stdlib HTTP server under skills/migrate-eppo/test-fixtures/ that mimics Eppo's REST admin API for the four read endpoints the skill calls (/environments, /feature-flags, /feature-flags/{id}, /feature-flags/{id}/environments/{envId}). Lets us drive the migrate-eppo skill end-to-end without an Eppo account, which has become important because Eppo's signup is sales-gated and not self-service. The 10 fixture flags are deliberately chosen to exercise every branch of the skill's operator-mapping table: - internal-tools-gate MATCHES suffix anchor (endsWithRule) - pricing-experiment waterfall, Feature Gate + Experiment, multivariant 50/50, ONE_OF - legacy-search-rollout NOT_ONE_OF + GTE numeric + AND - subject-id-targeting special id attribute -> entity rewrite - legacy-checkout-redesign disabled in Production (0%-rollout path) - mobile-only-feature SemVer GTE -> BLOCKED path - general-regex-flag regex with alternation -> BLOCKED path - extra-flag-1..3 pagination filler (per_page=5 -> 2 pages) The fixture JSON is our best guess at Eppo's actual response schema, modeled on the Swagger summary at https://eppo.cloud/api/docs and the public docs. A future Tier 3 pass with a real Eppo account should diff real responses against these fixtures and update either side if they drift; until then this server gives us deterministic coverage of the skill's logic. Smoke-tested: auth gate returns 401 without X-Eppo-Token, all four endpoints return well-formed JSON, pagination loop terminates after page 3 returns an empty array, and the per-env override flips legacy-checkout-redesign to enabled=false in Production while leaving Staging/Test enabled. Also adds a one-paragraph callout in skills/migrate-eppo/SKILL.md under Prerequisites pointing future contributors at the fixture server. Co-authored-by: Cursor <cursoragent@cursor.com>

…penAPI schema The prior REST reference and fake-server fixtures were modeled from Eppo's high-level docs and were structurally close but field-naming wrong almost everywhere. Diffed against the real OpenAPI 3.0 spec (publicly served at https://eppo.cloud/api/docs/swagger-ui-init.js, no auth required) and corrected: * snake_case throughout: variation_type, is_archived, targeting_rules, variation_weight, percent_exposure, is_default, environment_id, etc. * Numeric IDs (Eppo Object IDs) instead of slug strings * variation_weight is an array of {variation_id, weight}, not a map keyed by variant_key * Condition values are always arrays, even for single-value operators * Default variation lives on the allocation with is_default: true, not on the flag itself * Environment status uses active + is_production, not enabled * List pagination is offset + limit, not page + per_page * List response is a bare array, not a {flags, has_more, total} wrapper * Added include_archived, include_detailed_allocations query params Surfaced two new BLOCKED cases the spec made visible: * IS_NULL operator -> Confidence has no native null-check rule, so any allocation using it is BLOCKED * SWITCHBACK allocation type -> Eppo time-windowed experiments are not modeled in Confidence; the whole flag is BLOCKED Plus an audiences[]-references BLOCKED path (allocations that reference reusable Eppo audience definitions via the IS_IN / IS_NOT_IN type require fetching /audiences/{id} and inlining, which is non-trivial). Test-fixture coverage updated to exercise every operator and allocation type in the spec: 10 fixture flags covering MATCHES suffix, waterfall + multivariant, NOT_ONE_OF + GTE + AND, special id attribute, inactive-in-env, SemVer BLOCKED, regex BLOCKED, IS_NULL BLOCKED, SWITCHBACK BLOCKED, and audience-reference BLOCKED. Validated end-to-end against the OpenAPI required-fields and enums via an automated schema-check script: zero violations across 10 flags and 3 envs.

Computes Eppo's waterfall evaluation locally over the fake-server fixtures and prints a 40-case test matrix (5 migratable flags x 8 test contexts) so the human running the migration can spot-check that Confidence resolves match what Eppo would have returned. Already used in the end-to-end test pass on this PR: a curated 18-case subset matched 0 mismatches across 11 distinct operator-mapping branches (endsWithRule, eqRule singleton + OR, NOT + AND, rangeRule startInclusive, id->user_id rewrite, 2-rule waterfall, inactive flag handling, probabilistic split). Worth keeping around as a tracked regression harness for future skill changes. Aligned with the commit-4 schema rewrite: reads snake_case fields, arrays of `values`, array `variation_weight` with `variation_id` lookups against `flag.variations`, and `is_default`-allocation default handling. Numeric operators coerce both sides to float; SemVer-looking values fail coercion and return False (those flags are BLOCKED in the migration anyway). Also adds a small `.gitignore` to keep `__pycache__/` out of the working tree now that there's Python code under test-fixtures.

All 10 existing fixture flags had is_archived: false, leaving the archive-filtering logic in the list endpoint untested. Add flag #11 (old-onboarding-flow) with is_archived: true so the default list returns 10 flags and include_archived=true returns 11. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… audiences instead of blocking Investigated the open-source Confidence resolver (spotify/confidence-resolver: target.proto, value.rs, version.rs, and its spec test payloads) to verify what the targeting engine actually supports. Four cases previously marked BLOCKED have clean translations: - SemVer comparisons → rangeRule with versionValue (Confidence has a first-class SemanticVersion value type; the resolver even strips pre-release suffixes) - MATCHES alternation → decompose into an OR of startsWith/endsWith rules - IS_NULL (sole condition, variant == default) → drop the rule; null subjects fall through to the flag default - Eppo audiences → Confidence segments via createSegment + segment criteria (IS_IN / IS_NOT_IN map to ref / not-wrapped ref) The genuinely-blocked set narrows from five to three: generic regex (char classes, quantifiers), IS_NULL combined with other conditions or serving a non-default variant, and SWITCHBACK (no time-windowed exposure primitive). Shared core gains Version/Set/Timestamp/Segment criteria with worked examples and a Reusable Segments (createSegment) section. The Eppo skill adds the /audiences/{id} fetch step, numeric-vs-version detection, regex decomposition, the precise IS_NULL semantics, audience->segment translation, a Segments plan section, and segment-first execute ordering. Fixtures: server.py serves /audiences and now has 12 flags + 2 audiences (four previously-blocked cases flipped to migratable, two new genuinely-blocked added). verify_migration.py models Eppo's SemVer/regex/audience evaluation as ground truth; 72 cases across 8 contexts pass. Co-authored-by: Cursor <cursoragent@cursor.com>

…ot sticky assignment The previous reason text read as if Confidence lacked sticky/consistent assignment. It doesn't — that's native, plus there's a materialization API. Reframe the blocker around the time-window rotation that genuinely has no Confidence equivalent. Co-authored-by: Cursor <cursoragent@cursor.com>

… instead of blocking Confidence does have a null/existence check: an attribute criterion with no inner rule is a presence test (resolver ir_builder.rs existence arm; it appears in the resolver's own spec fixtures), and the admin API accepts it on create (epx-flags-admin TargetingValidator does no structural validation for ATTRIBUTE criteria). Wrapping it in `not` expresses "attribute is null". So IS_NULL maps directly — emit `{ attribute: { attributeName } }` referenced under `not` — even when it serves a non-default variant or is ANDed with other conditions. This removes the previous "drop only if variant == default" trick and unblocks the combined case. Genuinely-blocked set narrows to two whole-flag cases: generic regex and SWITCHBACK. - shared core: add existence/null criterion form, combinator row, and two worked examples (IS null, IS null combined) - Eppo skill: rewrite the IS_NULL section as a direct not(exists) translation, drop the combined/non-default BLOCKED rows, note the empty-in-editor caveat, add a "Null rules emitted" plan field - fixtures: flip #8 to serve a non-default variant for no-plan subjects and #12 (IS_NULL ANDed with plan==free) to migratable; both now exercise the not(exists) path. verify_migration.py moves null-and-condition to MIGRATED_FLAGS and adds a no-country/free context; 90 cases pass, 2 blocked. Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

…l rule Confidence has no server-side flag default (createFlag takes none; an unmatched resolve returns the caller's code default via ClientDefaultAssignment). Migrating Eppo's is_default allocation as a "flag default value" was therefore a no-op. Emit it instead as a final catch-all addTargetingRule (no payload, 100% -> default variant) so the default variation is preserved for no-match subjects. Co-authored-by: Cursor <cursoragent@cursor.com>

Align the eppo and posthog skills on the same "Available / Selected" layout for the subject-mapping section, and ignore local .claude/ dev artifacts (CLI symlinks + generated plans). Co-authored-by: Cursor <cursoragent@cursor.com>

fabriziodemaria and others added 12 commits May 27, 2026 11:33

docs(migrate-eppo): note IS_NULL is no longer in the blocked list

dd33a47

Co-authored-by: Cursor <cursoragent@cursor.com>

fabriziodemaria mentioned this pull request Jun 1, 2026

feat(migrate-eppo): phase 2 code-migration — client SDKs, struct keys, resolve modes #18

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(migrate-eppo): add Eppo migration kit and extract shared core#17

feat(migrate-eppo): add Eppo migration kit and extract shared core#17
fabriziodemaria wants to merge 12 commits into
mainfrom
feat/migrate-eppo

fabriziodemaria commented May 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fabriziodemaria commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Operator mapping — what migrates vs what's blocked

On IS_NULL

On the default allocation

How to test end-to-end

Test plan checklist

Known follow-ups

Draft because

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fabriziodemaria commented May 27, 2026 •

edited

Loading

On `IS_NULL`