feat(migrate-eppo): add Eppo migration kit and extract shared core#17
Draft
fabriziodemaria wants to merge 12 commits into
Draft
feat(migrate-eppo): add Eppo migration kit and extract shared core#17fabriziodemaria wants to merge 12 commits into
fabriziodemaria wants to merge 12 commits into
Conversation
Adds a /migrate-eppo slash command and skill that mirror the existing PostHog migration: two-phase flow (flag definitions then code), opt-in gating per flag, progressive plan files, and the same execute sequence with positive- and negative-case resolve verification. Eppo-specific adjustments: - Uses Eppo's REST admin API via curl since no Claude MCP exists yet; requires an EPPO_API_KEY and never persists it to the plan file. - Asks the user to pick a source environment before scanning, because Eppo flag state is per-environment. - Maps Eppo's subjectKey to a single Confidence entity field; rewrites rules that target the special id attribute to use that field. - Emits one Confidence targeting rule per Eppo allocation in waterfall order, with trafficExposure -> rolloutPercentage and variation weights -> variant splits inside a single rule. - Constrains MATCHES to ^prefix.* / .*suffix$ and blocks SemVer comparisons, with execute refusing to proceed on unresolved blocks. - Flags disabled in the source environment are migrated at 0% rollout so they cannot accidentally activate during migration. Co-authored-by: Cursor <cursoragent@cursor.com>
Both platform migration skills (migrate-posthog, migrate-eppo) shared roughly 60% of their content: the Confidence targeting payload format, flag setup sequence, naming rules, execute flow, plan-file resume pattern, client-selection step, step-tracker conventions, and the multivariant split handling. Duplicating that content across N skills makes Confidence-side bug fixes like the targeting-payload-format correction in #15 a multi-place chore and gets worse with every new source platform. This commit extracts all platform-agnostic content into skills/_shared/migration-core.md. Each platform skill now starts with a "read the core file first" instruction and only contains: - Its migration overview ASCII art - Source-platform prerequisites (PostHog MCP install vs Eppo REST API key) - The platform-specific scan logic for Step 1 of plan flags - The platform-specific randomization mapping for Step 3 - The Operator Mapping table (source operator -> Confidence payload strategy) plus any blocked-operator guidance - Plan-flag template fields that differ across platforms - The platform-specific code-scan and transform-rule generation for plan code Steps 3 and 4 - Any platform-specific execute notes (e.g. Eppo's disabled-in-env handling) Net effect: -28% total lines, each platform skill ~500 lines instead of ~1100, and adding a third source platform (LaunchDarkly, Statsig, GrowthBook, ConfigCat, ...) is now mechanical - copy a platform skill template and fill in the four platform-specific sections above. Pure refactor; no user-facing behavior change. The slash commands, plan-file paths, MCP tool names, and execute flow are unchanged. Co-authored-by: Cursor <cursoragent@cursor.com>
Adds a Python stdlib HTTP server under skills/migrate-eppo/test-fixtures/
that mimics Eppo's REST admin API for the four read endpoints the skill
calls (/environments, /feature-flags, /feature-flags/{id},
/feature-flags/{id}/environments/{envId}). Lets us drive the migrate-eppo
skill end-to-end without an Eppo account, which has become important
because Eppo's signup is sales-gated and not self-service.
The 10 fixture flags are deliberately chosen to exercise every branch of
the skill's operator-mapping table:
- internal-tools-gate MATCHES suffix anchor (endsWithRule)
- pricing-experiment waterfall, Feature Gate + Experiment,
multivariant 50/50, ONE_OF
- legacy-search-rollout NOT_ONE_OF + GTE numeric + AND
- subject-id-targeting special id attribute -> entity rewrite
- legacy-checkout-redesign disabled in Production (0%-rollout path)
- mobile-only-feature SemVer GTE -> BLOCKED path
- general-regex-flag regex with alternation -> BLOCKED path
- extra-flag-1..3 pagination filler (per_page=5 -> 2 pages)
The fixture JSON is our best guess at Eppo's actual response schema,
modeled on the Swagger summary at https://eppo.cloud/api/docs and the
public docs. A future Tier 3 pass with a real Eppo account should diff
real responses against these fixtures and update either side if they
drift; until then this server gives us deterministic coverage of the
skill's logic.
Smoke-tested: auth gate returns 401 without X-Eppo-Token, all four
endpoints return well-formed JSON, pagination loop terminates after
page 3 returns an empty array, and the per-env override flips
legacy-checkout-redesign to enabled=false in Production while leaving
Staging/Test enabled.
Also adds a one-paragraph callout in skills/migrate-eppo/SKILL.md
under Prerequisites pointing future contributors at the fixture server.
Co-authored-by: Cursor <cursoragent@cursor.com>
…penAPI schema The prior REST reference and fake-server fixtures were modeled from Eppo's high-level docs and were structurally close but field-naming wrong almost everywhere. Diffed against the real OpenAPI 3.0 spec (publicly served at https://eppo.cloud/api/docs/swagger-ui-init.js, no auth required) and corrected: * snake_case throughout: variation_type, is_archived, targeting_rules, variation_weight, percent_exposure, is_default, environment_id, etc. * Numeric IDs (Eppo Object IDs) instead of slug strings * variation_weight is an array of {variation_id, weight}, not a map keyed by variant_key * Condition values are always arrays, even for single-value operators * Default variation lives on the allocation with is_default: true, not on the flag itself * Environment status uses active + is_production, not enabled * List pagination is offset + limit, not page + per_page * List response is a bare array, not a {flags, has_more, total} wrapper * Added include_archived, include_detailed_allocations query params Surfaced two new BLOCKED cases the spec made visible: * IS_NULL operator -> Confidence has no native null-check rule, so any allocation using it is BLOCKED * SWITCHBACK allocation type -> Eppo time-windowed experiments are not modeled in Confidence; the whole flag is BLOCKED Plus an audiences[]-references BLOCKED path (allocations that reference reusable Eppo audience definitions via the IS_IN / IS_NOT_IN type require fetching /audiences/{id} and inlining, which is non-trivial). Test-fixture coverage updated to exercise every operator and allocation type in the spec: 10 fixture flags covering MATCHES suffix, waterfall + multivariant, NOT_ONE_OF + GTE + AND, special id attribute, inactive-in-env, SemVer BLOCKED, regex BLOCKED, IS_NULL BLOCKED, SWITCHBACK BLOCKED, and audience-reference BLOCKED. Validated end-to-end against the OpenAPI required-fields and enums via an automated schema-check script: zero violations across 10 flags and 3 envs.
Computes Eppo's waterfall evaluation locally over the fake-server fixtures and prints a 40-case test matrix (5 migratable flags x 8 test contexts) so the human running the migration can spot-check that Confidence resolves match what Eppo would have returned. Already used in the end-to-end test pass on this PR: a curated 18-case subset matched 0 mismatches across 11 distinct operator-mapping branches (endsWithRule, eqRule singleton + OR, NOT + AND, rangeRule startInclusive, id->user_id rewrite, 2-rule waterfall, inactive flag handling, probabilistic split). Worth keeping around as a tracked regression harness for future skill changes. Aligned with the commit-4 schema rewrite: reads snake_case fields, arrays of `values`, array `variation_weight` with `variation_id` lookups against `flag.variations`, and `is_default`-allocation default handling. Numeric operators coerce both sides to float; SemVer-looking values fail coercion and return False (those flags are BLOCKED in the migration anyway). Also adds a small `.gitignore` to keep `__pycache__/` out of the working tree now that there's Python code under test-fixtures.
All 10 existing fixture flags had is_archived: false, leaving the archive-filtering logic in the list endpoint untested. Add flag #11 (old-onboarding-flow) with is_archived: true so the default list returns 10 flags and include_archived=true returns 11. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… audiences instead of blocking
Investigated the open-source Confidence resolver (spotify/confidence-resolver:
target.proto, value.rs, version.rs, and its spec test payloads) to verify what
the targeting engine actually supports. Four cases previously marked BLOCKED have
clean translations:
- SemVer comparisons → rangeRule with versionValue (Confidence has a first-class
SemanticVersion value type; the resolver even strips pre-release suffixes)
- MATCHES alternation → decompose into an OR of startsWith/endsWith rules
- IS_NULL (sole condition, variant == default) → drop the rule; null subjects fall
through to the flag default
- Eppo audiences → Confidence segments via createSegment + segment criteria
(IS_IN / IS_NOT_IN map to ref / not-wrapped ref)
The genuinely-blocked set narrows from five to three: generic regex (char classes,
quantifiers), IS_NULL combined with other conditions or serving a non-default
variant, and SWITCHBACK (no time-windowed exposure primitive).
Shared core gains Version/Set/Timestamp/Segment criteria with worked examples and a
Reusable Segments (createSegment) section. The Eppo skill adds the /audiences/{id}
fetch step, numeric-vs-version detection, regex decomposition, the precise IS_NULL
semantics, audience->segment translation, a Segments plan section, and segment-first
execute ordering.
Fixtures: server.py serves /audiences and now has 12 flags + 2 audiences (four
previously-blocked cases flipped to migratable, two new genuinely-blocked added).
verify_migration.py models Eppo's SemVer/regex/audience evaluation as ground truth;
72 cases across 8 contexts pass.
Co-authored-by: Cursor <cursoragent@cursor.com>
…ot sticky assignment The previous reason text read as if Confidence lacked sticky/consistent assignment. It doesn't — that's native, plus there's a materialization API. Reframe the blocker around the time-window rotation that genuinely has no Confidence equivalent. Co-authored-by: Cursor <cursoragent@cursor.com>
… instead of blocking
Confidence does have a null/existence check: an attribute criterion with no
inner rule is a presence test (resolver ir_builder.rs existence arm; it appears
in the resolver's own spec fixtures), and the admin API accepts it on create
(epx-flags-admin TargetingValidator does no structural validation for ATTRIBUTE
criteria). Wrapping it in `not` expresses "attribute is null".
So IS_NULL maps directly — emit `{ attribute: { attributeName } }` referenced
under `not` — even when it serves a non-default variant or is ANDed with other
conditions. This removes the previous "drop only if variant == default" trick
and unblocks the combined case.
Genuinely-blocked set narrows to two whole-flag cases: generic regex and
SWITCHBACK.
- shared core: add existence/null criterion form, combinator row, and two worked
examples (IS null, IS null combined)
- Eppo skill: rewrite the IS_NULL section as a direct not(exists) translation,
drop the combined/non-default BLOCKED rows, note the empty-in-editor caveat,
add a "Null rules emitted" plan field
- fixtures: flip #8 to serve a non-default variant for no-plan subjects and #12
(IS_NULL ANDed with plan==free) to migratable; both now exercise the not(exists)
path. verify_migration.py moves null-and-condition to MIGRATED_FLAGS and adds a
no-country/free context; 90 cases pass, 2 blocked.
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
…l rule Confidence has no server-side flag default (createFlag takes none; an unmatched resolve returns the caller's code default via ClientDefaultAssignment). Migrating Eppo's is_default allocation as a "flag default value" was therefore a no-op. Emit it instead as a final catch-all addTargetingRule (no payload, 100% -> default variant) so the default variation is preserved for no-match subjects. Co-authored-by: Cursor <cursoragent@cursor.com>
Align the eppo and posthog skills on the same "Available / Selected" layout for the subject-mapping section, and ignore local .claude/ dev artifacts (CLI symlinks + generated plans). Co-authored-by: Cursor <cursoragent@cursor.com>
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a
/migrate-epposlash command and skill that mirrors the existing PostHog migration, refactors both skills to share their Confidence-side conventions, ships a local fake-Eppo fixture server (schema-aligned with real Eppo) so the whole thing can be tested without an Eppo account, and — after auditing the open-source Confidence resolver — translates several targeting cases that earlier looked un-migratable.Commits, intentionally separable:
feat(migrate-eppo)— new command, skill, README/CLAUDE.md.refactor(migrations)— extracts the ~60% duplicated content intoskills/_shared/migration-core.md.chore+fix(migrate-eppo)— local fake-Eppo server, then realigned to Eppo's public OpenAPI 3.0 spec.chore(migrate-eppo)—verify_migration.pybehavioral-equivalence harness.test(migrate-eppo)— archived-flag fixture for the list-filter path.feat(migrate-eppo)— translate SemVer, regex alternation, and reusable audiences instead of blocking them.feat(migrate-eppo)— translateIS_NULLvia a ruleless presence criterion (not(exists)), instead of blocking or relying on a default-fallthrough trick.feat(migrate-eppo)— emit Eppo's default allocation as an explicit catch-all rule (Confidence has no server-side flag default).Operator mapping — what migrates vs what's blocked
After auditing the open-source resolver (
spotify/confidence-resolver/state-to-wasm:target.proto,ir_builder.rs,value.rs,version.rs, spec payloads) and the admin API (epx-flags-adminTargetingValidator), these all have clean translations:ONE_OF/NOT_ONE_OFsetRule(latter wrapped innot)MATCHESprefix/suffix anchorstartsWithRule/endsWithRuleMATCHESalternation (.*@(a|b|c)$)startsWith/endsWith, one per branchGT/GTE/LT/LTEnumericrangeRulewithnumberValueGT/GTE/LT/LTESemVerrangeRulewithversionValueIS_NULL(any shape){ attribute: { attributeName } }referenced undernot; composes withand/or, so combined and non-default cases work tooaudiences[](IS_IN/IS_NOT_IN)createSegment, referenced by a segment criterionis_defaultallocationaddTargetingRule(no payload, 100% → default variant)Genuinely blocked — now just two whole-flag cases:
MATCHESregex — character classes, quantifiers, wildcard., backrefs, multiple alternation groups.SWITCHBACKallocations — Eppo rotates a subject through different variations across time windows; Confidence has no time-bucketed exposure (its per-subject assignment is sticky, the opposite of switchback). Not a sticky-assignment gap — consistent assignment is native, plus there's a materialization API.(Unnormalizable version strings — not
v-strippable into 2–4 numeric segments — also fall back to per-condition manual review.)On
IS_NULLConfidence's resolver compiles an attribute criterion with no inner rule to an existence check (
ir_builder.rs_ =>arm →I64Neqz), the resolver's own spec fixtures contain a bare{ "attributeName": "country" }criterion, andepx-flags-admin'sTargetingValidatordoes no structural validation forATTRIBUTEcriteria — so it's creatable and resolvable.IS_NULL(attr)therefore maps directly tonot(exists(attr)). (Confirmed in-product: the segment editor exposes a first-classis nulloperator.) One cosmetic caveat: the web segment editor may render a ruleless criterion as empty even though it resolves correctly.On the default allocation
This is a separate concern from
IS_NULL. Confidence has no server-side flag default: theFlagproto carries variants + an ordered rule list but no default field (createFlagaccepts none), and an unmatched resolve returns the caller's code default (ClientDefaultAssignment). Migrating Eppo'sis_defaultallocation as a phantom "flag default value" was a no-op. The skill now emits it as a final catch-all rule —addTargetingRulewithvariantAllocations { <default>: 100 }and no payload (an empty payload targets all contexts; a no-targeting segment matches everyone) — placed after every specific rule so it only catches no-match subjects. The shared-core instruction is gated on the source platform actually defining a default variation, so the PostHog flow is unaffected.How to test end-to-end
Fixtures (12 active + 1 archived):
idkeyinternal-tools-gateMATCHES .*suffix$→endsWithRulepricing-experimentONE_OF+ catch-all defaultlegacy-search-rolloutNOT_ONE_OF+ numericGTE+ ANDsubject-id-targetingid→ entity-field rewritelegacy-checkout-redesignmobile-only-feature>=→versionValuegeneral-regex-flagendsWithRulemissing-attribute-fallbackIS_NULL→ non-default variant vianot(exists)delivery-pricing-switchbackSWITCHBACK→ flag-level BLOCKEDpremium-users-onlyaudiences[](IS_IN/IS_NOT_IN) → segmentsregex-id-formatnull-and-conditionIS_NULL∧plan==free→and(not(exists), eq)old-onboarding-flowis_archived: true→ hidden unless opted inTest plan checklist
include_archived=true(13); per-env overrides apply.verify_migration.py) — Eppo ground-truth waterfall (regex, SemVer, set membership,IS_NULLvianot(exists), audienceIS_IN/IS_NOT_IN, default allocation as catch-all) over 10 migratable flags × 9 contexts = 90 cases, 0 mismatches; 2 BLOCKED fixtures listed separately. Positive null paths covered (no-plan → on; no-country ∧ free → on).regex-id-formatanddelivery-pricing-switchbackfire with the exact reason strings./migrate-posthog plan flagsend-to-end to confirm the shared-core refactor didn't break the existing PostHog flow.Known follow-ups
Draft because