Skip to content

feat(migrate-eppo): add Eppo migration kit and extract shared core#17

Draft
fabriziodemaria wants to merge 12 commits into
mainfrom
feat/migrate-eppo
Draft

feat(migrate-eppo): add Eppo migration kit and extract shared core#17
fabriziodemaria wants to merge 12 commits into
mainfrom
feat/migrate-eppo

Conversation

@fabriziodemaria
Copy link
Copy Markdown
Member

@fabriziodemaria fabriziodemaria commented May 27, 2026

Summary

Adds a /migrate-eppo slash command and skill that mirrors the existing PostHog migration, refactors both skills to share their Confidence-side conventions, ships a local fake-Eppo fixture server (schema-aligned with real Eppo) so the whole thing can be tested without an Eppo account, and — after auditing the open-source Confidence resolver — translates several targeting cases that earlier looked un-migratable.

Commits, intentionally separable:

  1. feat(migrate-eppo) — new command, skill, README/CLAUDE.md.
  2. refactor(migrations) — extracts the ~60% duplicated content into skills/_shared/migration-core.md.
  3. chore + fix(migrate-eppo) — local fake-Eppo server, then realigned to Eppo's public OpenAPI 3.0 spec.
  4. chore(migrate-eppo)verify_migration.py behavioral-equivalence harness.
  5. test(migrate-eppo) — archived-flag fixture for the list-filter path.
  6. feat(migrate-eppo) — translate SemVer, regex alternation, and reusable audiences instead of blocking them.
  7. feat(migrate-eppo)translate IS_NULL via a ruleless presence criterion (not(exists)), instead of blocking or relying on a default-fallthrough trick.
  8. feat(migrate-eppo)emit Eppo's default allocation as an explicit catch-all rule (Confidence has no server-side flag default).

Operator mapping — what migrates vs what's blocked

After auditing the open-source resolver (spotify/confidence-resolver / state-to-wasm: target.proto, ir_builder.rs, value.rs, version.rs, spec payloads) and the admin API (epx-flags-admin TargetingValidator), these all have clean translations:

Eppo Confidence translation
ONE_OF / NOT_ONE_OF setRule (latter wrapped in not)
MATCHES prefix/suffix anchor startsWithRule / endsWithRule
MATCHES alternation (.*@(a|b|c)$) OR of startsWith/endsWith, one per branch
GT/GTE/LT/LTE numeric rangeRule with numberValue
GT/GTE/LT/LTE SemVer rangeRule with versionValue
IS_NULL (any shape) ruleless presence criterion { attribute: { attributeName } } referenced under not; composes with and/or, so combined and non-default cases work too
reusable audiences[] (IS_IN/IS_NOT_IN) a Confidence segment per audience via createSegment, referenced by a segment criterion
is_default allocation final catch-all addTargetingRule (no payload, 100% → default variant)

Genuinely blocked — now just two whole-flag cases:

  • Generic MATCHES regex — character classes, quantifiers, wildcard ., backrefs, multiple alternation groups.
  • SWITCHBACK allocations — Eppo rotates a subject through different variations across time windows; Confidence has no time-bucketed exposure (its per-subject assignment is sticky, the opposite of switchback). Not a sticky-assignment gap — consistent assignment is native, plus there's a materialization API.

(Unnormalizable version strings — not v-strippable into 2–4 numeric segments — also fall back to per-condition manual review.)

On IS_NULL

Confidence's resolver compiles an attribute criterion with no inner rule to an existence check (ir_builder.rs _ => arm → I64Neqz), the resolver's own spec fixtures contain a bare { "attributeName": "country" } criterion, and epx-flags-admin's TargetingValidator does no structural validation for ATTRIBUTE criteria — so it's creatable and resolvable. IS_NULL(attr) therefore maps directly to not(exists(attr)). (Confirmed in-product: the segment editor exposes a first-class is null operator.) One cosmetic caveat: the web segment editor may render a ruleless criterion as empty even though it resolves correctly.

On the default allocation

This is a separate concern from IS_NULL. Confidence has no server-side flag default: the Flag proto carries variants + an ordered rule list but no default field (createFlag accepts none), and an unmatched resolve returns the caller's code default (ClientDefaultAssignment). Migrating Eppo's is_default allocation as a phantom "flag default value" was a no-op. The skill now emits it as a final catch-all ruleaddTargetingRule with variantAllocations { <default>: 100 } and no payload (an empty payload targets all contexts; a no-targeting segment matches everyone) — placed after every specific rule so it only catches no-match subjects. The shared-core instruction is gated on the source platform actually defining a default variation, so the PostHog flow is unaffected.

How to test end-to-end

python3 skills/migrate-eppo/test-fixtures/server.py
# → 13 fixture flags, 2 audiences, 3 environments
export EPPO_API_KEY=fake-key-for-testing
# /migrate-eppo plan flags ; API base http://127.0.0.1:3000/api/v1 ; env Production (id 1)

Fixtures (12 active + 1 archived):

id key Tests Status
1 internal-tools-gate MATCHES .*suffix$endsWithRule Migrate
2 pricing-experiment waterfall + 50/50 multivariant + ONE_OF + catch-all default Migrate
3 legacy-search-rollout NOT_ONE_OF + numeric GTE + AND Migrate
4 subject-id-targeting special id → entity-field rewrite Migrate
5 legacy-checkout-redesign inactive in prod → 0%-rollout Migrate (warning)
6 mobile-only-feature SemVer >=versionValue Migrate
7 general-regex-flag suffix alternation → OR of endsWithRule Migrate
8 missing-attribute-fallback IS_NULL → non-default variant via not(exists) Migrate
9 delivery-pricing-switchback SWITCHBACK → flag-level BLOCKED BLOCKED
10 premium-users-only audiences[] (IS_IN/IS_NOT_IN) → segments Migrate
11 regex-id-format generic regex → BLOCKED BLOCKED
12 null-and-condition IS_NULLplan==freeand(not(exists), eq) Migrate
13 old-onboarding-flow is_archived: true → hidden unless opted in Skipped

Test plan checklist

  • Server smoke test — 401 without token; archived flag hidden by default (12) / visible with include_archived=true (13); per-env overrides apply.
  • Schema validation against Eppo's real OpenAPI 3.0 spec — zero violations.
  • Behavioral equivalence (verify_migration.py) — Eppo ground-truth waterfall (regex, SemVer, set membership, IS_NULL via not(exists), audience IS_IN/IS_NOT_IN, default allocation as catch-all) over 10 migratable flags × 9 contexts = 90 cases, 0 mismatches; 2 BLOCKED fixtures listed separately. Positive null paths covered (no-plan → on; no-country ∧ free → on).
  • BLOCKED-path fixturesregex-id-format and delivery-pricing-switchback fire with the exact reason strings.
  • Regression — run /migrate-posthog plan flags end-to-end to confirm the shared-core refactor didn't break the existing PostHog flow.

Known follow-ups

  • Phase 2 (code migration) is untested — needs a sample Eppo-SDK codebase.

Draft because

  1. Wanted feedback on the shared-core extraction before merging.
  2. PostHog regression run still outstanding.

fabriziodemaria and others added 12 commits May 27, 2026 11:33
Adds a /migrate-eppo slash command and skill that mirror the existing
PostHog migration: two-phase flow (flag definitions then code), opt-in
gating per flag, progressive plan files, and the same execute sequence
with positive- and negative-case resolve verification.

Eppo-specific adjustments:
- Uses Eppo's REST admin API via curl since no Claude MCP exists yet;
  requires an EPPO_API_KEY and never persists it to the plan file.
- Asks the user to pick a source environment before scanning, because
  Eppo flag state is per-environment.
- Maps Eppo's subjectKey to a single Confidence entity field; rewrites
  rules that target the special id attribute to use that field.
- Emits one Confidence targeting rule per Eppo allocation in waterfall
  order, with trafficExposure -> rolloutPercentage and variation
  weights -> variant splits inside a single rule.
- Constrains MATCHES to ^prefix.* / .*suffix$ and blocks SemVer
  comparisons, with execute refusing to proceed on unresolved blocks.
- Flags disabled in the source environment are migrated at 0% rollout
  so they cannot accidentally activate during migration.

Co-authored-by: Cursor <cursoragent@cursor.com>
Both platform migration skills (migrate-posthog, migrate-eppo) shared
roughly 60% of their content: the Confidence targeting payload format,
flag setup sequence, naming rules, execute flow, plan-file resume
pattern, client-selection step, step-tracker conventions, and the
multivariant split handling. Duplicating that content across N skills
makes Confidence-side bug fixes like the targeting-payload-format
correction in #15 a multi-place chore and gets worse with every new
source platform.

This commit extracts all platform-agnostic content into
skills/_shared/migration-core.md. Each platform skill now starts with
a "read the core file first" instruction and only contains:

- Its migration overview ASCII art
- Source-platform prerequisites (PostHog MCP install vs Eppo REST
  API key)
- The platform-specific scan logic for Step 1 of plan flags
- The platform-specific randomization mapping for Step 3
- The Operator Mapping table (source operator -> Confidence payload
  strategy) plus any blocked-operator guidance
- Plan-flag template fields that differ across platforms
- The platform-specific code-scan and transform-rule generation for
  plan code Steps 3 and 4
- Any platform-specific execute notes (e.g. Eppo's disabled-in-env
  handling)

Net effect: -28% total lines, each platform skill ~500 lines instead
of ~1100, and adding a third source platform (LaunchDarkly, Statsig,
GrowthBook, ConfigCat, ...) is now mechanical - copy a platform skill
template and fill in the four platform-specific sections above.

Pure refactor; no user-facing behavior change. The slash commands,
plan-file paths, MCP tool names, and execute flow are unchanged.

Co-authored-by: Cursor <cursoragent@cursor.com>
Adds a Python stdlib HTTP server under skills/migrate-eppo/test-fixtures/
that mimics Eppo's REST admin API for the four read endpoints the skill
calls (/environments, /feature-flags, /feature-flags/{id},
/feature-flags/{id}/environments/{envId}). Lets us drive the migrate-eppo
skill end-to-end without an Eppo account, which has become important
because Eppo's signup is sales-gated and not self-service.

The 10 fixture flags are deliberately chosen to exercise every branch of
the skill's operator-mapping table:

- internal-tools-gate         MATCHES suffix anchor (endsWithRule)
- pricing-experiment          waterfall, Feature Gate + Experiment,
                              multivariant 50/50, ONE_OF
- legacy-search-rollout       NOT_ONE_OF + GTE numeric + AND
- subject-id-targeting        special id attribute -> entity rewrite
- legacy-checkout-redesign    disabled in Production (0%-rollout path)
- mobile-only-feature         SemVer GTE -> BLOCKED path
- general-regex-flag          regex with alternation -> BLOCKED path
- extra-flag-1..3             pagination filler (per_page=5 -> 2 pages)

The fixture JSON is our best guess at Eppo's actual response schema,
modeled on the Swagger summary at https://eppo.cloud/api/docs and the
public docs. A future Tier 3 pass with a real Eppo account should diff
real responses against these fixtures and update either side if they
drift; until then this server gives us deterministic coverage of the
skill's logic.

Smoke-tested: auth gate returns 401 without X-Eppo-Token, all four
endpoints return well-formed JSON, pagination loop terminates after
page 3 returns an empty array, and the per-env override flips
legacy-checkout-redesign to enabled=false in Production while leaving
Staging/Test enabled.

Also adds a one-paragraph callout in skills/migrate-eppo/SKILL.md
under Prerequisites pointing future contributors at the fixture server.

Co-authored-by: Cursor <cursoragent@cursor.com>
…penAPI schema

The prior REST reference and fake-server fixtures were modeled from
Eppo's high-level docs and were structurally close but field-naming
wrong almost everywhere. Diffed against the real OpenAPI 3.0 spec
(publicly served at https://eppo.cloud/api/docs/swagger-ui-init.js,
no auth required) and corrected:

* snake_case throughout: variation_type, is_archived, targeting_rules,
  variation_weight, percent_exposure, is_default, environment_id, etc.
* Numeric IDs (Eppo Object IDs) instead of slug strings
* variation_weight is an array of {variation_id, weight}, not a map
  keyed by variant_key
* Condition values are always arrays, even for single-value operators
* Default variation lives on the allocation with is_default: true,
  not on the flag itself
* Environment status uses active + is_production, not enabled
* List pagination is offset + limit, not page + per_page
* List response is a bare array, not a {flags, has_more, total} wrapper
* Added include_archived, include_detailed_allocations query params

Surfaced two new BLOCKED cases the spec made visible:

* IS_NULL operator -> Confidence has no native null-check rule, so any
  allocation using it is BLOCKED
* SWITCHBACK allocation type -> Eppo time-windowed experiments are not
  modeled in Confidence; the whole flag is BLOCKED

Plus an audiences[]-references BLOCKED path (allocations that reference
reusable Eppo audience definitions via the IS_IN / IS_NOT_IN type
require fetching /audiences/{id} and inlining, which is non-trivial).

Test-fixture coverage updated to exercise every operator and
allocation type in the spec: 10 fixture flags covering MATCHES suffix,
waterfall + multivariant, NOT_ONE_OF + GTE + AND, special id
attribute, inactive-in-env, SemVer BLOCKED, regex BLOCKED, IS_NULL
BLOCKED, SWITCHBACK BLOCKED, and audience-reference BLOCKED.

Validated end-to-end against the OpenAPI required-fields and enums
via an automated schema-check script: zero violations across 10 flags
and 3 envs.
Computes Eppo's waterfall evaluation locally over the fake-server
fixtures and prints a 40-case test matrix (5 migratable flags x 8
test contexts) so the human running the migration can spot-check
that Confidence resolves match what Eppo would have returned.

Already used in the end-to-end test pass on this PR: a curated 18-case
subset matched 0 mismatches across 11 distinct operator-mapping
branches (endsWithRule, eqRule singleton + OR, NOT + AND, rangeRule
startInclusive, id->user_id rewrite, 2-rule waterfall, inactive flag
handling, probabilistic split). Worth keeping around as a tracked
regression harness for future skill changes.

Aligned with the commit-4 schema rewrite: reads snake_case fields,
arrays of `values`, array `variation_weight` with `variation_id`
lookups against `flag.variations`, and `is_default`-allocation default
handling. Numeric operators coerce both sides to float; SemVer-looking
values fail coercion and return False (those flags are BLOCKED in the
migration anyway).

Also adds a small `.gitignore` to keep `__pycache__/` out of the
working tree now that there's Python code under test-fixtures.
All 10 existing fixture flags had is_archived: false, leaving the
archive-filtering logic in the list endpoint untested. Add flag #11
(old-onboarding-flow) with is_archived: true so the default list
returns 10 flags and include_archived=true returns 11.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… audiences instead of blocking

Investigated the open-source Confidence resolver (spotify/confidence-resolver:
target.proto, value.rs, version.rs, and its spec test payloads) to verify what
the targeting engine actually supports. Four cases previously marked BLOCKED have
clean translations:

- SemVer comparisons → rangeRule with versionValue (Confidence has a first-class
  SemanticVersion value type; the resolver even strips pre-release suffixes)
- MATCHES alternation → decompose into an OR of startsWith/endsWith rules
- IS_NULL (sole condition, variant == default) → drop the rule; null subjects fall
  through to the flag default
- Eppo audiences → Confidence segments via createSegment + segment criteria
  (IS_IN / IS_NOT_IN map to ref / not-wrapped ref)

The genuinely-blocked set narrows from five to three: generic regex (char classes,
quantifiers), IS_NULL combined with other conditions or serving a non-default
variant, and SWITCHBACK (no time-windowed exposure primitive).

Shared core gains Version/Set/Timestamp/Segment criteria with worked examples and a
Reusable Segments (createSegment) section. The Eppo skill adds the /audiences/{id}
fetch step, numeric-vs-version detection, regex decomposition, the precise IS_NULL
semantics, audience->segment translation, a Segments plan section, and segment-first
execute ordering.

Fixtures: server.py serves /audiences and now has 12 flags + 2 audiences (four
previously-blocked cases flipped to migratable, two new genuinely-blocked added).
verify_migration.py models Eppo's SemVer/regex/audience evaluation as ground truth;
72 cases across 8 contexts pass.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ot sticky assignment

The previous reason text read as if Confidence lacked sticky/consistent
assignment. It doesn't — that's native, plus there's a materialization API.
Reframe the blocker around the time-window rotation that genuinely has no
Confidence equivalent.

Co-authored-by: Cursor <cursoragent@cursor.com>
… instead of blocking

Confidence does have a null/existence check: an attribute criterion with no
inner rule is a presence test (resolver ir_builder.rs existence arm; it appears
in the resolver's own spec fixtures), and the admin API accepts it on create
(epx-flags-admin TargetingValidator does no structural validation for ATTRIBUTE
criteria). Wrapping it in `not` expresses "attribute is null".

So IS_NULL maps directly — emit `{ attribute: { attributeName } }` referenced
under `not` — even when it serves a non-default variant or is ANDed with other
conditions. This removes the previous "drop only if variant == default" trick
and unblocks the combined case.

Genuinely-blocked set narrows to two whole-flag cases: generic regex and
SWITCHBACK.

- shared core: add existence/null criterion form, combinator row, and two worked
  examples (IS null, IS null combined)
- Eppo skill: rewrite the IS_NULL section as a direct not(exists) translation,
  drop the combined/non-default BLOCKED rows, note the empty-in-editor caveat,
  add a "Null rules emitted" plan field
- fixtures: flip #8 to serve a non-default variant for no-plan subjects and #12
  (IS_NULL ANDed with plan==free) to migratable; both now exercise the not(exists)
  path. verify_migration.py moves null-and-condition to MIGRATED_FLAGS and adds a
  no-country/free context; 90 cases pass, 2 blocked.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
…l rule

Confidence has no server-side flag default (createFlag takes none; an
unmatched resolve returns the caller's code default via
ClientDefaultAssignment). Migrating Eppo's is_default allocation as a
"flag default value" was therefore a no-op. Emit it instead as a final
catch-all addTargetingRule (no payload, 100% -> default variant) so the
default variation is preserved for no-match subjects.

Co-authored-by: Cursor <cursoragent@cursor.com>
Align the eppo and posthog skills on the same "Available / Selected"
layout for the subject-mapping section, and ignore local .claude/
dev artifacts (CLI symlinks + generated plans).

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant