feat(auth): Microsoft Entra JWT authentication for the Context Intelligence Server#29
Open
colombod wants to merge 15 commits into
Open
feat(auth): Microsoft Entra JWT authentication for the Context Intelligence Server#29colombod wants to merge 15 commits into
colombod wants to merge 15 commits into
Conversation
Adds pyjwt[crypto]>=2.8.0 (resolves pyjwt 2.13.0 + cryptography 49) and an import smoke test. Foundation dependency for the EntraResolver (T4); not yet used.
…(T2) Pure refactor, no behaviour change. BearerTokenMiddleware now delegates token resolution to a PrincipalResolver protocol; StaticKeyResolver wraps the existing sha256 keystore lookup; the resolver is constructed in create_asgi_app(). Prepares the seam for an EntraResolver (T4) without adding it. All existing test_auth.py tests pass unchanged; 15 new seam tests added.
…dators (T3)
Adds the Entra-auth config surface to Settings (no resolver yet — that's T4):
- auth_mode: Literal[static, entra] (default static; existing behaviour unchanged)
- azure_client_id / azure_tenant_id (empty/whitespace normalized to None)
- entra_identities: oid->contributor map, exact api_keys parity, value {id} only
- _validate_entra_identities (mirrors _validate_api_keys): GUID re.fullmatch keys
(rejects braces / urn:uuid / trailing-junk / all-zeros), non-dict value and
missing/empty/whitespace id rejection, lowercase-normalized keys
- model_validator (AC7): auth_mode=entra requires client_id + tenant_id + identities,
else a loud startup refusal
- build_identity_map() mirrors build_keystore()
56 new config tests cover the plan's edge matrix incl. the env-var (production) path,
GUID edges (unicode / zero-width / braces), coexistence with api_keys, and the
duplicate-oid last-wins documentation. tester-breaker-reviewed (verdict CONCERN:
nothing breaks; gaps were test-coverage, now closed). 106 config tests green; 1438 suite green.
Second PrincipalResolver; drops into the T2 seam with no middleware rewrite. - RS256-pinned jwt.decode, dual audience [client_id, api://client_id], v2 issuer; explicit tid check; scp must contain access_as_user (space-split, no substring trap); oid extracted and mapped oid->contributor. - AuthError(status): 401 for invalid/missing-oid token; 403 for a valid token whose oid is not in entra_identities (the 403 names the oid for operator diagnosis). - Eager JWKS prefetch at construction, fail-closed: raises if the endpoint is unreachable OR returns zero signing keys. - BearerTokenMiddleware maps AuthError->status and has a fail-closed catch-all (an unexpected resolver exception denies with 401 + a loud log, never a 500). Adversarially reviewed (tester-breaker): found and fixed two live 500-crash bugs (non-string oid/scp) and a 403-vs-401 semantic bug. 46 tests incl. a real-crypto tier proving expired / wrong-aud / tampered / alg=none / HS256 rejection, dual-aud (bare GUID + api://), nbf, app-only(roles)/no-scp, alg case variants, empty/garbage bearer, aud-array, and empty-JWKS-at-startup. Full suite 1484 green. Not yet wired into create_asgi_app (auth_mode switch = T7); JWKS global cap + per-kid dedup lock = T5 (TODO left in code). 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
create_asgi_app() now selects the resolver from settings.auth_mode: entra -> EntraResolver(client_id, tenant_id, build_identity_map()); static -> StaticKeyResolver(build_keystore()). The middleware no longer special-cases a concrete resolver type -- PrincipalResolver gains an auth_enabled property (StaticKeyResolver: bool(keystore); EntraResolver: True). Closes a CRITICAL silent fail-open (AC13/H2): a server with no auth configured (static + no keys -- previously a pass-through) now REFUSES to start with a loud RuntimeError, unless the explicit, default-false allow_unauthenticated flag is set (test harness only). A six-lens council review flagged this as the #1 issue (restless-old-brian verdict: FAIL) -- auth_mode=entra was previously inert and silently unauthenticated. Cleanups (cranky-old-sam): delete dead _is_hex() + its test class; drop @runtime_checkable + the circular protocol test; remove dead isinstance(meta,dict) branches in both validators; correct the PrincipalResolver / _validate_api_keys docstrings (resolvers raise AuthError 401 OR 403). 13 new switch/fail-closed tests incl. AC13 startup-refusal (RED->GREEN) and AC8 (auth_mode actually changes the resolver). Full suite 1493 green.
… real HTTP (T8) Proves the seams unit tests can't: a real RS256-signed token (in-test keypair, stub JWKS) through httpx -> asgi_app in auth_mode=entra: - valid mapped-oid token -> 202 and created_by == the mapped contributor in the queued payload (AC2/AC9, the load-bearing provenance chain, asserted via the durable-queue capture technique, no Neo4j needed) - valid unmapped-oid token -> HTTP 403 (real response, not a mocked ASGI send) - expired / garbage / missing bearer -> HTTP 401 - /status and /skills/* still exempt under entra mode - static auth path regression intact Tests only, no production change. Full non-neo4j suite 1543 green.
…tra section Adds docs/entra-auth-setup.md (the council's #1 unblock -- an operator can't build the oid->contributor map and a developer can't get a token without it): - operator guide: config shape (YAML + env), `az ad user show` for an oid, a bold PII/secret-hygiene warning, the 403-names-oid recovery loop, the real startup- validator messages, and the fail-closed allow_unauthenticated note - developer guide: scope access_as_user on api://<client-id>, az account get-access-token, the Bearer header + a full curl (incl. data.timestamp), and a 401-vs-403 table from the caller's POV - ops runbook: write-once wrong-oid permanence (+ verify-before-apply), JWKS ~5-min cache / ~6-week rotation guidance, and reading auth logs AGENTS.md gains an Entra-auth subsection alongside the static-key section plus a secret-hygiene rule. Placeholders only -- no real oids/client-ids/tenant-ids in the product repo. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
…d-run + log tags Council-trimmed T5 (cranky-old-sam + crusty: no custom per-kid lock or global cap -- PyJWKClient handles per-kid caching + lifespan-bounded refresh natively): - pin PyJWKClient(lifespan=JWKS_CACHE_LIFESPAN_SECONDS=300) so the cache-TTL contract is visible in code - distinct, greppable auth log tags: auth_event=auth_denied (INFO, normal denial) vs auth_event=resolver_unexpected_exception (ERROR, catch-all) so an operator can tell 'rejected a bad token' from 'resolver is broken'; raw token never logged - prove post-startup JWKS-unreachable is fail-closed: a signing-key fetch that fails mid-run (connection error OR malformed JWKS) -> AuthError(401), not an unhandled 500; a previously-cached kid still resolves 14 new tests; zero concurrency primitives added (council direction). Full suite 1516 green. 🤖 Generated with Amplifier Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
…kdown
Council revised plan §3b (factory -> one conditional): a single `web_ui_enabled`
settings flag (default True), read once at app construction -- NOT a two-factory
split (cranky-old-sam: two profiles, trivial divergence). When web_ui_enabled=false
(the locked-down pilot profile):
- FastAPI built with docs_url/redoc_url/openapi_url=None (no Swagger, no schema leak)
- browser routes not registered: /, /dashboard, /static, /logs/stream
- kept: the API, /status, /version, and /skills/* (the bundle fetches skills here)
- the auth-exempt set narrows to {/status,/version}; /logs/stream LEAVES the exempt
set (it was auth-exempt and only the dashboard used it -> no unauthenticated log
drain). Unauth /logs/stream -> 401; with a token -> 404 (route absent).
20 tests incl. the tester-breaker F4 bypass guards: /openapi.json not 200, /dashboard
invalid-token not bypassed, /logs/stream unauth -> 401, /skills still reachable;
web_ui_enabled=true regression intact. Full suite 1536 green.
Generated with Amplifier
Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Adds ops-runbook note: an authenticated event missing data.timestamp is still accepted (202, created_by stamped) but the durable drainer dead-letters it (no graph node). Ingest validation, not auth -- surfaced in the live AC10 run; matters for curl/hand-rolled payloads. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
…silent dead-letter
The /events endpoint accepted events whose data.timestamp was missing/empty (202,
created_by stamped) but the durable drainer then crashed building the graph node
(datetime.fromisoformat('') -> ValueError), retried, and dead-lettered them -- no node,
and no error surfaced to the caller. Now:
- post_events validates data.timestamp is present, a non-empty string, and valid
ISO-8601 BEFORE queuing; otherwise HTTP 400 with a clear, value-naming message.
- make_node_id wraps the parse and re-raises a NAMED error (event + session in the
message) so any malformed event that bypasses ingest dead-letters legibly, not as a
bare 'Invalid isoformat string'.
Verified safe against real traffic: 224,530 real events across 759 on-disk records all
carry data.timestamp, so the 400 only catches malformed/hand-rolled payloads (the gap
surfaced in the live AC10 run). 11 new tests; 15 pre-existing /events tests that sent
timestamp-less payloads updated to well-formed bodies (assertions unchanged). Suite 1376 green.
🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)
Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
… after folding the ingest fix The folded-in ingest validation (HTTP 400 on missing data.timestamp) made the entra created_by integration test's event body well-formed; the other entra integration tests are short-circuited by auth (401/403) before ingest and were unaffected. Full suite 1547 green. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Collaborator
Author
|
Folded in the data.timestamp ingest-validation fix (was PR #30): /events now returns HTTP 400 for a missing/empty/invalid data.timestamp instead of accepting it (202) and silently dead-lettering it in the graph drainer; make_node_id re-raises a named error as defense-in-depth. Verified safe against real traffic (224,530 on-disk events, 0 missing the field). Full suite 1547 green. |
…iagrams Audited every server doc and DOT diagram against the implemented Entra-auth feature: - entra-auth-setup.md: §4.4 rewritten — a missing/invalid data.timestamp now returns HTTP 400 at ingest (was documented as silent 202 -> dead-letter); §4.3 states the live auth_event=auth_denied / resolver_unexpected_exception log tags (dropped the stale "finalized in T5" note); §3.4 notes the exempt set shrinks under web_ui_enabled=false. - README.md: added the six missing settings rows (auth_mode, azure_client_id, azure_tenant_id, entra_identities, allow_unauthenticated, web_ui_enabled) and an Entra option in First-Run Setup. - service-setup.md and managing-api-keys.md cross-reference entra mode; AGENTS.md corrects the auth.py description to the Bearer-token middleware / resolver model. - architecture diagrams: NEW 06-auth-flow.dot (per-request: bearer -> middleware -> resolver[static|entra] -> 401/403 -> created_by -> 202) and 07-auth-startup.dot (create_asgi_app auth_mode switch + fail-closed gate + web_ui_enabled exempt selection); extended 05-durable-ingest-queue.dot with the auth middleware, the data.timestamp 400, and created_by stamping; architecture/README.md indexes both. PNGs rendered (graphviz); both new diagrams vision-checked (readable + correct flow). Docs only; no code change. Verified against the live source.
…app (fail-open) The context-intelligence-server-dev DTU profile launched `uvicorn ...main:app` -- the raw FastAPI app with BearerTokenMiddleware NOT in the chain -- so an unauthenticated write returned 422 (body validation), not 401, even though the profile generates and configures an API key. Silent fail-open. Now serves `main:asgi_app` (auth-wrapped) in both the start and update flows and exports AMPLIFIER_CONTEXT_INTELLIGENCE_SERVER_API_KEY=$CI_KEY so the generated key is actually enforced (authenticated write -> 202, unauthenticated -> 401). Mirrors the live-verified AC10 entra variant. Rule: launchers/profiles/Dockerfiles MUST serve main:asgi_app, never main:app. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
…egated az-CLI token model
Adds an "Authentication model & Entra App Registration" section to entra-auth-setup.md
making the current setup explicit:
- the server accepts DELEGATED (user) tokens only — scp must contain access_as_user, a
claim that exists only on user-context tokens
- the single App Registration: Expose an API (api://<client>, delegated scope access_as_user,
admin+user consent); the scope GUID is internal and never referenced by callers; a
"what the server checks" table tied to auth.py (RS256, dual aud, v2 issuer, tid, scp,
oid -> created_by)
- how a token is obtained today: `az login` + `az account get-access-token --resource
api://<client>`; and DefaultAzureCredential().get_token("api://<client>/.default"),
compatible ONLY when it resolves to a user-context credential (AzureCliCredential / VS Code
/ interactive)
- limitation: app-only credentials (Managed Identity / SP client-secret) carry `roles`, not
`scp`, and are rejected; supporting them needs an App Role + server `roles` handling (not done)
AGENTS.md gains a one-line capture of the delegated-only model. Placeholders only; verified
against auth.py (algorithms=[RS256], expected_aud=[client_id, api://client_id], tid check,
scp.split() membership).
colombod
added a commit
that referenced
this pull request
Jun 29, 2026
… + live static keystore T1-T3 of runtime identity-map management: - config: admin_api_key (YAML config and/or env, consistent Settings pattern); api-keys & entra store paths - IdentityStore (identity_store.py): durable JSON map with write-file-then-swap-memory commit order and fail-closed-on-corrupt load (never crash-loop); a live flat_dict reference - wire the static keystore to the live store (first-boot seed from config, store-wins); a put() is visible to the resolver immediately, no restart 41 new tests; suite 1406 green. NOTE: branched from main, which lacks the Entra auth code (auth_mode / EntraResolver / entra_identities — those are on PR #29 / feat/entra-auth). The entra-side store wiring is deferred until this work is re-based onto feat/entra-auth. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds an optional
auth_mode=entrathat validates Microsoft Entra JWTs (RS256, dual-audience, tid/scp/oid checks) and maps the tokenoidto a write-oncecreated_bycontributor, with a clean static<->entra switch via aPrincipalResolverseam (StaticKeyResolver | EntraResolver).Highlights:
aztoken -> isolated DTU server (auth_mode=entra) ->created_by=colombodread back from the Neo4j graph.docs/entra-auth-setup.md) + AGENTS.md Entra section (placeholders only).web_ui_enabledAPI-only lockdown: docs/openapi off, web routes +/logs/streamexempt removed.Deferred (named, off the pilot critical path): the app-profile factory, JWKS concurrency/rotation tests, and a
data.timestampingest-validation gap (see ops runbook §4.4).