Refactor JWT verification and implement session versioning by jfrench9 · Pull Request #647 · RoboFinSystems/robosystems

jfrench9 · 2026-05-01T02:02:04Z

Summary

Refactors the JWT verification pipeline and auth cache invalidation logic, introducing session versioning to enable reliable, granular token revocation without relying solely on token blocklists.

Key Changes

Session Versioning

Added session_version column to both users and API keys tables via a new Alembic migration
JWT tokens now embed a session version at creation time, which is validated against the current database value during verification
Incrementing a user's session version instantly invalidates all previously issued tokens without needing to track individual JWTs

JWT Verification Refactor

Restructured the JWT verification function to cleanly separate token decoding, claims validation, and session version checking
Improved the return value structure from the verification pipeline to carry richer context for downstream consumers

Auth Cache Invalidation Improvements

Enhanced the auth cache layer with session-version-aware invalidation logic
Cache entries are now tied to session versions, ensuring stale cached auth state is properly evicted when tokens are revoked

Auth Router Updates

Updated all auth-related routers (login, register, SSO, email verification, password reset, session management) to integrate with the new session versioning system
Session router now supports explicit session version bumps for "logout everywhere" functionality

Model Enhancements

Extended User and UserApiKey models with session version fields and methods for version management

Breaking Changes

JWT payload structure: Tokens now include a session_version claim. Existing tokens without this claim will fail verification and require re-authentication.
Database migration required: A new migration adds session_version columns to the users and API keys tables.
Verification return type: The verify function return value has changed, which affects any code consuming its output directly (all internal call sites and tests have been updated).

Testing

Comprehensive test updates across 8 test files covering:
- Auth dependency injection with new verification return values
- Session versioning lifecycle (creation, increment, validation)
- JWT revocation via session version bumps
- Graph access control with updated mock signatures
- Email verification and password reset flows
- SSO token exchange with session versioning
Net addition of ~624 lines of test code reflecting expanded coverage

Infrastructure Considerations

A database migration must be applied before deploying this change to add the session_version columns
Cache infrastructure should be monitored after deployment for any increase in invalidation rates during the transition period as existing sessions re-authenticate
Rolling deployments should account for the JWT payload change — old application instances will not recognize the new session_version claim, so a coordinated deployment is recommended

🤖 Generated with Claude Code

Branch Info:

Source: refactor/jwt-cache-invalidation-fix
Target: main
Type: refactor

Co-Authored-By: Claude noreply@anthropic.com

- Added session_version to User model to track session state changes. - Updated JWT creation and verification to include session_version claims. - Implemented session invalidation on password reset and user deactivation. - Enhanced JWT verification to check session_version against the current user session. - Updated SSO token generation to include session_version. - Modified API key model to include a deterministic fingerprint for cache invalidation. - Improved tests to cover new session_version logic and cache interactions.

…n logic - Renamed `verify_jwt_token` to `verify_jwt_claims` for clarity in its purpose. - Updated all references to the renamed function across the codebase, including middleware, routers, and tests. - Enhanced `_invalidate_auth_cache` method in the User model to ensure both cache invalidation attempts succeed before considering the operation successful, with added logging for failures. - Added regression tests for the `_invalidate_auth_cache` method to cover various scenarios, including success, transient failures, and critical failure logging. - Updated comments and documentation to reflect changes in the cache invalidation strategy and JWT verification process.

jfrench9 · 2026-05-01T02:02:06Z

@claude please review this PR

claude · 2026-05-01T02:02:53Z

Claude finished @jfrench9's task in 4m 24s —— View job

PR Review: Refactor JWT verification and implement session versioning

Tasks

Overall this is a well-structured security refactor with a clear design rationale. The session-versioning approach is sound, the backward-compatibility story is good, and the PR found and fixed at least one pre-existing bug (API key cache invalidation). A few issues worth addressing before merge, plus several smaller notes below.

Significant Find: Pre-existing API Key Cache Invalidation Bug — Fixed ✅

In validate_api_key (utils.py:156), the cache key has always been sha256(plaintext_key). But the old _invalidate_cache was calling api_key_cache.invalidate_api_key(self.key_hash) where key_hash is a bcrypt hash — a completely different value. Every call to _invalidate_cache was trying to delete a cache entry that didn't exist. This PR corrects it by introducing key_fingerprint (SHA-256) and using that instead. Worth calling out in the PR description since it's a real security fix, not just a refactor artifact.

Medium Issues

1. create_jwt_token performs a redundant DB query

_get_user_session_version opens a new DB session (or re-queries via the passed session) to read user.session_version. In every call site the caller already holds a loaded User object with session_version populated — yet the function re-queries:

# robosystems/middleware/auth/jwt.py
"session_version": _get_user_session_version(user_id, session=session) or 0,

vs. every caller having:

create_jwt_token(str(user.id), device_fingerprint, session=session)
# user.session_version is right there

The cleaner approach — one that also eliminates any sync-vs-async session ambiguity — is to accept session_version: int directly:

def create_jwt_token(
    user_id: str,
    device_fingerprint: dict[str, Any] | None = None,
    session_version: int = 0,
) -> str:

Callers already have the value: create_jwt_token(user.id, device_fingerprint, session_version=user.session_version). The helper _get_user_session_version could be removed entirely. Fix this →

2. Dead code: cache_jwt_validation / get_cached_jwt_validation are now unreachable

The old per-token jwt:* cache entries (cache.py:949, cache.py:1006) are no longer written to or read from anywhere in the new code. They should be removed. Similarly, the api_key_cache.invalidate_jwt_token(jwt_token) call in session.py:264 is now a no-op (no code writes to those entries after this refactor) — it can be removed to avoid misleading future readers.

3. time.sleep() inside a model method (user.py)

_invalidate_auth_cache calls time.sleep(0.05) (line ~196) for retry backoff. This blocks the calling thread. In this codebase routers use sync sessions, so it won't stall the event loop directly, but it's still a 50ms penalty on every password reset / deactivation that hits a Redis blip. A better pattern for retry backoff here would be to let the caller (router) decide — or accept that one Redis failure is logged loudly and operators page on it. The code comment already acknowledges the tradeoff; just worth reconsidering the sleep.

Low / Nit

4. PR description overstates the breaking change

"Existing tokens without this claim will fail verification and require re-authentication."

This is not accurate. verify_jwt_claims defaults payload.get("session_version", 0) to 0, and all existing users get session_version = 0 (via server_default). Old tokens are silently treated as version 0 and continue to work. This is actually a nice backward-compatible design — the description should reflect it so operators don't brace for a forced re-auth wave.

5. get_cached_jwt_user_data returns the cache envelope, not user data

The function is documented as returning "cached JWT user data" but actually returns cache_data — the full wrapper dict ({ user_data, session_version, cached_at, cache_version }). The caller in _get_user_for_verified_jwt correctly does cached_data.get("user_data", {}), but the function name and docstring create a false expectation. Consider renaming to get_cached_jwt_auth_entry or similar.

6. Stale session_version-mismatch cache entries not evicted

In get_cached_jwt_user_data, when cached_version != token_session_version, the stale entry is returned as None but not deleted:

# cache.py ~line 858
if cached_version != int(session_version):
    logger.debug(...)
    return None   # ← no delete

Under normal operation _invalidate_auth_cache already deleted this entry, so the case is uncommon. But if invalidation failed (the "fail-open" window), the stale version-0 entry survives for the full cache TTL and every version-1 request falls back to DB for that whole window. Evicting it here on version mismatch would halve the DB overhead during that window. Fix this →

7. Redundant is_active check in get_current_user_with_graph

# dependencies.py:358
if user and bool(user.is_active):

_get_user_for_verified_jwt already returns None for inactive users, so user.is_active is always True here. The outer check is harmless but misleading.

8. Session version bump on deactivate()

deactivate() now increments session_version. This means if a user is temporarily deactivated and then reactivated, they'll be forced to re-authenticate. That may be intentional but it's a behavior change worth documenting on the method.

Positive Security Notes

Grace period path hardened: refresh_session's expired-token decode now validates issuer and audience claims (previously missing for the verify_exp=False decode path). Good catch.
Password reset revocation fix: The old code iterated UserToken email/reset records and called revoke_jwt_token on them — those are never JWTs, so the revocation was a no-op. The new user.invalidate_sessions() correctly bumps the version and invalidates all JWT sessions.
Fail-open is explicitly documented: The comment in _invalidate_auth_cache naming the race condition and its bounded window is exactly the right level of documentation for a security-adjacent tradeoff.
key_fingerprint backfill strategy is clearly documented: The deploy note in user_api_key.py about pre-migration rows is thorough.

Test Coverage

The test restructuring is clean — separating TestJWTTokenVerification (claim-level, no DB) from TestGetUserForVerifiedJWT (DB/version comparison layer) mirrors the new design boundary well. The autouse _stub_session_version fixture in test_jwt_revocation.py is a good pattern for keeping these tests database-independent.

One gap: there's no test for the case where get_cached_jwt_user_data finds a cached entry but its session_version doesn't match the token (i.e., the invalidation raced with a cache hit). A unit test for that path in TestGetUserForVerifiedJWT would be good to have.

… refresh

jfrench9 · 2026-05-01T02:25:57Z

@claude have we resolved all issues

claude · 2026-05-01T02:26:22Z

Claude finished @jfrench9's task in 3m 23s —— View job

Tasks

Issue Status Review

Not all issues from the previous review have been resolved. Here's the current status:

✅ Resolved

Pre-existing API key cache bug — _invalidate_cache now uses key_fingerprint (SHA-256) matching the cache key, fixing the bcrypt/SHA-256 mismatch that silently no-op'd every cache invalidation.
Dead code: cache_jwt_validation / get_cached_jwt_validation — these old per-token JWT cache functions are no longer present in the codebase.
api_key_cache.invalidate_jwt_token call in session.py:264 — that call is gone; the correct revoke_jwt_token is used instead.
Missing test for cache version mismatch path — test_dependencies.py:413 (test_cache_version_mismatch_falls_through_to_db) covers the dependency behavior when get_cached_jwt_user_data returns None on a version mismatch.

❌ Still Open

Medium #1 — create_jwt_token performs a redundant DB query (jwt.py:252)

_get_user_session_version still exists and is called unconditionally; when a session is passed it queries the DB even though every caller already holds a loaded User. Refactoring to accept session_version: int directly would eliminate both the helper and the extra query.

# jwt.py:252 — still queries DB:
"session_version": _get_user_session_version(user_id, session=session) or 0,

Fix this →

Medium #3 — time.sleep(0.05) inside _invalidate_auth_cache (user.py:217)

The blocking sleep is still there on the retry path. Not critical, but a 50ms thread block on every Redis blip that hits a password reset or deactivation.

CodeQL false positive — SHA-256 flagged as "weak password hashing" (user_api_key.py:310)

The CodeQL alert is a false positive: _fingerprint_api_key uses SHA-256 as a cache lookup key for API keys, not for password storage. The actual password hashing uses bcrypt (lines 323-330). Adding a suppression comment or inline # nosec annotation would silence the scanner without changing behavior.

Code quality — Silent empty except in cache.py:878-880

except Exception:
    pass   # ← no log, no explanation

The github-code-quality bot flagged this. Changing to except Exception as e: logger.debug(f"cleanup failed: {e}") would satisfy the linter and preserve the best-effort semantics.
Fix this →

Low #5 — get_cached_jwt_user_data name/docstring mismatch (cache.py:816)

The function returns the full auth envelope ({ user_data, session_version, cached_at, cache_version }) but its name and docstring imply it returns user data. Callers correctly unwrap with .get("user_data", {}), but the mismatch is a readability hazard. Consider renaming to get_cached_jwt_auth_entry.

Low #6 — Stale cache entries not evicted on session_version mismatch (cache.py:844-850)

When cached_version != int(session_version), the stale entry is returned as None but not deleted. During the fail-open window (invalidation failed), every request for the new version falls back to DB for the full cache TTL.

if cached_version != int(session_version):
    logger.debug(...)
    return None   # ← should also delete cache_key, signature_key here

Fix this →

Low #7 — Redundant is_active check (dependencies.py:358)

if user and bool(user.is_active):   # user.is_active is always True here

_get_user_for_verified_jwt already returns None for inactive users (line 150), so the outer bool(user.is_active) guard is always True and misleads readers into thinking user might be inactive at this point.

Low #8 — deactivate() session version bump is undocumented (user.py:138)

deactivate() increments session_version, meaning a reactivated user must re-authenticate. This behavior change is not noted in the method docstring.

Summary

4 of the 12 flagged issues are resolved. The 8 remaining are all low-to-medium severity with the most impactful being the redundant DB query in create_jwt_token (medium) and the empty except flagged by the automated quality scanner (will block merge if gated on CodeQL/quality checks).

…logic

jfrench9 added 3 commits April 30, 2026 20:27

fix(tests): update mock_verify return value for graph access tests

5b1068c

github-advanced-security AI found potential problems May 1, 2026

View reviewed changes

Comment thread robosystems/models/core/user/user_api_key.py Fixed

github-code-quality Bot found potential problems May 1, 2026

View reviewed changes

Comment thread robosystems/middleware/auth/cache.py Fixed

jfrench9 added 2 commits April 30, 2026 21:23

refactor(auth): remove JWT cache invalidation from logout and session…

90dac6b

… refresh

fix(dependencies): update robosystems-client to version 0.3.18

87da979

refactor(auth): enhance JWT cache invalidation and user deactivation …

f373b7e

…logic

github-advanced-security AI found potential problems May 1, 2026

View reviewed changes

Comment thread robosystems/models/core/user/user_api_key.py Dismissed

jfrench9 merged commit feb4f1f into main May 1, 2026
11 checks passed

jfrench9 deleted the refactor/jwt-cache-invalidation-fix branch May 1, 2026 02:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor JWT verification and implement session versioning#647

Refactor JWT verification and implement session versioning#647
jfrench9 merged 6 commits into
mainfrom
refactor/jwt-cache-invalidation-fix

jfrench9 commented May 1, 2026

Uh oh!

jfrench9 commented May 1, 2026

Uh oh!

claude Bot commented May 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

jfrench9 commented May 1, 2026

Uh oh!

claude Bot commented May 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jfrench9 commented May 1, 2026

Summary

Key Changes

Session Versioning

JWT Verification Refactor

Auth Cache Invalidation Improvements

Auth Router Updates

Model Enhancements

Breaking Changes

Testing

Infrastructure Considerations

Uh oh!

jfrench9 commented May 1, 2026

Uh oh!

claude Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Refactor JWT verification and implement session versioning

Significant Find: Pre-existing API Key Cache Invalidation Bug — Fixed ✅

Medium Issues

Low / Nit

Positive Security Notes

Test Coverage

Uh oh!

Uh oh!

Uh oh!

jfrench9 commented May 1, 2026

Uh oh!

claude Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue Status Review

✅ Resolved

❌ Still Open

Summary

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude Bot commented May 1, 2026 •

edited

Loading

claude Bot commented May 1, 2026 •

edited

Loading