feat: add configurable token cache for customer flow#133
Conversation
prashantrakheja
left a comment
There was a problem hiding this comment.
changes look good, some minor feedback, in particular i would like to remove usage of app_tid from token caching because its an optional parameter and likely to be removed going forward..
| Returns: | ||
| System-scoped access token. | ||
| """ | ||
| cached = cache.get_system_token(app_tid) |
There was a problem hiding this comment.
i am not sure if app_tid would be needed for fetching the tokens, is there any other way to do this?
| logger.debug("Loaded %d tool(s) from %s", len(server_tools), dep.ord_id) | ||
| except Exception: | ||
| except Exception as exc: | ||
| unwrapped = _unwrap_exception_group(exc) |
There was a problem hiding this comment.
do u feel the exception block is unusually big? can we put this stuff inside a method?
There was a problem hiding this comment.
Updated logic here - request system_token when needed. Added helper closure to refetch token
| ) | ||
| continue | ||
| except Exception: | ||
| logger.exception( |
There was a problem hiding this comment.
i feel we should not swallow the exception here, if there is a failure we should fail-fast, what's ur take?
There was a problem hiding this comment.
I'll extract logic to a method, good catch.
Regarding exception - it basically follows what was implemented before - log with logger.exception and continue
| self._config = config or ClientConfig() | ||
| self._token_cache = _TokenCache(self._config) | ||
|
|
||
| @record_metrics(Module.AGENTGATEWAY, Operation.AGENTGATEWAY_CLEAR_TOKEN_CACHE) |
There was a problem hiding this comment.
this operation is "internal" to the functioning of the sdk, do u feel it makes sense to capture metrics for this?
There was a problem hiding this comment.
I didn't see value exposing this method
There was a problem hiding this comment.
Removed. My assumption was that consumers of SDK should be able to set/unset/cleanup cache should they require it. I had an impression that it's usual for SDKs :) removed as considered unwanted
| self._config = config or ClientConfig() | ||
| self._token_cache = _TokenCache(self._config) | ||
|
|
||
| @record_metrics(Module.AGENTGATEWAY, Operation.AGENTGATEWAY_CLEAR_TOKEN_CACHE) |
There was a problem hiding this comment.
I didn't see value exposing this method
| credentials: CustomerCredentials, | ||
| grant_type: str, | ||
| timeout: float, | ||
| config: ClientConfig, |
There was a problem hiding this comment.
Isn't timeout part of client config?
| @@ -25,6 +25,8 @@ | |||
| IntegrationDependency, | |||
There was a problem hiding this comment.
Shouldn't we also add token cache for LoB flow to keep consistency?
Description
Add in-process token cache for customer agent flow. Previously every
list_mcp_tools()/call_mcp_tool()call fetched fresh IAS token via mTLS - unnecessary latency in agentic loops.Changes:
_token_cache.py(new):_TokenCache- TTL + LRU eviction for system tokens (key:client_id) and user tokens (key:sha256(user_jwt+"|"+client_id)[:16]). Expiry is fromexpires_in,id_tokenexp claim, or fallback TTL._customer.py:get_system_token_mtls/exchange_user_tokento consult/populate cache. 401 response from MCP server → invalidate stale token + retry once.agw_client.py:AgentGatewayClientowns_TokenCacheconfig.py: 4 newClientConfigfields added -token_expiry_buffer_seconds(60 s),max_system_token_cache_size(10 s),max_user_token_cache_size(10 s),fallback_token_ttl_seconds(300 s).LoB flow unaffected - delegates to BTP Destination Service.
Type of Change
Please check the relevant option:
How to Test
Describe how reviewers can test your changes:
Checklist
Before submitting your PR, please review and check the following:
Breaking Changes
None. Cache internal to
AgentGatewayClient- existingcreate_client()calls get caching automatically.ClientConfignew fields all have defaults.Additional Notes
Thread safety: GIL makes individual
OrderedDictops atomic, but check-then-set is not. Concurrent coroutines on same key may both miss and both fetch - redundant requests, not corruption. Acceptable for agentic loop use case.401 retry: Both
get_mcp_tools_customerandcall_mcp_tool_customerinvalidate + retry once on 401, handling server-side revocation before token expiry.