perf(sdk): cache check_beta_features userinfo response per credentials#599
perf(sdk): cache check_beta_features userinfo response per credentials#599RapidPoseidon wants to merge 1 commit into
Conversation
Customers (e.g. Reve.art) construct many short-lived RapidataClient instances during request bursts (~1500+ per minute from a single process). Each __init__ called _check_beta_features which fired a GET /connect/userinfo with a 1s timeout; under burst load identity-service slowed enough for the calls to time out and surface as SDK failures. The userinfo response is stable for the access token's lifetime (~1 day per OIDC config), so cache it process-wide keyed by (environment, client_id) with a 24h TTL. The cache is guarded by a threading.Lock for concurrent client construction. reset_credentials clears the matching entry and clear_all_caches wipes the whole cache. Co-Authored-By: luca <luca@rapidata.ai>
Code ReviewOverall: Solid, well-motivated fix for a real production pain point. The structure is clean, the locking is correct, and the extracted Bugs / Correctness1. When a user authenticates via browser flow or token (no After a credential reset, the next Mitigation: store the resolved 2. No cache hit for the The cache lookup block is guarded by Design / Thread Safety3. Lock released between cache miss and HTTP call (double-fetch) # lock released here
if cached_result is not None:
...
return
result = ... # HTTP call happens without the lockTwo threads with the same 4.
Minor5. Hardcoded 24 h TTL
6. Mutable dict stored in cache
_userinfo_cache[effective_key] = _UserInfoCacheEntry(
result=result.copy(),
expires_at=time.monotonic() + _USERINFO_CACHE_TTL_SECONDS,
)What's good
Recommend: Address issue #1 (stale cache after |
Summary
Cache the
/connect/userinforesponse thatRapidataClient.__init__fires from_check_beta_features, so subsequent client constructions in the same process reuse the result instead of calling identity-service again.Why
A customer creates many short-lived
RapidataClientinstances inside one process. Each__init__calls_check_beta_features, which firesGET https://auth.{env}/connect/userinfowith_request_timeout=1. During traffic bursts observed on 2026-05-22 the same process was generating 1500+ userinfo calls per minute; identity-service slowed under that load and the 1s timeout started failing, surfacing as SDK errors.Userinfo is stable for the access token's lifetime (~1 day per OIDC config), so the call is redundant across short-lived clients sharing the same credentials.
What changes
rapidata_client.py:dict[(environment, client_id), _UserInfoCacheEntry]with 24h TTL.threading.Lockprotects the cache for concurrent client construction._check_beta_featureschecks the cache before making the HTTP call; on miss, calls userinfo, stores the response, then applies the user-info / Admin-role side effects.reset_credentialsevicts the entry for the current(environment, client_id).clear_all_cacheswipes the whole cache._request_timeout=1is left alone — the cache, not a higher timeout, is what fixes the burst symptom.Test plan
uv run pyright src/rapidata/rapidata_client→ 0 errorsuv run python3 -c "from rapidata import RapidataClient; print('OK')"→ OK🔗 Session: https://session-a1ce5546.poseidon.rapidata.internal/