perf(sdk): cache check_beta_features userinfo response per credentials by RapidPoseidon · Pull Request #599 · RapidataAI/rapidata-python-sdk

RapidPoseidon · 2026-05-22T12:19:11Z

Summary

Cache the /connect/userinfo response that RapidataClient.__init__ fires from _check_beta_features, so subsequent client constructions in the same process reuse the result instead of calling identity-service again.

Why

A customer creates many short-lived RapidataClient instances inside one process. Each __init__ calls _check_beta_features, which fires GET https://auth.{env}/connect/userinfo with _request_timeout=1. During traffic bursts observed on 2026-05-22 the same process was generating 1500+ userinfo calls per minute; identity-service slowed under that load and the 1s timeout started failing, surfacing as SDK errors.

Userinfo is stable for the access token's lifetime (~1 day per OIDC config), so the call is redundant across short-lived clients sharing the same credentials.

What changes

Module-level cache in rapidata_client.py: dict[(environment, client_id), _UserInfoCacheEntry] with 24h TTL.
threading.Lock protects the cache for concurrent client construction.
_check_beta_features checks the cache before making the HTTP call; on miss, calls userinfo, stores the response, then applies the user-info / Admin-role side effects.
reset_credentials evicts the entry for the current (environment, client_id).
clear_all_caches wipes the whole cache.
No public API change; no docs/examples updated.
The _request_timeout=1 is left alone — the cache, not a higher timeout, is what fixes the burst symptom.

Test plan

uv run pyright src/rapidata/rapidata_client → 0 errors
uv run python3 -c "from rapidata import RapidataClient; print('OK')" → OK

🔗 Session: https://session-a1ce5546.poseidon.rapidata.internal/

Customers (e.g. Reve.art) construct many short-lived RapidataClient instances during request bursts (~1500+ per minute from a single process). Each __init__ called _check_beta_features which fired a GET /connect/userinfo with a 1s timeout; under burst load identity-service slowed enough for the calls to time out and surface as SDK failures. The userinfo response is stable for the access token's lifetime (~1 day per OIDC config), so cache it process-wide keyed by (environment, client_id) with a 24h TTL. The cache is guarded by a threading.Lock for concurrent client construction. reset_credentials clears the matching entry and clear_all_caches wipes the whole cache. Co-Authored-By: luca <luca@rapidata.ai>

claude · 2026-05-22T12:21:16Z

Code Review

Overall: Solid, well-motivated fix for a real production pain point. The structure is clean, the locking is correct, and the extracted _apply_userinfo helper is an improvement. A few issues worth addressing before merge.

Bugs / Correctness

1. reset_credentials doesn't evict the cache when client_id=None

When a user authenticates via browser flow or token (no client_id passed), self._client_id is None at construction. The cache entry is then stored under (self._environment, sub) — the sub from the API response. But reset_credentials only evicts when self._client_id is not None, so the stale entry is never removed.

After a credential reset, the next RapidataClient() call (with no client_id) will find None cache key, skip the cache lookup, make the HTTP call, and write a fresh entry — but any concurrent instance could still read the stale one in the window between them. More importantly, calling reset_credentials on a no-client_id client is a no-op for the cache.

Mitigation: store the resolved sub on self after a successful userinfo fetch (e.g. self._userinfo_sub), and use it in reset_credentials as a fallback key.

2. No cache hit for the client_id=None path on first call — the core use-case may not be fully covered

The cache lookup block is guarded by if cache_key is not None. For clients constructed without an explicit client_id, the first N concurrent calls all reach the HTTP endpoint (classic thundering-herd on startup). The fix lands writes but misses reads until after the first response. If the customer in question uses env-var credentials (RAPIDATA_CLIENT_ID), this is a non-issue (env vars are resolved to client_id before self._client_id is set). Worth confirming that is the actual path.

Design / Thread Safety

3. Lock released between cache miss and HTTP call (double-fetch)

# lock released here
if cached_result is not None:
    ...
    return

result = ...  # HTTP call happens without the lock

Two threads with the same cache_key can both read a miss, both call identity-service, and both write to cache. This is a benign double-fetch (not a correctness bug), but it could produce a small burst on a cold start. For the reported 1500 req/min pattern this is likely negligible. Just document or add a TODO if you want to close it later.

4. clear_all_caches clears entries for every client in the process

_userinfo_cache is module-level. Calling clear_all_caches() on one RapidataClient instance invalidates cached entries for all other instances in the same process (including those with different credentials). The docstring doesn't mention this. Worth at minimum adding a note, or scoping the clear to (self._environment, self._client_id) unless a full wipe is intentional.

Minor

5. Hardcoded 24 h TTL

_USERINFO_CACHE_TTL_SECONDS is correct for the stated OIDC config, but baking it in makes it hard to tune if the token lifetime changes. A class-level constant or env-var override would help. Low priority — mentioning for visibility.

6. Mutable dict stored in cache

_UserInfoCacheEntry.result holds the exact dict returned by json.loads. If _apply_userinfo ever gains a mutation (even accidentally), all cache readers share the side effect. A defensive result.copy() at write time costs nothing and removes the risk:

_userinfo_cache[effective_key] = _UserInfoCacheEntry(
    result=result.copy(),
    expires_at=time.monotonic() + _USERINFO_CACHE_TTL_SECONDS,
)

What's good

from __future__ import annotations added correctly, consistent with project conventions.
time.monotonic() — right call for TTL; immune to clock adjustments.
threading.Lock placement is correct; no lock held during the HTTP call (avoids blocking other threads).
_apply_userinfo extraction makes the logic readable and avoids duplication between cache-hit and cache-miss paths.
reset_credentials integration is a good touch — prevents stale admin-flag after a credential swap (for the client_id case).

Recommend: Address issue #1 (stale cache after reset_credentials on no-client_id clients) before merging. The rest are low-risk and can be follow-ups.

LucStr requested a review from LinoGiger May 22, 2026 12:38

LucStr assigned LucStr and LinoGiger and unassigned LucStr May 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(sdk): cache check_beta_features userinfo response per credentials#599

perf(sdk): cache check_beta_features userinfo response per credentials#599
RapidPoseidon wants to merge 1 commit into
mainfrom
perf(sdk)/cache-userinfo-check-beta-features

RapidPoseidon commented May 22, 2026 •

edited by LucStr

Loading

Uh oh!

claude Bot commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

RapidPoseidon commented May 22, 2026 • edited by LucStr Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

What changes

Test plan

Uh oh!

claude Bot commented May 22, 2026

Code Review

Bugs / Correctness

Design / Thread Safety

Minor

What's good

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RapidPoseidon commented May 22, 2026 •

edited by LucStr

Loading