fix(self-host): reduce hosted-only assumptions#409
fix(self-host): reduce hosted-only assumptions#409JustinMissmahl wants to merge 6 commits intodatabuddy-analytics:mainfrom
Conversation
|
Someone is attempting to deploy a commit to the Databuddy OSS Team on Vercel. A member of the Team first needs to authorize it. |
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
izadoesdev
left a comment
There was a problem hiding this comment.
Welcome to Databuddy, @JustinMissmahl! 🎉 Thanks for this contribution — it's clear you put real effort into improving the self-hosting experience, and the PR description is excellent.
Summary: This PR makes Databuddy's self-hosting path significantly more viable by parameterizing hardcoded URLs (tracker, API, auth, basket), adding a dashboard.Dockerfile and init job to the compose stack, centralizing origin/CORS handling, gracefully handling missing optional services (Resend, Stripe metadata), fixing organization-scoped queries across dashboard hooks, and fixing a real bug in the Redis proxy.
What needs attention:
🔴 The CORS change in apps/api/src/routes/public/index.ts switches the public API from origin: true to a restricted allowlist. This would break any customer website calling the public API from non-databuddy.cc domains (feature flags, agent telemetry, etc.). The authenticated API change is fine, but the public route should remain open.
🟡 The self-hosted tracker script serving (/databuddy.js, /errors.js, /vitals.js) defaults to rejecting cross-origin requests when CLIENT_APP_ALLOWED_ORIGINS is empty, which is the default in .env.selfhost.example. Self-hosted users would hit 403s from customer sites until they discover this env var.
See inline comments for the full details. The auth changes, organization fallback logic, Resend/Stripe defensiveness, Docker setup, and origin utility module all look solid.
apps/api/src/routes/public/index.ts
Outdated
| cors({ | ||
| credentials: false, | ||
| origin: true, | ||
| origin: getAllowedCorsOrigins(), |
There was a problem hiding this comment.
🔴 Breaking change for hosted Databuddy. The public API previously used origin: true, which allowed requests from any origin. This is intentional — customer websites (e.g., myapp.com) call these public endpoints for feature flags, agent telemetry, etc.
Switching to getAllowedCorsOrigins() restricts this to *.databuddy.cc plus explicitly configured origins. Any customer site not on a databuddy.cc subdomain would start getting CORS errors.
The main app CORS change (in apps/api/src/index.ts) is fine since that serves authenticated endpoints. But this public route needs to remain open — or at minimum, the self-host path should keep origin: true as the default when CLIENT_APP_ALLOWED_ORIGINS is not set.
|
|
||
| if (origin && !isAllowedTrackerAssetOrigin(origin, allowedOrigins)) { | ||
| return new NextResponse("Forbidden", { | ||
| status: 403, |
There was a problem hiding this comment.
🟡 When CLIENT_APP_ALLOWED_ORIGINS is empty (which is the default in .env.selfhost.example), isAllowedTrackerAssetOrigin falls back to isDatabuddyOrigin() — meaning only *.databuddy.cc origins can load the script.
For self-hosted users, their customer websites (e.g., myapp.com) would get a 403 when trying to load /databuddy.js cross-origin, since the crossorigin="anonymous" attribute on the generated script tag triggers CORS preflight.
Consider either:
- Defaulting to allow-all when
CLIENT_APP_ALLOWED_ORIGINSis unset (matching current CDN behavior) - Or at minimum, documenting in
.env.selfhost.examplethat this must be set for tracker scripts to work cross-origin
There was a problem hiding this comment.
Yeah, the idea here was that if you self-host it, the intent is most likely private use, so denying by default for security was the thought process. What do you think? How would you like it to be handled? Then I can add a patch for it.
#409 (comment)
goes in the same direction
| null; | ||
|
|
||
| return { | ||
| organizations, |
There was a problem hiding this comment.
🟡 This pattern is repeated in ~10 components across the PR:
const { activeOrganization, activeOrganizationId } = useOrganizationsContext();
const organizationId = activeOrganization?.id ?? activeOrganizationId ?? undefined;Since you're already modifying useOrganizationsContext() in this PR, consider adding a derived organizationId field directly to the context return value. Something like:
return {
organizations,
activeOrganization,
activeOrganizationId,
organizationId: activeOrganization?.id ?? activeOrganizationId ?? null,
// ...
};Would clean up all the call sites.
| }, [ | ||
| isLoadingOrgs, | ||
| isLoadingSession, | ||
| organizationsData, |
There was a problem hiding this comment.
🔵 The .catch() resets the ref so it can retry, but there's no logging or user feedback if setActive fails. A transient network error could silently leave the user without an active org, causing queries to stay disabled (enabled: !!organizationId). Consider logging the error or adding a retry limit.
| import { expiredRoute } from "./routes/expired"; | ||
| import { redirectRoute } from "./routes/redirect"; | ||
|
|
||
| const LINKS_ROOT_REDIRECT_URL = |
There was a problem hiding this comment.
🔵 The LINKS_ROOT_REDIRECT_URL variable isn't included in .env.selfhost.example, but it's referenced here. Worth adding it to the example env for completeness — self-hosted users would likely want this to point to their own dashboard URL.
There was a problem hiding this comment.
its in the .env.selfhost.example as third to last entry
| }); | ||
| set(_target, prop, value, receiver) { | ||
| const client = getRedisCache(); | ||
| return Reflect.set(client, prop, value, receiver); |
There was a problem hiding this comment.
🔵 Nice catch on the Redis proxy. The original code didn't bind this context, which would break when methods like get, set, etc. are called through the proxy since they lose their this reference to the actual Redis client. Good fix.
Greptile SummaryThis PR meaningfully improves the self-hosting experience: it adds a
Confidence Score: 4/5Safe to merge after addressing the tracker route CORS default, which directly breaks the primary self-hosted use case this PR is intended to fix. One P1 finding: the default-empty CLIENT_APP_ALLOWED_ORIGINS causes tracker asset routes to return 403 for cross-origin requests, conflicting with the crossorigin="anonymous" attribute on generated code snippets. Remaining findings are P2. apps/dashboard/lib/tracker-script.ts and .env.selfhost.example need attention for the CLIENT_APP_ALLOWED_ORIGINS default/documentation gap. Important Files Changed
Sequence DiagramsequenceDiagram
participant Site as Tracked Website
participant DB as Dashboard (Next.js)
participant API as API (Elysia)
participant Basket as Basket (ingestion)
Note over Site,DB: crossorigin="anonymous" script load
Site->>DB: GET /databuddy.js (Origin: https://mysite.com)
alt CLIENT_APP_ALLOWED_ORIGINS includes mysite.com
DB-->>Site: 200 + Access-Control-Allow-Origin
Site->>Basket: POST /event (analytics)
Basket->>Basket: Check CLIENT_APP_ALLOWED_ORIGINS bypass
Basket-->>Site: 200
else CLIENT_APP_ALLOWED_ORIGINS empty (default)
DB-->>Site: 403 Forbidden
Note over Site: Tracker fails to load
end
Note over Site,API: Dashboard auth flow
Site->>API: Request (Origin: dashboard_url)
API->>API: getAllowedCorsOrigins() checks AUTH_TRUSTED_ORIGINS + env URLs
API-->>Site: Response + CORS headers
Reviews (1): Last reviewed commit: "self_host1" | Re-trigger Greptile |
| const origin = request.headers.get("origin"); | ||
| const allowedOrigins = getClientAppAllowedOrigins(); | ||
|
|
||
| if (origin && !isAllowedTrackerAssetOrigin(origin, allowedOrigins)) { | ||
| return new NextResponse("Forbidden", { | ||
| status: 403, | ||
| headers: createCorsHeaders(request), |
There was a problem hiding this comment.
Self-hosted cross-origin tracker requests blocked by default
When CLIENT_APP_ALLOWED_ORIGINS is empty (the default in .env.selfhost.example), isAllowedTrackerAssetOrigin falls back to isDatabuddyOrigin, which returns false for any non-databuddy.cc hostname. For self-hosted deployments the tracked website is almost always on a different origin than the dashboard. The generated code snippets in code-generators.ts and step-install-tracking.tsx now include crossorigin="anonymous", which causes browsers to send an Origin header for cross-origin script loads — hitting this 403 gate and preventing the tracker from executing entirely.
Self-hosters need to set CLIENT_APP_ALLOWED_ORIGINS to their tracked domains for this to work, but the self-host example env provides no guidance on this. Consider either documenting the requirement clearly or loosening the fallback when the env var is absent (e.g., allow all origins when empty, mirroring the old origin: true behavior on the public API).
| <Databuddy | ||
| apiUrl={ | ||
| isLocalhost | ||
| ? "http://localhost:4000" | ||
| : "https://basket.databuddy.cc" | ||
| } | ||
| clientId={ | ||
| isLocalhost | ||
| ? "5ced32e5-0219-4e75-a18a-ad9826f85698" | ||
| : "3ed1fce1-5a56-4cb6-a977-66864f6d18e3" | ||
| } | ||
| apiUrl={basketUrl} |
There was a problem hiding this comment.
clientId may be undefined with no guard
trackingClientId is process.env.NEXT_PUBLIC_DATABUDDY_CLIENT_ID with no fallback, so it is undefined for any self-hosted install that doesn't set that env var. The FlagsProviderWrapper was given an explicit if (!clientId) return <>{children}</> guard to handle this, but the <Databuddy> component here has no equivalent. Depending on the SDK's type contract for clientId, this may produce a TypeScript error or cause the SDK to emit noise at runtime.
| const trimmed = value.trim().replace(TRAILING_SLASHES_REGEX, ""); | ||
|
|
||
| try { | ||
| return new URL(trimmed).origin; | ||
| } catch { | ||
| return trimmed; | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
normalizeOrigin diverges from the shared implementation on parse failure
The shared packages/shared/src/utils/origins.ts version returns null when new URL(trimmed) throws, letting callers filter bad values out. This local copy returns trimmed (the raw string) instead. A malformed entry in AUTH_TRUSTED_ORIGINS — for example mysite.com without a scheme — would be passed to BetterAuth as-is. The inconsistency makes the behaviour hard to reason about; applying the same return null / filter pattern here would keep both code paths consistent.
| process.env.BETTER_AUTH_URL, | ||
| process.env.NEXT_PUBLIC_APP_URL, | ||
| process.env.NEXT_PUBLIC_API_URL, | ||
| process.env.DASHBOARD_URL, |
There was a problem hiding this comment.
NEXT_PUBLIC_API_URL included as a CORS-allowed origin for the API itself
CORS allowed origins represent browsers/apps that may issue cross-origin requests to the API, not the API itself. Including NEXT_PUBLIC_API_URL (e.g. http://localhost:3001) means the API treats its own URL as a permitted front-end origin, which is a no-op in practice but adds unnecessary noise. Consider removing it from the defaults array.
|
I have addressed all the AI-generated findings that I believe are valid and need to be handled directly. The rest either raise open questions—such as how you would like to handle an unset CLIENT_APP_ALLOWED_ORIGINS and its access control—or are, in my view, not valid. |
|
i'm gonna be honest here, i have no idea why my account is leaving AI generated reviews, I don't remember setting up any agents on my github to review code lmao, but I will take a look at this PR soon |
For me also, it's the first time hearing of a code review agent that uses an account to post. I needed to double-check if the comments were user-generated or AI-generated. It sounded like AI, but it was a user account.🤣 |
yeah LMAO i literally didn't setup anything to use my account OR leave reviews, so i'm confused and trying to figure it out rn |
Asking claude i got: Here's an actionable checklist:
|
Databuddy’s self-hosting path still carries a few hosted-only assumptions, especially around runtime configuration, Docker Compose coverage, tracker URLs, and auth/email-related environment handling. This change set makes those areas more configurable and better aligned with a self-hosted deployment.
I originally found Databuddy through the TanStack showcase and wanted to try it on my own VPS. The hosted product experience worked well, but self-hosting exposed a number of rough edges that made the included self-host flow harder to get running end to end.
Summary of changes:
.env.selfhost.exampledocker-compose.selfhost.ymlto pass the env vars the app needs at runtimedashboard.Dockerfileso the dashboard can be built and run as part of the self-host stack/databuddy.js,/errors.js,/vitals.js)The goal here is to make the existing self-host setup easier to run while keeping the default hosted behavior intact where possible.