Skip to content

fix(self-host): reduce hosted-only assumptions#409

Open
JustinMissmahl wants to merge 6 commits intodatabuddy-analytics:mainfrom
JustinMissmahl:self_host1
Open

fix(self-host): reduce hosted-only assumptions#409
JustinMissmahl wants to merge 6 commits intodatabuddy-analytics:mainfrom
JustinMissmahl:self_host1

Conversation

@JustinMissmahl
Copy link
Copy Markdown

Databuddy’s self-hosting path still carries a few hosted-only assumptions, especially around runtime configuration, Docker Compose coverage, tracker URLs, and auth/email-related environment handling. This change set makes those areas more configurable and better aligned with a self-hosted deployment.

I originally found Databuddy through the TanStack showcase and wanted to try it on my own VPS. The hosted product experience worked well, but self-hosting exposed a number of rough edges that made the included self-host flow harder to get running end to end.

Summary of changes:

  • add a dedicated .env.selfhost.example
  • expand docker-compose.selfhost.yml to pass the env vars the app needs at runtime
  • add a dashboard.Dockerfile so the dashboard can be built and run as part of the self-host stack
  • parameterize tracker, app, API, basket, and auth-related URLs more consistently for self-host installs
  • expose self-host tracker asset routes from the dashboard app (/databuddy.js, /errors.js, /vitals.js)
  • centralize origin parsing/allowlist logic for tracker and CORS-related behavior
  • improve handling when optional provider/email env vars are missing
  • fix organization-scoped query/cache behavior in several dashboard hooks and flows
  • update README self-host troubleshooting guidance, especially around Postgres password/volume mismatches

The goal here is to make the existing self-host setup easier to run while keeping the default hosted behavior intact where possible.

@vercel
Copy link
Copy Markdown

vercel bot commented Apr 11, 2026

Someone is attempting to deploy a commit to the Databuddy OSS Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 11, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3b5f5078-b94e-4686-acb1-7fe37352ec56

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 11, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown
Member

@izadoesdev izadoesdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Welcome to Databuddy, @JustinMissmahl! 🎉 Thanks for this contribution — it's clear you put real effort into improving the self-hosting experience, and the PR description is excellent.

Summary: This PR makes Databuddy's self-hosting path significantly more viable by parameterizing hardcoded URLs (tracker, API, auth, basket), adding a dashboard.Dockerfile and init job to the compose stack, centralizing origin/CORS handling, gracefully handling missing optional services (Resend, Stripe metadata), fixing organization-scoped queries across dashboard hooks, and fixing a real bug in the Redis proxy.

What needs attention:

🔴 The CORS change in apps/api/src/routes/public/index.ts switches the public API from origin: true to a restricted allowlist. This would break any customer website calling the public API from non-databuddy.cc domains (feature flags, agent telemetry, etc.). The authenticated API change is fine, but the public route should remain open.

🟡 The self-hosted tracker script serving (/databuddy.js, /errors.js, /vitals.js) defaults to rejecting cross-origin requests when CLIENT_APP_ALLOWED_ORIGINS is empty, which is the default in .env.selfhost.example. Self-hosted users would hit 403s from customer sites until they discover this env var.

See inline comments for the full details. The auth changes, organization fallback logic, Resend/Stripe defensiveness, Docker setup, and origin utility module all look solid.

cors({
credentials: false,
origin: true,
origin: getAllowedCorsOrigins(),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Breaking change for hosted Databuddy. The public API previously used origin: true, which allowed requests from any origin. This is intentional — customer websites (e.g., myapp.com) call these public endpoints for feature flags, agent telemetry, etc.

Switching to getAllowedCorsOrigins() restricts this to *.databuddy.cc plus explicitly configured origins. Any customer site not on a databuddy.cc subdomain would start getting CORS errors.

The main app CORS change (in apps/api/src/index.ts) is fine since that serves authenticated endpoints. But this public route needs to remain open — or at minimum, the self-host path should keep origin: true as the default when CLIENT_APP_ALLOWED_ORIGINS is not set.


if (origin && !isAllowedTrackerAssetOrigin(origin, allowedOrigins)) {
return new NextResponse("Forbidden", {
status: 403,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 When CLIENT_APP_ALLOWED_ORIGINS is empty (which is the default in .env.selfhost.example), isAllowedTrackerAssetOrigin falls back to isDatabuddyOrigin() — meaning only *.databuddy.cc origins can load the script.

For self-hosted users, their customer websites (e.g., myapp.com) would get a 403 when trying to load /databuddy.js cross-origin, since the crossorigin="anonymous" attribute on the generated script tag triggers CORS preflight.

Consider either:

  • Defaulting to allow-all when CLIENT_APP_ALLOWED_ORIGINS is unset (matching current CDN behavior)
  • Or at minimum, documenting in .env.selfhost.example that this must be set for tracker scripts to work cross-origin

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the idea here was that if you self-host it, the intent is most likely private use, so denying by default for security was the thought process. What do you think? How would you like it to be handled? Then I can add a patch for it.
#409 (comment)
goes in the same direction

null;

return {
organizations,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 This pattern is repeated in ~10 components across the PR:

const { activeOrganization, activeOrganizationId } = useOrganizationsContext();
const organizationId = activeOrganization?.id ?? activeOrganizationId ?? undefined;

Since you're already modifying useOrganizationsContext() in this PR, consider adding a derived organizationId field directly to the context return value. Something like:

return {
  organizations,
  activeOrganization,
  activeOrganizationId,
  organizationId: activeOrganization?.id ?? activeOrganizationId ?? null,
  // ...
};

Would clean up all the call sites.

}, [
isLoadingOrgs,
isLoadingSession,
organizationsData,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 The .catch() resets the ref so it can retry, but there's no logging or user feedback if setActive fails. A transient network error could silently leave the user without an active org, causing queries to stay disabled (enabled: !!organizationId). Consider logging the error or adding a retry limit.

import { expiredRoute } from "./routes/expired";
import { redirectRoute } from "./routes/redirect";

const LINKS_ROOT_REDIRECT_URL =
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 The LINKS_ROOT_REDIRECT_URL variable isn't included in .env.selfhost.example, but it's referenced here. Worth adding it to the example env for completeness — self-hosted users would likely want this to point to their own dashboard URL.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its in the .env.selfhost.example as third to last entry

});
set(_target, prop, value, receiver) {
const client = getRedisCache();
return Reflect.set(client, prop, value, receiver);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 Nice catch on the Redis proxy. The original code didn't bind this context, which would break when methods like get, set, etc. are called through the proxy since they lose their this reference to the actual Redis client. Good fix.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 11, 2026

Greptile Summary

This PR meaningfully improves the self-hosting experience: it adds a dashboard.Dockerfile, expands docker-compose.selfhost.yml to build from source and run a DB init job, centralizes CORS/origin logic, makes auth cookie and trusted-origin configuration env-driven, fixes the Redis proxy this-binding bug, and adds org-scoped query keys across several dashboard hooks.

  • Tracker asset routes block self-hosted cross-origin traffic by default. The new /databuddy.js etc. routes return 403 when CLIENT_APP_ALLOWED_ORIGINS is empty and the request carries an Origin header. Generated code snippets include crossorigin=\"anonymous\", which causes browsers to send Origin on cross-origin script loads — meaning self-hosters tracking sites on a different domain will see their tracker blocked until they populate CLIENT_APP_ALLOWED_ORIGINS. The self-host example env has the variable but provides no guidance on what to set.
  • <Databuddy clientId={trackingClientId}> in layout.tsx lacks the same undefined guard added to FlagsProviderWrapper, so self-hosted installs without NEXT_PUBLIC_DATABUDDY_CLIENT_ID pass undefined to the SDK component.

Confidence Score: 4/5

Safe to merge after addressing the tracker route CORS default, which directly breaks the primary self-hosted use case this PR is intended to fix.

One P1 finding: the default-empty CLIENT_APP_ALLOWED_ORIGINS causes tracker asset routes to return 403 for cross-origin requests, conflicting with the crossorigin="anonymous" attribute on generated code snippets. Remaining findings are P2.

apps/dashboard/lib/tracker-script.ts and .env.selfhost.example need attention for the CLIENT_APP_ALLOWED_ORIGINS default/documentation gap.

Important Files Changed

Filename Overview
apps/dashboard/lib/tracker-script.ts New file serving tracker assets from the dashboard; blocks cross-origin requests when CLIENT_APP_ALLOWED_ORIGINS is empty, breaking default self-hosted cross-origin usage with crossorigin="anonymous" script tags
packages/shared/src/utils/origins.ts New shared origin utility module with normalizeOrigin, parseOriginList, isOriginInList, and isAllowedTrackerAssetOrigin helpers; logic is clean and well-tested
apps/api/src/lib/cors-origins.ts New file centralizing CORS origin logic for the API; includes NEXT_PUBLIC_API_URL as an allowed origin which is semantically unusual
packages/auth/src/auth.ts Auth configuration made self-host friendly: configurable cookie domain, dynamic trusted origins from env, and secure-cookie detection; local normalizeOrigin returns raw value on parse failure unlike the shared version
dashboard.Dockerfile New multi-stage Dockerfile for the dashboard; uses turbo prune correctly, bakes NEXT_PUBLIC_ vars at build time via ARG, and separately builds the tracker package
docker-compose.selfhost.yml Substantially expanded to build from source, add dashboard service, db init job, and propagate new env vars; healthcheck switched to clickhouse-client for better auth coverage
apps/dashboard/components/providers/organizations-provider.tsx Adds auto-sync of active organization when session has no activeOrganizationId; uses ref guard to prevent concurrent calls and resets on failure
apps/dashboard/app/layout.tsx Tracker API URL and client ID are now env-driven; clientId may be undefined for self-hosted installs and is passed to Databuddy component without a guard
packages/redis/redis.ts Proxy now correctly binds methods to the Redis client instance and adds a set trap, fixing a this-context bug in the original get-only proxy
.env.selfhost.example New self-host env template covering all required variables; CLIENT_APP_ALLOWED_ORIGINS is present but empty with no documentation on when or what to set

Sequence Diagram

sequenceDiagram
    participant Site as Tracked Website
    participant DB as Dashboard (Next.js)
    participant API as API (Elysia)
    participant Basket as Basket (ingestion)

    Note over Site,DB: crossorigin="anonymous" script load
    Site->>DB: GET /databuddy.js (Origin: https://mysite.com)
    alt CLIENT_APP_ALLOWED_ORIGINS includes mysite.com
        DB-->>Site: 200 + Access-Control-Allow-Origin
        Site->>Basket: POST /event (analytics)
        Basket->>Basket: Check CLIENT_APP_ALLOWED_ORIGINS bypass
        Basket-->>Site: 200
    else CLIENT_APP_ALLOWED_ORIGINS empty (default)
        DB-->>Site: 403 Forbidden
        Note over Site: Tracker fails to load
    end

    Note over Site,API: Dashboard auth flow
    Site->>API: Request (Origin: dashboard_url)
    API->>API: getAllowedCorsOrigins() checks AUTH_TRUSTED_ORIGINS + env URLs
    API-->>Site: Response + CORS headers
Loading

Reviews (1): Last reviewed commit: "self_host1" | Re-trigger Greptile

Comment on lines +38 to +44
const origin = request.headers.get("origin");
const allowedOrigins = getClientAppAllowedOrigins();

if (origin && !isAllowedTrackerAssetOrigin(origin, allowedOrigins)) {
return new NextResponse("Forbidden", {
status: 403,
headers: createCorsHeaders(request),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Self-hosted cross-origin tracker requests blocked by default

When CLIENT_APP_ALLOWED_ORIGINS is empty (the default in .env.selfhost.example), isAllowedTrackerAssetOrigin falls back to isDatabuddyOrigin, which returns false for any non-databuddy.cc hostname. For self-hosted deployments the tracked website is almost always on a different origin than the dashboard. The generated code snippets in code-generators.ts and step-install-tracking.tsx now include crossorigin="anonymous", which causes browsers to send an Origin header for cross-origin script loads — hitting this 403 gate and preventing the tracker from executing entirely.

Self-hosters need to set CLIENT_APP_ALLOWED_ORIGINS to their tracked domains for this to work, but the self-host example env provides no guidance on this. Consider either documenting the requirement clearly or loosening the fallback when the env var is absent (e.g., allow all origins when empty, mirroring the old origin: true behavior on the public API).

Comment on lines 136 to +137
<Databuddy
apiUrl={
isLocalhost
? "http://localhost:4000"
: "https://basket.databuddy.cc"
}
clientId={
isLocalhost
? "5ced32e5-0219-4e75-a18a-ad9826f85698"
: "3ed1fce1-5a56-4cb6-a977-66864f6d18e3"
}
apiUrl={basketUrl}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 clientId may be undefined with no guard

trackingClientId is process.env.NEXT_PUBLIC_DATABUDDY_CLIENT_ID with no fallback, so it is undefined for any self-hosted install that doesn't set that env var. The FlagsProviderWrapper was given an explicit if (!clientId) return <>{children}</> guard to handle this, but the <Databuddy> component here has no equivalent. Depending on the SDK's type contract for clientId, this may produce a TypeScript error or cause the SDK to emit noise at runtime.

Comment on lines +95 to +103
const trimmed = value.trim().replace(TRAILING_SLASHES_REGEX, "");

try {
return new URL(trimmed).origin;
} catch {
return trimmed;
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 normalizeOrigin diverges from the shared implementation on parse failure

The shared packages/shared/src/utils/origins.ts version returns null when new URL(trimmed) throws, letting callers filter bad values out. This local copy returns trimmed (the raw string) instead. A malformed entry in AUTH_TRUSTED_ORIGINS — for example mysite.com without a scheme — would be passed to BetterAuth as-is. The inconsistency makes the behaviour hard to reason about; applying the same return null / filter pattern here would keep both code paths consistent.

Comment on lines +10 to +13
process.env.BETTER_AUTH_URL,
process.env.NEXT_PUBLIC_APP_URL,
process.env.NEXT_PUBLIC_API_URL,
process.env.DASHBOARD_URL,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 NEXT_PUBLIC_API_URL included as a CORS-allowed origin for the API itself

CORS allowed origins represent browsers/apps that may issue cross-origin requests to the API, not the API itself. Including NEXT_PUBLIC_API_URL (e.g. http://localhost:3001) means the API treats its own URL as a permitted front-end origin, which is a no-op in practice but adds unnecessary noise. Consider removing it from the defaults array.

@JustinMissmahl
Copy link
Copy Markdown
Author

I have addressed all the AI-generated findings that I believe are valid and need to be handled directly. The rest either raise open questions—such as how you would like to handle an unset CLIENT_APP_ALLOWED_ORIGINS and its access control—or are, in my view, not valid.

@izadoesdev
Copy link
Copy Markdown
Member

i'm gonna be honest here, i have no idea why my account is leaving AI generated reviews, I don't remember setting up any agents on my github to review code lmao, but I will take a look at this PR soon

@JustinMissmahl
Copy link
Copy Markdown
Author

JustinMissmahl commented Apr 13, 2026

i'm gonna be honest here, i have no idea why my account is leaving AI generated reviews, I don't remember setting up any agents on my github to review code lmao, but I will take a look at this PR soon

For me also, it's the first time hearing of a code review agent that uses an account to post. I needed to double-check if the comments were user-generated or AI-generated. It sounded like AI, but it was a user account.🤣

@izadoesdev
Copy link
Copy Markdown
Member

Author

yeah LMAO i literally didn't setup anything to use my account OR leave reviews, so i'm confused and trying to figure it out rn

@JustinMissmahl
Copy link
Copy Markdown
Author

Author

yeah LMAO i literally didn't setup anything to use my account OR leave reviews, so i'm confused and trying to figure it out rn

Asking claude i got:

Here's an actionable checklist:

  1. Check GitHub Authorized Apps
    → github.com/settings/applications — look for any OAuth apps or GitHub Apps that have been granted access
  2. Check Personal Access Tokens
    → github.com/settings/tokens — look for any PATs with repo or pull_requests write scope, especially ones shared with local tools
  3. Check MCP Configs in IDE
    → Look in ~/.cursor/mcp.json or claude_desktop_config.json for a GitHub MCP server entry using their PAT
  4. Check the Repo's GitHub Actions Workflows
    → .github/workflows/ — look for any workflow that uses secrets.PERSONAL_ACCESS_TOKEN instead of secrets.GITHUB_TOKEN (the latter posts as github-actions[bot], the former posts as the user)
  5. Look at the Review Comment Formatting
    → AI-generated reviews from tools like Cursor/MCP tend to have very structured, bullet-pointed, comprehensive formatting — different from how a human would casually comment
  6. Check Cursor's Background Agent Settings
    → In newer Cursor versions, background agents can be configured to trigger on PRs automatically

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants