Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -50,3 +50,9 @@ GITHUB_CLIENT_ID=
GITHUB_CLIENT_SECRET=
GOOGLE_CLIENT_ID=
GOOGLE_CLIENT_SECRET=

# Analytics (WS4 behavioral-intelligence funnel events → New Relic custom events)
# "noop" (default) = inert; drops every event, zero deps, never errors. Set to
# "newrelic" to emit InstantFunnel custom events via the existing New Relic app
# (requires NEW_RELIC_LICENSE_KEY). Leaving it unset/noop is the safe default.
ANALYTICS_BACKEND=noop
85 changes: 85 additions & 0 deletions docs/OBSERVABILITY-FUNNEL-EVENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# Observability — WS4 Behavioral-Intelligence Funnel Events

This note documents the New Relic custom event the api emits at the conversion
funnel points, and the Prometheus counter that backs the bridge's failure mode.
It satisfies CLAUDE.md rule 25 (every observability signal ships with its
documentation). The NR **alert + dashboard tile** for these signals live in the
separate `infra` repo (`infra/newrelic/alerts/`, `infra/newrelic/dashboards/`,
`infra/observability/METRICS-CATALOG.md`) — that repo has no auto-apply, so the
operator wires the tiles/alerts there; this note is the source-of-truth for the
event/attribute contract the dashboards FACET on.

## Why this exists

`instant_conversion_funnel_total{step}` (Prometheus) is an **aggregate count** —
it answers "how many provisions today" but cannot be keyed on a stable entity
(team / anonymous fingerprint bucket / cohort), so it cannot compute the per-
entity / cohorted funnel KPIs the WS4 plan needs:

- anonymous → claimed (target **> 2%**)
- claimed → paid (target **> 20%**)

The `InstantFunnel` New Relic custom event is the **per-entity** companion. Both
are emitted at every funnel point — the Prometheus counter is **not** removed.

## `InstantFunnel` custom event

Emitted via `common/analyticsevent` (factory-wrapped: fail-open + PII-sanitized).
The api wires the emitter once at boot (`router.wireAnalyticsEmitter`) from
`ANALYTICS_BACKEND` (default `noop` — **inert** until New Relic is configured;
the noop default is the flag protection, no separate feature flag needed). The
`newrelic` backend reuses the api's existing `*newrelic.Application`.

| Attribute | Always? | Values / notes |
|----------------|---------|----------------|
| `funnelStep` | yes | `landing` \| `provision` \| `claim` \| `paid` |
| `serviceName` | yes | `api` (FACET to attribute a step to the emitting service) |
| `tier` | most | `anonymous`/`free`/`hobby`/`pro`/… (omitted at `landing`) |
| `env` | provision | `development`/`production`/… (resolved env of the provision) |
| `fingerprint` | anon | **already-hashed** SHA256(/24+ASN) bucket — never a raw IP |
| `teamId` | claim/paid | team UUID (opaque id, not PII) |

PII policy: the attribute map passes through `analyticsevent.Sanitize` (explicit
allowlist + email-hashing) before any backend sees it, so no raw email / token /
connection string can leak even if a future emit site passes one.

### Emit sites (api)

| Step | File:func | Trigger |
|-------------|-----------|---------|
| `landing` | `onboarding.go` `StartLanding` | GET `/start` (top of funnel) |
| `provision` | `db.go`/`cache.go`/`nosql.go`/`vector.go`/`queue.go`/`storage.go`/`webhook.go` `New*` (anonymous path) | anonymous resource provisioned |
| `claim` | `onboarding.go` `Claim` | anonymous → claimed (account created) |
| `paid` | `billing.go` `handleSubscriptionCharged` | claimed → paid (subscription active) |

### NRQL starters

```sql
-- anon->claimed (exclude synthetic prober traffic)
SELECT uniqueCount(fingerprint) FROM InstantFunnel
WHERE funnelStep = 'landing' AND cohort != 'synthetic' SINCE 1 day ago

SELECT uniqueCount(teamId) FROM InstantFunnel
WHERE funnelStep = 'paid' AND cohort != 'synthetic' FACET tier SINCE 7 days ago
```

> **Exclude `cohort = 'synthetic'` from all funnel analysis.** Synthetic
> flow-test traffic (`InstantFlowTest`, emitted by the worker's prober) carries
> `cohort='synthetic'`; the real-traffic funnel `InstantFunnel` events carry no
> cohort attribute, so `WHERE cohort != 'synthetic'` keeps the two separated.

## `instant_analytics_emit_failed_total{reason}` (Prometheus)

Counts behavioral-intelligence custom events **dropped** before reaching the
analytics sink, by `reason`.

- `reason="nil_app"` — the New Relic sink had no `*newrelic.Application` (NR not
configured). This is the **expected steady state** until
`ANALYTICS_BACKEND=newrelic` + a license key are wired, so a flat non-zero
value in that configuration is benign.
- A **sudden climb after** NR is configured means the bridge is dropping real
funnel events — that is the alertable condition (suggested: P2 observability,
warn on `rate(...[10m]) > 0` once `ANALYTICS_BACKEND=newrelic`).

Lazy `*Vec`: not visible at `/metrics` until the first dropped emit observes a
label.
12 changes: 11 additions & 1 deletion internal/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,15 @@ type Config struct {
MetricsToken string // METRICS_TOKEN — if set, required as Bearer token to access /metrics
DashboardBaseURL string // DASHBOARD_BASE_URL — where to redirect onboarding flows (default: http://localhost:5173)

// AnalyticsBackend selects the behavioral-intelligence custom-event sink
// (common/analyticsevent). Read from ANALYTICS_BACKEND. One of "noop"
// (default — drops every event, zero deps, never errors) or "newrelic"
// (emits InstantFunnel/InstantFlowTest custom events via the existing
// *newrelic.Application). Defaulting to "noop" makes funnel emission INERT
// in any environment where New Relic is not configured — the safe,
// fail-open default, so no separate feature flag is needed.
AnalyticsBackend string

// APIPublicURL is the externally-routable base URL the API runs at
// — used to construct fully-qualified links in outbound emails
// (deletion-confirm, etc). Empty in local dev where the dashboard
Expand Down Expand Up @@ -421,7 +430,8 @@ func Load() *Config {
cfg.DeployDomain = getenv("DEPLOY_DOMAIN", "instant.dev")
cfg.ComputeProvider = getenv("COMPUTE_PROVIDER", "noop")
cfg.KubeNamespaceApps = getenv("KUBE_NAMESPACE_APPS", "instant-apps")
cfg.MetricsToken = os.Getenv("METRICS_TOKEN") // empty = open (local dev)
cfg.MetricsToken = os.Getenv("METRICS_TOKEN") // empty = open (local dev)
cfg.AnalyticsBackend = getenv("ANALYTICS_BACKEND", "noop") // noop = inert (no NR sink)
cfg.DashboardBaseURL = getenv("DASHBOARD_BASE_URL", "http://localhost:5173")
cfg.APIPublicURL = strings.TrimRight(getenv("API_PUBLIC_URL", ""), "/")
// Parse DELETION_CONFIRMATION_TTL_MINUTES; fall back to 15 on
Expand Down
144 changes: 144 additions & 0 deletions internal/handlers/analytics.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
package handlers

import (
"context"
"sync/atomic"

"instant.dev/common/analyticsevent"
)

// WS4 behavioral-intelligence funnel events.
//
// This file is the api's bridge from the existing Prometheus conversion-funnel
// counter (instant_conversion_funnel_total — an AGGREGATE count) to the
// per-entity / per-cohort New Relic custom event (InstantFunnel) that the WS4
// observability plan needs for funnel + retention analysis (anon→claim→
// provision→paid). The Prometheus counter stays exactly where it is; every
// funnel emit site now ALSO records an InstantFunnel custom event alongside it.
//
// Why a package-level emitter instead of a struct field on every handler: the
// funnel emit sites live across nine independently-constructed handler structs
// (DBHandler, CacheHandler, OnboardingHandler, BillingHandler, …), each with
// its own constructor. The api already shares process-wide observability deps
// (the `metrics` package globals) the same way. The router wires the concrete
// emitter ONCE at boot via [SetAnalyticsEmitter]; until then — and in every
// unit test that doesn't opt in — the default is the no-op emitter, so funnel
// emission is INERT by default and can NEVER block, slow, or error a request.
//
// Fail-open + inert-by-default IS the flag protection: the analyticsevent
// package wraps every backend so a panic in the sink is swallowed and a nil /
// unconfigured backend is a silent drop. No separate feature flag is needed —
// ANALYTICS_BACKEND defaulting to "noop" means this code path does nothing in
// prod until New Relic is explicitly configured.

// emitterBox wraps the [analyticsevent.Emitter] interface in a single concrete
// struct type so [analyticsEmitter] (an atomic.Value) always sees ONE concrete
// type across Stores — atomic.Value panics if successive Store calls pass
// different concrete types, which a bare interface value would (noop{} vs the
// factory's wrapped{}). The box is the invariant concrete type.
type emitterBox struct{ e analyticsevent.Emitter }

// analyticsEmitter holds the process-wide emitter (boxed). atomic.Value so
// [SetAnalyticsEmitter] (called once at boot, before serving) and the per-request
// reads in [recordFunnelEvent] are race-free. Defaults to the no-op emitter via
// the package init below.
var analyticsEmitter atomic.Value // stores emitterBox

func init() {
// Inert default: no analytics sink until the router wires one. The no-op
// emitter drops every event with zero deps and can never error.
analyticsEmitter.Store(emitterBox{e: analyticsevent.NewNoop()})
}

// SetAnalyticsEmitter installs the process-wide analytics emitter. Called once
// from the router at boot with the emitter built from ANALYTICS_BACKEND (noop by
// default; the New Relic sink when configured). A nil emitter is ignored so a
// mis-wire degrades to the existing no-op rather than panicking on first emit.
func SetAnalyticsEmitter(e analyticsevent.Emitter) {
if e == nil {
return
}
analyticsEmitter.Store(emitterBox{e: e})
}

// getAnalyticsEmitter returns the current process-wide emitter, never nil.
func getAnalyticsEmitter() analyticsevent.Emitter {
if box, ok := analyticsEmitter.Load().(emitterBox); ok && box.e != nil {
return box.e
}
return analyticsevent.NewNoop()
}

// serviceNameAPI is the AttrServiceName value every funnel event from this
// service carries, so a dashboard can FACET by which service emitted the step.
const serviceNameAPI = "api"

// Funnel-step values re-exported from analyticsevent so the per-handler emit
// sites (db/cache/nosql/…/onboarding/billing) reference one in-package constant
// and don't each need to import common/analyticsevent. These MUST stay equal to
// the analyticsevent constants — funnelStepsMatchCanonical (in the test) asserts
// it, and the wire contract (dashboards FACET on these exact strings) depends on
// it.
const (
funnelStepProvision = analyticsevent.FunnelStepProvision
funnelStepClaim = analyticsevent.FunnelStepClaim
funnelStepPaid = analyticsevent.FunnelStepPaid
funnelStepLanding = analyticsevent.FunnelStepLanding
)

// recordFunnelEvent emits one [analyticsevent.EventFunnel] custom event for the
// given funnel step alongside the existing Prometheus counter. It is the single
// chokepoint every funnel emit site routes through so the attribute set stays
// uniform and PII-safe.
//
// Attributes are intentionally low-cardinality and allowlisted (the
// analyticsevent wrapper drops anything not on the PII allowlist before the
// event leaves the process): step, tier, env, service, and — when known — the
// already-hashed fingerprint bucket (SHA256(/24+ASN), never a raw IP) and team
// id (an opaque UUID, not PII). Empty values are omitted so an absent field
// reads as "missing" in NRQL rather than "".
//
// FAIL-OPEN: this never returns an error and the wrapper swallows any panic, so
// a funnel emit can never affect the request path. Callers MUST NOT wrap it in
// error handling.
func recordFunnelEvent(ctx context.Context, step string, attrs funnelAttrs) {
getAnalyticsEmitter().Record(ctx, analyticsevent.EventFunnel, attrs.toMap(step))
}

// funnelAttrs is the typed, PII-safe attribute payload for a funnel event. Only
// these fields can reach an event; the package allowlist is the backstop.
type funnelAttrs struct {
// Tier is the plan tier the funnel step occurred at ("anonymous", "free",
// "pro", …). Low cardinality.
Tier string
// Env is the resolved environment ("development", "production", …).
Env string
// Fingerprint is the already-hashed SHA256(/24+ASN) anonymous bucket, or ""
// for an authenticated step. Never a raw IP.
Fingerprint string
// TeamID is the owning team UUID (opaque id, not PII), or "" when unknown
// (e.g. anonymous provisions before a claim).
TeamID string
}

// toMap renders funnelAttrs + the step into the flat attribute map the emitter
// consumes, omitting empty values so NRQL facets stay clean.
func (a funnelAttrs) toMap(step string) map[string]any {
out := map[string]any{
analyticsevent.AttrFunnelStep: step,
analyticsevent.AttrServiceName: serviceNameAPI,
}
if a.Tier != "" {
out[analyticsevent.AttrTier] = a.Tier
}
if a.Env != "" {
out[analyticsevent.AttrEnv] = a.Env
}
if a.Fingerprint != "" {
out[analyticsevent.AttrFingerprint] = a.Fingerprint
}
if a.TeamID != "" {
out[analyticsevent.AttrTeamID] = a.TeamID
}
return out
}
Loading
Loading