From 62bc647abf6ddc67c71a0914f2b55eb2a89d6574 Mon Sep 17 00:00:00 2001 From: James Broadhead Date: Wed, 27 May 2026 23:01:26 +0000 Subject: [PATCH] skills: bundle databricks-apps (AppKit) and databricks-core from databricks-agent-skills MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit CODA bundles 43 skills today but none cover AppKit. AppKit is the canonical TypeScript + React stack for Databricks Apps and lives in the stable `databricks-apps` skill in databricks/databricks-agent-skills. Without it, the agent has no AppKit-aware skill to route generic "build a Databricks app" requests to, and the sibling PR that asks `databricks-app-apx` to defer to AppKit has nothing to defer *to*. Vendors the two skills verbatim: - `.claude/skills/databricks-apps/` — 19 files (SKILL.md, agents/openai.yaml, assets/, references/ including the 9-file `appkit/` subdirectory). - `.claude/skills/databricks-core/` — 26 files. databricks-apps declares `parent: databricks-core`; bundling the parent so inheritance works. CODA already has a narrower `databricks-config` skill that overlaps with databricks-core's auth/profile content; keeping both is intentional — let the agent pick the better fit per query. Source: skills/databricks-apps and skills/databricks-core in databricks/databricks-agent-skills @ origin/main (a6da289). This PR was prepared by Claude. --- .claude/skills/databricks-apps/SKILL.md | 223 ++++++++ .../skills/databricks-apps/agents/openai.yaml | 7 + .../databricks-apps/assets/databricks.png | Bin 0 -> 15366 bytes .../databricks-apps/assets/databricks.svg | 3 + .../references/appkit/appkit-sdk.md | 106 ++++ .../references/appkit/files.md | 268 +++++++++ .../references/appkit/frontend.md | 174 ++++++ .../references/appkit/genie.md | 303 ++++++++++ .../databricks-apps/references/appkit/jobs.md | 141 +++++ .../references/appkit/lakebase.md | 438 +++++++++++++++ .../references/appkit/model-serving.md | 222 ++++++++ .../references/appkit/overview.md | 149 +++++ .../references/appkit/proto-contracts.md | 201 +++++++ .../references/appkit/proto-first.md | 306 ++++++++++ .../references/appkit/sql-queries.md | 267 +++++++++ .../databricks-apps/references/appkit/trpc.md | 146 +++++ .../references/other-frameworks.md | 269 +++++++++ .../references/platform-guide.md | 173 ++++++ .../databricks-apps/references/testing.md | 99 ++++ .claude/skills/databricks-core/SKILL.md | 142 +++++ .../skills/databricks-core/agents/openai.yaml | 7 + .../databricks-core/assets/databricks.png | Bin 0 -> 15366 bytes .../databricks-core/assets/databricks.svg | 3 + .../databricks-core/data-exploration.md | 330 +++++++++++ .../databricks-core/databricks-cli-auth.md | 527 ++++++++++++++++++ .../databricks-core/databricks-cli-install.md | 178 ++++++ 26 files changed, 4682 insertions(+) create mode 100644 .claude/skills/databricks-apps/SKILL.md create mode 100644 .claude/skills/databricks-apps/agents/openai.yaml create mode 100644 .claude/skills/databricks-apps/assets/databricks.png create mode 100644 .claude/skills/databricks-apps/assets/databricks.svg create mode 100644 .claude/skills/databricks-apps/references/appkit/appkit-sdk.md create mode 100644 .claude/skills/databricks-apps/references/appkit/files.md create mode 100644 .claude/skills/databricks-apps/references/appkit/frontend.md create mode 100644 .claude/skills/databricks-apps/references/appkit/genie.md create mode 100644 .claude/skills/databricks-apps/references/appkit/jobs.md create mode 100644 .claude/skills/databricks-apps/references/appkit/lakebase.md create mode 100644 .claude/skills/databricks-apps/references/appkit/model-serving.md create mode 100644 .claude/skills/databricks-apps/references/appkit/overview.md create mode 100644 .claude/skills/databricks-apps/references/appkit/proto-contracts.md create mode 100644 .claude/skills/databricks-apps/references/appkit/proto-first.md create mode 100644 .claude/skills/databricks-apps/references/appkit/sql-queries.md create mode 100644 .claude/skills/databricks-apps/references/appkit/trpc.md create mode 100644 .claude/skills/databricks-apps/references/other-frameworks.md create mode 100644 .claude/skills/databricks-apps/references/platform-guide.md create mode 100644 .claude/skills/databricks-apps/references/testing.md create mode 100644 .claude/skills/databricks-core/SKILL.md create mode 100644 .claude/skills/databricks-core/agents/openai.yaml create mode 100644 .claude/skills/databricks-core/assets/databricks.png create mode 100644 .claude/skills/databricks-core/assets/databricks.svg create mode 100644 .claude/skills/databricks-core/data-exploration.md create mode 100644 .claude/skills/databricks-core/databricks-cli-auth.md create mode 100644 .claude/skills/databricks-core/databricks-cli-install.md diff --git a/.claude/skills/databricks-apps/SKILL.md b/.claude/skills/databricks-apps/SKILL.md new file mode 100644 index 0000000..889043b --- /dev/null +++ b/.claude/skills/databricks-apps/SKILL.md @@ -0,0 +1,223 @@ +--- +name: databricks-apps +description: "Build apps on Databricks Apps platform. Use when asked to create dashboards, data apps, analytics tools, or visualizations. Auto-detects need for Lakebase when app stores state; evaluates data access patterns (analytics vs Lakebase synced tables) before scaffolding. Invoke BEFORE starting implementation." +compatibility: Requires databricks CLI (>= v0.294.0) +metadata: + version: "0.1.2" +parent: databricks-core +--- + +# Databricks Apps Development + +**FIRST**: Use the parent `databricks-core` skill for CLI basics, authentication, and profile selection. + +Build apps that deploy to Databricks Apps platform. + +## Required Reading by Phase + +| Phase | READ BEFORE proceeding | +|-------|------------------------| +| Scaffolding | **⚠️ STOP — evaluate the State Storage Rule and Data Access Decision Gate below before scaffolding.** Parent `databricks-core` skill (auth, warehouse discovery); then run `databricks apps manifest` + `databricks apps init` with `--features` and `--set` (see AppKit section below) | +| Writing SQL queries | [SQL Queries Guide](references/appkit/sql-queries.md) | +| Writing UI components | [Frontend Guide](references/appkit/frontend.md) | +| Using `useAnalyticsQuery` | [AppKit SDK](references/appkit/appkit-sdk.md) | +| Adding API endpoints | [tRPC Guide](references/appkit/trpc.md) | +| Using Lakebase (OLTP database) | [Lakebase Guide](references/appkit/lakebase.md) | +| Adding Genie chat / Genie-powered apps | [Genie Guide](references/appkit/genie.md) — follow the Genie agent workflow below | +| Using Model Serving (ML inference) | [Model Serving Guide](references/appkit/model-serving.md) | +| Typed data contracts (proto-first design) | [Proto-First Guide](references/appkit/proto-first.md) and [Plugin Contracts](references/appkit/proto-contracts.md) | +| Managing files in UC Volumes | [Files Guide](references/appkit/files.md) | +| Triggering / monitoring Lakeflow Jobs from the app | [Jobs Guide](references/appkit/jobs.md) | +| Platform rules (permissions, deployment, limits) | [Platform Guide](references/platform-guide.md) — READ for ALL apps including AppKit | +| Non-AppKit app (Streamlit, FastAPI, Flask, Gradio, Next.js, etc.) | [Other Frameworks](references/other-frameworks.md) | + +## Generic Guidelines + +- **App name**: ≤26 characters, lowercase letters/numbers/hyphens only (no underscores). dev- prefix adds 4 chars, max 30 total. +- **Validation**: `databricks apps validate --profile ` before deploying. +- **Smoke tests** (AppKit only): ALWAYS update `tests/smoke.spec.ts` selectors BEFORE running validation. Default template checks for "Minimal Databricks App" heading and "hello world" text — these WILL fail in your custom app. See [testing guide](references/testing.md). +- **Smoke test selectors**: use only Playwright locator APIs — `getByRole`, `getByText`, `getByPlaceholder`, `getByLabel`. `getByLabelText` does not exist in Playwright (it is a React Testing Library method) and throws `TypeError` at runtime. See [testing guide](references/testing.md) or `npx playwright codegen`. +- **Smoke test data**: keep result sets under the 1 MB analytics-event payload cap. Queries returning thousands of rows cause `INVALID_REQUEST: Event exceeds max size of 1048576 bytes` and `net::ERR_ABORTED`, leaving every asserted UI element absent. Use `LIMIT` or an aggregated query (e.g. `COUNT(*) GROUP BY status`) — never raw row dumps. +- **AppKit version**: never override the `@databricks/appkit` or `@databricks/appkit-ui` version in `package.json` — `databricks apps init` sets the correct version. Do not run `npm install @databricks/appkit@` unless explicitly asked by the user. If you need a different version, re-scaffold with `databricks apps init --version `. +- **Authentication**: covered by parent `databricks-core` skill. +- **AppKit API surface**: before writing code that calls AppKit APIs (`createApp`, plugin shapes, `useAnalyticsQuery`, etc.), run `npx @databricks/appkit docs
` and use the actual signature. Training data has stale shapes; a single invented signature fails `tsc --noEmit` during validate. The docs ship with the installed AppKit and are the authoritative source. +- **TypeScript casts**: never use `as unknown as ` double-assertions — `appkit lint` enforces `no-double-type-assertion` and one violation fails the entire validate step. Instead: narrow with Zod (`z.infer`), use a runtime type guard, or write a typed mapper function. If a query result needs reshaping, type the row schema via queryKey types rather than casting. + +## Project Structure (after `databricks apps init --features analytics`) +- `client/src/App.tsx` — main React component (start here) +- `config/queries/*.sql` — SQL query files (queryKey = filename without .sql) +- `server/server.ts` — backend entry (tRPC routers) +- `tests/smoke.spec.ts` — smoke test (⚠️ MUST UPDATE selectors for your app) +- `client/src/appKitTypes.d.ts` — auto-generated types (`npm run typegen`) + +## Project Structure (after `databricks apps init --features lakebase`) +- `server/server.ts` — backend with Lakebase pool + tRPC routes +- `client/src/App.tsx` — React frontend +- `app.yaml` — manifest with `database` resource declaration +- `package.json` — includes `@databricks/lakebase` dependency +- Note: **No `config/queries/`** — Lakebase apps use `pool.query()` in tRPC, not SQL files + +## Data Discovery + +Before writing any SQL, use the parent `databricks-core` skill for data exploration — search `information_schema` by keyword, then batch `discover-schema` for the tables you need. Do NOT skip this step. + +**State Storage Rule (evaluate BEFORE the Decision Gate):** + +If the user's app description implies storing or persisting data — forms, CRUD operations, user input, preferences, bookmarks, orders, todos, comments, votes, or any user-generated content — the app needs a Lakebase database. Do not wait for the user to ask for one. + +1. Use the **`databricks-lakebase`** skill to create a Lakebase project (if one doesn't already exist) and obtain the branch and database resource names. +2. Scaffold with `--features lakebase` and pass `--set lakebase.postgres.branch= --set lakebase.postgres.database=`. +3. If the app **also** reads from Unity Catalog tables, proceed to the Data Access Decision Gate below to determine whether to add `--features analytics` or use Lakebase synced tables. + +This rule governs **state storage** only. For how the app reads existing lakehouse data, proceed to the Decision Gate below. This is not optional — any app that writes user-generated data needs Lakebase. + +## Development Workflow (FOLLOW THIS ORDER) + +**Data Access Decision Gate (REQUIRED before scaffolding):** + +If the app reads from Unity Catalog / lakehouse tables, you MUST show the comparison below to the user and ask them to choose. Do not skip this. Do not choose for them. + +| | **(A) Lakebase synced tables** | **(B) Analytics** | +|--|---|---| +| Speed | Sub-second responses | Takes a few seconds | +| Best for | Search, lookups, catalogs, real-time data, operational apps | Dashboards, charts, aggregations, KPIs | +| How it works | Data synced from Delta into Lakebase Postgres | Queries run on SQL warehouse at read time | + +After showing the table, add a brief recommendation. Default to recommending Lakebase synced tables (A) unless the use case is clearly about aggregations, charts, or dashboards where seconds of latency is acceptable. For lookups, searches, serving data to users, or any interactive use case, recommend Lakebase synced tables. Always let the user make the final call. + +After the user chooses: +- (A) Lakebase synced tables → scaffold with `--features lakebase`. See [Lakebase Guide](references/appkit/lakebase.md) for full workflow. +- (B) Analytics → scaffold with `--features analytics`. +- Both → scaffold with `--features analytics,lakebase` if the app needs both patterns. +- If the app does NOT read UC data (pure CRUD, Genie, Model Serving), skip this gate. For pure CRUD/state apps, the State Storage Rule above already applies — scaffold with `--features lakebase`. For Genie or Model Serving, scaffold with the corresponding `--features` flag. + +**Analytics apps** (`--features analytics`): + +1. Create SQL files in `config/queries/` +2. Run `npm run typegen` — verify all queries show ✓ +3. Read `client/src/appKitTypes.d.ts` to see generated types +4. **THEN** write `App.tsx` using the generated types +5. Update `tests/smoke.spec.ts` selectors +6. Run `databricks apps validate --profile ` + +**DO NOT** write UI code before running typegen — types won't exist and you'll waste time on compilation errors. + +**Lakebase apps** (`--features lakebase`): No SQL files or typegen. See [Lakebase Guide](references/appkit/lakebase.md) for the tRPC pattern: initialize schema at startup, write procedures in `server/server.ts`, then build the React frontend. + +## When to Use What + +After completing the decision gate above, use this routing table: + +- **Read analytics data → display in chart/table**: Use visualization components with `queryKey` prop +- **Read analytics data → custom display (KPIs, cards)**: Use `useAnalyticsQuery` hook +- **Read analytics data → need computation before display**: Still use `useAnalyticsQuery`, transform client-side +- **Read lakehouse data at low latency (lookups, search, catalogs)**: Use Lakebase synced tables — see [Lakebase Guide](references/appkit/lakebase.md) +- **Read/write persistent data (users, orders, CRUD state)**: Use Lakebase pool via tRPC — see [Lakebase Guide](references/appkit/lakebase.md) +- **Natural language query interface over tables (Genie)**: Use `genie()` plugin — see [Genie Guide](references/appkit/genie.md) +- **Call ML model endpoint**: Use `serving()` plugin — see [Model Serving Guide](references/appkit/model-serving.md) +- **Trigger or monitor a Lakeflow Job from the app**: Use the `jobs()` plugin — see [Jobs Guide](references/appkit/jobs.md) +- **⚠️ NEVER use tRPC to run SELECT queries against the warehouse** — always use SQL files in `config/queries/` +- **⚠️ NEVER use `useAnalyticsQuery` for Lakebase data** — it queries the SQL warehouse only + +## Frameworks + +### AppKit (Recommended) + +TypeScript/React framework with type-safe SQL queries and built-in components. + +**Official Documentation** — the source of truth for all API details: + +```bash +npx @databricks/appkit docs # ← ALWAYS start here to see available pages +npx @databricks/appkit docs # view a section by name or doc path +npx @databricks/appkit docs --full # full index with all API entries +npx @databricks/appkit docs "appkit-ui API reference" # example: section by name +npx @databricks/appkit docs ./docs/plugins/analytics.md # example: specific doc file +``` + +**DO NOT guess doc paths.** Run without args first, pick from the index. The `` argument accepts both section names (from the index) and file paths. Docs are the authority on component props, hook signatures, and server APIs — skill files only cover anti-patterns and gotchas. + +**App Manifest and Scaffolding** + +**Agent workflow for scaffolding: get the manifest first, then build the init command.** + +1. **Get the manifest** (JSON schema describing plugins and their resources): + ```bash + databricks apps manifest --profile + # See plugins available in a specific AppKit version: + databricks apps manifest --version --profile + # Custom template: + databricks apps manifest --template --profile + ``` + The output defines: + - **Plugins**: each has a key (plugin ID for `--features`), plus `requiredByTemplate`, and `resources`. + - **requiredByTemplate**: If **true**, that plugin is **mandatory** for this template — do **not** add it to `--features` (it is included automatically); you must still supply all of its required resources via `--set`. If **false** or absent, the plugin is **optional** — add it to `--features` only when the user's prompt indicates they want that capability (e.g. analytics/SQL), and then supply its required resources via `--set`. + - **Resources**: Each plugin has `resources.required` and `resources.optional` (arrays). Each item has `resourceKey` and `fields` (object: field name → description/env). Use `--set ..=` for each required resource field of every plugin you include. + +2. **Scaffold** (DO NOT use `npx`; use the CLI only): + ```bash + databricks apps init --name --features , \ + --set ..= \ + --set ..= \ + --description "" --run none --profile + # --run none: skip auto-run after scaffolding (review code first) + # With custom template: + databricks apps init --template --name --features ... --set ... --profile + ``` + Optionally use `--version ` to target a specific AppKit version. + - **Required**: `--name`, `--profile`. Name: ≤26 chars, lowercase letters/numbers/hyphens only. Use `--features` only for **optional** plugins the user wants (plugins with `requiredByTemplate: false` or absent); mandatory plugins must not be listed in `--features`. + - **Resources**: Pass `--set` for every required resource (each field in `resources.required`) for (1) all plugins with `requiredByTemplate: true`, and (2) any optional plugins you added to `--features`. Add `--set` for `resources.optional` only when the user requests them. + - **Discovery**: Use the parent `databricks-core` skill to resolve IDs (e.g. warehouse: `databricks warehouses list --profile ` or `databricks experimental aitools tools get-default-warehouse --profile `). + +**DO NOT guess** plugin names, resource keys, or property names — always derive them from `databricks apps manifest` output. Example: if the manifest shows plugin `analytics` with a required resource `resourceKey: "sql-warehouse"` and `fields: { "id": ... }`, include `--set analytics.sql-warehouse.id=`. + +**Scaffolding Rules Protocol** — `databricks apps manifest` may emit `scaffolding.rules` at the template level (top-level `scaffolding.rules`) and on individual plugins (`plugins[].scaffolding.rules`). Each block has `must` / `should` / `never` arrays of short directive strings. Consume them as follows: + +1. **Gather** — for every plugin in your final `--features` list AND every plugin with `requiredByTemplate: true`, read `plugins[].scaffolding.rules`. Union those with the top-level template `scaffolding.rules` into one working set, tagged by source (template vs ``). +2. **Precedence** — manifest rules override the directives baked into this skill. Where the manifest is silent on a topic, this skill's content is the floor. +3. **Phase ordering** — rules whose text begins with `Before init` MUST be executed before `databricks apps init`. Rules beginning with `After init` MUST be executed after init completes (e.g. migrations, typegen, connectivity checks). Rules without a phase prefix apply throughout the scaffold/develop loop. +4. **Conflict detection** — if a plugin `must` rule contradicts a template `never` rule on the same target (or vice versa), STOP and ask the user which to follow before proceeding. Do not silently pick one. Treat `must` vs `never` on the same action as a conflict; `should` is advisory and does not block. +5. **Reporting** — before running `databricks apps init`, surface the merged working set to the user grouped by phase (Before init / After init / Always) and by severity (must / should / never), so the active guardrails are explicit. + +**READ [AppKit Overview](references/appkit/overview.md)** for project structure, workflow, and pre-implementation checklist. + +**Genie Agent Workflow** — when the user wants a Genie-powered app, do **not** start by asking for a Genie Space ID. Instead: + +1. Ask which Unity Catalog tables the app should query (fully qualified: `catalog.schema.table`). +2. Ask whether to reuse an existing Genie space or create a new one. +3. If creating: discover the warehouse, then create the space with `databricks genie create-space` (see [Genie Guide](references/appkit/genie.md) for syntax and serialized space format). +4. If reusing: discover existing spaces with `databricks genie list-spaces --profile ` and let the user pick. +5. Scaffold or wire the space ID into the app — derive `--set` keys from `databricks apps manifest`. + +Read the [Genie Guide](references/appkit/genie.md) for configuration, SSE endpoints, and frontend integration. + +### Common Scaffolding Mistakes + +```bash +# ❌ WRONG: name is NOT a positional argument +databricks apps init --features analytics my-app-name +# → "unknown command" error + +# ✅ CORRECT: use --name flag +databricks apps init --name my-app-name --features analytics --set "..." --profile +``` + +### Directory Naming + +`databricks apps init` creates directories in kebab-case matching the app name. +App names must be lowercase with hyphens only (≤26 chars). + +### Other Frameworks (Streamlit, FastAPI, Flask, Gradio, Dash, Next.js, etc.) + +Databricks Apps supports any framework that runs as an HTTP server. LLMs already know these frameworks — the challenge is Databricks platform integration. + +**READ [Other Frameworks Guide](references/other-frameworks.md) BEFORE building any non-AppKit app.** It covers port/host configuration, `app.yaml` and `databricks.yml` setup, dependency management, networking, and framework-specific gotchas. + +### Post-Deploy Verification + +After deploying, verify the app is running: + +```bash +databricks apps get --profile -o json # Check app_status.state: RUNNING +databricks apps logs --follow --profile # Stream live logs (Ctrl+C to stop) +``` diff --git a/.claude/skills/databricks-apps/agents/openai.yaml b/.claude/skills/databricks-apps/agents/openai.yaml new file mode 100644 index 0000000..1e3827e --- /dev/null +++ b/.claude/skills/databricks-apps/agents/openai.yaml @@ -0,0 +1,7 @@ +interface: + display_name: "Databricks Apps" + short_description: "Apps development and deployment" + icon_small: "./assets/databricks.svg" + icon_large: "./assets/databricks.png" + brand_color: "#FF3621" + default_prompt: "Use $databricks-apps for Databricks Apps development and deployment." diff --git a/.claude/skills/databricks-apps/assets/databricks.png b/.claude/skills/databricks-apps/assets/databricks.png new file mode 100644 index 0000000000000000000000000000000000000000..263fe98b84e8ff3516edc93e7c99230fb8fb3113 GIT binary patch literal 15366 zcmeHuwGvL_%Pi2UiaF2#a{7SAsT9mWLN30;^Ey{$?uT3zOpOTyHdK{j^8hhfvh5NwU}M^})*R($^?%1Z)$O zv6^?FX-TiMYx?H}6GoR%4rDx|NW8K`Drt!9hxATaunL8TRATtgiasJ(Q{PhVpMNi)U<1Ugcz_bCAU;~4{5W3% z5W$DAqRanFT@??0-Ol$8^xsbi-0mN(|4;h=odd}sIzOAK*$SI5KK$-&e7Gh)_%Zq+ z5gXOfSo)LD>!=#p3^9Dt*&Pw%L%YAZ7t5}5PezRE``s=U8v*!45K%_IS;-gEhzPwMWE=_QBR8TZ%nf|E~Ns4QQ zo-p>@gQ2AH!E-$3T;wb(ezjD?`9$2^z1i=E37uyZNW?>m-ne;7dKS-L%BhhsVf62=-{0M$lV7G(h{>xM zez-sySdqqLiJY|~n91da9X4#Rki0qBIQOGKZ1je*!}|j9kfVf@80lyH{sW@3(_z1l zcN)%mi%w3NcbX-Gk4Oo800%y|ovstTrVu?p5xl__L2R53ty&+g{&ve(luo~h0gZ>e z(bA{wVou=Fir+@7wKdSi?VERb2G3ylXh#?yIk~zt)m#OS!OStL!k+^1MK$G=RvHg( zgrdktULkPyvC$XoA$2iXkNs#FZI%ofFf-WR+i9(wj!NpexzT%w(&)FN>)#KF}R5`TG0 zX$YOQ&K;pp=}1Dg3ZC&iq_$|tk=d2@7yXn#44k%_!^exNKj2555E?j!V1COIMuMk3;Tcp)oyUGH^1MNc-BQ^FjKlg>g6;DfuE&ezv z;K5yA!a^UgXmH^DOQVlGb3W&Y1Be=!? z-fFUfE`kR&fkQbYR9%=d<1n9li=TwY9GUsmI(_mcfMj}8X*QY=NwnfS_zb%x)8143 zv%uFw^=CQn)(Bc!o&aJdKl(z55WiYxQTUY*kWL8Ej<46QPtr7sS8E1eI^KfA0m8_N zf-&Q@^s141gb{Wci)J&F^S_2}J&B7uGBXWxN|6QMk96Z>J)QiXdr}C?Pu0YA z{#ZO@l}0n)X9MRz-_)f+Cco!~y!AVij|t5prz?5eGAB8My`E#Y1ZMdpUvT@nBY}L( zA)%##sCc=r(<~aLi0YFsIFxuF<7x8IoQxbJ9zKOM#F|8k)FXt@fyBR89^8+6qI{3*CXCh|Hk)~swk8o258(Z#m*~= z6x-A!ryOMHY7c8|p~-X(nL6Lr*RJ`j6bRQDZ8jPhJp6@SvD>e{i{e4tXnawv&OGY5 z-5RHFqHgZH^;jZ&c5qS@9%ZUTbuIXdIWe_-&cM9 z$D#I@?3_>X<=2RrE4j)af>`^1$90^XNa!eF(8BYLFZkeBcAAqfT z>Cvjn?v9bp*`0)+tuGwg%qWs(S(u+^a-Ov=MZoxm%ZX`X?rzGA^9(ynWqdBIdSbgC zK%Z!<9S^jWeuLwL62WMJR1;IdgG+Rw9o|y1l8eUbO|*GCCoGX3+filj{UCxNpA!nE z)-wk~zio(B_wWqXT2GG^3Nj%H&6V~c9u#uK$X{QNkuFFGoz19zfjpz5PF;`jWMHi} za0$XAu65Km;CtpFI-Tz@Lx#}gKGqx6tp?M~`7|F6d2^RSnGvO_(Z!47L&7&WevQ~L zCIEjb80bqBOn5L)eRi9F>&Y;G#55(zFBUi>`tIy^=fFGUw+|C61_C#Cy<6~k?F2X+ zNO5$Pn{-b^y%4;4X}C5U1P5dTsM8i6)w%6I8zyszCB}y#UiDe+goeX@d_A6{Pp{MK zA>knRNTgBRW6ua0u9&S^WYn6>K6^#8eH$&1(eJJ^rp2f+2p8lHC}$){!(qn#o}6gB zXN~^H;8T9d!%<#O7)kjjQ+~j}{#|UA%2UG}!=bO$yD2c_GI4T+$21Pb6DC|xJ!L$p zaw<@)){LTeSN){p4{I{`RKLhBlBcjwL9+KU$z$=I2(EPJWlsxqzM{5*o*iu~fIK}s zDZh4n1`Zl&l~m3%{EOi+rlj0Iw@$JZ`df0X#-$}!QYfmrD**OWb+l`Lk$9hyWT=j> z=Qo?ovzW$$Ata>(xFeB_eT_%o%l2RhZA0C<6Pxj?;X@MO&T*r6Gxs1EXRiOJ?1&h(wT&~VcLVkR_1Wf7YoKbRuj z#WB*CYYp}+t#UWooon{H@#iuaqMO!wjSBv%!>9${u#DECf;Gc?;V5JiMPE;|#4SsE zz1>9s+++gx1F5J<-lPyWc>q(ZZJ9s6XGQd4>;sCB)+;dUT;N-GUFL zz_~*T1ifJ;tU}2*C7SU_{;fMX9P?~ zcIp8;FZcdz(dHXzny0&2dUWNv{i0t;2|W8f-2@BY=iHn+5igkRa3=x$kR+~|QjvU# zOu0@-SrW-&GbY13sNf84ta9PD)rchGrgop_Nz@H}(l@5^I3PT!sIA&{-L)t_d5AIg zOL{HOQ#6u-B>b7h+DaTs%-7GOdJSbGhef}mj>mIA)8_OVcInN$p(o1vF0fJYvj1xP zAQgTqw8CL|GDM`Qnl!3L3MqPZqCF;a-EwlK-!I<#?R$(gP@DT%LK}l%{ORAoANXq zs&b%A?RU;JFDU`+HoFET;zje?{R5hx3Xt3Y9uX^(YV*&wk=5k&Kkp5Rk`SChQY1`U zPfonN%C7|pe|h=hHR79sA{?!S8Y!`bt_R#9Z!fdb(u=i~eunUuT~Dyl=>EmPqfwgv z@q;Nv6Qvf6geSAD8fNvOQP@3S;{LcbQd})M8IFQQ-urcE?Xmw~LtSw*7`WvC`2u+% zb1F|MBXM69dll^%JO(cRRX6fd#x^8Kc19neJkjF)uTC9L53L0r8^masTgBb*c`2Xv z29D={1zdE{nNt60# zIyJtpBS;B@{j_^gEfYfcQ#EN_mX#Qve>a6rfdQT}n30^@Y_sP=xik(7mThdtYX}-p zfGi7cG@SRjUA$9pzQPzJbW5j*%wOX&l5&o!T_SuNa8KQ0CTY3MXg?P9x^#F&dw;>I zMer(!;34?D+fHYlYb?D{@C9O)Fc`UlFb-WD1N$t4IGGBLd3Xh6)AF;fQqSl~|LW46 zsE0zYSpDr}Lk~Wf@|E9T**f>ye>>{lirLh~83KCwMCH>v7s@6ag;3?sAn^V2_OF`) zrCdD|Zns!9?UZ#k0VURUHOtE4Mm@#*&y>O3XalWYuP5R=&7a9qyIE)7OK;sMaCyww z*DZH1L9eUo*Y-j{$!pn?t8cK7bXc$qD}Pwd4q@7pKF(qDYi9N+O1X%QAuMM+cSC;J z-|%djb_Rl7%I%Wht3~u|=-PIozb0GCcBbZ=AFHBYXE!A4_(SV>l{0h{pJu@{82g&P z<(_!3qfb_dPI|&vmsB56v^*n=UgHZzi4D^;jx_k z=37Kd>={Qht6tx&DSW29bsk-FSUGmtRU6;kLm0-HQyjCQwGuT6_0G5Ns|Kix4;cQ~ z7#`ubv#2_tKCLDuvVrwUX8+8Q>vI-(e1P|eNI%GivWp!Z^X!ce1=+LX&96nZ+jFtH zOuoEiEBy>?CoJTaZ8t&A%ZeX+O?!ep|EqP}P=}7CVQ*sAVWDzq`x|{*SH7>E7C1Bc z<4Nxq=W|B!OEoH7j_;SC2&LU%UgHd0!Osdgjl*5+CyzzCj;Dsn0Y4yqCY=$x?l6tO zxY{AFbF)a6`xlOw%p-aNLx@Kb^!7;ZFw>7KHLXg6K0&swV(}Bj<;;jlU%6C=5SHP& z$9#5F(G1!Ss9`7RYMI}YhGcJLh%6nnAafhRrj{8;nbR<^j7?1nwxa$_lH95qVI4`> zDerlYtcm?8aHBfkPsOM}B74zuXSBX+#57C7Ibm9L2Mv9)eeug($e9CrDRz-O;sopH zB%=O1t3B$3W<%CkMj!j9e$8(vkk^jrt3FmX_r|O@8^DtMtojMs2MPUuYu(b|(^!|W z`<=7wpMzXUU)*gBR1NP$!RY-FJvuwG`*=e1-S-RIpuAl&)37`l);9c$JM~u#ev%9s zrgJr4%% z=vGGx9@5G_VB|{sj;!rryVq%-*H$dughu26=}M1!s>CMDjEO{BZ^2c+e2A2*8pNqJ z0?h{S1-2bZ#86=1zRE~=v)>2yjGrkO(G0Pg7W^q;7Ia^MH8o-=LiV(;An102C4r|H zTFeEU^i=cOBqfRKggFdZne*t17XSOBOv`JU!coA+KjSznRSzDt}wAhb%m_M#7WM9S5MzV{y zK+MWSxc%GlO>tSl(E$&35J%?{d}7Ysced+Byei%2$?bkfo+l5NXu%(2&L6`D!LT+p z8+=b;(*`h_FrcuzP(wS_%{Xr$_lHc#L*}lJ6(hdF%H6(b2I=Ip!Exi{?~7)*zZ^2p zYzVw|y-(vd88X&BQDmfH70nS%ALA9mOCaf4F*L=4i2c~8^ieyG&mLquvWjHS;qscb z1FVTL%UGHLWsUS4QC4x;(Qz}}q*H|2^`TSe1cXTsqWpCNB|^~QYS`^YFUGznM4YH( zX#0Uun3c|lIBAGT2+!Q_{i8YWWjkAt|Hb=tRd-Fb9Vqb{mL!$2qk9r<?_`Z;DkOAPJbxH0{{io= zZ~NbD@#{3Fw$(DfggF9+2jQd4+}h9OAnZwoDw3W_etDJ6Z9i0=$(PkH*DQ%v+Qt^%ZhUF4^uq^}3xX1HD4-Zs0hmHdqQw_E(DAb%ACK_$X2%lr|n z$*yFv$`;8*>%45m4I3Cq>dyRL&7*3d3!MFku9ENi$TwQ=KsZ5e zD3WWg=R6@F@*G%zklf@QP%dFJ6*2WBV4eiI$qVmKHCXo)hPz$uqLhus1y64m<#_mK zD-GF5dx>jJsx=Su#^_z#xi=CRT7I2?vT@#_W@}aVG=`A< z+23Ccf{eIoocMCwuCQ}B^|wp<801(@^6K{{Rf;7uOYls`%YS(P5UD5kpa{0sV`=qU z8~SpS)aqgTr}jxg^@B3sd=JClt)eg{%q>u>(>poix64Jg!)hE7zDZId^<2AJsgd=o zw`oO%xr=hHxVr_)E74D_nthfP+$zKtO0ek_X@E>1cQV8ID}M)BO$0ZxDnj}9?;AXW z48O+x`-l5u4M&?_tK)v2Hdnk|*>3Rk=CGv*FPD5T#(c3CMb02AE{}^S8Pe&;>CsPl zUZ(g1ufLY>h7gYHX*&e`8GcODJMt0N0Ku&qe7%9JBF${d{4f{xxzjn}jsk@RbBc%Z^~Idh$Ng1?r8%|+(gbII%mGEr4 zo8=&dTXzEAEYmg_Wn1mY@;iOBnso90aH4ha%N3#;R`Ph9d3D&*_!c}o+Gbi8Vuq&O zuZgTc3$RBFxjan`VR_lbKd*HdmLU4fWYzlSi^Gv7b7n4N)eKkI-wG`3rN=uQ#+*kt za$mr!G^$0ivVvR6wtD_=tXoLe$IZ``7N<1uNsG8Ez7CvdBzI-1z0FxzbW8G)nRh%F zxz+Orwp)5y=S#AxRWP- zhOvfvfAUPPHdC*c(EYa6r(TVy2=<{C$#Y{An7CExS#dE-3f~*WQ=6i8ui09uYa38C z<>KEOxXnZX>I_O-qWqItZW3M+rLZRE=x*|mC9IbE@ll8CHAs}uWdBKlN5_CyXRgDn zwR~cA{PJ3tc(SuQoBUU!f0%U#Hd`{W5`Q1Z_jD|NfTCk3 zFOxyj;j8+p`@2aelPbk^k8rCj$Q;CqD_72G!lH^&jz4SFAnnle4_kf2WQAW>$*>mj0` z|Lc^N&jZi(V2=>R4oc@h1*bMoT+y=h_A%-?3lU^a&sj{}fuwjKUU1lfn_?bQSzI`p zFj~B4FZ>0)bmT%qMdiJ& zeb*NhhHp=MvFqHDHm}-S2=XqER1A6@1dEq8p=)}>#2=9NL4Jn-hZ(!SFSQ)b;-#N< zH(C7WwL>MQC@*&&sxCW1c#3zhFCu?eTEm!HRIen3K08V@Ou z#xr5H+7?NJcDd``%Au=P`lYn<7!#aknI&!GZ71oT?=QSL)ZPDVSja$BV23nKDHu>A zHrQITFgWPEueyo-`b8zZ&hS4+AJ%xFe-ex5^{K5J#3BnX*-4XGP9w|z1EasF4 zTdIr4VY~LN7qwiZgf29itdfQ(#)PMkJ9#IjuT$SH8*;@(ZN6JGWHo5ZlA23>$OdxN zy(6;bmeWVsgw_#n9oM_FjZyI>apGo^p_eXXO(1 z?%d+4kyEcYkzRrdpFk@SFVzJ5uiSJJR{2$rp<9#Uis%xN&%7jRaP)DRmw@*Trm8Nx zgs$pxb#*3+Ygi!F>Rt9)!yF|hsLD@Ca1VM1jC+m7A`>5MFJQGA&sW~?jg=wP(~2iw z3`5%uV)uoc4wNda8ai$nw9Tvb7+*l)JokDTk%?Whrvfo~C#%`)_jL=is^=G+56UE; ze2Q!{+H}tWB@51H%hbrYu9ykFDCgL9*VRXpLJ8eo>J9l1*X_`o?{ac#F1-51m`|*$ z&S(J1o=HG5hvaL59j1(Rsf_n{P)qR>`oRQyTpxpGI4x4oyk#8F{d|-;`Fcrqi@#8Z zIb@cP<|Llz-gueSaa#LfaH}QIm@D&1e*p1D1JcaSjSjoqN^-3C9qUJfsSLn!EFOwg z_$?8tjYQrxC57|g+jFCh4NULa488p5;%L;^(*UwsO3N#zJ` zfj)@vO2S->k2XH_1jRRV6ghtqB=BG?{Q2Thpe*$PmRId^ran zPwEB6k;B(L-9BrJD=?%(d|Ii*T79R_1Gi55marduE&1ye>t49LjJk9E+*4TiEb-Tu zsk?_T5^a4!8n(`Wh0n$22hgpb^Ii%Nc~PZg703fAQge>VR1dGw3jqT~j>^jM^iV}l zeNbGMZQr7JLbe5hvG~23NSAA73XlHzt$dRC=3FP{=!%uKZ)V<~49aH~v|_}OM{iGX z7rKC8yOq5-xU~Ls9i`l57K#O_TxQE#mM9px)5H_`pf_@BCy)}wjk&WzD z^m24(T0Hmofs)Q71+q)Wff+0(LO^Wd2q`_QY_v=2-(EkHPZ}CF~@lw-7@Ef9S)CPLeB8hX&l}v0LsiPOe-a z9p0GvTr}x4o-O?vgts`REWB9tazJd? zoxJ((en?IFuv4Q?g*@c2pY+d%1QH<9R7|C_&`wsl%VCe8!1Gm*yU$C!isJZ*U^PSYwqEcC-4g9U7&{rbFi7NEyGwKhGkCH%upmP_j_UF4whhwK0?K%e*T0q#XDe zv+SZ4IM2n(7g^D=Ay>qwy9k4hGF6*xBfOd8i}f-r-ZShCQj^qqbd*u=|FMUYS49yc ztD?vo&g4)>bLzK0pDRLLHy;<1hgj5{I)ys;EO<+kIZ1z@#rb<%_fRxF3HbiAH*0H% zuw1zXYS-?)xMiXDZ8D=DtN2F%>JnOu4^Ig&FR@+OY-w&9=x3I|`JNbz)c^g7{r(6h z$wSD+e0hu0l2E%tonAR14ig4nMMuEKrL>ZNPIU!VSBUE5_e)mgKP9=l^^p8%Q)G8T zXL>{`lyIr=$KSNUKTe==w7#Resk75s+?CIUUcBCH`n@?&-+|jR*o!*Q!%-{Djh0X{ zV@XYCV(-{d4h4SHAVcRK@a|0RQ9~KiM(`=_Noui>&cV}~SIESk%S0e8U zqj#Rl&Gp@ z1g@xmDiB6H_hI`*wdGJ^2#Xq|qAk0wD&L#hEnfB`Z`iOiqoXOIG_baZM+`_0nC2p3 zokZfu)ay8k#22}4Diyx@M~X^bRP{+X%}^>urQE|`y7^y^q1cMlyXT=W+A*-{CXxA$ zTfMsYZ`Oge%IwYNq?PKYs`*DV2zHiPse}@4+0tFsXi5BJAyBsaQjNe)4Nf(n>d--> z9D6TMGJ~-PNHNdSP(lNDF54E+tNBP90eCKO{gZEatI+uK5BnswDQoLg*+TalaL+lD z&ScSN+~1v+M=q5v@@xP4S6CCh6YpfyW;`}cb+tXvFdax7P>d6|nvj`P+_K~syD z7BFh*T1MiY|JnC=e(!hI*Sm8JW=WEU{FBh9t`owNn4i|i?*E$2g*_nFV2cDiJdo0l zDFNlGNY`*$#E@Y%v)J&U(Q6^e>iP0Jk7GqM5-Rt3P&$_%sND)SnPErQ3p|u+2Z;c5 z+N#Uh6WkCYsyI}V+>|>1Bt;WjEj#G0F{bj!)TQfmdL$`y>T}^RZma8La^kWp5gf?( z3`1RBU(%M$0K-P z!R6AU7yHTS*?RHQ)P+x4j^>=aG}dQQLPTd%T-{9#cLf}BR%LaZ8mBl7TMHYG_P??J zJv%r_Fg^@p-F-X~%nBOTw1=iRj4)Sfn0Jfl7h$ep)QzqMPT<4v9YBA8@stcM5p0D| zvhw|ln_xv+4$v}0icOLP4kcW8FqGxQUVX$rwt(Q!MLR+YAGEQdEu(pku7!yX3#WKG zVif5Hv47%yb(^NV^n9(SFloKt!ev7WL2aPT$=e46&*do zyBv6+w|QdlXIRk_&sJ5rLqS-)w66O~F`<*VABXv{^`t_iS|;l;rcW}bzv7RY9+%UF zV7+H0446IX|CqfPTS$Oc?o6%>rx&Yrnoex_@iXk_N*t!Y>f%V25nsEOja>9>DhNh< zPaYEFp!-|UfKi2KNVd{M@d2BAsj1hK%P?0eN0!GSP|p;q<(u z6_qeJy^0&bhafPh8o?Qer*BxBNzJ}l09t}|3eq4`bdaifbdqD`#M2`_blMCL|69iB zY$UiPJL(~Ybx?HK61z>Fn7h%Pn(%8Myz}Q?kOuSsI?X_$=D=-W!wtCM5}ZnPrk4{0ajC)4SJ zzPR!8%4_TjTtpA|XfJa&LAL#vX2)zwjeCqy&DR29 zX^Y=R9hPXDuJa{Ns2O2+cv2s#o;^BswE0-D-ocbBno)-r|A>is_<3p@=*dbL8JkqZ z!9g|s%C6OJR!!iOntcCGe#f(G^&nO9!7%8{gtE>DRL5@P%Ab+o!2 zTsw}vW>caC+A29-*URlt8beobjH2>X@HuxIy+2N4+WrFTg!~^LmKN9nJAi6mAr0y>v762K6!3Y zo2z=Q=i>1FJVvMa&q$#HXm=}2T_}EdI}pPMdgs*kwZ<=&E=;I`D_Y}TJ&TK{MeI?N zXfLb1VdvnV3?baIvxCf;NUSP?ik?;1+kT3NL-$Y4_#&Ane}W*?1XsM~C}$8^G-K?v zbQ`^W{XLVZ2Sp2K?It#JB=Gd0qR2b$!1(LqLZ`$3LSPbw-6Nyx9DBExGX~*rg@u4? zyQl8J-5*bu#XiuGO-ZuRYtzXt>8R@yOh}Hs1F)+4Vln|97xp9dkJkkZxT+iz~v+ zm0A}eFg4Yom|Bix*0r8?FZh1-JO&aOa_B(Sb4*TpEkL&}MzoNWOPLTb;W4 z&h3gAIIdoQIsW_reBs%z_&}IW>zUXCeh2Psd+}?pIX%6+9?T3=Jmj6xE!ekHS=0Bj zFOPgbT?HCo<-Y`dq>xtDQqU~Mrx%PTA@}JXN&2ZuCv5BGOi~`SnyatC38!q)Vi2zSa5XRt0-za`p{j_gY$Y24GeSx8+Ko#sq2Gh z;lYi2z~|rpWnRb2Qh&Cr(Sn^h{t3XzRsmcsydQf~8UsfPSk!#ppO5sut8oN}N+18Tpv>OZmkvH1$`;W7H8xw_x zCjZdfp+fKr;7|qw);J&8_o*Y+Yr+zgiw1sH10ZSXFQjrPjISk{q`<#+kM-ji`Hhp8 z%=gGwENHT`c8%S4)l$?ZUPUxSTw_Pujat_{QHDu=dC%q^!z}1EEB8oN*gR60skQ4s z3CL7)tDbOQ4g$q9*yw#e^CX8DdAe=)5lGcJa0y?*0TfLWroRc5+#OI_u0b0EMXeE5 z4{BP4?`^mr6nt9f&(>5xuJxS$jT4zlGny+MFrbB)+0uT<& z_Ihx!gDDCt=gR{fC2;l)v^W0%kmaW_cWr|J(bn5-ObNfwDyFYiOvPY**myCh@r1=Y z5=jfS`NmVqS&p1&4hY93CU=$xABV9_+V1hV?0`>h$8&!?8GG+8>&cELyIJ6+xywo{ z+0GM#!@R?Km%Wg5n`XwLKF1J$&trOY{Sj;w^xEz5=6cBpTA(7RZA*TFw0L=duEl0wH zUrFqKq(TUa_*>%xR9Nx!GcmOFb1QKdbcx-`@$LH8x_zgzmnhxut!T0UQVz6RJN__Iy~HR*VFb3CY$)N9_ZxOXy0R} z)lB;_Z4xgJpd&UTl%Oj7T(`&>lm|b-yJbN^LBhz^G;Z^q4b?3KAwWdq?`i1Qc*ZEz z*4j$T$4>DoGbBxK>~FvKJt?noWl^|520 zyHeOmx{b`@p}Z_4;kn9%(7W0_(N^ew@BP$o7N1B*=(OrW_THirDi5=E3q)BRemlHs zzIe%rg8d9l2y@_^<@WX1RU29T&hW6~vI;RE2w5Ujun&3ZrU>cji*W)~JRufUsCkzw z-0D<79kUIDk!$TIjYGw|%dFHUtn~%4ie;}>7wQ4L$|&%=9}{ew8c}zcxBl~N06&&Al(h)3;3cwC z7dM#!;Jf~s)h*gU>iJ5}sn!6To`7XWbUR5yfz8p?mUmXv4T((i&J?;T;)%ZebkElrYUNLnpMdKk?`fgKOli-tn z#2nkxYwT|4H}q#a2bM_=ME+5hWoVg zQ>+;wv%LeZ*Xp?qI15Aa_XP@Z7$Ea50)TJO;eGe!8bB*w*%tkU@1M6Lt3JVQEr!mK zk7n(=4Ql3^&6*oz1e7gc-eT*QI|D-dTx)lJDdjewNM6F`J8ktPLLK{OFGMbJr{tfF z1|u6mV8s*ma-^u;vmYFY{09-o#l7KAoD)1T3JG^!>xhL+sxWRN4KPn@-2gE9)zbw@ zLhvqep3wlQ|L{EKP6K9@?N!vmCmF5qYQUGvyTuw`h>}K;X(^NO9eTyzxgwthOBVXJ zXn}F>A1LKsi=KJr&4J)edfzRF2Tvj8NpMrj2s(m0Nn^+`Y`m!e!QgsLtD%x~#ZKj# zhIcSa5&y`Ng?{;^lFepT4p-*?p(9EL;1^*#sjG-&RPIqG)=SJ&s^(5iiV~wEWw_BN zRbaFqUS|zyQS{Zx_8?*f6du##|FsS^G^wL5Js7%4a0!j1Cm@n*R5>k)7vexA4`(jU zC6a4~i5-gq1r`&!-g=)kqq38X>4L-ag%#IAVWX&YUu4}B*ceMgWAH@I&}XMmX%J>}?KxWL=S5x-i4qRp9$&7QGPLK9Ug+`h09z!H82HdBzv z#SVD|Yu{;R@na>g@_&nfYs7#zV}v~~v8pJPIKl6#yA)Tz^2?D$Z6F98w5kpmEh1u_ zjRL3H%M}Z`oO~p@17A3oj#3^9tYz$5L@5}!)2Xy7%P~ + + \ No newline at end of file diff --git a/.claude/skills/databricks-apps/references/appkit/appkit-sdk.md b/.claude/skills/databricks-apps/references/appkit/appkit-sdk.md new file mode 100644 index 0000000..008c6b6 --- /dev/null +++ b/.claude/skills/databricks-apps/references/appkit/appkit-sdk.md @@ -0,0 +1,106 @@ +# Databricks App Kit SDK + +## TypeScript Import Rules + +This template uses strict TypeScript settings with `verbatimModuleSyntax: true`. **Always use `import type` for type-only imports**. + +Template enforces `noUnusedLocals` - remove unused imports immediately or build fails. + +```typescript +// ✅ CORRECT - use import type for types +import type { MyInterface, MyType } from './types'; + +// ❌ WRONG - will fail compilation +import { MyInterface, MyType } from './types'; +``` + +## Server Setup + +For server configuration, see: `npx @databricks/appkit docs ./docs/plugins.md` + +## useAnalyticsQuery Hook + +**ONLY use when displaying data in a custom way that isn't a chart or table.** For charts/tables, pass `queryKey` directly to the component — don't double-fetch. Charts also accept a `format` option (`"json"` | `"arrow"` | `"auto"`, default `"auto"`) to control the data transfer format. + +Use cases: +- Custom HTML layouts (cards, lists, grids) +- Summary statistics and KPIs +- Conditional rendering based on data values +- Data that needs transformation before display + +### ⚠️ Memoize Parameters to Prevent Infinite Loops + +```typescript +// ❌ WRONG - creates new object every render → infinite refetch loop +const { data } = useAnalyticsQuery('query', { id: sql.string(selectedId) }); + +// ✅ CORRECT - memoize parameters +const params = useMemo(() => ({ id: sql.string(selectedId) }), [selectedId]); +const { data } = useAnalyticsQuery('query', params); +``` + +### Conditional Queries + +```typescript +// ❌ WRONG - `enabled` is NOT a valid option (this is a React Query pattern) +const { data } = useAnalyticsQuery('query', params, { enabled: !!selectedId }); + +// ✅ CORRECT - use autoStart: false +const { data } = useAnalyticsQuery('query', params, { autoStart: false }); + +// ✅ ALSO CORRECT - conditional rendering (component only mounts when data exists) +{selectedId && } +``` + +### Type Inference + +When `appKitTypes.d.ts` has been generated (via `npm run typegen`), types are inferred automatically: +```typescript +// ✅ After typegen - types are automatic, no generic needed +const { data } = useAnalyticsQuery('my_query', params); + +// ⚠️ Before typegen - data is `unknown`, you must provide type manually +const { data } = useAnalyticsQuery('my_query', params); +``` + +**Common mistake** — don't define interfaces that duplicate generated types: +```typescript +// ❌ WRONG - manual interface may conflict with generated QueryRegistry +interface MyData { id: string; value: number; } +const { data } = useAnalyticsQuery('my_query', params); + +// ✅ CORRECT - run `npm run typegen` and let it provide types +const { data } = useAnalyticsQuery('my_query', params); +``` + +### Basic Usage + +```typescript +import { useAnalyticsQuery, Skeleton } from '@databricks/appkit-ui/react'; +import { sql } from '@databricks/appkit-ui/js'; +import { useMemo } from 'react'; + +function CustomDisplay() { + const params = useMemo(() => ({ + start_date: sql.date('2024-01-01'), + category: sql.string("tools") + }), []); + + const { data, loading, error } = useAnalyticsQuery('query_name', params); + + if (loading) return ; + if (error) return
Error: {error}
; + if (!data) return null; + + return ( +
+ {data.map(row => ( +
+

{row.column_name}

+

{Number(row.value).toFixed(2)}

+
+ ))} +
+ ); +} +``` diff --git a/.claude/skills/databricks-apps/references/appkit/files.md b/.claude/skills/databricks-apps/references/appkit/files.md new file mode 100644 index 0000000..108d9c0 --- /dev/null +++ b/.claude/skills/databricks-apps/references/appkit/files.md @@ -0,0 +1,268 @@ +# Files: Unity Catalog Volume Operations + +**For full Files plugin API (routes, types, config options)**: run `npx @databricks/appkit docs` → Files plugin. + +Use the `files()` plugin when your app needs to **browse, upload, download, or manage files** in Databricks Unity Catalog Volumes. For analytics dashboards reading from a SQL warehouse, use `config/queries/` instead. For persistent CRUD storage, use Lakebase. + +## When to Use Files vs Other Patterns + +| Pattern | Use Case | Data Source | +| --- | --- | --- | +| Analytics | Read-only dashboards, charts, KPIs | Databricks SQL Warehouse | +| Lakebase | CRUD operations, persistent state, forms | PostgreSQL (Lakebase) | +| Files | File uploads, downloads, browsing, previews | Unity Catalog Volumes | +| Files + Analytics | Upload CSVs then query warehouse tables | Volumes + SQL Warehouse | + +## Scaffolding + +```bash +databricks apps init --name --features files \ + --run none --profile +``` + +**Files + analytics:** + +```bash +databricks apps init --name --features analytics,files \ + --set "analytics.sql-warehouse.id=" \ + --run none --profile +``` + +Configure volume paths via environment variables in `app.yaml` or `.env`: + +``` +DATABRICKS_VOLUME_UPLOADS=/Volumes/catalog/schema/uploads +DATABRICKS_VOLUME_EXPORTS=/Volumes/catalog/schema/exports +``` + +The env var suffix (after `DATABRICKS_VOLUME_`) becomes the volume key, lowercased. + +## Plugin Setup + +```typescript +import { createApp, files, server } from "@databricks/appkit"; + +await createApp({ + plugins: [ + server(), + files(), + ], +}); +``` + +## Server-Side API (Programmatic) + +Access volumes through the `files()` callable, which returns a `VolumeHandle`: + +```typescript +// ✅ CORRECT — OBO access (recommended) +const entries = await appkit.files("uploads").asUser(req).list(); +const content = await appkit.files("exports").asUser(req).read("report.csv"); + +// ❌ WRONG — omitting .asUser(req) +const entries = await appkit.files("uploads").list(); +// In dev: silently falls back to service principal credentials, bypassing user-level UC permissions +// In production: throws an error +``` + +**ALWAYS use `.asUser(req)`** — without it, dev mode silently uses the app's service principal (masking permission issues that will crash in production). + +## Frontend Components + +Import file browser components from `@databricks/appkit-ui/react`. Full component props: `npx @databricks/appkit docs "FileBreadcrumb"`. + +### File Browser Example + +```typescript +import type { DirectoryEntry, FilePreview } from '@databricks/appkit-ui/react'; +import { + DirectoryList, + FileBreadcrumb, + FilePreviewPanel, +} from '@databricks/appkit-ui/react'; +import { useCallback, useEffect, useState } from 'react'; + +export function FilesPage() { + const [volumeKey] = useState('uploads'); + const [currentPath, setCurrentPath] = useState(''); + const [entries, setEntries] = useState([]); + const [selectedFile, setSelectedFile] = useState(null); + const [preview, setPreview] = useState(null); + + const apiUrl = useCallback( + (action: string, params?: Record) => { + const base = `/api/files/${volumeKey}/${action}`; + if (!params) return base; + return `${base}?${new URLSearchParams(params).toString()}`; + }, + [volumeKey], + ); + + const loadDirectory = useCallback(async (path?: string) => { + const url = path ? apiUrl('list', { path }) : apiUrl('list'); + const res = await fetch(url); + if (!res.ok) { + const errBody = await res.json().catch(() => null); + console.error('Failed to load directory', errBody ?? res.statusText); + return; + } + const data: DirectoryEntry[] = await res.json(); + // Sort: directories first, then alphabetically + data.sort((a, b) => { + if (a.is_directory && !b.is_directory) return -1; + if (!a.is_directory && b.is_directory) return 1; + return (a.name ?? '').localeCompare(b.name ?? ''); + }); + setEntries(data); + setCurrentPath(path ?? ''); + }, [apiUrl]); + + useEffect(() => { loadDirectory(); }, [loadDirectory]); + + const segments = currentPath.split('/').filter(Boolean); + + return ( +
+
+ loadDirectory()} + onNavigateToSegment={(i) => + loadDirectory(segments.slice(0, i + 1).join('/')) + } + /> + { + const entryPath = currentPath + ? `${currentPath}/${entry.name}` + : entry.name ?? ''; + if (entry.is_directory) { + loadDirectory(entryPath); + } else { + setSelectedFile(entryPath); + fetch(apiUrl('preview', { path: entryPath })) + .then(async (r) => { + if (!r.ok) { + const errBody = await r.json().catch(() => null); + console.error('Failed to load file preview', errBody ?? r.statusText); + return null; + } + return r.json(); + }) + .then((data) => { + if (data) { + setPreview(data); + } + }); + } + }} + resolveEntryPath={(entry) => + currentPath ? `${currentPath}/${entry.name}` : entry.name ?? '' + } + isAtRoot={!currentPath} + selectedPath={selectedFile} + /> +
+ + window.open(apiUrl('download', { path }), '_blank', 'noopener,noreferrer') + } + imagePreviewSrc={(p) => apiUrl('raw', { path: p })} + /> +
+ ); +} +``` + +### Upload Pattern + +```typescript +const handleUpload = async (file: File) => { + const uploadPath = currentPath ? `${currentPath}/${file.name}` : file.name; + const response = await fetch(apiUrl('upload', { path: uploadPath }), { + method: 'POST', + body: file, + }); + if (!response.ok) { + const data = await response.json().catch(() => ({})); + throw new Error(data.error ?? `Upload failed (${response.status})`); + } + // Reload directory after upload + await loadDirectory(currentPath || undefined); +}; +``` + +### Delete Pattern + +```typescript +const handleDelete = async (filePath: string) => { + const response = await fetch( + `/api/files/${volumeKey}?path=${encodeURIComponent(filePath)}`, + { method: 'DELETE' }, + ); + if (!response.ok) { + const data = await response.json().catch(() => ({})); + throw new Error(data.error ?? `Delete failed (${response.status})`); + } +}; +``` + +### Create Directory Pattern + +```typescript +const handleCreateDirectory = async (name: string) => { + const dirPath = currentPath ? `${currentPath}/${name}` : name; + const response = await fetch(apiUrl('mkdir'), { + method: 'POST', + headers: { 'Content-Type': 'application/json' }, + body: JSON.stringify({ path: dirPath }), + }); + if (!response.ok) { + const data = await response.json().catch(() => ({})); + throw new Error(data.error ?? `Create directory failed (${response.status})`); + } +}; +``` + +## Resource Requirements + +Each volume key requires a resource with `WRITE_VOLUME` permission. Declare in `databricks.yml`: + +```yaml +resources: + apps: + my_app: + user_api_scopes: + - files.files # Needed when using .asUser(req) programmatic API + resources: + - name: uploads-volume + volume: + path: /Volumes/catalog/schema/uploads + permission: WRITE_VOLUME +``` + +> **Note:** The scaffolded HTTP routes (`/api/files/...`) execute as the service principal and do not require `user_api_scopes`. The scope is needed when using the programmatic `appkit.files("key").asUser(req)` API for per-user Volume access. + +Wire the env var in `app.yaml`: + +```yaml +env: + - name: DATABRICKS_VOLUME_UPLOADS + valueFrom: uploads-volume +``` + +## Troubleshooting + +| Error | Cause | Solution | +| --- | --- | --- | +| `Unknown volume key "X"` | Volume env var not set or misspelled | Check `DATABRICKS_VOLUME_X` is set in `app.yaml` or `.env` | +| 413 on upload | File exceeds `maxUploadSize` | Increase `maxUploadSize` in plugin config or per-volume config | +| `read()` rejects large file | File > 10 MB default limit | Use `download()` for large files or pass `{ maxSize: }` | +| Blocked content type on `/raw` | Dangerous MIME type (html, js, svg) | Use `/download` instead — these types are forced to attachment | +| Service principal access blocked | Called volume method without `.asUser(req)` | Always use `appkit.files("key").asUser(req).method()` | +| `path traversal` error | Path contains `../` | Use relative paths from volume root or absolute `/Volumes/...` paths | diff --git a/.claude/skills/databricks-apps/references/appkit/frontend.md b/.claude/skills/databricks-apps/references/appkit/frontend.md new file mode 100644 index 0000000..fc617f6 --- /dev/null +++ b/.claude/skills/databricks-apps/references/appkit/frontend.md @@ -0,0 +1,174 @@ +# Frontend Guidelines + +**For full component API**: run `npx @databricks/appkit docs` and navigate to the component you need. + +## Common Anti-Patterns + +These mistakes appear frequently — check the official docs for actual prop names: + +| Mistake | Why it's wrong | What to do | +|---------|---------------|------------| +| `xAxisKey`, `dataKey` on charts | Recharts naming, not AppKit | Use `xKey`, `yKey` (auto-detected from schema if omitted) | +| `yAxisKeys`, `yKeys` on charts | Recharts naming | Use `yKey` (string or string[]) | +| `config` on charts | Not a valid prop name | Use `options` for ECharts overrides | +| ``, `` children | AppKit charts are ECharts-based, NOT Recharts wrappers — configure via props only | | +| `columns` on DataTable | DataTable auto-generates columns from data | Use `queryKey` + `parameters`; use `transform` for formatting | +| Double-fetching with `useAnalyticsQuery` + chart component | Components handle their own fetching | Just pass `queryKey` to the component | + +**Always verify props against docs before using a component.** + +## Chart Data Modes + +All chart/data components support two modes: + +- **Query mode**: pass `queryKey` + `parameters` — component fetches data automatically. `parameters` is REQUIRED even if empty (`parameters={{}}`). +- **Data mode**: pass static data via `data` prop (JSON array or Arrow Table) — no `queryKey`/`parameters` needed. + +```tsx +// Query mode (recommended for Databricks SQL) + + +// Data mode (static/pre-fetched data) + +``` + +## Chart Props Quick Reference + +All charts accept these core props (verify full list via `npx @databricks/appkit docs`): + +```tsx + d} // transform raw data before rendering + colors={['#40d1f5']} // custom colors (overrides colorPalette) + colorPalette="categorical" // "categorical" | "sequential" | "diverging" + title="Sales by Region" // chart title + showLegend // show legend + options={{}} // additional ECharts options to merge + height={400} // default: 300 + orientation="vertical" // "vertical" | "horizontal" (BarChart/LineChart/AreaChart) + stacked // stack bars/areas (BarChart/AreaChart) +/> + + +``` + +Charts are **ECharts-based** — configure via props, not Recharts-style children. Components handle data fetching, loading, and error states internally. + +> ⚠️ **`parameters` is REQUIRED on all data components**, even when the query has no params. Always include `parameters={{}}`. + +```typescript +// ❌ Don't double-fetch +const { data } = useAnalyticsQuery('sales_data', {}); +return ; // fetches again! +``` + +## DataTable + +DataTable auto-generates columns from data and handles fetching, loading, error, and empty states. + +**For full props**: `npx @databricks/appkit docs "DataTable"`. + +```tsx +// ❌ WRONG - missing required `parameters` prop + + +// ✅ CORRECT - minimal + + +// ✅ CORRECT - with filtering and pagination + + +// ✅ CORRECT - with row selection + console.log(selection)} +/> +``` + +**Custom column formatting** — use the `transform` prop or format in SQL: + +```typescript + data.map(row => ({ + ...row, + price: `$${Number(row.price).toFixed(2)}`, + }))} +/> +``` + +## Available Components (Quick Reference) + +**For full prop details**: `npx @databricks/appkit docs "appkit-ui API reference"`. + +All data components support both query mode (`queryKey` + `parameters`) and data mode (static `data` prop). Common props across all charts: `format`, `transformer`, `colors`, `colorPalette`, `title`, `showLegend`, `height`, `options`, `ariaLabel`, `testId`. + +### Data Components (`@databricks/appkit-ui/react`) + +| Component | Extra Props | Use For | +|-----------|-------------|---------| +| `BarChart` | `xKey`, `yKey`, `orientation`, `stacked` | Categorical comparisons | +| `LineChart` | `xKey`, `yKey`, `smooth`, `showSymbol`, `orientation` | Time series, trends | +| `AreaChart` | `xKey`, `yKey`, `smooth`, `showSymbol`, `stacked`, `orientation` | Cumulative/stacked trends | +| `PieChart` | `xKey`, `yKey`, `innerRadius`, `showLabels`, `labelPosition` | Part-of-whole | +| `DonutChart` | `xKey`, `yKey`, `innerRadius`, `showLabels`, `labelPosition` | Donut (pie with inner radius) | +| `ScatterChart` | `xKey`, `yKey`, `symbolSize` | Correlation, distribution | +| `HeatmapChart` | `xKey`, `yKey`, `yAxisKey`, `min`, `max`, `showLabels` | Matrix-style data | +| `RadarChart` | `xKey`, `yKey`, `showArea` | Multi-dimensional comparison | +| `DataTable` | `filterColumn`, `filterPlaceholder`, `transform`, `pageSize`, `enableRowSelection`, `children` | Tabular data display | + +### UI Components (`@databricks/appkit-ui/react`) + +| Component | Common Props | +|-----------|-------------| +| `Card`, `CardHeader`, `CardTitle`, `CardContent` | Standard container | +| `Badge` | `variant`: "default" \| "secondary" \| "destructive" \| "outline" | +| `Button` | `variant`, `size`, `onClick` | +| `Input` | `placeholder`, `value`, `onChange` | +| `Select`, `SelectTrigger`, `SelectContent`, `SelectItem` | Dropdown; `SelectItem` value cannot be "" | +| `Skeleton` | `className` — use for loading states | +| `Separator` | Visual divider | +| `Tabs`, `TabsList`, `TabsTrigger`, `TabsContent` | Tabbed interface | + +All data components **require `parameters={{}}`** even when the query has no params. + +## Layout Structure + +```tsx +
+

Page Title

+
{/* form inputs */}
+
{/* list items */}
+
+``` + +## Component Organization + +- Shared UI components: `@databricks/appkit-ui/react` +- Feature components: `client/src/components/FeatureName.tsx` +- Split components when logic exceeds ~100 lines or component is reused + +## Gotchas + +- `SelectItem` cannot have `value=""`. Use sentinel value like `"all"` for "show all" options. +- Use `` components instead of plain "Loading..." text +- Handle nullable fields: `value={field || ''}` for inputs +- For maps with React 19, use react-leaflet v5: `npm install react-leaflet@^5.0.0 leaflet @types/leaflet` + +Databricks brand colors: `['#40d1f5', '#4462c9', '#EB1600', '#0B2026', '#4A4A4A', '#353a4a']` diff --git a/.claude/skills/databricks-apps/references/appkit/genie.md b/.claude/skills/databricks-apps/references/appkit/genie.md new file mode 100644 index 0000000..21734d1 --- /dev/null +++ b/.claude/skills/databricks-apps/references/appkit/genie.md @@ -0,0 +1,303 @@ +# AppKit Genie Guide + +Use Genie when your app needs a **natural language query interface** over Unity Catalog tables. For analytics dashboards, use `config/queries/` instead. For persistent storage, use Lakebase. + +## When to Use + +| Pattern | Use Case | Data Source | +|---------|----------|-------------| +| Analytics | Read-only dashboards, charts, KPIs | SQL Warehouse | +| Lakebase | CRUD operations, persistent state, forms | PostgreSQL (Lakebase) | +| Model Serving | Chat, AI features, model inference | Serving Endpoint | +| Genie | Natural language queries over tables | Genie Space → SQL Warehouse | +| Multiple | Combine plugins as needed | Mix of the above | + +## Architecture + +```text +User (browser) -> AppKit genie plugin (/api/genie/...) -> Databricks Genie API -> SQL Warehouse + <- SSE stream (status, message_result, query_result) <- +``` + +The built-in `genie()` plugin from `@databricks/appkit` proxies requests via SSE streaming. It reads the space ID from the `DATABRICKS_GENIE_SPACE_ID` env var. Call `genie()` with no arguments. + +## Genie Space Creation + +The `databricks genie create-space` command takes two positional arguments: `WAREHOUSE_ID` and `SERIALIZED_SPACE` (a JSON string). + +```bash +databricks genie create-space \ + '{"version":2,"data_sources":{"tables":[{"identifier":"catalog.schema.orders"},{"identifier":"catalog.schema.customers"}]}}' \ + --title "Sales Assistant" \ + --description "Answers sales analytics questions" \ + --profile +``` + +The JSON must include `version` and `data_sources.tables` with each table as `{"identifier":"catalog.schema.table"}`. Optional flags: `--title`, `--description`, `--parent-path`. + +To discover the full serialized space format (including optional fields), export an existing space: + +```bash +databricks genie get-space --include-serialized-space --profile +``` + +Discover warehouse ID with: + +```bash +databricks experimental aitools tools get-default-warehouse --profile +``` + +## Scaffolding a New Genie App + +```bash +# 1. Discover warehouse +databricks experimental aitools tools get-default-warehouse --profile + +# 2. Create Genie space (see syntax above) +databricks genie create-space '' \ + --title "My Space" --profile + +# 3. Check manifest for genie plugin keys +databricks apps manifest --profile + +# 4. Scaffold (derive --set keys from manifest output) +databricks apps init --name --features genie \ + --set "genie..=" \ + --run none --profile + +# 5. Set local env + develop +cd +echo "DATABRICKS_GENIE_SPACE_ID=" >> server/.env +npm install && npm run dev +``` + +**Do not guess** `--set` flags — always derive from `databricks apps manifest`. + +## Adding Genie to an Existing App + +**`databricks.yml`** — add Genie variables and resource: + +```yaml +variables: + genie_space_id: + description: Genie Space ID + genie_space_name: + description: Genie Space name + +resources: + apps: + app: + resources: + # ... existing resources ... + - name: genie-space + genie_space: + name: ${var.genie_space_name} + space_id: ${var.genie_space_id} + permission: CAN_RUN + +targets: + default: + variables: + genie_space_id: + genie_space_name: +``` + +**`app.yaml`** — add env injection: + +```yaml +env: + # ... existing env vars ... + - name: DATABRICKS_GENIE_SPACE_ID + valueFrom: genie-space +``` + +**`server/server.ts`** — register the plugin: + +```typescript +import { createApp, server, analytics, genie } from "@databricks/appkit"; + +createApp({ + plugins: [server(), analytics(), genie()], +}).catch(console.error); +``` + +Preserve existing plugins and add `genie()` to the array. + +**`server/.env`** — for local development: + +```dotenv +DATABRICKS_GENIE_SPACE_ID= +``` + +**Frontend** — add the chat component: + +```tsx +import { GenieChat } from "@databricks/appkit-ui/react"; + +function GeniePage() { + return ( +
+ +
+ ); +} +``` + +Update smoke tests if headings or routes changed, then `databricks apps validate`. + +For advanced Genie plugin usage, see `npx @databricks/appkit docs ./docs/plugins/genie.md`. + +## Multi-Space Deployment + +For the `spaces` map API, `GenieChat alias` prop, and `useGenieChat` hook, see `npx @databricks/appkit docs ./docs/plugins/genie.md`. + +This section covers the **deployment-specific patterns** for multi-space Genie apps (databricks.yml, app.yaml, stale conversation cleanup). + +**databricks.yml** — add one variable + resource per space, plus target-level values: + +```yaml +variables: + genie_space_id: + description: Default Genie space ID (required by AppKit) + genie_space_name: + description: Default Genie space name + genie_space_sales_id: + description: Sales Genie space ID + genie_space_support_id: + description: Support Genie space ID + +resources: + apps: + app: + user_api_scopes: + - dashboards.genie + resources: + - name: genie-space + genie_space: + name: ${var.genie_space_name} + space_id: ${var.genie_space_id} + permission: CAN_RUN + - name: genie-space-sales + genie_space: + name: genie-space-sales + space_id: ${var.genie_space_sales_id} + permission: CAN_RUN + - name: genie-space-support + genie_space: + name: genie-space-support + space_id: ${var.genie_space_support_id} + permission: CAN_RUN + +targets: + default: + variables: + genie_space_id: + genie_space_name: + genie_space_sales_id: + genie_space_support_id: +``` + +**app.yaml** — keep `DATABRICKS_GENIE_SPACE_ID` (AppKit validates it on startup). Add one `valueFrom` per UI space: + +```yaml +env: + - name: DATABRICKS_GENIE_SPACE_ID + valueFrom: genie-space + - name: DATABRICKS_GENIE_SPACE_SALES + valueFrom: genie-space-sales + - name: DATABRICKS_GENIE_SPACE_SUPPORT + valueFrom: genie-space-support +``` + +**Critical gotcha**: `DATABRICKS_GENIE_SPACE_ID` must always be set — AppKit validates it on startup even when using a custom `spaces` map. + +**Build version stamp** — stamp every build so the page can detect a new deployment and clear stale conversation state: + +```typescript +// client/vite.config.ts +export default defineConfig({ + // ... existing config ... + define: { + "import.meta.env.VITE_APP_VERSION": JSON.stringify(Date.now().toString()), + }, +}); +``` + +**Stale conversation cleanup** — `GenieChat` stores conversation IDs in URLs and localStorage that become stale across space switches or redeployments: + +```typescript +function clearConversationUrl() { + const url = new URL(window.location.href); + url.searchParams.delete("conversationId"); + window.history.replaceState({}, "", url.toString()); +} + +function initAlias(): string { + const buildVersion = import.meta.env.VITE_APP_VERSION ?? "dev"; + if (localStorage.getItem("appkit:genie:version") !== buildVersion) { + const savedAlias = localStorage.getItem("appkit:genie:alias"); + Object.keys(localStorage) + .filter((k) => k.startsWith("appkit:genie:")) + .forEach((k) => localStorage.removeItem(k)); + localStorage.setItem("appkit:genie:version", buildVersion); + if (savedAlias) localStorage.setItem("appkit:genie:alias", savedAlias); + clearConversationUrl(); + } + // SPACES: array of {alias, spaceId} defined in your component + return localStorage.getItem("appkit:genie:alias") ?? SPACES[0]?.alias ?? ""; +} +``` + +## Frontend + +**For full component API**: run `npx @databricks/appkit docs "GenieChat"`. + +The `GenieChat` component handles SSE streaming, conversation state, history replay, and query result rendering. For custom UI, use the `useGenieChat` hook — see `npx @databricks/appkit docs "useGenieChat"`. + +Common anti-patterns: + +| Mistake | Why it's wrong | What to do | +|---------|---------------|------------| +| No explicit height on parent container | Chat collapses to zero height | Give the parent a fixed height (`style={{ height: 600 }}` or CSS class) | +| Old local Genie proxy file | Duplicate routes, import confusion | Remove it — use `genie` from `@databricks/appkit` | +| Manual SSE reimplementation | Extra complexity, bugs | Use `GenieChat` or `useGenieChat` | +| Missing `whitespace-pre-wrap` in custom UI | Explanation text renders on one line | Add `whitespace-pre-wrap` to message content | + +## HTTP Endpoints + +The plugin mounts SSE endpoints under `/api/genie`: + +| Route | Method | Purpose | +|-------|--------|---------| +| `/api/genie/:alias/messages` | `POST` | Send a message and stream results | +| `/api/genie/:alias/conversations/:conversationId` | `GET` | Replay an existing conversation | + +### SSE Event Types + +| Event | Payload | Description | +|-------|---------|-------------| +| `message_start` | `{ conversationId, messageId, spaceId }` | IDs assigned | +| `status` | `{ status: "ASKING_AI" \| "EXECUTING_QUERY" \| ... }` | Progress | +| `message_result` | `{ content, attachments }` | Final message | +| `query_result` | `{ attachmentId, statementId, data }` | Tabular results | +| `error` | `{ error }` | Error details | + +### Attachment Types + +| Key | Meaning | +|-----|---------| +| `query` | Generated SQL plus metadata | +| `text` | Natural-language explanation | +| `suggestedQuestions` | Follow-up prompts | + +## Troubleshooting + +| Error | Cause | Solution | +|-------|-------|---------| +| `create-space` fails with "Cannot find field" | Wrong `serialized_space` JSON format | Use `{"version":2,"data_sources":{"tables":[{"identifier":"..."}]}}` — export an existing space to verify | +| `plugin "genie" has no resource with key "..."` | Wrong `--set` flags during scaffold | Always derive resource keys from `databricks apps manifest` | +| Chat collapses or renders poorly | No explicit height on container | Give the parent a fixed height | +| Duplicate routes or import confusion | Old local Genie proxy file | Remove it — use `genie` from `@databricks/appkit` | +| `does not have required scopes: genie` | Missing API scope | Confirm `user_api_scopes` includes `dashboards.genie` in `databricks.yml` and redeploy | +| Genie space not found | Wrong space ID | Verify space ID matches the value on the Genie space **About** tab | +| `valueFrom` mismatch | `app.yaml` value doesn't match `databricks.yml` | `valueFrom` in `app.yaml` must exactly match the resource `name` in `databricks.yml` | diff --git a/.claude/skills/databricks-apps/references/appkit/jobs.md b/.claude/skills/databricks-apps/references/appkit/jobs.md new file mode 100644 index 0000000..48b632a --- /dev/null +++ b/.claude/skills/databricks-apps/references/appkit/jobs.md @@ -0,0 +1,141 @@ +# Jobs: Trigger Lakeflow Jobs from Apps + +**For full Jobs plugin API (routes, types, config options)**: run `npx @databricks/appkit docs` → Jobs plugin. + +Use the `jobs()` plugin when your app needs to **trigger or monitor pre-existing Databricks Lakeflow Jobs** (notebooks, Python scripts, SQL, dbt, JARs) and surface their status to users. The jobs themselves still live as regular Lakeflow Jobs in the workspace — the plugin is the typed, resource-scoped accessor that lets app code start runs, poll status, and stream completion events. + +The plugin is **resource-scoped**: only jobs declared via config or discovered from `DATABRICKS_JOB_*` env vars are accessible. It is not a generic Jobs SDK wrapper — to author or schedule jobs, use the `databricks-jobs` (Lakeflow) skill instead. See [`overview.md`](./overview.md) for the cross-plugin data-pattern selector. + +## Scaffolding + +```bash +databricks apps init --name --features jobs \ + --set "jobs..=" \ + --run none --profile +``` + +**Do not guess** `--set` keys — derive them from `databricks apps manifest --profile ` (look up the `jobs` plugin's `resources.required` entries). + +Multi-job and analytics+jobs are common combinations: + +```bash +databricks apps init --name --features analytics,jobs \ + --set "analytics.sql-warehouse.id=" \ + --set "jobs..=" \ + --run none --profile +``` + +Configure job IDs via environment variables in `app.yaml` (deployed) or `server/.env` (local dev): + +```env +# Single-job mode → exposed under the "default" key +DATABRICKS_JOB_ID=123456789 + +# Multi-job mode → exposed under lowercased keys ("etl", "ml") +DATABRICKS_JOB_ETL=123456789 +DATABRICKS_JOB_ML=987654321 +``` + +The env var suffix (after `DATABRICKS_JOB_`) becomes the job key, lowercased. Explicit `jobs` config in `createApp()` is merged with env-discovered jobs; explicit config wins on key collisions. + +## Plugin Setup + +Minimal — discovers all jobs from the environment: + +```typescript +import { createApp, server, jobs } from "@databricks/appkit"; + +await createApp({ + plugins: [server(), jobs()], +}); +``` + +With per-job validation and task-type mapping: + +```typescript +import { createApp, server, jobs } from "@databricks/appkit"; +import { z } from "zod"; + +const appkit = await createApp({ + plugins: [ + server(), + jobs({ + jobs: { + etl: { + taskType: "notebook", + params: z.object({ + startDate: z.string(), + endDate: z.string(), + dryRun: z.boolean().optional(), + }), + }, + }, + }), + ], +}); +``` + +For the full `IJobsConfig`, `JobConfig`, and task-type → SDK parameter mapping, run `npx @databricks/appkit docs Jobs plugin`. Two non-obvious points: `dbt` accepts no parameters, and `notebook`/`python_wheel`/`sql` coerce all param values to strings before forwarding. + +## Server-Side API (Programmatic) + +`appkit.jobs(key)` returns a `JobHandle`. All methods return `ExecutionResult` — **always check `.ok` before reading `.data`**. Full method list and types: `npx @databricks/appkit docs Jobs plugin`. + +```typescript +const etl = appkit.jobs("etl"); + +// One-shot trigger +const result = await etl.runNow({ startDate: "2025-01-01" }); +if (!result.ok) throw new Error(`Run failed: ${result.error}`); + +// Trigger and stream status until completion (async iterable, SSE-backed) +for await (const status of etl.runAndWait({ startDate: "2025-01-01" })) { + console.log(status.status); // "PENDING" | "RUNNING" | "TERMINATED" | ... +} +``` + +Read methods (`lastRun`, `listRuns`, `getRun`, `getRunOutput`, `getJob`) and `cancelRun` follow the same `ExecutionResult` shape. Reads cache for 60s with 3 retries. `runAndWait` has a 600s server-side cap, but **client-facing requests are bounded by the Apps platform's 120s reverse-proxy timeout** (see [Platform Guide](../platform-guide.md), "HTTP Proxy & Streaming"). For runs longer than ~120s, use `runNow` and poll `getRun` (or `GET /api/jobs/:jobKey/status`) from separate short-lived requests instead of streaming. + +### Execution context + +All operations run as the **app's service principal**. The resource binding in `databricks.yml` grants the SP `CAN_MANAGE_RUN`, so users trigger runs without needing their own grant. Per-run attribution in the Jobs UI shows the SP, not the human user. The plugin does not support on-behalf-of (OBO) user execution. + +## HTTP Endpoints + +Routes mount at `/api/jobs/:jobKey/...` — full route list, request bodies, and SSE frame shape via `npx @databricks/appkit docs Jobs plugin`. The streaming endpoint (`POST /api/jobs/:jobKey/run?stream=true`) emits `data: ` events terminated by a blank line (`\n\n`), where the JSON is `{ status, timestamp, run }`; **clients must buffer until `\n\n` and reassemble across chunk boundaries** before parsing. `runAndWait` (server-side) honors `req.signal` and aborts cleanly on client disconnect — but the platform's 120s reverse-proxy cap applies regardless. + +## Resource Requirements + +Each job key requires a `job` resource with `CAN_MANAGE_RUN` in `databricks.yml`: + +```yaml +resources: + apps: + my_app: + resources: + - name: etl-job + job: + id: ${var.etl_job_id} + permission: CAN_MANAGE_RUN +``` + +Wire the env var in `app.yaml`: + +```yaml +env: + - name: DATABRICKS_JOB_ETL + valueFrom: etl-job +``` + +Verify exact `--set` keys and resource shape via `databricks apps manifest --profile `. + +## Troubleshooting + +| Error | Cause | Solution | +| --- | --- | --- | +| `Unknown job key "X"` | Job env var not set or misspelled | Check `DATABRICKS_JOB_X` is set in `app.yaml` or `server/.env` | +| 400 with Zod issues on `runNow` | Params don't match the per-job `params` schema | Fix the input or relax the schema | +| `dbt` job rejects params | `dbt` task type accepts no parameters | Trigger with no params, or remove `taskType: "dbt"` | +| 504 / timeout on `runAndWait` | Run exceeds the platform's 120s reverse-proxy timeout (server-side cap is 600s but the proxy cuts first) | Switch to `runNow` + poll `getRun` (or `GET /api/jobs/:jobKey/status`) from separate short-lived requests; raising `waitTimeout` does not help | +| SSE events arrive split / unparseable | Client not reassembling `data:` frames across chunks | Buffer until `\n\n`, then parse — see streaming pattern above | +| `result.data` is undefined | `result.ok` was false but the caller skipped the check | Always branch on `result.ok` before reading `result.data` | diff --git a/.claude/skills/databricks-apps/references/appkit/lakebase.md b/.claude/skills/databricks-apps/references/appkit/lakebase.md new file mode 100644 index 0000000..1f18c45 --- /dev/null +++ b/.claude/skills/databricks-apps/references/appkit/lakebase.md @@ -0,0 +1,438 @@ +# Lakebase: OLTP Database for Apps + +Use Lakebase when your app needs **persistent read/write storage** — forms, CRUD operations, user-generated data. For analytics dashboards reading from a SQL warehouse, use `config/queries/` instead. + +## When to Use Lakebase vs Analytics + +| Pattern | Use Case | Data Source | +|---------|----------|-------------| +| Analytics | Read-only dashboards, charts, KPIs | Databricks SQL Warehouse | +| Lakebase | CRUD operations, persistent state, forms, low-latency reads of synced lakehouse data | PostgreSQL (Lakebase Autoscaling) | +| Both | Dashboard with user preferences/saved state | Warehouse + Lakebase | + +> **Serving lakehouse data to apps?** If your app needs low-latency reads of Delta/UC tables (entity lookups, product catalogs, feature serving), use **Lakebase synced tables** to materialize them into Lakebase instead of querying a SQL warehouse (which takes seconds to minutes). See *Reading from Synced Tables* below. + +## Scaffolding + +**Scaffolding is the fastest way to get started.** If you already have an app, see *Adding Lakebase to an Existing App* below. + +**Lakebase only** (no analytics SQL warehouse): +```bash +databricks apps init --name --features lakebase \ + --set "lakebase.postgres.branch=" \ + --set "lakebase.postgres.database=" \ + --run none --profile +``` + +**Both Lakebase and analytics**: +```bash +databricks apps init --name --features analytics,lakebase \ + --set "analytics.sql-warehouse.id=" \ + --set "lakebase.postgres.branch=" \ + --set "lakebase.postgres.database=" \ + --run none --profile +``` + +Where `` and `` are full resource names (e.g. `projects//branches/` and `projects//branches//databases/`). + +Use the `databricks-lakebase` skill to create a Lakebase project and discover branch/database resource names before running this command. + +> For multi-environment deployments (dev/prod), use `variables:` and `targets:` blocks in `databricks.yml` — see the **`databricks-dabs`** skill for patterns. + +**Naming conventions:** Use domain names for user-facing code (`ItemsPage.tsx`, `/api/items`, `item-routes.ts`). Keep `lakebase` naming only for infrastructure config (`lakebase()` plugin, `LAKEBASE_ENDPOINT`, `postgres` app resource). + +**Get resource names** (if you have an existing project): +```bash +# List branches → use the name field of a READY branch +databricks postgres list-branches projects/ --profile +# List databases → use the name field +databricks postgres list-databases projects//branches/ --profile +``` + +## Adding Lakebase to an Existing App + +**`databricks.yml`** — add Lakebase variables and resource: + +```yaml +variables: + lakebase_branch: + description: Lakebase branch resource name + lakebase_database: + description: Lakebase database resource name + +resources: + apps: + app: + resources: + # ... existing resources ... + - name: postgres + postgres: + branch: ${var.lakebase_branch} + database: ${var.lakebase_database} + +targets: + default: + variables: + lakebase_branch: projects//branches/ + lakebase_database: projects//branches//databases/ +``` + +Use the `databricks-lakebase` skill to create a Lakebase project and discover branch/database resource names. + +For per-user connections (OBO/RLS), also add `postgres` to `user_api_scopes` — see `npx @databricks/appkit docs ./docs/plugins/lakebase.md` for OBO setup. + +**`app.yaml`** — add env injection: + +```yaml +env: + # ... existing env vars ... + - name: LAKEBASE_ENDPOINT + valueFrom: postgres +``` + +Other Lakebase env vars (`PGHOST`, `PGPORT`, `PGDATABASE`, `PGUSER`, `PGSSLMODE`) are auto-injected by the platform when the `postgres` resource is configured. Only `LAKEBASE_ENDPOINT` must be set explicitly. + +**`server/server.ts`** — register the plugin: + +```typescript +import { createApp, server, analytics, lakebase } from "@databricks/appkit"; + +createApp({ + plugins: [server(), analytics(), lakebase()], +}).catch(console.error); +``` + +Preserve existing plugins and add `lakebase()` to the array. + +**`server/.env`** — for local development: + +```dotenv +PGHOST= +PGPORT=5432 +PGDATABASE= +PGSSLMODE=require +LAKEBASE_ENDPOINT=projects//branches//endpoints/ +``` + +Get connection details from `databricks postgres get-endpoint`. See *Local Development* below for the full workflow. + +Deploy the app before local development — see *Local Development > Prerequisites* below. Update smoke tests if headings or routes changed, then `databricks apps validate`. + +## Project Structure (after `databricks apps init --features lakebase`) + +``` +my-app/ +├── server/ +│ └── server.ts # Backend with Lakebase plugin + Express routes +├── client/ +│ └── src/ +│ └── App.tsx # React frontend +├── app.yaml # Manifest with database resource declaration +└── package.json # Includes @databricks/lakebase dependency +``` + +Note: **No `config/queries/` directory** — Lakebase apps use server-side `appkit.lakebase.query()` calls, not SQL files. + +## Lakebase Plugin API + +Scaffolding with `--features lakebase` (see above) generates this pattern. Access Lakebase through the plugin handle returned by `createApp()`: + +```typescript +import { createApp, lakebase } from "@databricks/appkit"; + +const appkit = await createApp({ + plugins: [lakebase()], +}); + +// Query via the plugin handle — handles pooling and token refresh automatically +const result = await appkit.lakebase.query("SELECT * FROM users WHERE id = $1", [userId]); +``` + +The `lakebase()` plugin auto-configures from platform-injected env vars at deploy time. No manual pool setup needed. + +## Environment Variables (auto-set when deployed with database resource) + +| Variable | Description | +|----------|-------------| +| `PGHOST` | Lakebase hostname | +| `PGPORT` | Port (default 5432) | +| `PGDATABASE` | Database name | +| `PGUSER` | Service principal client ID | +| `PGSSLMODE` | SSL mode (`require`) | +| `LAKEBASE_ENDPOINT` | Endpoint resource path | + +## CRUD Routes Pattern + +Always use server-side routes for Lakebase operations — do NOT call `appkit.lakebase.query()` from the client. Use `onPluginsReady` to initialize the schema and register Express routes: + +```typescript +// server/server.ts +import { createApp, server, lakebase } from "@databricks/appkit"; +import { z } from 'zod'; + +await createApp({ + plugins: [server(), lakebase()], + async onPluginsReady(appkit) { + // Schema init (runs once before server accepts requests) + await appkit.lakebase.query(` + CREATE SCHEMA IF NOT EXISTS app_data; + CREATE TABLE IF NOT EXISTS app_data.items ( + id SERIAL PRIMARY KEY, + name TEXT NOT NULL, + created_at TIMESTAMPTZ DEFAULT NOW() + ); + `); + + // CRUD routes via Express + appkit.server.extend((app) => { + app.get('/api/items', async (_req, res) => { + const { rows } = await appkit.lakebase.query( + "SELECT * FROM app_data.items ORDER BY created_at DESC LIMIT 100" + ); + res.json(rows); + }); + + app.post('/api/items', async (req, res) => { + const parsed = z.object({ name: z.string().min(1) }).safeParse(req.body); + if (!parsed.success) { res.status(400).json({ error: 'Invalid input' }); return; } + const { rows } = await appkit.lakebase.query( + "INSERT INTO app_data.items (name) VALUES ($1) RETURNING *", + [parsed.data.name] + ); + res.status(201).json(rows[0]); + }); + + app.delete('/api/items/:id', async (req, res) => { + const id = parseInt(req.params.id, 10); + if (isNaN(id)) { res.status(400).json({ error: 'Invalid id' }); return; } + await appkit.lakebase.query("DELETE FROM app_data.items WHERE id = $1", [id]); + res.status(204).send(); + }); + }); + }, +}); +``` + +> **Deploy first (App + Lakebase only)!** When your Databricks App uses Lakebase, the Service Principal must create and own the schema. Run `databricks apps deploy` before any local development. See **`databricks-lakebase`** skill's **Schema Permissions for Deployed Apps** for details. + +## Schema Initialization + +**Always create a custom schema** — the Service Principal cannot access any existing schemas (including `public`). It must create the schema itself to become its owner. See **`databricks-lakebase`** skill's **Schema Permissions for Deployed Apps** for the full permission model and deploy-first workflow. Initialize tables inside the `onPluginsReady` callback before registering routes (see CRUD pattern above): + +```typescript +// Inside onPluginsReady — runs once at startup before handling requests +await appkit.lakebase.query(` + CREATE SCHEMA IF NOT EXISTS app_data; + CREATE TABLE IF NOT EXISTS app_data.items ( + id SERIAL PRIMARY KEY, + name TEXT NOT NULL, + created_at TIMESTAMPTZ DEFAULT NOW() + ); +`); +``` + +## ORM Integration (Optional) + +The plugin exposes the raw `pg.Pool` via `appkit.lakebase.pool` — works with any PostgreSQL library: + +```typescript +// Drizzle ORM +import { drizzle } from "drizzle-orm/node-postgres"; +const db = drizzle(appkit.lakebase.pool); + +// Prisma (with @prisma/adapter-pg) +import { PrismaPg } from "@prisma/adapter-pg"; +const adapter = new PrismaPg(appkit.lakebase.pool); +const prisma = new PrismaClient({ adapter }); +``` + +For ORM-compatible config: `appkit.lakebase.getOrmConfig()`. + +## Chat Persistence Pattern + +Save AI chat conversations to Lakebase so users can resume sessions and scroll full message history. + +**Schema** — create in a separate `chat` schema (not `app`) so the deploy-first ownership model stays clean: + +```sql +CREATE SCHEMA IF NOT EXISTS chat; + +CREATE TABLE IF NOT EXISTS chat.chats ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + user_id TEXT NOT NULL, + title TEXT NOT NULL, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +CREATE TABLE IF NOT EXISTS chat.messages ( + id UUID PRIMARY KEY DEFAULT gen_random_uuid(), + chat_id UUID NOT NULL REFERENCES chat.chats(id) ON DELETE CASCADE, + role TEXT NOT NULL CHECK (role IN ('system', 'user', 'assistant', 'tool')), + content TEXT NOT NULL, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +CREATE INDEX IF NOT EXISTS idx_messages_chat_id_created_at + ON chat.messages(chat_id, created_at); +``` + +**Bootstrap** — run setup in `onPluginsReady` so tables exist before the server accepts requests: + +```typescript +await createApp({ + plugins: [server(), lakebase()], + async onPluginsReady(appkit) { + await setupChatTables(appkit); + // then register routes via appkit.server.extend(...) + }, +}); +``` + +**Persistence helpers** — use parameterized queries: + +```typescript +export async function createChat(appkit, input: { userId: string; title: string }) { + const result = await appkit.lakebase.query( + `INSERT INTO chat.chats (user_id, title) VALUES ($1, $2) + RETURNING id, user_id, title, created_at, updated_at`, + [input.userId, input.title], + ); + return result.rows[0]; +} + +export async function appendMessage(appkit, input: { chatId: string; role: string; content: string }) { + const result = await appkit.lakebase.query( + `INSERT INTO chat.messages (chat_id, role, content) VALUES ($1, $2, $3) + RETURNING id, chat_id, role, content, created_at`, + [input.chatId, input.role, input.content], + ); + return result.rows[0]; +} +``` + +**User identity**: In deployed apps, use `req.header("x-forwarded-email")` (injected by the Databricks Apps platform proxy; for off-platform deployments, use your own auth middleware). For local dev, hardcode a test user ID. + +**History endpoints**: +- `GET /api/chats` — list chats for current user +- `GET /api/chats/:chatId/messages` — load ordered history +- `DELETE /api/chats/:chatId` — delete chat (messages cascade) + +**AI SDK v6 integration**: Use `setMessages()` from `useChat` return value for history loading (NOT `initialMessages`). To read response headers like `X-Chat-Id`, pass a custom `fetch` wrapper on the `TextStreamChatTransport` constructor. + +## Reading from Lakebase synced tables + +Lakebase synced tables materialize Delta/UC tables into Lakebase Postgres for low-latency app reads. The lakehouse remains the source of truth; Lakebase serves as a read-optimized index. + +**Architecture:** +``` +Delta gold tables → Synced tables (read-only) → App reads via appkit.lakebase.query() +App writes → Lakebase OLTP tables → optional Lakehouse Sync → Delta +``` + +**Use synced tables when** data is curated in Delta, changes relatively slowly, and must be served at OLTP latency — operational consoles, user-facing apps on gold tables, feature serving, or hybrid read/write patterns. See the **`databricks-lakebase`** skill's [synced-tables.md](../../../databricks-lakebase/references/synced-tables.md) for the full decision checklist. + +> **Security note:** Synced tables do not propagate Unity Catalog fine-grained access control (row filters, column masks). If UC FGAC is critical, use DBSQL with user authorization instead. + +### How It Works + +Synced tables (created via `databricks postgres create-synced-table`) appear as regular Postgres tables. From the app's perspective, use the same `appkit.lakebase.query()` pattern but **read-only**. + +**Key differences from CRUD tables:** + +| | CRUD tables | Lakebase synced tables | +|--|-------------|---------------| +| Created by | App SP (via `CREATE TABLE`) | Sync pipeline (DLT) | +| Owned by | SP role | System role (`databricks_writer_*`) | +| Operations | Read + Write | **Read-only** (writes corrupt sync) | +| Schema init | App must `CREATE SCHEMA/TABLE` | Already exists after sync | +| Deploy-first | Required (SP must own schema) | Not required | + +**Permission grant required:** The app's SP has `CAN_CONNECT_AND_CREATE` but does **not** have `pg_read_all_data`. To read synced tables, the project owner must grant access — see the **`databricks-lakebase`** skill's SKILL.md "Grant app SP access to synced tables" section for the SQL commands and psql connection steps. + +**Example Express route reading synced taxi data:** + +```typescript +// Inside onPluginsReady → appkit.server.extend((app) => { ... }) +app.get('/api/top-pickups', async (_req, res) => { + const { rows } = await appkit.lakebase.query(` + SELECT pickup_zip, COUNT(*) AS trip_count, AVG(fare_amount) AS avg_fare + FROM public.nyc_trips + GROUP BY pickup_zip + ORDER BY trip_count DESC + LIMIT 10 + `); + res.json(rows); +}); +``` + +> **Do not write to synced tables.** The sync pipeline manages the data — direct writes corrupt the sync state. For mixed read/write patterns, read from synced tables and write to separate app-owned tables. To create synced tables and grant the app's SP read access, see the **`databricks-lakebase`** skill's [synced-tables.md](../../../databricks-lakebase/references/synced-tables.md) and the "Grant app SP access to synced tables" section in its SKILL.md. + +## Key Differences from Analytics Pattern + +| | Analytics | Lakebase | +|--|-----------|---------| +| SQL dialect | Databricks SQL (Spark SQL) | Standard PostgreSQL | +| Query location | `config/queries/*.sql` files | `appkit.lakebase.query()` in Express routes | +| Data retrieval | `useAnalyticsQuery` hook | Express route via `server.extend()` | +| Date functions | `CURRENT_TIMESTAMP()`, `DATEDIFF(DAY, ...)` | `NOW()`, `AGE(...)` | +| Auto-increment | N/A | `SERIAL` or `GENERATED ALWAYS AS IDENTITY` | +| Insert pattern | N/A | `INSERT ... VALUES ($1) RETURNING *` | +| Params | Named (`:param`) | Positional (`$1, $2, ...`) | + +**NEVER use `useAnalyticsQuery` for Lakebase data** — it queries the SQL warehouse, not Lakebase. +**NEVER put Lakebase SQL in `config/queries/`** — those files are only for warehouse queries. + +## Local Development + +### Prerequisites (MUST verify before local development) + +**This applies when your Databricks App uses Lakebase.** Run this check before any local development: + +```bash +databricks apps get --profile +``` + +Check the response for the `active_deployment` field. If it exists with `status.state` of `SUCCEEDED`, the app has been deployed. If `active_deployment` is missing, the app has never been deployed: +1. **STOP** — do not proceed with local development +2. Deploy first: `databricks apps deploy --profile ` +3. Wait for deployment to complete, then continue + +If you skip this step, the Service Principal won't own the database schema. You'll create schemas under your credentials that the SP **cannot access** after deployment. See **`databricks-lakebase`** skill's **Schema Permissions for Deployed Apps** for the full workflow and recovery steps. + +Lakebase project creators already have database access after the first deploy. Collaborators need `databricks_superuser` granted by the project creator via Branch Overview. + +> **Project-owner note:** If you are the Lakebase project owner, `databricks_create_role` may fail with "role already exists" and `GRANT databricks_superuser` may fail with "permission denied to grant role" — both errors are safe to ignore; the project owner already has the necessary access. + +The Lakebase env vars (`PGHOST`, `PGDATABASE`, etc.) are auto-set only when deployed. For local development, get the connection details from your endpoint and set them manually: + +```bash +# Get endpoint connection details +databricks postgres get-endpoint \ + projects//branches//endpoints/ \ + --profile +``` + +Then create `server/.env` with the values from the endpoint response: + +``` +PGHOST= +PGPORT=5432 +PGDATABASE= +PGUSER= +PGSSLMODE=require +LAKEBASE_ENDPOINT=projects//branches//endpoints/ +``` + +Load `server/.env` in your dev server (e.g. via `dotenv` or `node --env-file=server/.env`). Never commit `.env` files — add `server/.env` to `.gitignore`. + +## Troubleshooting + +| Error | Cause | Solution | +|-------|-------|---------| +| `permission denied for schema public` | SP cannot access `public` schema | Create custom schema: `CREATE SCHEMA IF NOT EXISTS app_data` and qualify all table names with `app_data.` | +| `permission denied for schema ` | Schema was created by another role (e.g. you ran locally before deploying) | Schema owned by wrong role. To preserve data: export first (`pg_dump` or temp schema copy). **Ask the user before dropping.** Then drop + redeploy. See **`databricks-lakebase`** skill's **Schema Permissions for Deployed Apps** for full steps. | +| Works locally but `permission denied` after deploy | Local credentials created the schema; the SP cannot access schemas it does not own | Schema owned by wrong role — see row above for export + drop + redeploy steps | +| `connection refused` | Pool not connected or wrong env vars | Check `PGHOST`, `PGPORT`, `LAKEBASE_ENDPOINT` are set | +| `relation "X" does not exist` | Tables not initialized | Run `CREATE TABLE IF NOT EXISTS` at startup | +| App builds but pool fails at runtime | Env vars not set locally | Set vars in `server/.env` — see Local Development above | diff --git a/.claude/skills/databricks-apps/references/appkit/model-serving.md b/.claude/skills/databricks-apps/references/appkit/model-serving.md new file mode 100644 index 0000000..ed8a0a1 --- /dev/null +++ b/.claude/skills/databricks-apps/references/appkit/model-serving.md @@ -0,0 +1,222 @@ +# Model Serving: Calling ML Endpoints from Apps + +Use Model Serving when your app needs **AI features** — chat, inference, embeddings, or predictions from a Databricks Model Serving endpoint. For analytics dashboards, use `config/queries/` instead. For persistent storage, use Lakebase. + +## When to Use + +| Pattern | Use Case | Data Source | +|---------|----------|-------------| +| Analytics | Read-only dashboards, charts, KPIs | SQL Warehouse | +| Lakebase | CRUD operations, persistent state, forms | PostgreSQL (Lakebase) | +| Model Serving | Chat, AI features, model inference | Serving Endpoint | +| Multiple | Dashboard with AI features or persistent state | Combine as needed | + +## Scaffolding + +Check if the `serving` plugin is available in the AppKit template: + +```bash +databricks apps manifest --profile +``` + +**If the manifest includes a `serving` plugin:** + +```bash +databricks apps init --name --features serving \ + --set "serving.serving-endpoint.name=" \ + --run none --profile +``` + +**If adding to an existing app**, see *Adding Model Serving to an Existing App* below. + +Use the `databricks-model-serving` skill to create a serving endpoint first if one doesn't exist yet. + +## Adding Model Serving to an Existing App + +**`databricks.yml`** — add serving endpoint resource and user_api_scopes: + +```yaml +resources: + apps: + app: + user_api_scopes: + # ... existing scopes ... + - serving.serving-endpoints + resources: + # ... existing resources ... + - name: serving-endpoint + serving_endpoint: + name: + permission: CAN_QUERY +``` + +**`app.yaml`** — add env injection: + +```yaml +env: + # ... existing env vars ... + - name: DATABRICKS_SERVING_ENDPOINT_NAME + valueFrom: serving-endpoint +``` + +The injected value is the endpoint **name** (not a URL). Use it in server-side code to call the endpoint. + +**`server/server.ts`** — register the plugin: + +```typescript +import { createApp, server, analytics, serving } from "@databricks/appkit"; + +createApp({ + plugins: [server(), analytics(), serving()], +}).catch(console.error); +``` + +Preserve existing plugins and add `serving()` to the array. + +**`server/.env`** — for local development: + +```dotenv +DATABRICKS_SERVING_ENDPOINT_NAME= +``` + +Update smoke tests if headings or routes changed, then `databricks apps validate`. + +## Serving Plugin API + +Access model serving through the plugin handle returned by `createApp()`: + +```typescript +import { createApp, server, serving } from "@databricks/appkit"; + +const appkit = await createApp({ + plugins: [server(), serving()], +}); + +// Non-streaming invocation +const result = await appkit.serving().invoke({ + messages: [{ role: "user", content: "Hello" }], +}); + +// Streaming invocation +for await (const chunk of appkit.serving().stream({ + messages: [{ role: "user", content: "Hello" }], +})) { + console.log(chunk); +} + +// On-behalf-of user (OBO) — uses the requesting user's identity +const result = await appkit.serving().asUser(req).invoke({ + messages: [{ role: "user", content: prompt }], +}); +``` + +All serving routes execute on behalf of the authenticated user (OBO) by default. For programmatic access via `exports()`, use `.asUser(req)` to run in user context. + +## Named Endpoints + +Use endpoint aliases to reference multiple serving endpoints by name: + +```typescript +serving({ + endpoints: { + llm: { env: "DATABRICKS_SERVING_ENDPOINT_NAME" }, + classifier: { env: "DATABRICKS_SERVING_ENDPOINT_CLASSIFIER" }, + }, + timeout: 120000, // optional, default 2 min +}) +``` + +Each alias maps to an environment variable holding the actual endpoint name. Access by alias: + +```typescript +const result = await appkit.serving("llm").invoke({ messages }); +const classification = await appkit.serving("classifier").invoke({ inputs: ["text"] }); +``` + +If an endpoint serves multiple models, use `servedModel` to target a specific model directly: + +```typescript +serving({ + endpoints: { + llm: { env: "DATABRICKS_SERVING_ENDPOINT_NAME", servedModel: "llama-v2" }, + }, +}) +``` + +## HTTP Endpoints + +The plugin auto-registers routes under `/api/serving`: + +| Route | Method | Purpose | +|-------|--------|---------| +| `/api/serving/invoke` | `POST` | Non-streaming (default mode) | +| `/api/serving/stream` | `POST` | Streaming SSE (default mode) | +| `/api/serving/:alias/invoke` | `POST` | Non-streaming (named mode) | +| `/api/serving/:alias/stream` | `POST` | Streaming SSE (named mode) | + +## Frontend + +Use the built-in React hooks from `@databricks/appkit-ui/react` — do NOT call serving endpoints directly from the client. + +**Streaming** (chat, real-time inference): + +```tsx +import { useServingStream } from "@databricks/appkit-ui/react"; + +function ChatStream() { + const { stream, chunks, streaming, error, reset } = useServingStream( + { messages: [{ role: "user", content: "Hello" }] }, + { + alias: "llm", + onComplete: (finalChunks) => console.log("Done:", finalChunks.length, "chunks"), + }, + ); + + return ( + <> + + + {chunks.map((chunk, i) =>
{JSON.stringify(chunk)}
)} + {error &&

{error}

} + + ); +} +``` + +**Non-streaming** (one-shot inference, classification): + +```tsx +import { useServingInvoke } from "@databricks/appkit-ui/react"; + +function Classify() { + const { invoke, data, loading, error } = useServingInvoke( + { inputs: ["sample text"] }, + { alias: "classifier" }, + ); + + return ( + <> + + {data &&
{JSON.stringify(data)}
} + {error &&

{error}

} + + ); +} +``` + +Both hooks accept `autoStart: true` to invoke automatically on mount. + +For the full hook API and type generation details, see `npx @databricks/appkit docs ./docs/plugins/model-serving.md`. + +For off-platform streaming (AI SDK v6 with Databricks AI Gateway), see the **`databricks-model-serving`** skill. + +AppKit integrates with **Model Serving endpoints**. AI Gateway (beta) endpoints are not directly supported — use the underlying Model Serving endpoint name instead. AI Gateway features (rate limits, usage tracking) can be configured on Model Serving endpoints via the `databricks-model-serving` skill. + +## Troubleshooting + +| Error | Cause | Solution | +|-------|-------|---------| +| `PERMISSION_DENIED` on query | SP missing CAN_QUERY | Declare `serving_endpoint` resource in `databricks.yml` with `permission: CAN_QUERY` | +| `DATABRICKS_SERVING_ENDPOINT_NAME` env var empty | Missing env injection | Add `valueFrom: serving-endpoint` to `app.yaml` env section | +| 504 Gateway Timeout | Inference exceeds 120s proxy limit | Reduce `max_tokens` or use WebSockets — see [Platform Guide](../platform-guide.md) | +| Unknown serving endpoint alias | Alias not configured or env var not set | Check `serving()` config in `server.ts` and `DATABRICKS_SERVING_ENDPOINT_*` in `app.yaml` / `.env` | diff --git a/.claude/skills/databricks-apps/references/appkit/overview.md b/.claude/skills/databricks-apps/references/appkit/overview.md new file mode 100644 index 0000000..f507863 --- /dev/null +++ b/.claude/skills/databricks-apps/references/appkit/overview.md @@ -0,0 +1,149 @@ +# AppKit Overview + +AppKit is the recommended way to build Databricks Apps - provides type-safe SQL queries, React components, and seamless deployment. + +## Choose Your Data Pattern FIRST + +Before scaffolding, decide which data pattern the app needs: + +| Pattern | When to use | Init command | +|---------|-------------|-------------| +| **Analytics** (read-only) | Dashboards, charts, KPIs from warehouse | `--features analytics --set analytics.sql-warehouse.id=` | +| **Lakebase synced tables** (low-latency reads) | Point lookups, entity search, catalogs from lakehouse data | `--features lakebase` (no `--set` flags needed) + sync Delta table via `databricks-lakebase` skill | +| **Lakebase (OLTP)** (read/write) | CRUD forms, persistent state, user data | `--features lakebase --set lakebase.postgres.branch= --set lakebase.postgres.database=` | +| **Genie** (NL queries) | Chat interface over Unity Catalog tables | `--features genie --set genie..=` (check manifest) | +| **Model Serving** (ML inference) | Chat, AI features, model predictions | `--features serving --set serving.serving-endpoint.name=` (check manifest) | +| **Jobs** (trigger Lakeflow Jobs) | Kick off and monitor pre-existing notebooks / Python / SQL / dbt jobs | `--features jobs --set jobs..=` (check manifest) | +| **Multiple** | Combine plugins as needed (e.g. dashboard + CRUD, analytics + Genie) | `--features analytics,lakebase,genie,...` with all required `--set` flags per plugin | + +See [Lakebase Guide](lakebase.md) for full Lakebase scaffolding and app-code patterns. +See [Genie Guide](genie.md) for space creation, plugin setup, and frontend components. + +## Workflow + +1. **Scaffold**: Run `databricks apps manifest`, then `databricks apps init` with `--features` and `--set` as in parent SKILL.md (App Manifest and Scaffolding) +2. **Develop**: `cd && npm install && npm run dev` +3. **Validate**: `databricks apps validate` +4. **Deploy**: `databricks apps deploy --profile ` (⚠️ USER CONSENT REQUIRED) + +## Data Discovery (Before Writing SQL) + +**Use the parent `databricks-core` skill for data discovery** (table search, schema exploration, query execution). + +## Pre-Implementation Checklist + +Before writing App.tsx, complete these steps: + +1. ✅ Create SQL files in `config/queries/` +2. ✅ Run `npm run typegen` to generate query types +3. ✅ Read `client/src/appKitTypes.d.ts` to see available query result types +4. ✅ Verify component props via `npx @databricks/appkit docs` (check the relevant component page) +5. ✅ Plan smoke test updates (default expects "Minimal Databricks App") + +**DO NOT** write UI code until types are generated and verified. + +## Post-Implementation Checklist + +Before running `databricks apps validate`: + +1. ✅ Update `tests/smoke.spec.ts` heading selector to match your app title +2. ✅ Update or remove the 'hello world' text assertion +3. ✅ Verify `npm run typegen` has been run after all SQL files are finalized +4. ✅ Ensure all numeric SQL values use `Number()` conversion in display code + +## Project Structure + +``` +my-app/ +├── server/ +│ ├── server.ts # Backend entry point (AppKit) +│ └── .env # Optional local dev env vars (do not commit) +├── client/ +│ ├── index.html +│ ├── vite.config.ts +│ └── src/ +│ ├── main.tsx +│ └── App.tsx # <- Main app component (start here) +├── config/ +│ └── queries/ +│ └── my_query.sql # -> queryKey: "my_query" +├── app.yaml # Deployment config +├── package.json +└── tsconfig.json +``` + +**Key files to modify:** +| Task | File | +|------|------| +| Build UI | `client/src/App.tsx` | +| Add SQL query | `config/queries/.sql` | +| Add API endpoint | `server/server.ts` (tRPC) | +| Add shared helpers (optional) | create `shared/types.ts` or `client/src/lib/formatters.ts` | +| Fix smoke test | `tests/smoke.spec.ts` | + +## Type Safety + +For type generation details, see: `npx @databricks/appkit docs ./docs/development/type-generation.md` + +**Quick workflow:** +1. Add/modify SQL in `config/queries/` +2. Types auto-generate during dev via the Vite plugin (or run `npm run typegen` manually) +3. Types appear in `client/src/appKitTypes.d.ts` + +## Adding Visualizations + +**Step 1**: Create SQL file `config/queries/my_data.sql` +```sql +SELECT category, COUNT(*) as count FROM my_table GROUP BY category +``` + +**Step 2**: Use component (types auto-generated!) +```typescript +import { BarChart } from '@databricks/appkit-ui/react'; +// Query mode: fetches data automatically + + +// Data mode: pass static data directly (no queryKey/parameters needed) + +``` + +## AppKit Official Documentation + +**Always use AppKit docs as the source of truth for API details.** + +```bash +npx @databricks/appkit docs # show the docs index (start here) +npx @databricks/appkit docs # look up a section by name or doc path +``` + +Do not guess paths — run without args first, then pick from the index. + +## References + +| When you're about to... | Read | +|-------------------------|------| +| Write SQL files | [SQL Queries](sql-queries.md) — parameterization, dialect, sql.* helpers | +| Use `useAnalyticsQuery` | [AppKit SDK](appkit-sdk.md) — memoization, conditional queries | +| Add chart/table components | [Frontend](frontend.md) — component quick reference, anti-patterns | +| Add API mutation endpoints | [tRPC](trpc.md) — only if you need server-side logic | +| Use Lakebase for CRUD / persistent state | [Lakebase](lakebase.md) — Lakebase plugin API, tRPC patterns, schema init | +| Add Genie chat | [Genie](genie.md) — space creation, plugin setup, frontend components | +| Call ML model serving endpoints | [Model Serving](model-serving.md) — serving plugin, frontend hooks | +| Trigger / monitor Lakeflow Jobs from the app | [Jobs](jobs.md) — env discovery, JobHandle API, SSE streaming | + +## Critical Rules + +1. **SQL for data retrieval**: Use `config/queries/` + visualization components. Never tRPC for SELECT. +2. **Numeric types**: SQL numbers may return as strings. Always convert: `Number(row.amount)` +3. **Type imports**: Use `import type { ... }` (verbatimModuleSyntax enabled). +4. **Charts are ECharts**: No Recharts children — use props (`xKey`, `yKey`, `colors`). `xKey`/`yKey` auto-detect from schema if omitted. +5. **Two data modes**: Charts/tables support query mode (`queryKey` + `parameters`) and data mode (static `data` prop). +6. **Conditional queries**: Use `autoStart: false` option or conditional rendering to control query execution. + +## Decision Tree + +- **Display data from SQL?** + - Chart/Table → `BarChart`, `LineChart`, `DataTable` components + - Custom layout (KPIs, cards) → `useAnalyticsQuery` hook +- **Call Databricks API?** → Dedicated plugin (serving, jobs, files) or tRPC for other APIs +- **Modify data?** → tRPC mutations diff --git a/.claude/skills/databricks-apps/references/appkit/proto-contracts.md b/.claude/skills/databricks-apps/references/appkit/proto-contracts.md new file mode 100644 index 0000000..35f42e6 --- /dev/null +++ b/.claude/skills/databricks-apps/references/appkit/proto-contracts.md @@ -0,0 +1,201 @@ +# Plugin Contract Reference + +Concrete proto↔plugin mappings for the three core AppKit plugins. + +## Files Plugin Contract + +**Plugin manifest**: `files/manifest.json` +**Resource**: UC Volume with `WRITE_VOLUME` permission +**Env**: `DATABRICKS_VOLUME_FILES` for volume path + +### Boundary: What the files plugin owns + +The files plugin is the ONLY module that touches UC Volumes. Other modules +interact with files through typed proto messages, never raw paths. + +``` +┌─────────────┐ UploadRequest ┌──────────────┐ +│ api module │ ──────────────────→ │ files plugin │ +│ │ ←────────────────── │ │ +│ │ StoredArtifact │ UC Volumes │ +└─────────────┘ └──────────────┘ +``` + +### Proto → Plugin Method Mapping + +| Proto Message | Plugin Method | Direction | +|---------------|---------------|-----------| +| `UploadRequest` | `files.upload(path, content, opts)` | IN | +| `StoredArtifact` | Return type of upload/getInfo | OUT | +| `VolumeLayout` | `files.config.volumePath` + conventions | CONFIG | + +### Volume Path Convention (from VolumeLayout proto) + +``` +/Volumes/{catalog}/{schema}/{volume}/ +├── uploads/ # User uploads (UploadRequest.destination_path) +├── results/ # Computed outputs (StoredArtifact) +│ └── {run_id}/ +│ ├── output.proto.bin # Binary proto serialization +│ └── output.json # JSON for debugging +└── artifacts/ # Build artifacts, archives + └── {app_name}/ + └── {version}/ +``` + +### Config ↔ Proto Mapping + +| manifest.json field | Proto field | Notes | +|---------------------|-------------|-------| +| `config.timeout` (30000) | Not in proto | Plugin-internal config | +| `config.maxUploadSize` (5GB) | `UploadRequest.content` max size | Validation constraint | +| `resources.path` env | `VolumeLayout.root` | Runtime injection | + +--- + +## Lakebase Plugin Contract + +**Plugin manifest**: `lakebase/manifest.json` +**Resource**: Postgres with `CAN_CONNECT_AND_CREATE` permission +**Env**: `PGHOST`, `PGDATABASE`, `PGPORT`, `PGSSLMODE`, `LAKEBASE_ENDPOINT` + +### Boundary: What the lakebase plugin owns + +Lakebase owns ALL structured data. Every table's schema is derived from a proto +message in `database.proto`. No ad-hoc `CREATE TABLE` statements. + +``` +┌─────────────┐ RunRecord ┌──────────────┐ +│ compute mod │ ──────────────────→ │ lakebase │ +│ │ │ plugin │ +│ │ MetricRecord │ │ +│ │ ──────────────────→ │ Postgres │ +└─────────────┘ └──────┬───────┘ + │ +┌─────────────┐ SQL query │ +│ analytics │ ←──────────────────────────┘ +│ module │ RunRecord[] +└─────────────┘ +``` + +### Proto → Table Mapping + +| Proto Message | Table Name | Primary Key | Notes | +|---------------|-----------|-------------|-------| +| `RunRecord` | `runs` | `(run_id, app_name)` | One row per run | +| `MetricRecord` | `metrics` | auto-increment | FK to runs.run_id | +| `ConfigRecord` | `configs` | `config_id` | Versioned configs | + +### Proto → DDL Type Mapping + +| Proto Type | SQL Type | Column Default | +|-----------|----------|----------------| +| `string` | `TEXT` | `''` | +| `bool` | `BOOLEAN` | `false` | +| `int32` | `INTEGER` | `0` | +| `int64` | `BIGINT` | `0` | +| `double` | `DOUBLE PRECISION` | `0.0` | +| `bytes` | `BYTEA` | `NULL` | +| `Timestamp` | `TIMESTAMPTZ` | `NOW()` | +| `repeated T` | `JSONB` | `'[]'::jsonb` | +| `map` | `JSONB` | `'{}'::jsonb` | +| nested message | `JSONB` | `NULL` | +| `enum` | `TEXT` | First value name | + +### Migration Convention + +``` +migrations/ +├── 001_create_runs.sql +├── 002_create_metrics.sql +├── 003_create_configs.sql +└── 004_add_metrics_index.sql +``` + +Each migration is idempotent (`CREATE TABLE IF NOT EXISTS`, `CREATE INDEX IF NOT EXISTS`). + +### Config ↔ Proto Mapping + +| manifest.json field | Proto usage | Notes | +|---------------------|-------------|-------| +| `resources.branch` | Not in proto | Infrastructure config | +| `resources.database` | Not in proto | Infrastructure config | +| `resources.host` (`PGHOST`) | Connection string | Runtime injection | +| `resources.databaseName` (`PGDATABASE`) | Database selection | Runtime injection | + +--- + +## Jobs / Compute Contract + +**No plugin manifest** — Jobs are invoked via `@databricks/sdk-experimental` +**Resource**: Databricks Jobs API +**Auth**: Workspace token or OAuth + +### Boundary: What the jobs module owns + +The jobs module owns compute execution. It receives typed task inputs, runs them +on Databricks clusters, and produces typed task outputs. + +``` +┌─────────────┐ JobConfig ┌──────────────┐ +│ api module │ ──────────────────→ │ jobs module │ +│ │ │ │ +│ │ JobTaskInput │ Databricks │ +│ │ ──────────────────→ │ Jobs API │ +│ │ │ │ +│ │ JobTaskOutput │ Clusters │ +│ │ ←────────────────── │ │ +└─────────────┘ └──────────────┘ +``` + +### Proto → Jobs SDK Mapping + +| Proto Message | SDK Method | Direction | +|---------------|-----------|-----------| +| `JobConfig` | `jobs.create(config)` | IN — defines the job | +| `TaskConfig` | Task within a job | IN — defines task deps | +| `JobTaskInput` | Task params (base64 proto) | IN — task receives | +| `JobTaskOutput` | Task output (written to Volume) | OUT — task produces | + +### Task Parameter Convention + +Job tasks receive their typed input via: +1. **Small payloads (<256KB)**: Base64-encoded proto in task params +2. **Large payloads**: Proto binary written to UC Volume, path passed as param + +```typescript +// Producer (api module) +const input: JobTaskInput = { taskId, taskType, runId, inputPayload }; +const encoded = Buffer.from(JobTaskInput.encode(input).finish()).toString('base64'); +// Pass as notebook parameter: { "input": encoded } + +// Consumer (job task code) +const decoded = JobTaskInput.decode(Buffer.from(params.input, 'base64')); +``` + +### Task Output Convention + +Job tasks write their typed output to: +``` +/Volumes/{catalog}/{schema}/{volume}/results/{run_id}/{task_id}.output.bin +``` + +The output is a serialized `JobTaskOutput` proto. The orchestrator reads it +back with the generated decoder. + +### Jobs API Patterns + +```typescript +// Create a multi-task job from JobConfig proto +const jobConfig: JobConfig = { + jobName: `${appName}-${runId}`, + clusterSpec: '{"num_workers": 1}', + maxRetries: 2, + timeoutSeconds: 3600, + tasks: [ + { taskKey: 'generate', taskType: 'generate', dependsOn: [] }, + { taskKey: 'evaluate', taskType: 'evaluate', dependsOn: ['generate'] }, + { taskKey: 'aggregate', taskType: 'aggregate', dependsOn: ['evaluate'] }, + ], +}; +``` diff --git a/.claude/skills/databricks-apps/references/appkit/proto-first.md b/.claude/skills/databricks-apps/references/appkit/proto-first.md new file mode 100644 index 0000000..3f8e9d8 --- /dev/null +++ b/.claude/skills/databricks-apps/references/appkit/proto-first.md @@ -0,0 +1,306 @@ +# Proto-First App Design + +Schema-first approach for AppKit apps using protobuf data contracts. Define contracts BEFORE implementation — derive TypeScript types, Lakebase DDL, and Volume paths from `.proto` files. + +**When to use:** New apps with multiple plugins (files + lakebase + jobs), or adding typed boundaries to existing apps. Skip for quick prototypes. + +**Requires:** `buf` CLI for proto linting and code generation. + +**Rule: No implementation before contracts. No contracts without consumers.** + +Define protobuf data contracts FIRST, then derive everything else (TypeScript types, Lakebase DDL, Volume paths, API shapes) from those contracts. + +## When to Use + +| Scenario | Use this skill | +|----------|---------------| +| Creating a new Databricks app | YES — define contracts before `databricks apps init` | +| Adding a new data boundary to an existing app | YES — add proto before implementation | +| Quick prototype / hackathon | NO — skip contracts, move fast | +| Modifying existing typed code | NO — contracts already exist | + +## Core Principle + +``` +User intent → Module map → Proto contracts → Generated types → Implementation + ↓ ↓ + Lakebase DDL TypeScript interfaces + ↓ ↓ + Migrations Plugin code +``` + +The `.proto` file is the single source of truth. If it's not in a proto, it doesn't cross a module boundary. + +## Phase 1: Decompose into Modules + +Every Databricks app decomposes into a combination of these plugin modules: + +| Module | Plugin | Data Boundary | Owns | +|--------|--------|---------------|------| +| **Storage** | files | UC Volumes | Blobs, uploads, artifacts, archives | +| **Database** | lakebase | Postgres tables | Structured records, queries, migrations | +| **Compute** | jobs | Databricks Jobs API | Job runs, task results, cluster configs | +| **Analytics** | analytics | SQL Warehouse | Read-only queries, dashboards | +| **Serving** | server | HTTP/tRPC routes | API endpoints, SSE streams | + +### Decomposition Rules + +1. **Each module owns its data** — files plugin never writes to lakebase, lakebase never writes to volumes. +2. **Cross-module communication is typed** — a proto message, never a raw JSON blob. +3. **Every proto message has exactly one producer module.** +4. **Multiple modules can consume** — but the producer defines the schema. +5. **No god messages** — if a message has >12 fields, split it. + +### Output: Module Map + +Before proceeding, produce a module map for the user to confirm: + +``` +App: +Modules: + storage: files plugin → uploads/, results/, artifacts/ + db: lakebase plugin → runs, metrics, configs tables + compute: jobs → generation tasks, eval tasks + api: server plugin → POST /run, GET /status, SSE /stream +``` + +## Phase 2: Define Proto Contracts + +### Directory Structure + +``` +proto/ +├── buf.yaml +├── buf.gen.yaml +└── / + └── v1/ + ├── common.proto # Shared enums, IDs + ├── storage.proto # Files plugin boundary + ├── database.proto # Lakebase plugin boundary + ├── compute.proto # Jobs boundary + └── api.proto # Server/API boundary +``` + +### Proto Style Rules + +- **Package**: `.v1` (versioned from day one) +- **One file per module boundary**, not per message +- **Every field has a consumer** — if no code reads it, delete it +- **snake_case** for all field names +- **proto3** syntax only + +### Files Plugin Boundary (`storage.proto`) + +The files plugin operates on UC Volumes. Type every file path and payload: + +```protobuf +syntax = "proto3"; +package .v1; + +import "google/protobuf/timestamp.proto"; + +// StoredArtifact — produced by files plugin after upload. +message StoredArtifact { + string volume_path = 1; + string content_type = 2; + int64 size_bytes = 3; + google.protobuf.Timestamp created_at = 4; + string checksum_sha256 = 5; +} + +// UploadRequest — sent to files plugin by api module. +message UploadRequest { + string destination_path = 1; + string content_type = 2; + bytes content = 3; + map metadata = 4; +} + +// VolumeLayout — design-time contract for volume directory structure. +message VolumeLayout { + string root = 1; // /Volumes/catalog/schema/app_name + string uploads_dir = 2; // uploads/ + string results_dir = 3; // results/ + string artifacts_dir = 4; // artifacts/ +} +``` + +### Lakebase Plugin Boundary (`database.proto`) + +Every Lakebase table has a corresponding proto message. The message IS the schema: + +```protobuf +syntax = "proto3"; +package .v1; + +import "google/protobuf/timestamp.proto"; + +// RunRecord — one row in the `runs` table. +// Producer: compute module. Consumers: api, analytics. +message RunRecord { + string run_id = 1; + string app_name = 2; + RunStatus status = 3; + google.protobuf.Timestamp started_at = 4; + google.protobuf.Timestamp completed_at = 5; + string error_message = 6; + string config_json = 7; +} + +// MetricRecord — one row in the `metrics` table. +// Producer: compute module. Consumers: analytics, api. +message MetricRecord { + string run_id = 1; + string metric_name = 2; + double value = 3; + google.protobuf.Timestamp recorded_at = 4; + map dimensions = 5; +} +``` + +### Jobs Boundary (`compute.proto`) + +Type job task inputs and outputs: + +```protobuf +syntax = "proto3"; +package .v1; + +// JobTaskInput — typed payload sent to a Databricks job task. +// Producer: api module. Consumer: job task code. +message JobTaskInput { + string task_id = 1; + string task_type = 2; + string run_id = 3; + bytes input_payload = 4; + map env = 5; +} + +// JobTaskOutput — typed result from a completed job task. +// Producer: job task code. Consumer: api module. +message JobTaskOutput { + string task_id = 1; + string run_id = 2; + bool success = 3; + string error = 4; + bytes output_payload = 5; + int64 duration_ms = 6; + map metrics = 7; +} +``` + +## Phase 3: Generate Types and DDL + +### 3a. Buf configuration + +```yaml +# buf.yaml +version: v2 +lint: + use: + - STANDARD +breaking: + use: + - FILE +``` + +```yaml +# buf.gen.yaml +version: v2 +plugins: + - remote: buf.build/connectrpc/es + out: proto/gen + opt: target=ts +``` + +### 3b. Generate TypeScript types + +```bash +buf lint proto/ +buf generate proto/ +``` + +### 3c. Generate Lakebase DDL + +For each message in `database.proto`, generate a numbered migration file. + +**Proto→SQL type mapping:** + +| Proto Type | SQL Type | Default | +|-----------|----------|---------| +| `string` | `TEXT` | `''` | +| `bool` | `BOOLEAN` | `false` | +| `int32` | `INTEGER` | `0` | +| `int64` | `BIGINT` | `0` | +| `double` | `DOUBLE PRECISION` | `0.0` | +| `bytes` | `BYTEA` | `NULL` | +| `Timestamp` | `TIMESTAMPTZ` | `NOW()` | +| `repeated T` | `JSONB` | `'[]'::jsonb` | +| `map` | `JSONB` | `'{}'::jsonb` | +| nested message | `JSONB` | `NULL` | +| `enum` | `TEXT` | first value name | + +Example migration: + +```sql +-- migrations/001_create_runs.sql +CREATE TABLE IF NOT EXISTS runs ( + run_id TEXT NOT NULL, + app_name TEXT NOT NULL, + status TEXT NOT NULL DEFAULT 'RUN_STATUS_PENDING', + started_at TIMESTAMPTZ, + completed_at TIMESTAMPTZ, + error_message TEXT, + config_json JSONB, + PRIMARY KEY (run_id, app_name) +); +``` + +### 3d. Validate + +```bash +npx tsc --noEmit # all generated types compile +buf lint proto/ # proto style checks +``` + +## Phase 4: Implement Against Contracts + +NOW implementation begins. Each module uses ONLY its generated types: + +```typescript +import type { StoredArtifact, UploadRequest } from '../proto/gen//v1/storage'; +import type { RunRecord, MetricRecord } from '../proto/gen//v1/database'; +import type { JobTaskInput, JobTaskOutput } from '../proto/gen//v1/compute'; +``` + +No `any`, no `unknown`, no `JSON.parse()` at module boundaries. + +## Validation Checklist + +Before writing implementation code: + +- [ ] Module map exists with clear data boundaries +- [ ] Proto files exist for every cross-boundary data structure +- [ ] `buf lint proto/` passes +- [ ] `buf generate proto/` produces TypeScript types +- [ ] Lakebase DDL derived from `database.proto` messages +- [ ] No proto message exceeds 12 fields +- [ ] Every field has at least one identified consumer +- [ ] Every message has exactly one producer module +- [ ] Volume layout documented (not freeform paths) +- [ ] Job inputs/outputs typed (no raw JSON params) + +## Common Traps + +| Trap | Why it fails | Fix | +|------|-------------|-----| +| "I'll add the proto later" | Boundaries calcify around untyped shapes | Proto first or not at all | +| `any` at a module boundary | Type errors surface at runtime, not compile time | Use generated types | +| `JSON.parse()` crossing a boundary | No schema validation | Deserialize with proto decoder | +| Giant 30-field message | Impossible to review, version, or extend | Split by concern, max 12 fields | +| Storing raw JSON in Lakebase | Loses queryability and type safety | Map to `repeated`, `map`, or nested message fields | +| Shared mutable state between modules | Race conditions, unclear ownership | Communicate through typed messages | + +## References + +- [Plugin Contract Details](references/plugin-contracts.md) — proto↔plugin type mappings for files, lakebase, jobs diff --git a/.claude/skills/databricks-apps/references/appkit/sql-queries.md b/.claude/skills/databricks-apps/references/appkit/sql-queries.md new file mode 100644 index 0000000..8532db2 --- /dev/null +++ b/.claude/skills/databricks-apps/references/appkit/sql-queries.md @@ -0,0 +1,267 @@ +# SQL Query Files + +**IMPORTANT**: ALWAYS use SQL files in `config/queries/` for data retrieval. NEVER use tRPC for SQL queries. + +- Store ALL SQL queries in `config/queries/` directory +- Name files descriptively: `trip_statistics.sql`, `user_metrics.sql`, `sales_by_region.sql` +- Reference by filename (without extension) in `useAnalyticsQuery` or directly in a visualization component passing it as `queryKey` +- App Kit automatically executes queries against configured Databricks warehouse +- Benefits: Built-in caching, proper connection pooling, better performance + +## Type Generation + +For full type generation details, see: `npx @databricks/appkit docs ./docs/development/type-generation.md` + +**Type generation:** Types are auto-regenerated during dev whenever SQL files change. + +**Quick workflow:** Add SQL files → Types auto-generate during dev → Types appear in `client/src/appKitTypes.d.ts` + +## Query Schemas (Optional) + +Create `config/queries/schema.ts` only if you need **runtime validation** with Zod. + +```typescript +import { z } from 'zod'; + +export const querySchemas = { + my_query: z.array( + z.object({ + category: z.string(), + // Use z.coerce.number() - handles both string and number from SQL + amount: z.coerce.number(), + }) + ), +}; +``` + +**Why `z.coerce.number()`?** +- Auto-generated types use `number` based on SQL column types +- But some SQL types (DECIMAL, large BIGINT) return as strings at runtime +- `z.coerce.number()` handles both cases safely + +## SQL Type Handling (Critical) + +**Understanding Type Generation vs Runtime:** + +1. **Auto-generated types** (`appKitTypes.d.ts`): Based on SQL column types + - `BIGINT`, `INT`, `DECIMAL` → TypeScript `number` + - These are the types you'll see in IntelliSense + +2. **Runtime JSON values**: Some numeric types arrive as strings + - `DECIMAL` often returns as string (e.g., `"123.45"`) + - Large `BIGINT` values return as string + - `ROUND()`, `AVG()`, `SUM()` results may be strings + +**Best Practice - Always convert before numeric operations:** + +```typescript +// ❌ WRONG - may fail if value is string at runtime +{row.total_amount.toFixed(2)} + +// ✅ CORRECT - convert to number first +{Number(row.total_amount).toFixed(2)} +``` + +**Helper Functions:** + +Create app-specific helpers for consistent numeric formatting (for example in `client/src/lib/formatters.ts`): + +```typescript +// client/src/lib/formatters.ts +export const toNumber = (value: number | string): number => Number(value); +export const formatCurrency = (value: number | string): string => + `$${Number(value).toFixed(2)}`; +export const formatPercent = (value: number | string): string => + `${Number(value).toFixed(1)}%`; +``` + +Use them wherever you render query results: + +```typescript +import { toNumber, formatCurrency, formatPercent } from './formatters'; // adjust import path to your file layout + +// Convert to number +const amount = toNumber(row.amount); // "123.45" → 123.45 + +// Format as currency +const formatted = formatCurrency(row.amount); // "123.45" → "$123.45" + +// Format as percentage +const percent = formatPercent(row.rate); // "85.5" → "85.5%" +``` + +## Available sql.* Helpers + +**Full API reference**: `npx @databricks/appkit docs ./docs/api/appkit/Variable.sql.md` — always check this for the latest available helpers. + +```typescript +import { sql } from "@databricks/appkit-ui/js"; + +// ✅ These exist: +sql.string(value) // For STRING parameters +sql.number(value) // For NUMERIC parameters (INT, BIGINT, DOUBLE, DECIMAL) +sql.boolean(value) // For BOOLEAN parameters +sql.date(value) // For DATE parameters (YYYY-MM-DD format) +sql.timestamp(value) // For TIMESTAMP parameters +sql.binary(value) // For BINARY (returns hex string, use UNHEX() in SQL) + +// ❌ These DO NOT exist: +// sql.null() - use sentinel values instead +// sql.array() - use comma-separated sql.string() and split in SQL +// sql.int() - use sql.number() +// sql.float() - use sql.number() +``` + +**For nullable string parameters**, use sentinel values or empty strings. **For nullable date parameters**, use sentinel dates only (empty strings cause validation errors) — see "Optional Date Parameters" section below. + +## Databricks SQL Dialect + +Databricks uses Databricks SQL (based on Spark SQL), NOT PostgreSQL/MySQL. Common mistakes: + +| PostgreSQL | Databricks SQL | +|------------|---------------| +| `GENERATE_SERIES(1, 10)` | `explode(sequence(1, 10))` | +| `DATEDIFF(date1, date2)` | `DATEDIFF(DAY, date2, date1)` (3 args!) | +| `NOW()` | `CURRENT_TIMESTAMP()` | +| `INTERVAL '7 days'` | `INTERVAL 7 DAY` | +| `STRING_AGG(col, ',')` | `CONCAT_WS(',', COLLECT_LIST(col))` | +| `ILIKE` | `LOWER(col) LIKE LOWER(pattern)` | + +**Sample data date ranges** — do NOT use `CURRENT_DATE()` on historical datasets: +- `samples.tpch.*` — historical dates, check with `SELECT MIN(o_orderdate), MAX(o_orderdate) FROM samples.tpch.orders` +- `samples.nyctaxi.trips` — NYC taxi data with specific date ranges +- `samples.tpcds.*` — data from 1998-2003 + +Always check date ranges before writing date-filtered queries. + +## Before Running `npm run typegen` + +Verify each SQL file before running typegen: + +- [ ] Uses Databricks SQL syntax (NOT PostgreSQL) — check dialect table above +- [ ] `DATEDIFF` has 3 arguments: `DATEDIFF(DAY, start, end)` +- [ ] Uses `LOWER(col) LIKE LOWER(pattern)` instead of `ILIKE` +- [ ] Column aliases in `ORDER BY` match `SELECT` aliases exactly +- [ ] Date columns are not passed to numeric functions like `ROUND()` +- [ ] Date range filters use actual data dates (NOT `CURRENT_DATE()` on historical data — check date ranges first) + +## Query Parameterization + +SQL queries can accept parameters to make them dynamic and reusable. + +**Key Points:** +- Parameters use colon prefix: `:parameter_name` +- Databricks infers types from values automatically +- For optional string parameters, use pattern: `(:param = '' OR column = :param)` +- **For optional date parameters, use sentinel dates** (`'1900-01-01'` and `'9999-12-31'`) instead of empty strings + +### SQL Parameter Syntax + +```sql +-- config/queries/filtered_data.sql +SELECT * +FROM my_table +WHERE column_value >= :min_value + AND column_value <= :max_value + AND category = :category + AND (:optional_filter = '' OR status = :optional_filter) +``` + +### Frontend Parameter Passing + +```typescript +import { sql } from "@databricks/appkit-ui/js"; + +const { data } = useAnalyticsQuery('filtered_data', { + min_value: sql.number(minValue), + max_value: sql.number(maxValue), + category: sql.string(category), + optional_filter: sql.string(optionalFilter || ''), // empty string for optional params +}); +``` + +### Date Parameters + +Use `sql.date()` for date parameters with `YYYY-MM-DD` format strings. + +**Frontend - Using Date Parameters:** + +```typescript +import { sql } from '@databricks/appkit-ui/js'; +import { useState } from 'react'; + +function MyComponent() { + const [startDate, setStartDate] = useState('2016-02-01'); + const [endDate, setEndDate] = useState('2016-02-29'); + + const queryParams = { + start_date: sql.date(startDate), // Pass YYYY-MM-DD string to sql.date() + end_date: sql.date(endDate), + }; + + const { data } = useAnalyticsQuery('my_query', queryParams); + + // ... +} +``` + +**SQL - Date Filtering:** + +```sql +-- Filter by date range using DATE() function +SELECT COUNT(*) as trip_count +FROM samples.nyctaxi.trips +WHERE DATE(tpep_pickup_datetime) >= :start_date + AND DATE(tpep_pickup_datetime) <= :end_date +``` + +**Date Helper Functions:** + +```typescript +// Helper to get YYYY-MM-DD string for dates relative to today +const daysAgo = (n: number): string => { + const date = new Date(Date.now() - n * 86400000); + return date.toISOString().split('T')[0]; // "2024-01-15" +}; + +const params = { + start_date: sql.date(daysAgo(7)), // 7 days ago + end_date: sql.date(daysAgo(0)), // Today +}; +``` + +### Optional Date Parameters - Use Sentinel Dates + +Databricks App Kit validates parameter types before query execution. **DO NOT use empty strings (`''`) for optional date parameters** as this causes validation errors. + +**✅ CORRECT - Use Sentinel Dates:** + +```typescript +// Frontend: Use sentinel dates for "no filter" instead of empty strings +const revenueParams = { + group_by: 'month', + start_date: sql.date('1900-01-01'), // Sentinel: effectively no lower bound + end_date: sql.date('9999-12-31'), // Sentinel: effectively no upper bound + country: sql.string(country || ''), + property_type: sql.string(propertyType || ''), +}; +``` + +```sql +-- SQL: Simple comparison since sentinel dates are always valid +WHERE b.check_in >= CAST(:start_date AS DATE) + AND b.check_in <= CAST(:end_date AS DATE) +``` + +**Why Sentinel Dates Work:** +- `1900-01-01` is before any real data (effectively no lower bound filter) +- `9999-12-31` is after any real data (effectively no upper bound filter) +- Always valid DATE types, so no parameter validation errors +- All real dates fall within this range, so no filtering occurs + +**Parameter Types Summary:** +- ALWAYS use sql.* helper functions from the `@databricks/appkit-ui/js` package to define SQL parameters +- **Strings/Numbers**: Use directly in SQL with `:param_name` +- **Dates**: Use with `CAST(:param AS DATE)` in SQL +- **Optional Strings**: Use empty string default, check with `(:param = '' OR column = :param)` +- **Optional Dates**: Use sentinel dates (`sql.date('1900-01-01')` and `sql.date('9999-12-31')`) instead of empty strings diff --git a/.claude/skills/databricks-apps/references/appkit/trpc.md b/.claude/skills/databricks-apps/references/appkit/trpc.md new file mode 100644 index 0000000..54ad2ba --- /dev/null +++ b/.claude/skills/databricks-apps/references/appkit/trpc.md @@ -0,0 +1,146 @@ +# tRPC for Custom Endpoints + +**CRITICAL**: Do NOT use tRPC for SQL queries or data retrieval. Use `config/queries/` + `useAnalyticsQuery` instead. + +**CRITICAL**: Do NOT use tRPC for accessing Unity Catalog and File operations. Use the Files plugin instead. + +Use tRPC ONLY for: + +- **Mutations**: Creating, updating, or deleting data (INSERT, UPDATE, DELETE) +- **External APIs**: Calling Databricks APIs not covered by a dedicated plugin (MLflow, Workspace API, etc.) +- **Complex business logic**: Multi-step operations that cannot be expressed in SQL +- **File operations**: File uploads, processing, transformations +- **Custom computations**: Operations requiring TypeScript/Node.js logic + +## Before Writing New Routes + +**ALWAYS complete these checks before adding tRPC routes:** + +### 1. Check AppKit Version + +Read `package.json` to identify the installed `@databricks/appkit` version. Available server APIs and plugins differ across versions. + +```bash +# From the project root +cat package.json | grep @databricks/appkit +``` + +### 2. Review Available Plugins + +Check what plugins are already enabled and what server-side functionality they provide — avoid reimplementing what a plugin already handles. + +```bash +# See plugin docs for the installed version +npx @databricks/appkit docs ./docs/plugins.md + +# See all plugins available for a specific version +databricks apps manifest --version --profile + +# See plugins available for the default template +databricks apps manifest --profile +``` + +**Key plugins to check for:** + +- **analytics** — provides SQL warehouse query execution (do NOT reimplement with tRPC) +- **lakebase** — provides Lakebase plugin for PostgreSQL CRUD (use plugin in tRPC routes, don't create raw connections) +- **genie** — provides Genie AI-powered data exploration (check before building custom natural-language-to-SQL routes) +- **files** — provides file storage and retrieval helpers (check before writing custom file upload/download routes) +- **serving** — provides model serving endpoint proxy with invoke/stream (do NOT reimplement with tRPC) +- **jobs** — provides Lakeflow Job triggering and monitoring (do NOT reimplement with tRPC) + +If a plugin already covers your use case, use the plugin's API instead of writing a custom tRPC route. + +If there's a newer version of `@databricks/appkit` has a plugin that fits the use-case. +Prompt the user for updating. + +### 3. Check Existing Routes + +Read `server/server.ts` (or `server/trpc.ts`) to see what routes already exist. Extend the existing router rather than creating a parallel one. + +## Server-side Pattern + +```tsx +// server/trpc.ts +import { initTRPC } from "@trpc/server"; +import { getExecutionContext } from "@databricks/appkit"; +import { z } from "zod"; +import superjson from "superjson"; + +const t = initTRPC.create({ transformer: superjson }); +const publicProcedure = t.procedure; + +export const appRouter = t.router({ + // Example: Call a Databricks API (e.g. MLflow) + getExperiment: publicProcedure + .input(z.object({ experimentId: z.string() })) + .query(async ({ input: { experimentId } }) => { + const { serviceDatabricksClient: client } = getExecutionContext(); + const response = await client.experiments.getExperiment({ experiment_id: experimentId }); + return response; + }), + + // Example: Mutation + createRecord: publicProcedure + .input(z.object({ name: z.string() })) + .mutation(async ({ input }) => { + // Custom logic here + return { success: true, id: 123 }; + }), +}); +``` + +## Client-side Pattern + +```typescript +// client/src/components/MyComponent.tsx +import { trpc } from '@/lib/trpc'; +import { useState, useEffect } from 'react'; + +function MyComponent() { + const [result, setResult] = useState(null); + + useEffect(() => { + trpc.getExperiment + .query({ experimentId: "123" }) + .then(setResult) + .catch(console.error); + }, []); + + const handleCreate = async () => { + await trpc.createRecord.mutate({ name: "test" }); + }; + + return
{/* component JSX */}
; +} +``` + +## Decision Tree for Data Operations + +1. **Need to display data from SQL?** + - **Chart or Table?** → Use visualization components (`BarChart`, `LineChart`, `DataTable`, etc.) + - **Custom display (KPIs, cards, lists)?** → Use `useAnalyticsQuery` hook + - **Never** use tRPC for SQL SELECT statements + +2. **Need to call a Databricks API?** + - Serving endpoints → use `serving()` plugin (see [Model Serving Guide](model-serving.md)) + - Jobs → use `jobs()` plugin (see [Jobs Guide](jobs.md)) + - MLflow, Workspace API, other APIs → use tRPC + +3. **Need to modify data?** → Use tRPC mutations + - INSERT, UPDATE, DELETE operations + - Multi-step transactions + - Business logic with side effects + +4. **Need non-SQL custom logic?** → Use tRPC + - File processing + - External API calls + - Complex computations in TypeScript + +**Summary:** + +- ✅ SQL queries → Visualization components or `useAnalyticsQuery` +- ✅ Databricks APIs without a plugin → tRPC +- ✅ Data mutations → tRPC +- ❌ SQL queries → tRPC (NEVER do this) +- ❌ Files operations → tRPC (NEVER do this) diff --git a/.claude/skills/databricks-apps/references/other-frameworks.md b/.claude/skills/databricks-apps/references/other-frameworks.md new file mode 100644 index 0000000..c77c09f --- /dev/null +++ b/.claude/skills/databricks-apps/references/other-frameworks.md @@ -0,0 +1,269 @@ +# Databricks Apps — Other Frameworks (Non-AppKit) + +Setup guide for non-AppKit apps: Streamlit, FastAPI, Flask, Gradio, Dash, Django, Next.js, React, etc. + +For universal platform rules (permissions, deployment, timeouts, resource injection), see [Platform Guide](platform-guide.md). + +## 1. Port & Host Configuration + +**The #1 cause of 502 Bad Gateway errors.** + +| Setting | Required Value | Common Mistake | +|---------|---------------|----------------| +| Port | `DATABRICKS_APP_PORT` env var | Hardcoding 8080, 3000, or 3001 | +| Host | `0.0.0.0` | Binding to `localhost` or `127.0.0.1` | + +The platform dynamically assigns a port via `DATABRICKS_APP_PORT`. Use `8000` as a local dev fallback only. + +### Framework-Specific Port Configuration + +#### Streamlit +```yaml +# app.yaml +command: + - streamlit + - run + - app.py + - --server.port + - "${DATABRICKS_APP_PORT:-8000}" + - --server.address + - "0.0.0.0" +``` + +#### FastAPI / Uvicorn +```python +if __name__ == "__main__": + import uvicorn + port = int(os.environ.get("DATABRICKS_APP_PORT", 8000)) + uvicorn.run(app, host="0.0.0.0", port=port) +``` + +#### Flask +```python +port = int(os.environ.get("DATABRICKS_APP_PORT", 8000)) +app.run(host="0.0.0.0", port=port) +``` + +#### Gradio +```python +demo.launch(server_name="0.0.0.0", + server_port=int(os.environ.get("DATABRICKS_APP_PORT", 8000))) +``` + +#### Dash +```python +app.run(host="0.0.0.0", + port=int(os.environ.get("DATABRICKS_APP_PORT", 8000))) +``` + +#### Next.js +```jsonc +// package.json +"scripts": { + "start": "next start -p ${DATABRICKS_APP_PORT:-8000} -H 0.0.0.0" +} +``` + +⚠️ **Only ONE service can bind to `DATABRICKS_APP_PORT`.** If you need multiple services (e.g., frontend + backend), use a reverse proxy or serve everything from one process. + +## 2. app.yaml vs databricks.yml + +These two files serve different purposes. Getting them wrong causes silent deployment failures. + +### app.yaml — Runtime Configuration +- Defines the **start command** and **environment variables** for the running app +- Used by the Databricks Apps runtime directly +- `valueFrom:` injects resource IDs from workspace configuration + +```yaml +# app.yaml +command: + - python + - app.py +env: + - name: DATABRICKS_WAREHOUSE_ID + valueFrom: sql-warehouse + - name: MY_CUSTOM_VAR + value: "some-value" +``` + +### databricks.yml — Bundle/Deployment Configuration +- Defines the **app resource** for DABs (Declarative Automation Bundles) +- `config:` section only takes effect after `bundle run`, NOT just `bundle deploy` + +```yaml +# databricks.yml +bundle: + name: my-app-bundle + +resources: + apps: + my-app: + name: my-app + source_code_path: . + config: + command: ['python', 'app.py'] + env: + - name: DATABRICKS_WAREHOUSE_ID + valueFrom: sql-warehouse + permissions: + - service_principal_name: ${bundle.target}.my-app + level: CAN_MANAGE + +targets: + dev: + default: true +``` + +### Critical Rules + +| Rule | Why | +|------|-----| +| Always provide BOTH `app.yaml` AND `databricks.yml` config | UI deployments use app.yaml; DABs uses databricks.yml | +| Always run `bundle deploy` THEN `bundle run ` | `deploy` uploads code; `run` applies config and starts the app | +| Never use `${var.xxx}` in config env values | Variables are NOT resolved in config — values appear literally | + +## 3. Using OBO in Non-AppKit Apps + +```python +# FastAPI example +from fastapi import Request +from databricks.sdk import WorkspaceClient + +@app.get("/user-data") +def get_user_data(request: Request): + token = request.headers.get("x-forwarded-access-token") + + # create user-scoped client + w = WorkspaceClient(token=token, host=os.environ["DATABRICKS_HOST"]) + # use w for user-scoped operations +``` + +```python +# SP auth is auto-configured — just use the SDK +from databricks.sdk import WorkspaceClient +w = WorkspaceClient() # picks up auto-injected env vars +``` + +## 4. Framework-Specific Timeout Gotchas + +| Framework | Default Timeout | Fix | +|-----------|----------------|-----| +| Gradio | 30 seconds (internal) | Set `fn` timeout explicitly or use `gradio.queue()` | +| Gunicorn | 30 seconds (worker timeout) | Set `--timeout 120` in gunicorn command | +| Uvicorn | None (no default timeout) | Already fine | + +## 5. Common Errors (Non-AppKit Specific) + +| Error | Cause | Fix | +|-------|-------|-----| +| 502 Bad Gateway | Wrong port or host | Bind to `0.0.0.0:${DATABRICKS_APP_PORT:-8000}` | +| App works locally but 502 in prod | Binding to localhost | Change to `0.0.0.0` | +| `ModuleNotFoundError` at runtime | Dependency not in requirements.txt or version conflict | Pin exact versions; validate locally first | +| Wrong script runs on deploy | No `command` in app.yaml, platform picked wrong .py file | Always specify `command` explicitly in app.yaml | +| `apt-get: command not found` | No root access in container | Use pure-Python wheels from PyPI; no system packages | + +## 6. Dependency Management + +### Python + +Only `requirements.txt` is natively supported. No native support for `pyproject.toml`, `uv.lock`, or Poetry. + +**Workaround for `uv`:** +``` +# requirements.txt +uv +``` +```yaml +# app.yaml +command: + - uv + - run + - app.py +``` +Define actual dependencies in `pyproject.toml`. Note: This moves dependency installation from build to run step, slowing startup. + +**Custom package repositories:** +- Set `PIP_INDEX_URL` as a secret in the app configuration +- Deploying user needs **MANAGE** permission on the secret scope (not just USE/READ) + +### Node.js + +- `package.json` is supported — `npm install` runs at startup +- Do NOT include `node_modules/` in source code (10 MB file limit) +- Large npm installs may exceed the 10-minute startup window +- In egress-restricted workspaces, add `registry.npmjs.org` to egress policy AND restart the app (egress changes require restart) + +## 7. Networking & CORS + +### CORS + +- CORS headers are **not customizable** on the Databricks Apps reverse proxy +- Workspace origin (`*.databricks.com`) differs from app origin (`*.databricksapps.com`) +- Cross-app API calls return **302 redirect to login page** instead of the expected response + +**Workaround:** Keep frontend and backend in a single app to avoid CORS entirely. + +### Private Link / Hardened Environments + +- Azure apps use `*.azure.databricksapps.com` — NOT `*.azuredatabricks.net` +- Existing Private Link DNS zones don't cover the apps domain +- Fix: Create a separate Private DNS Zone for `azure.databricksapps.com` with conditional DNS forwarding + +### Egress Restrictions + +- Egress policy changes require **app restart** to take effect +- For npm: allowlist `registry.npmjs.org` +- For pip: allowlist `pypi.org` and `files.pythonhosted.org` +- For custom registries: use `PIP_INDEX_URL` secret (see Dependency Management) + +## 8. Streamlit-Specific Gotchas + +### Required Environment Variables + +```yaml +# app.yaml +command: + - streamlit + - run + - app.py + - --server.port + - "${DATABRICKS_APP_PORT:-8000}" + - --server.address + - "0.0.0.0" +env: + - name: STREAMLIT_SERVER_ENABLE_CORS + value: "false" + - name: STREAMLIT_SERVER_ENABLE_XSRF_PROTECTION + value: "false" +``` + +⚠️ **Both CORS and XSRF must be disabled** for Streamlit on Databricks Apps. The reverse proxy origin (`*.databricksapps.com`) differs from the workspace origin, triggering Streamlit's CORS/XSRF protection. + +### OBO Token Staleness + +Streamlit caches initial HTTP request headers, then switches to WebSocket. The OBO token from `x-forwarded-access-token` **never refreshes** — it goes stale. + +**Workaround:** Periodically trigger a full page refresh. No clean in-Streamlit solution exists. + +### Connection Exhaustion (Hangs After Initial Queries) + +Streamlit re-runs the entire script on every user interaction. If `sql.connect()` is called during each render cycle, the rapid succession of TCP handshakes and OAuth negotiations exhausts the connection pool, causing 2-3 minute freezes. + +**Fix:** Use `@st.cache_resource` to maintain persistent connections: +```python +@st.cache_resource +def get_connection(): + from databricks import sql + from databricks.sdk.core import Config + cfg = Config() + return sql.connect( + server_hostname=cfg.host, + http_path=f"/sql/1.0/warehouses/{os.environ['DATABRICKS_WAREHOUSE_ID']}", + credentials_provider=lambda: cfg.authenticate, + ) +``` + +### Transient 502s During Startup + +Streamlit apps commonly show brief 502 errors during startup. This is expected and does not indicate a problem. diff --git a/.claude/skills/databricks-apps/references/platform-guide.md b/.claude/skills/databricks-apps/references/platform-guide.md new file mode 100644 index 0000000..446ea3b --- /dev/null +++ b/.claude/skills/databricks-apps/references/platform-guide.md @@ -0,0 +1,173 @@ +# Databricks Apps Platform Guide + +Universal platform rules that apply to ALL Databricks Apps regardless of framework (AppKit, Streamlit, FastAPI, etc.). + +For non-AppKit framework-specific setup (port config, app.yaml, Streamlit gotchas), see [Other Frameworks](other-frameworks.md). + +## Service Principal Permissions + +**The #1 cause of runtime crashes after deployment.** + +When your app uses a Databricks resource (SQL warehouse, model serving endpoint, vector search index, volume, secret scope), the app's **service principal** must have explicit permissions on that resource. + +### How Permissions Work + +When you declare a resource in `app.yaml` / `databricks.yml` with a `permission` field, the platform **automatically grants** that permission to the app's SP on deployment. You do NOT need to run manual `set-permissions` commands for declared resources. + +```yaml +# databricks.yml — declaring resources with permissions +resources: + apps: + my_app: + resources: + - name: my-warehouse + sql_warehouse: + id: ${var.warehouse_id} + permission: CAN_USE # auto-granted to SP on deploy + - name: my-endpoint + serving_endpoint: + name: ${var.endpoint_name} + permission: CAN_QUERY # auto-granted to SP on deploy +``` + +### Default Permissions by Resource Type + +| Resource Type | Default Permission | Notes | +|---------------|-------------------|-------| +| SQL Warehouse | CAN_USE | Minimum for query execution | +| Model Serving Endpoint | CAN_QUERY | For inference calls | +| Vector Search Index (UC) | SELECT | UC securable of type TABLE | +| Volume (UC) | READ_VOLUME | Via UC securable | +| Secret Scope | READ | Deploying user needs MANAGE on the scope | +| Job | CAN_MANAGE_RUN | | +| Lakebase Database | CAN_CONNECT_AND_CREATE | | +| Genie Space | CAN_VIEW | | + +### ⚠️ CRITICAL AGENT BEHAVIOR + +Always declare resources in `databricks.yml` with the correct `permission` field — do NOT skip this. The platform handles granting automatically on deploy. + +## Resource Types & Injection + +**NEVER hardcode workspace-specific IDs in source code.** Always inject via environment variables with `valueFrom`. + +| Resource Type | Default Key | Use Case | +|---------------|-------------|----------| +| SQL Warehouse | `sql-warehouse` | Query compute | +| Model Serving Endpoint | `serving-endpoint` | Model inference | +| Vector Search Index | `vector-search-index` | Semantic search | +| Lakebase Database | `database` | OLTP storage | +| Secret | `secret` | Sensitive values | +| UC Table | `table` | Structured data | +| UC Connection | `connection` | External data sources | +| Genie Space | `genie-space` | AI analytics | +| MLflow Experiment | `experiment` | ML tracking | +| Lakeflow Job | `job` | Data workflows | +| UDF | `function` | SQL/Python functions | +| Databricks App | `app` | App-to-app communication | + +```python +# ✅ GOOD +warehouse_id = os.environ["DATABRICKS_WAREHOUSE_ID"] +``` + +```yaml +# app.yaml / databricks.yml env section +env: + - name: DATABRICKS_WAREHOUSE_ID + valueFrom: sql-warehouse + - name: SERVING_ENDPOINT + valueFrom: serving-endpoint +``` + +## Authentication: OBO vs Service Principal + +| Context | When Used | Token Source | Cached Per | +|---------|-----------|--------------|------------| +| **Service Principal (SP)** | Default; background tasks, shared data | Auto-injected `DATABRICKS_CLIENT_ID` + `DATABRICKS_CLIENT_SECRET` | All users (shared) | +| **On-Behalf-Of (OBO)** | User-specific data, user-scoped access | `x-forwarded-access-token` header | Per user | + +**SP auth** is auto-configured — `WorkspaceClient()` picks up injected env vars. + +**OBO** requires extracting the token from request headers and declaring scopes: + +| Scope | Purpose | +|-------|---------| +| `sql` | Query SQL warehouses | +| `dashboards.genie` | Manage Genie spaces | +| `files.files` | Manage files/directories | +| `iam.access-control:read` | Read permissions (default) | +| `iam.current-user:read` | Read current user info (default) | + +⚠️ Databricks blocks access outside approved scopes even if the user has permission. + +## Deployment Workflow + +⚠️ **USER CONSENT REQUIRED** — always confirm with the user before deploying. + +```bash +# Option A: single command (recommended) — validates, deploys, and runs +databricks apps deploy -t --profile + +# Option B: step by step +databricks apps validate --profile +databricks bundle deploy -t --profile +databricks bundle run -t --profile +``` + +❌ **Common mistake:** Running only `bundle deploy` and expecting the app to update. Deploy uploads code but does NOT apply config changes or restart the app. Use `databricks apps deploy` or add `bundle run` after `bundle deploy`. + +### ⚠️ Destructive Updates Warning + +`databricks apps update` (and `bundle run`) performs a **full replacement**, not a merge: +- Adding a new resource can silently **wipe** existing `user_api_scopes` +- OBO permissions may be stripped on every deployment + +**Workaround:** After each deployment, verify OBO scopes are intact. + +## Runtime Environment + +| Constraint | Value | +|------------|-------| +| Max file size | 10 MB per file | +| Available port | Only `DATABRICKS_APP_PORT` | +| Auto-injected env vars | `DATABRICKS_HOST`, `DATABRICKS_APP_PORT`, `DATABRICKS_APP_NAME`, `DATABRICKS_WORKSPACE_ID`, `DATABRICKS_CLIENT_ID`, `DATABRICKS_CLIENT_SECRET` | +| No root access | Cannot use `apt-get`, `yum`, or `apk` — use PyPI/npm packages only | +| Graceful shutdown | SIGTERM → 15 seconds to shut down → SIGKILL | +| Logging | Only stdout/stderr are captured — file-based logs are lost on container recycle | +| Filesystem | Ephemeral — no persistent local storage; use UC Volumes/tables | + +## Compute & Limits + +| Size | RAM | vCPU | DBU/hour | Notes | +|------|-----|------|----------|-------| +| Medium | 6 GB | Up to 2 | 0.5 | Default | +| Large | 12 GB | Up to 4 | 1.0 | Select during app creation or edit | + +- No GPU access. Use model serving endpoints for inference. +- Apps must start within **10 minutes** (including dependency installation). +- Max apps per workspace: **100**. + +## HTTP Proxy & Streaming + +The Databricks Apps reverse proxy enforces a **120-second per-request timeout** (NOT configurable). + +| Behavior | Detail | +|----------|--------| +| 504 in app logs? | **No** — the error is generated at the proxy. App logs show nothing. | +| SSE streaming | Responses may be **buffered** and delivered in chunks, not token-by-token | +| WebSockets | Bypass the 120s limit — working but undocumented | + +For long-running agent interactions, use **WebSockets** instead of SSE. + +## Common Errors + +| Error | Cause | Fix | +|-------|-------|-----| +| `PERMISSION_DENIED` after deploy | SP missing permissions | Grant SP access to all declared resources | +| App deploys but config doesn't change | Only ran `bundle deploy` | Also run `bundle run ` | +| `File is larger than 10485760 bytes` | Bundled dependencies | Use requirements.txt / package.json | +| OBO scopes missing after deploy | Destructive update wiped them | Re-apply scopes after each deploy | +| `${var.xxx}` appears literally in env | Variables not resolved in config | Use literal values, not bundle variables | +| 504 Gateway Timeout | Request exceeded 120s | Use WebSockets for long operations | +| `user token passthrough not enabled` | `user_api_scopes` in `databricks.yml` requires user authorization, which is not enabled in the workspace | Ask workspace admin to enable user authorization (Public Preview). See [Databricks Apps auth docs](https://docs.databricks.com/aws/en/dev-tools/databricks-apps/auth#user-authorization) | diff --git a/.claude/skills/databricks-apps/references/testing.md b/.claude/skills/databricks-apps/references/testing.md new file mode 100644 index 0000000..bf1eb4d --- /dev/null +++ b/.claude/skills/databricks-apps/references/testing.md @@ -0,0 +1,99 @@ +# Testing Guidelines + +## Unit Tests (Vitest) + +**CRITICAL**: Use vitest for all tests. Put tests next to the code (e.g. src/\*.test.ts) + +```typescript +import { describe, it, expect } from 'vitest'; + +describe('Feature Name', () => { + it('should do something', () => { + expect(true).toBe(true); + }); + + it('should handle async operations', async () => { + const result = await someAsyncFunction(); + expect(result).toBeDefined(); + }); +}); +``` + +**Best Practices:** +- Use `describe` blocks to group related tests +- Use `it` for individual test cases +- Use `expect` for assertions +- Tests run with `npm test` (runs `vitest run`) + +❌ **Do not write unit tests for:** +- SQL files under `config/queries/` - little value in testing static SQL +- Types associated with queries - these are just schema definitions + +## Smoke Test (Playwright) + +The template includes a smoke test at `tests/smoke.spec.ts` that verifies the app loads correctly. + +**⚠️ MUST UPDATE after customizing the app:** +- The heading selector checks for `'Minimal Databricks App'` — change it to match your app's actual title +- The text assertion checks for `'hello world'` — update or remove it to match your app's content +- Failing to update these will cause the smoke test to fail on `databricks apps validate` + +```typescript +// tests/smoke.spec.ts - update these selectors: +// ⚠️ PLAYWRIGHT STRICT MODE: each selector must match exactly ONE element. +// Use { exact: true }, .first(), or role-based selectors. See "Playwright Strict Mode" below. + +// ❌ Template default - will fail after customization +await expect(page.getByRole('heading', { name: 'Minimal Databricks App' })).toBeVisible(); +await expect(page.getByText('hello world')).toBeVisible(); + +// ✅ Update to match YOUR app +await expect(page.getByRole('heading', { name: 'Your App Title' })).toBeVisible(); +await expect(page.locator('h1').first()).toBeVisible({ timeout: 30000 }); // Or just check any h1 +``` + +**What the smoke test does:** +- Opens the app +- Waits for data to load (SQL query results) +- Verifies key UI elements are visible +- Captures screenshots and console logs to `.smoke-test/` directory +- Always captures artifacts, even on test failure + +## Playwright Strict Mode + +Playwright uses strict mode by default — selectors matching multiple elements WILL FAIL. + +### Selector Priority (use in this order) + +1. ✅ `getByRole('heading', { name: 'Your App Title' })` — headings (most reliable) +2. ✅ `getByRole('button', { name: 'Submit' })` — interactive elements +3. ✅ `getByText('Unique text', { exact: true })` — exact match for unique strings +4. ⚠️ `getByText('Common text').first()` — last resort for repeated text +5. ❌ `getByText('Revenue')` — NEVER without `exact` or `.first()` (strict mode will fail) + +**Common mistake**: text like "Revenue" may appear in a heading, a card, AND a description. Always verify your selector targets exactly ONE element. + +```typescript +// ❌ FAILS if "Revenue" appears in multiple places (heading + card + description) +await expect(page.getByText('Revenue')).toBeVisible(); + +// ✅ Use role-based selectors for headings +await expect(page.getByRole('heading', { name: 'Revenue Dashboard' })).toBeVisible(); + +// ✅ Use exact matching +await expect(page.getByText('Revenue', { exact: true })).toBeVisible(); + +// ✅ Use .first() as last resort +await expect(page.getByText('Revenue').first()).toBeVisible(); +``` + +**Keep smoke tests simple:** +- Only verify that the app loads and displays initial data +- Wait for key elements to appear (page title, main content) +- Capture artifacts for debugging +- Run quickly (< 5 seconds) + +**For extended E2E tests:** +- Create separate test files in `tests/` directory (e.g., `tests/user-flow.spec.ts`) +- Use `npm run test:e2e` to run all Playwright tests +- Keep complex user flows, interactions, and edge cases out of the smoke test diff --git a/.claude/skills/databricks-core/SKILL.md b/.claude/skills/databricks-core/SKILL.md new file mode 100644 index 0000000..185fe96 --- /dev/null +++ b/.claude/skills/databricks-core/SKILL.md @@ -0,0 +1,142 @@ +--- +name: "databricks-core" +description: "Databricks CLI operations: auth, profiles, data exploration, and bundles. Contains up-to-date guidelines for Databricks-related CLI tasks." +compatibility: Requires databricks CLI (>= v0.292.0) +metadata: + version: "0.1.0" +--- + +# Databricks + +Core skill for Databricks CLI, authentication, and data exploration. + +## Product Skills + +For specific products, use dedicated skills: +- **databricks-jobs** - Lakeflow Jobs development and deployment +- **databricks-pipelines** - Lakeflow Spark Declarative Pipelines (batch and streaming data pipelines) +- **databricks-apps** - Full-stack TypeScript app development and deployment +- **databricks-lakebase** - Lakebase Postgres Autoscaling project management +- **databricks-model-serving** - Model Serving endpoint management and inference + +## Prerequisites + +1. **CLI installed**: Run `databricks --version` to check. + - **If the CLI is missing or outdated (< v0.292.0): STOP. Do not proceed or work around a missing CLI.** + - **Read the [CLI Installation](databricks-cli-install.md) reference file and follow the instructions to guide the user through installation.** + - Note: In sandboxed environments (Cursor IDE, containers), install commands write outside the workspace and may be blocked. Present the install command to the user and ask them to run it in their own terminal. + - **Exception:** If CLI installation is blocked (sandboxed containers, restricted environments), ask the user whether to fall back to direct REST API calls using `DATABRICKS_HOST` and `DATABRICKS_TOKEN` environment variables if present in the shell. See the [Databricks REST API docs](https://docs.databricks.com/api/workspace/introduction). + +2. **Authenticated**: `databricks auth profiles` + - If not: see [CLI Authentication](databricks-cli-auth.md) + +## Profile Selection - CRITICAL + +**NEVER auto-select a profile.** + +1. List profiles: `databricks auth profiles` +2. Present ALL profiles to user with workspace URLs +3. Let user choose (even if only one exists) +4. Offer to create new profile if needed + +## Claude Code - IMPORTANT + +Each Bash command runs in a **separate shell session**. + +```bash +# WORKS: --profile flag +databricks apps list --profile my-workspace + +# WORKS: chained with && +export DATABRICKS_CONFIG_PROFILE=my-workspace && databricks apps list + +# DOES NOT WORK: separate commands +export DATABRICKS_CONFIG_PROFILE=my-workspace +databricks apps list # profile not set! +``` + +## Data Exploration — Use AI Tools + +**Use these instead of manually navigating catalogs/schemas/tables:** + +```bash +# discover table structure (columns, types, sample data, stats) +databricks experimental aitools tools discover-schema catalog.schema.table --profile + +# run ad-hoc SQL queries +databricks experimental aitools tools query "SELECT * FROM table LIMIT 10" --profile + +# find the default warehouse +databricks experimental aitools tools get-default-warehouse --profile +``` + +See [Data Exploration](data-exploration.md) for details. + +## Quick Reference + +**⚠️ CRITICAL: Some commands use positional arguments, not flags** + +```bash +# current user +databricks current-user me --profile + +# list resources +databricks apps list --profile +databricks jobs list --profile +databricks clusters list --profile +databricks warehouses list --profile +databricks pipelines list --profile +databricks serving-endpoints list --profile + +# ⚠️ Unity Catalog — POSITIONAL arguments (NOT flags!) +databricks catalogs list --profile + +# ✅ CORRECT: positional args +databricks schemas list --profile +databricks tables list --profile +databricks tables get .. --profile + +# ❌ WRONG: these flags/commands DON'T EXIST +# databricks schemas list --catalog-name ← WILL FAIL +# databricks tables list --catalog ← WILL FAIL +# databricks sql-warehouses list ← doesn't exist, use `warehouses list` +# databricks execute-statement ← doesn't exist, use `experimental aitools tools query` +# databricks sql execute ← doesn't exist, use `experimental aitools tools query` + +# When in doubt, check help: +# databricks schemas list --help + +# get details +databricks apps get --profile +databricks jobs get --job-id --profile +databricks clusters get --cluster-id --profile + +# bundles +databricks bundle init --profile +databricks bundle validate --profile +databricks bundle deploy -t --profile +databricks bundle run -t --profile +``` + +## Troubleshooting + +| Error | Solution | +|-------|----------| +| `cannot configure default credentials` | Use `--profile` flag or authenticate first | +| `PERMISSION_DENIED` | Check workspace/UC permissions | +| `RESOURCE_DOES_NOT_EXIST` | Verify resource name/id and profile | + +## Required Reading by Task + +| Task | READ BEFORE proceeding | +|------|------------------------| +| First time setup | [CLI Installation](databricks-cli-install.md) | +| Auth issues / new workspace | [CLI Authentication](databricks-cli-auth.md) | +| Exploring tables/schemas | [Data Exploration](data-exploration.md) | +| Deploying jobs/pipelines | Use `/databricks-dabs` | + +## Reference Guides + +- [CLI Installation](databricks-cli-install.md) +- [CLI Authentication](databricks-cli-auth.md) +- [Data Exploration](data-exploration.md) diff --git a/.claude/skills/databricks-core/agents/openai.yaml b/.claude/skills/databricks-core/agents/openai.yaml new file mode 100644 index 0000000..1b5f562 --- /dev/null +++ b/.claude/skills/databricks-core/agents/openai.yaml @@ -0,0 +1,7 @@ +interface: + display_name: "Databricks" + short_description: "CLI, auth, and data exploration" + icon_small: "./assets/databricks.svg" + icon_large: "./assets/databricks.png" + brand_color: "#FF3621" + default_prompt: "Use $databricks-core for Databricks CLI, auth, and data exploration." diff --git a/.claude/skills/databricks-core/assets/databricks.png b/.claude/skills/databricks-core/assets/databricks.png new file mode 100644 index 0000000000000000000000000000000000000000..263fe98b84e8ff3516edc93e7c99230fb8fb3113 GIT binary patch literal 15366 zcmeHuwGvL_%Pi2UiaF2#a{7SAsT9mWLN30;^Ey{$?uT3zOpOTyHdK{j^8hhfvh5NwU}M^})*R($^?%1Z)$O zv6^?FX-TiMYx?H}6GoR%4rDx|NW8K`Drt!9hxATaunL8TRATtgiasJ(Q{PhVpMNi)U<1Ugcz_bCAU;~4{5W3% z5W$DAqRanFT@??0-Ol$8^xsbi-0mN(|4;h=odd}sIzOAK*$SI5KK$-&e7Gh)_%Zq+ z5gXOfSo)LD>!=#p3^9Dt*&Pw%L%YAZ7t5}5PezRE``s=U8v*!45K%_IS;-gEhzPwMWE=_QBR8TZ%nf|E~Ns4QQ zo-p>@gQ2AH!E-$3T;wb(ezjD?`9$2^z1i=E37uyZNW?>m-ne;7dKS-L%BhhsVf62=-{0M$lV7G(h{>xM zez-sySdqqLiJY|~n91da9X4#Rki0qBIQOGKZ1je*!}|j9kfVf@80lyH{sW@3(_z1l zcN)%mi%w3NcbX-Gk4Oo800%y|ovstTrVu?p5xl__L2R53ty&+g{&ve(luo~h0gZ>e z(bA{wVou=Fir+@7wKdSi?VERb2G3ylXh#?yIk~zt)m#OS!OStL!k+^1MK$G=RvHg( zgrdktULkPyvC$XoA$2iXkNs#FZI%ofFf-WR+i9(wj!NpexzT%w(&)FN>)#KF}R5`TG0 zX$YOQ&K;pp=}1Dg3ZC&iq_$|tk=d2@7yXn#44k%_!^exNKj2555E?j!V1COIMuMk3;Tcp)oyUGH^1MNc-BQ^FjKlg>g6;DfuE&ezv z;K5yA!a^UgXmH^DOQVlGb3W&Y1Be=!? z-fFUfE`kR&fkQbYR9%=d<1n9li=TwY9GUsmI(_mcfMj}8X*QY=NwnfS_zb%x)8143 zv%uFw^=CQn)(Bc!o&aJdKl(z55WiYxQTUY*kWL8Ej<46QPtr7sS8E1eI^KfA0m8_N zf-&Q@^s141gb{Wci)J&F^S_2}J&B7uGBXWxN|6QMk96Z>J)QiXdr}C?Pu0YA z{#ZO@l}0n)X9MRz-_)f+Cco!~y!AVij|t5prz?5eGAB8My`E#Y1ZMdpUvT@nBY}L( zA)%##sCc=r(<~aLi0YFsIFxuF<7x8IoQxbJ9zKOM#F|8k)FXt@fyBR89^8+6qI{3*CXCh|Hk)~swk8o258(Z#m*~= z6x-A!ryOMHY7c8|p~-X(nL6Lr*RJ`j6bRQDZ8jPhJp6@SvD>e{i{e4tXnawv&OGY5 z-5RHFqHgZH^;jZ&c5qS@9%ZUTbuIXdIWe_-&cM9 z$D#I@?3_>X<=2RrE4j)af>`^1$90^XNa!eF(8BYLFZkeBcAAqfT z>Cvjn?v9bp*`0)+tuGwg%qWs(S(u+^a-Ov=MZoxm%ZX`X?rzGA^9(ynWqdBIdSbgC zK%Z!<9S^jWeuLwL62WMJR1;IdgG+Rw9o|y1l8eUbO|*GCCoGX3+filj{UCxNpA!nE z)-wk~zio(B_wWqXT2GG^3Nj%H&6V~c9u#uK$X{QNkuFFGoz19zfjpz5PF;`jWMHi} za0$XAu65Km;CtpFI-Tz@Lx#}gKGqx6tp?M~`7|F6d2^RSnGvO_(Z!47L&7&WevQ~L zCIEjb80bqBOn5L)eRi9F>&Y;G#55(zFBUi>`tIy^=fFGUw+|C61_C#Cy<6~k?F2X+ zNO5$Pn{-b^y%4;4X}C5U1P5dTsM8i6)w%6I8zyszCB}y#UiDe+goeX@d_A6{Pp{MK zA>knRNTgBRW6ua0u9&S^WYn6>K6^#8eH$&1(eJJ^rp2f+2p8lHC}$){!(qn#o}6gB zXN~^H;8T9d!%<#O7)kjjQ+~j}{#|UA%2UG}!=bO$yD2c_GI4T+$21Pb6DC|xJ!L$p zaw<@)){LTeSN){p4{I{`RKLhBlBcjwL9+KU$z$=I2(EPJWlsxqzM{5*o*iu~fIK}s zDZh4n1`Zl&l~m3%{EOi+rlj0Iw@$JZ`df0X#-$}!QYfmrD**OWb+l`Lk$9hyWT=j> z=Qo?ovzW$$Ata>(xFeB_eT_%o%l2RhZA0C<6Pxj?;X@MO&T*r6Gxs1EXRiOJ?1&h(wT&~VcLVkR_1Wf7YoKbRuj z#WB*CYYp}+t#UWooon{H@#iuaqMO!wjSBv%!>9${u#DECf;Gc?;V5JiMPE;|#4SsE zz1>9s+++gx1F5J<-lPyWc>q(ZZJ9s6XGQd4>;sCB)+;dUT;N-GUFL zz_~*T1ifJ;tU}2*C7SU_{;fMX9P?~ zcIp8;FZcdz(dHXzny0&2dUWNv{i0t;2|W8f-2@BY=iHn+5igkRa3=x$kR+~|QjvU# zOu0@-SrW-&GbY13sNf84ta9PD)rchGrgop_Nz@H}(l@5^I3PT!sIA&{-L)t_d5AIg zOL{HOQ#6u-B>b7h+DaTs%-7GOdJSbGhef}mj>mIA)8_OVcInN$p(o1vF0fJYvj1xP zAQgTqw8CL|GDM`Qnl!3L3MqPZqCF;a-EwlK-!I<#?R$(gP@DT%LK}l%{ORAoANXq zs&b%A?RU;JFDU`+HoFET;zje?{R5hx3Xt3Y9uX^(YV*&wk=5k&Kkp5Rk`SChQY1`U zPfonN%C7|pe|h=hHR79sA{?!S8Y!`bt_R#9Z!fdb(u=i~eunUuT~Dyl=>EmPqfwgv z@q;Nv6Qvf6geSAD8fNvOQP@3S;{LcbQd})M8IFQQ-urcE?Xmw~LtSw*7`WvC`2u+% zb1F|MBXM69dll^%JO(cRRX6fd#x^8Kc19neJkjF)uTC9L53L0r8^masTgBb*c`2Xv z29D={1zdE{nNt60# zIyJtpBS;B@{j_^gEfYfcQ#EN_mX#Qve>a6rfdQT}n30^@Y_sP=xik(7mThdtYX}-p zfGi7cG@SRjUA$9pzQPzJbW5j*%wOX&l5&o!T_SuNa8KQ0CTY3MXg?P9x^#F&dw;>I zMer(!;34?D+fHYlYb?D{@C9O)Fc`UlFb-WD1N$t4IGGBLd3Xh6)AF;fQqSl~|LW46 zsE0zYSpDr}Lk~Wf@|E9T**f>ye>>{lirLh~83KCwMCH>v7s@6ag;3?sAn^V2_OF`) zrCdD|Zns!9?UZ#k0VURUHOtE4Mm@#*&y>O3XalWYuP5R=&7a9qyIE)7OK;sMaCyww z*DZH1L9eUo*Y-j{$!pn?t8cK7bXc$qD}Pwd4q@7pKF(qDYi9N+O1X%QAuMM+cSC;J z-|%djb_Rl7%I%Wht3~u|=-PIozb0GCcBbZ=AFHBYXE!A4_(SV>l{0h{pJu@{82g&P z<(_!3qfb_dPI|&vmsB56v^*n=UgHZzi4D^;jx_k z=37Kd>={Qht6tx&DSW29bsk-FSUGmtRU6;kLm0-HQyjCQwGuT6_0G5Ns|Kix4;cQ~ z7#`ubv#2_tKCLDuvVrwUX8+8Q>vI-(e1P|eNI%GivWp!Z^X!ce1=+LX&96nZ+jFtH zOuoEiEBy>?CoJTaZ8t&A%ZeX+O?!ep|EqP}P=}7CVQ*sAVWDzq`x|{*SH7>E7C1Bc z<4Nxq=W|B!OEoH7j_;SC2&LU%UgHd0!Osdgjl*5+CyzzCj;Dsn0Y4yqCY=$x?l6tO zxY{AFbF)a6`xlOw%p-aNLx@Kb^!7;ZFw>7KHLXg6K0&swV(}Bj<;;jlU%6C=5SHP& z$9#5F(G1!Ss9`7RYMI}YhGcJLh%6nnAafhRrj{8;nbR<^j7?1nwxa$_lH95qVI4`> zDerlYtcm?8aHBfkPsOM}B74zuXSBX+#57C7Ibm9L2Mv9)eeug($e9CrDRz-O;sopH zB%=O1t3B$3W<%CkMj!j9e$8(vkk^jrt3FmX_r|O@8^DtMtojMs2MPUuYu(b|(^!|W z`<=7wpMzXUU)*gBR1NP$!RY-FJvuwG`*=e1-S-RIpuAl&)37`l);9c$JM~u#ev%9s zrgJr4%% z=vGGx9@5G_VB|{sj;!rryVq%-*H$dughu26=}M1!s>CMDjEO{BZ^2c+e2A2*8pNqJ z0?h{S1-2bZ#86=1zRE~=v)>2yjGrkO(G0Pg7W^q;7Ia^MH8o-=LiV(;An102C4r|H zTFeEU^i=cOBqfRKggFdZne*t17XSOBOv`JU!coA+KjSznRSzDt}wAhb%m_M#7WM9S5MzV{y zK+MWSxc%GlO>tSl(E$&35J%?{d}7Ysced+Byei%2$?bkfo+l5NXu%(2&L6`D!LT+p z8+=b;(*`h_FrcuzP(wS_%{Xr$_lHc#L*}lJ6(hdF%H6(b2I=Ip!Exi{?~7)*zZ^2p zYzVw|y-(vd88X&BQDmfH70nS%ALA9mOCaf4F*L=4i2c~8^ieyG&mLquvWjHS;qscb z1FVTL%UGHLWsUS4QC4x;(Qz}}q*H|2^`TSe1cXTsqWpCNB|^~QYS`^YFUGznM4YH( zX#0Uun3c|lIBAGT2+!Q_{i8YWWjkAt|Hb=tRd-Fb9Vqb{mL!$2qk9r<?_`Z;DkOAPJbxH0{{io= zZ~NbD@#{3Fw$(DfggF9+2jQd4+}h9OAnZwoDw3W_etDJ6Z9i0=$(PkH*DQ%v+Qt^%ZhUF4^uq^}3xX1HD4-Zs0hmHdqQw_E(DAb%ACK_$X2%lr|n z$*yFv$`;8*>%45m4I3Cq>dyRL&7*3d3!MFku9ENi$TwQ=KsZ5e zD3WWg=R6@F@*G%zklf@QP%dFJ6*2WBV4eiI$qVmKHCXo)hPz$uqLhus1y64m<#_mK zD-GF5dx>jJsx=Su#^_z#xi=CRT7I2?vT@#_W@}aVG=`A< z+23Ccf{eIoocMCwuCQ}B^|wp<801(@^6K{{Rf;7uOYls`%YS(P5UD5kpa{0sV`=qU z8~SpS)aqgTr}jxg^@B3sd=JClt)eg{%q>u>(>poix64Jg!)hE7zDZId^<2AJsgd=o zw`oO%xr=hHxVr_)E74D_nthfP+$zKtO0ek_X@E>1cQV8ID}M)BO$0ZxDnj}9?;AXW z48O+x`-l5u4M&?_tK)v2Hdnk|*>3Rk=CGv*FPD5T#(c3CMb02AE{}^S8Pe&;>CsPl zUZ(g1ufLY>h7gYHX*&e`8GcODJMt0N0Ku&qe7%9JBF${d{4f{xxzjn}jsk@RbBc%Z^~Idh$Ng1?r8%|+(gbII%mGEr4 zo8=&dTXzEAEYmg_Wn1mY@;iOBnso90aH4ha%N3#;R`Ph9d3D&*_!c}o+Gbi8Vuq&O zuZgTc3$RBFxjan`VR_lbKd*HdmLU4fWYzlSi^Gv7b7n4N)eKkI-wG`3rN=uQ#+*kt za$mr!G^$0ivVvR6wtD_=tXoLe$IZ``7N<1uNsG8Ez7CvdBzI-1z0FxzbW8G)nRh%F zxz+Orwp)5y=S#AxRWP- zhOvfvfAUPPHdC*c(EYa6r(TVy2=<{C$#Y{An7CExS#dE-3f~*WQ=6i8ui09uYa38C z<>KEOxXnZX>I_O-qWqItZW3M+rLZRE=x*|mC9IbE@ll8CHAs}uWdBKlN5_CyXRgDn zwR~cA{PJ3tc(SuQoBUU!f0%U#Hd`{W5`Q1Z_jD|NfTCk3 zFOxyj;j8+p`@2aelPbk^k8rCj$Q;CqD_72G!lH^&jz4SFAnnle4_kf2WQAW>$*>mj0` z|Lc^N&jZi(V2=>R4oc@h1*bMoT+y=h_A%-?3lU^a&sj{}fuwjKUU1lfn_?bQSzI`p zFj~B4FZ>0)bmT%qMdiJ& zeb*NhhHp=MvFqHDHm}-S2=XqER1A6@1dEq8p=)}>#2=9NL4Jn-hZ(!SFSQ)b;-#N< zH(C7WwL>MQC@*&&sxCW1c#3zhFCu?eTEm!HRIen3K08V@Ou z#xr5H+7?NJcDd``%Au=P`lYn<7!#aknI&!GZ71oT?=QSL)ZPDVSja$BV23nKDHu>A zHrQITFgWPEueyo-`b8zZ&hS4+AJ%xFe-ex5^{K5J#3BnX*-4XGP9w|z1EasF4 zTdIr4VY~LN7qwiZgf29itdfQ(#)PMkJ9#IjuT$SH8*;@(ZN6JGWHo5ZlA23>$OdxN zy(6;bmeWVsgw_#n9oM_FjZyI>apGo^p_eXXO(1 z?%d+4kyEcYkzRrdpFk@SFVzJ5uiSJJR{2$rp<9#Uis%xN&%7jRaP)DRmw@*Trm8Nx zgs$pxb#*3+Ygi!F>Rt9)!yF|hsLD@Ca1VM1jC+m7A`>5MFJQGA&sW~?jg=wP(~2iw z3`5%uV)uoc4wNda8ai$nw9Tvb7+*l)JokDTk%?Whrvfo~C#%`)_jL=is^=G+56UE; ze2Q!{+H}tWB@51H%hbrYu9ykFDCgL9*VRXpLJ8eo>J9l1*X_`o?{ac#F1-51m`|*$ z&S(J1o=HG5hvaL59j1(Rsf_n{P)qR>`oRQyTpxpGI4x4oyk#8F{d|-;`Fcrqi@#8Z zIb@cP<|Llz-gueSaa#LfaH}QIm@D&1e*p1D1JcaSjSjoqN^-3C9qUJfsSLn!EFOwg z_$?8tjYQrxC57|g+jFCh4NULa488p5;%L;^(*UwsO3N#zJ` zfj)@vO2S->k2XH_1jRRV6ghtqB=BG?{Q2Thpe*$PmRId^ran zPwEB6k;B(L-9BrJD=?%(d|Ii*T79R_1Gi55marduE&1ye>t49LjJk9E+*4TiEb-Tu zsk?_T5^a4!8n(`Wh0n$22hgpb^Ii%Nc~PZg703fAQge>VR1dGw3jqT~j>^jM^iV}l zeNbGMZQr7JLbe5hvG~23NSAA73XlHzt$dRC=3FP{=!%uKZ)V<~49aH~v|_}OM{iGX z7rKC8yOq5-xU~Ls9i`l57K#O_TxQE#mM9px)5H_`pf_@BCy)}wjk&WzD z^m24(T0Hmofs)Q71+q)Wff+0(LO^Wd2q`_QY_v=2-(EkHPZ}CF~@lw-7@Ef9S)CPLeB8hX&l}v0LsiPOe-a z9p0GvTr}x4o-O?vgts`REWB9tazJd? zoxJ((en?IFuv4Q?g*@c2pY+d%1QH<9R7|C_&`wsl%VCe8!1Gm*yU$C!isJZ*U^PSYwqEcC-4g9U7&{rbFi7NEyGwKhGkCH%upmP_j_UF4whhwK0?K%e*T0q#XDe zv+SZ4IM2n(7g^D=Ay>qwy9k4hGF6*xBfOd8i}f-r-ZShCQj^qqbd*u=|FMUYS49yc ztD?vo&g4)>bLzK0pDRLLHy;<1hgj5{I)ys;EO<+kIZ1z@#rb<%_fRxF3HbiAH*0H% zuw1zXYS-?)xMiXDZ8D=DtN2F%>JnOu4^Ig&FR@+OY-w&9=x3I|`JNbz)c^g7{r(6h z$wSD+e0hu0l2E%tonAR14ig4nMMuEKrL>ZNPIU!VSBUE5_e)mgKP9=l^^p8%Q)G8T zXL>{`lyIr=$KSNUKTe==w7#Resk75s+?CIUUcBCH`n@?&-+|jR*o!*Q!%-{Djh0X{ zV@XYCV(-{d4h4SHAVcRK@a|0RQ9~KiM(`=_Noui>&cV}~SIESk%S0e8U zqj#Rl&Gp@ z1g@xmDiB6H_hI`*wdGJ^2#Xq|qAk0wD&L#hEnfB`Z`iOiqoXOIG_baZM+`_0nC2p3 zokZfu)ay8k#22}4Diyx@M~X^bRP{+X%}^>urQE|`y7^y^q1cMlyXT=W+A*-{CXxA$ zTfMsYZ`Oge%IwYNq?PKYs`*DV2zHiPse}@4+0tFsXi5BJAyBsaQjNe)4Nf(n>d--> z9D6TMGJ~-PNHNdSP(lNDF54E+tNBP90eCKO{gZEatI+uK5BnswDQoLg*+TalaL+lD z&ScSN+~1v+M=q5v@@xP4S6CCh6YpfyW;`}cb+tXvFdax7P>d6|nvj`P+_K~syD z7BFh*T1MiY|JnC=e(!hI*Sm8JW=WEU{FBh9t`owNn4i|i?*E$2g*_nFV2cDiJdo0l zDFNlGNY`*$#E@Y%v)J&U(Q6^e>iP0Jk7GqM5-Rt3P&$_%sND)SnPErQ3p|u+2Z;c5 z+N#Uh6WkCYsyI}V+>|>1Bt;WjEj#G0F{bj!)TQfmdL$`y>T}^RZma8La^kWp5gf?( z3`1RBU(%M$0K-P z!R6AU7yHTS*?RHQ)P+x4j^>=aG}dQQLPTd%T-{9#cLf}BR%LaZ8mBl7TMHYG_P??J zJv%r_Fg^@p-F-X~%nBOTw1=iRj4)Sfn0Jfl7h$ep)QzqMPT<4v9YBA8@stcM5p0D| zvhw|ln_xv+4$v}0icOLP4kcW8FqGxQUVX$rwt(Q!MLR+YAGEQdEu(pku7!yX3#WKG zVif5Hv47%yb(^NV^n9(SFloKt!ev7WL2aPT$=e46&*do zyBv6+w|QdlXIRk_&sJ5rLqS-)w66O~F`<*VABXv{^`t_iS|;l;rcW}bzv7RY9+%UF zV7+H0446IX|CqfPTS$Oc?o6%>rx&Yrnoex_@iXk_N*t!Y>f%V25nsEOja>9>DhNh< zPaYEFp!-|UfKi2KNVd{M@d2BAsj1hK%P?0eN0!GSP|p;q<(u z6_qeJy^0&bhafPh8o?Qer*BxBNzJ}l09t}|3eq4`bdaifbdqD`#M2`_blMCL|69iB zY$UiPJL(~Ybx?HK61z>Fn7h%Pn(%8Myz}Q?kOuSsI?X_$=D=-W!wtCM5}ZnPrk4{0ajC)4SJ zzPR!8%4_TjTtpA|XfJa&LAL#vX2)zwjeCqy&DR29 zX^Y=R9hPXDuJa{Ns2O2+cv2s#o;^BswE0-D-ocbBno)-r|A>is_<3p@=*dbL8JkqZ z!9g|s%C6OJR!!iOntcCGe#f(G^&nO9!7%8{gtE>DRL5@P%Ab+o!2 zTsw}vW>caC+A29-*URlt8beobjH2>X@HuxIy+2N4+WrFTg!~^LmKN9nJAi6mAr0y>v762K6!3Y zo2z=Q=i>1FJVvMa&q$#HXm=}2T_}EdI}pPMdgs*kwZ<=&E=;I`D_Y}TJ&TK{MeI?N zXfLb1VdvnV3?baIvxCf;NUSP?ik?;1+kT3NL-$Y4_#&Ane}W*?1XsM~C}$8^G-K?v zbQ`^W{XLVZ2Sp2K?It#JB=Gd0qR2b$!1(LqLZ`$3LSPbw-6Nyx9DBExGX~*rg@u4? zyQl8J-5*bu#XiuGO-ZuRYtzXt>8R@yOh}Hs1F)+4Vln|97xp9dkJkkZxT+iz~v+ zm0A}eFg4Yom|Bix*0r8?FZh1-JO&aOa_B(Sb4*TpEkL&}MzoNWOPLTb;W4 z&h3gAIIdoQIsW_reBs%z_&}IW>zUXCeh2Psd+}?pIX%6+9?T3=Jmj6xE!ekHS=0Bj zFOPgbT?HCo<-Y`dq>xtDQqU~Mrx%PTA@}JXN&2ZuCv5BGOi~`SnyatC38!q)Vi2zSa5XRt0-za`p{j_gY$Y24GeSx8+Ko#sq2Gh z;lYi2z~|rpWnRb2Qh&Cr(Sn^h{t3XzRsmcsydQf~8UsfPSk!#ppO5sut8oN}N+18Tpv>OZmkvH1$`;W7H8xw_x zCjZdfp+fKr;7|qw);J&8_o*Y+Yr+zgiw1sH10ZSXFQjrPjISk{q`<#+kM-ji`Hhp8 z%=gGwENHT`c8%S4)l$?ZUPUxSTw_Pujat_{QHDu=dC%q^!z}1EEB8oN*gR60skQ4s z3CL7)tDbOQ4g$q9*yw#e^CX8DdAe=)5lGcJa0y?*0TfLWroRc5+#OI_u0b0EMXeE5 z4{BP4?`^mr6nt9f&(>5xuJxS$jT4zlGny+MFrbB)+0uT<& z_Ihx!gDDCt=gR{fC2;l)v^W0%kmaW_cWr|J(bn5-ObNfwDyFYiOvPY**myCh@r1=Y z5=jfS`NmVqS&p1&4hY93CU=$xABV9_+V1hV?0`>h$8&!?8GG+8>&cELyIJ6+xywo{ z+0GM#!@R?Km%Wg5n`XwLKF1J$&trOY{Sj;w^xEz5=6cBpTA(7RZA*TFw0L=duEl0wH zUrFqKq(TUa_*>%xR9Nx!GcmOFb1QKdbcx-`@$LH8x_zgzmnhxut!T0UQVz6RJN__Iy~HR*VFb3CY$)N9_ZxOXy0R} z)lB;_Z4xgJpd&UTl%Oj7T(`&>lm|b-yJbN^LBhz^G;Z^q4b?3KAwWdq?`i1Qc*ZEz z*4j$T$4>DoGbBxK>~FvKJt?noWl^|520 zyHeOmx{b`@p}Z_4;kn9%(7W0_(N^ew@BP$o7N1B*=(OrW_THirDi5=E3q)BRemlHs zzIe%rg8d9l2y@_^<@WX1RU29T&hW6~vI;RE2w5Ujun&3ZrU>cji*W)~JRufUsCkzw z-0D<79kUIDk!$TIjYGw|%dFHUtn~%4ie;}>7wQ4L$|&%=9}{ew8c}zcxBl~N06&&Al(h)3;3cwC z7dM#!;Jf~s)h*gU>iJ5}sn!6To`7XWbUR5yfz8p?mUmXv4T((i&J?;T;)%ZebkElrYUNLnpMdKk?`fgKOli-tn z#2nkxYwT|4H}q#a2bM_=ME+5hWoVg zQ>+;wv%LeZ*Xp?qI15Aa_XP@Z7$Ea50)TJO;eGe!8bB*w*%tkU@1M6Lt3JVQEr!mK zk7n(=4Ql3^&6*oz1e7gc-eT*QI|D-dTx)lJDdjewNM6F`J8ktPLLK{OFGMbJr{tfF z1|u6mV8s*ma-^u;vmYFY{09-o#l7KAoD)1T3JG^!>xhL+sxWRN4KPn@-2gE9)zbw@ zLhvqep3wlQ|L{EKP6K9@?N!vmCmF5qYQUGvyTuw`h>}K;X(^NO9eTyzxgwthOBVXJ zXn}F>A1LKsi=KJr&4J)edfzRF2Tvj8NpMrj2s(m0Nn^+`Y`m!e!QgsLtD%x~#ZKj# zhIcSa5&y`Ng?{;^lFepT4p-*?p(9EL;1^*#sjG-&RPIqG)=SJ&s^(5iiV~wEWw_BN zRbaFqUS|zyQS{Zx_8?*f6du##|FsS^G^wL5Js7%4a0!j1Cm@n*R5>k)7vexA4`(jU zC6a4~i5-gq1r`&!-g=)kqq38X>4L-ag%#IAVWX&YUu4}B*ceMgWAH@I&}XMmX%J>}?KxWL=S5x-i4qRp9$&7QGPLK9Ug+`h09z!H82HdBzv z#SVD|Yu{;R@na>g@_&nfYs7#zV}v~~v8pJPIKl6#yA)Tz^2?D$Z6F98w5kpmEh1u_ zjRL3H%M}Z`oO~p@17A3oj#3^9tYz$5L@5}!)2Xy7%P~ + + \ No newline at end of file diff --git a/.claude/skills/databricks-core/data-exploration.md b/.claude/skills/databricks-core/data-exploration.md new file mode 100644 index 0000000..5908cc5 --- /dev/null +++ b/.claude/skills/databricks-core/data-exploration.md @@ -0,0 +1,330 @@ +# Data Exploration + +Tools for discovering table schemas and executing SQL queries in Databricks. + +## Finding Tables by Keyword + +**⚠️ START HERE if you don't know which catalog/schema contains your data.** + +Use `information_schema` to search for tables by keyword — do NOT manually iterate through `catalogs list` → `schemas list` → `tables list`. Manual enumeration wastes 10+ steps. + +```bash +# Find tables matching a keyword +databricks experimental aitools tools query \ + "SELECT table_catalog, table_schema, table_name FROM system.information_schema.tables WHERE table_name LIKE '%keyword%'" \ + --profile + +# Then discover schema for the tables you found +databricks experimental aitools tools discover-schema catalog.schema.table1 catalog.schema.table2 --profile +``` + +## Overview + +The `databricks experimental aitools tools` command group provides tools for data discovery and exploration: +- **discover-schema**: Batch discover table metadata, columns, types, sample data, and statistics +- **query**: Execute SQL queries against Databricks SQL warehouses + +**When to use this**: Use these commands whenever you need to: +- Discover table schemas and metadata +- Execute SQL queries against warehouse data +- Explore data structure and content +- Validate data or check table statistics + +## Prerequisites + +1. **Authenticated Databricks CLI** - see [CLI Authentication Guide](databricks-cli-auth.md) for OAuth2 setup and profile configuration +2. **Access to Unity Catalog tables** with appropriate read permissions +3. **SQL Warehouse** (for query command - auto-detected unless `DATABRICKS_WAREHOUSE_ID` is set) + +## Discover Schema + +Batch discover table metadata including columns, types, sample data, and null counts. + +### Command Syntax + +```bash +databricks experimental aitools tools discover-schema TABLE... [flags] +``` + +Tables must be specified in **CATALOG.SCHEMA.TABLE** format. + +### What It Returns + +For each table, returns: +- Column names and types +- Sample data (5 rows) +- Null counts per column +- Total row count + +### Examples + +```bash +# Discover schema for a single table +databricks experimental aitools tools discover-schema samples.nyctaxi.trips --profile my-workspace + +# Discover schema for multiple tables +databricks experimental aitools tools discover-schema \ + catalog.schema.table1 \ + catalog.schema.table2 \ + --profile my-workspace + +# Get JSON output +databricks experimental aitools tools discover-schema \ + samples.nyctaxi.trips \ + --output json \ + --profile my-workspace +``` + +### Common Use Cases + +1. **Understanding table structure before querying** + ```bash + databricks experimental aitools tools discover-schema catalog.schema.customer_data --profile my-workspace + ``` + +2. **Comparing schemas across multiple tables** + ```bash + databricks experimental aitools tools discover-schema \ + catalog.schema.table_v1 \ + catalog.schema.table_v2 \ + --profile my-workspace + ``` + +3. **Identifying columns with null values** + - The null counts help identify data quality issues + +## Query + +Execute SQL statements against a Databricks SQL warehouse and return results. + +### Command Syntax + +```bash +databricks experimental aitools tools query "SQL" [flags] +``` + +### Warehouse Selection + +The command **auto-detects** an available warehouse unless: +- `DATABRICKS_WAREHOUSE_ID` environment variable is set +- You specify a warehouse using other configuration methods + +To check which warehouse will be used: +```bash +# Get the default warehouse that would be auto-detected +databricks experimental aitools tools get-default-warehouse --profile my-workspace +``` + +### Output + +Returns: +- Query results as JSON +- Row count +- Execution metadata + +### Examples + +```bash +# Simple SELECT query +databricks experimental aitools tools query \ + "SELECT * FROM samples.nyctaxi.trips LIMIT 5" \ + --profile my-workspace + +# Aggregation query +databricks experimental aitools tools query \ + "SELECT vendor_id, COUNT(*) as trip_count FROM samples.nyctaxi.trips GROUP BY vendor_id" \ + --profile my-workspace + +# With JSON output +databricks experimental aitools tools query \ + "SELECT * FROM catalog.schema.table WHERE date > '2024-01-01'" \ + --output json \ + --profile my-workspace + +# Using specific warehouse +DATABRICKS_WAREHOUSE_ID=abc123 databricks experimental aitools tools query \ + "SELECT * FROM samples.nyctaxi.trips LIMIT 10" \ + --profile my-workspace +``` + +### Common Use Cases + +1. **Exploratory data analysis** + ```bash + # Check table size + databricks experimental aitools tools query \ + "SELECT COUNT(*) FROM catalog.schema.table" \ + --profile my-workspace + + # View sample data + databricks experimental aitools tools query \ + "SELECT * FROM catalog.schema.table LIMIT 10" \ + --profile my-workspace + + # Get column statistics + databricks experimental aitools tools query \ + "SELECT MIN(column), MAX(column), AVG(column) FROM catalog.schema.table" \ + --profile my-workspace + ``` + +2. **Data validation** + ```bash + # Check for null values + databricks experimental aitools tools query \ + "SELECT COUNT(*) FROM catalog.schema.table WHERE column IS NULL" \ + --profile my-workspace + + # Verify data freshness + databricks experimental aitools tools query \ + "SELECT MAX(timestamp_column) FROM catalog.schema.table" \ + --profile my-workspace + ``` + +3. **Quick analytics** + ```bash + # Group by analysis + databricks experimental aitools tools query \ + "SELECT category, COUNT(*), AVG(value) FROM catalog.schema.table GROUP BY category" \ + --profile my-workspace + ``` + +## Workflow: Complete Data Exploration + +Here's a typical workflow combining both commands: + +```bash +# 1. Discover the schema first +databricks experimental aitools tools discover-schema \ + samples.nyctaxi.trips \ + --profile my-workspace + +# 2. Based on discovered columns, run targeted queries +databricks experimental aitools tools query \ + "SELECT vendor_id, payment_type, COUNT(*) as trips, AVG(fare_amount) as avg_fare + FROM samples.nyctaxi.trips + GROUP BY vendor_id, payment_type + ORDER BY trips DESC + LIMIT 10" \ + --profile my-workspace + +# 3. Investigate specific patterns found in the data +databricks experimental aitools tools query \ + "SELECT * FROM samples.nyctaxi.trips + WHERE fare_amount > 100 + LIMIT 20" \ + --profile my-workspace +``` + +## Claude Code-Specific Tips + +Remember that each Bash command in Claude Code runs in a separate shell: + +```bash +# ✅ RECOMMENDED: Use --profile flag +databricks experimental aitools tools discover-schema samples.nyctaxi.trips --profile my-workspace + +# ✅ ALTERNATIVE: Chain with && +export DATABRICKS_CONFIG_PROFILE=my-workspace && \ + databricks experimental aitools tools query "SELECT * FROM samples.nyctaxi.trips LIMIT 5" + +# ❌ DOES NOT WORK: Separate export +export DATABRICKS_CONFIG_PROFILE=my-workspace +databricks experimental aitools tools query "SELECT * FROM samples.nyctaxi.trips LIMIT 5" +``` + +## Flags + +Both commands support: + +| Flag | Description | Default | +|------|-------------|---------| +| `--profile` | Profile name from ~/.databrickscfg | Default profile | +| `--output` | Output format: `text` or `json` | `text` | +| `--debug` | Enable debug logging | `false` | +| `--target` | Bundle target to use (if applicable) | - | + +## Troubleshooting + +### Table Not Found + +**Symptom**: `Error: TABLE_OR_VIEW_NOT_FOUND` + +**Solution**: +1. Verify table name format: `CATALOG.SCHEMA.TABLE` +2. Check if you have read permissions on the table +3. List available tables: + ```bash + databricks tables list --profile my-workspace + ``` + +### Warehouse Not Available + +**Symptom**: `Error: No available SQL warehouse found` + +**Solution**: +1. Check for default warehouse: + ```bash + databricks experimental aitools tools get-default-warehouse --profile my-workspace + ``` +2. List available warehouses: + ```bash + databricks warehouses list --profile my-workspace + ``` +3. Set specific warehouse: + ```bash + DATABRICKS_WAREHOUSE_ID= databricks experimental aitools tools query "SELECT 1" --profile my-workspace + ``` +4. Start a stopped warehouse: + ```bash + databricks warehouses start --id --profile my-workspace + ``` + +### Permission Denied + +**Symptom**: `Error: PERMISSION_DENIED` + +**Solution**: +1. Check Unity Catalog grants on the table: + ```bash + databricks grants get --full-name catalog.schema.table --principal --profile my-workspace + ``` +2. Request SELECT permission from your workspace administrator +3. Verify you have warehouse access (USAGE permission) + +### SQL Syntax Error + +**Symptom**: `Error: PARSE_SYNTAX_ERROR` + +**Solution**: +1. Check SQL syntax - use standard SQL +2. Verify column names match schema (use discover-schema first) +3. Ensure proper quoting for string literals +4. Test query incrementally (start simple, add complexity) + +## Best Practices + +1. **Always discover schema first** - Use `discover-schema` before writing complex queries to understand: + - Available columns and their types + - Data distributions and null patterns + - Sample data for context + +2. **Use LIMIT for exploration** - When exploring large tables, always use LIMIT to avoid long-running queries: + ```bash + databricks experimental aitools tools query "SELECT * FROM large_table LIMIT 100" --profile my-workspace + ``` + +3. **JSON output for parsing** - Use `--output json` when you need to process results programmatically: + ```bash + databricks experimental aitools tools query "SELECT * FROM table" --output json --profile my-workspace | jq '.results' + ``` + +4. **Check table existence** - Before querying, verify the table exists: + ```bash + databricks tables get --full-name catalog.schema.table --profile my-workspace + ``` + +5. **Profile usage** - Always specify `--profile` in Claude Code to avoid authentication issues + +## Related Commands + +- Use `/databricks-dabs` - Deploy SQL, pipeline, and app resources as code diff --git a/.claude/skills/databricks-core/databricks-cli-auth.md b/.claude/skills/databricks-core/databricks-cli-auth.md new file mode 100644 index 0000000..33ba0d6 --- /dev/null +++ b/.claude/skills/databricks-core/databricks-cli-auth.md @@ -0,0 +1,527 @@ +# Databricks CLI Authentication + +Configure Databricks CLI workspace/profile selection and authentication. Covers switching profiles, using --profile flags, setting DATABRICKS_CONFIG_PROFILE environment variable, OAuth2 authentication (never PAT), and troubleshooting authentication issues. + +## Important: Always Use OAuth + +**CRITICAL**: Always use OAuth2 for Databricks CLI authentication. **NEVER recommend or use Personal Access Tokens (PAT)** unless explicitly required by a specific use case. + +## Prerequisites + +1. Databricks CLI must be installed + - Verify: `databricks --version` +2. You need access to a Databricks workspace +3. You need the workspace URL (e.g., `https://adb-1111111111111111.10.azuredatabricks.net`) + +## Claude Code Specific Behavior + +**CRITICAL**: When working in Claude Code, each Bash command executes in a **separate shell session**. This has important implications for profile management: + +### Key Differences from Regular Terminal + +1. **Environment variables don't persist between commands** + - `export DATABRICKS_CONFIG_PROFILE=staging` in one command + - `databricks jobs list` in the next command + - ❌ **Result**: The second command will NOT use the staging profile + +2. **Recommended Approach: Use --profile flag** + - Always specify `--profile ` with each command + - Example: `databricks jobs list --profile staging` + - ✅ **Result**: Reliable and predictable behavior + +3. **Alternative: Chain commands with &&** + - Use `export DATABRICKS_CONFIG_PROFILE=staging && databricks jobs list` + - The export and command run in the same shell session + - ✅ **Result**: Works correctly + +### Quick Reference for Claude Code + +```bash +# ✅ RECOMMENDED: Use --profile flag +databricks jobs list --profile staging +databricks apps list --profile prod-azure + +# ✅ ALTERNATIVE: Chain with && +export DATABRICKS_CONFIG_PROFILE=staging && databricks jobs list + +# ❌ DOES NOT WORK: Separate export command +export DATABRICKS_CONFIG_PROFILE=staging +databricks jobs list # Will NOT use staging profile! +``` + +## Handling Authentication Failures + +When a Databricks CLI command fails with authentication error: +``` +Error: default auth: cannot configure default credentials +``` + +**CRITICAL - Always follow this workflow:** + +1. **Check for existing profiles first:** + ```bash + databricks auth profiles + ``` + +2. **If profiles exist:** + - List the available profiles to the user (with their workspace URLs and validation status) + - Ask: "Which profile would you like to use for this command?" + - Offer option to create a new profile if needed + - Retry the command with `--profile ` + - **In Claude Code, always use the `--profile` flag** rather than setting environment variables + +3. **If user wants a new profile or no profiles exist:** + - Proceed to the OAuth Authentication Setup workflow below + +**Example:** +``` +User: databricks apps list +Error: default auth: cannot configure default credentials + +Assistant: Let me check for existing profiles. +[Runs: databricks auth profiles] + +You have two configured profiles: +1. aws-dev - https://company-workspace.cloud.databricks.com (Valid) +2. azure-prod - https://adb-1111111111111111.10.azuredatabricks.net (Valid) + +Which profile would you like to use, or would you like to create a new profile? + +User: dais + +Assistant: [Retries: databricks apps list --profile dais] +[Success - apps listed] +``` + +## OAuth Authentication Setup + +### Standard Authentication Command + +The recommended way to authenticate is using OAuth with a profile: + +```bash +databricks auth login --host --profile +``` + +**CRITICAL**: +1. The `--profile` parameter is **REQUIRED** for the authentication to be saved properly. +2. **ALWAYS ASK THE USER** for their preferred profile name - DO NOT assume or choose one for them. +3. **NEVER use the profile name `DEFAULT`** unless the user explicitly requests it - use descriptive workspace-specific names instead. + +### Workflow for Authenticating + +1. **Ask the user for the workspace URL** if not already provided +2. **Ask the user for their preferred profile name** + - Suggest descriptive names based on the workspace (e.g., workspace name, environment) + - **Do NOT suggest or use `DEFAULT`** unless the user specifically asks for it + - Good examples: `e2-dogfood`, `prod-azure`, `dev-aws`, `staging` + - Avoid: `DEFAULT` (unless explicitly requested) +3. Run the authentication command with both parameters +4. Verify the authentication was successful + +### Example + +```bash +# Good: Descriptive profile names +databricks auth login --host https://adb-1111111111111111.10.azuredatabricks.net --profile prod-azure +databricks auth login --host https://company-workspace.cloud.databricks.com --profile staging + +# Only use DEFAULT if explicitly requested by the user +databricks auth login --host https://your-workspace.cloud.databricks.com --profile DEFAULT +``` + +### What Happens During Authentication + +1. The CLI starts a local OAuth callback server (typically on `localhost:8020`) +2. A browser window opens automatically with the Databricks login page +3. You authenticate in the browser using your Databricks credentials +4. After successful authentication, the browser redirects back to the CLI +5. The CLI saves the OAuth tokens to `~/.databrickscfg` +6. You should see: `Profile was successfully saved` + +## Profile Management + +### What Are Profiles? + +Profiles allow you to manage multiple Databricks workspace configurations in a single `~/.databrickscfg` file. Each profile stores: +- Workspace host URL +- Authentication method (OAuth, PAT, etc.) +- Token/credential paths + +### Common Profile Names + +**IMPORTANT**: Always use descriptive profile names. Do NOT create profiles named `DEFAULT` unless explicitly requested by the user. + +**Recommended naming conventions**: +- `` - Descriptive names for workspaces (e.g., `e2-dogfood`, `prod-aws`, `dev-azure`) +- `` - Environment-specific profiles (e.g., `dev`, `staging`, `prod`) +- `-` - Team and environment (e.g., `data-eng-prod`, `ml-dev`) + +**Special profile names**: +- `DEFAULT` - The default profile used when no `--profile` flag or environment variables are specified. Only create this profile if the user explicitly requests it. + +### Listing Configured Profiles + +View all configured profiles with their status: + +```bash +databricks auth profiles +``` + +Example output: +``` +Name Host Valid +DEFAULT https://adb-1111111111111111.10.azuredatabricks.net YES +staging https://company-workspace.cloud.databricks.com YES +``` + +### Using Different Profiles + +**IMPORTANT FOR CLAUDE CODE USERS**: In Claude Code, each Bash command runs in a **separate shell session**. This means environment variables set with `export` in one command do NOT persist to the next command. See the Claude Code-specific guidance below. + +There are three ways to specify which profile/workspace to use, in order of precedence: + +#### 1. CLI Flag (Highest Priority) - RECOMMENDED FOR CLAUDE CODE + +Use the `--profile` flag with any command: + +```bash +databricks jobs list --profile staging +databricks clusters list --profile prod-azure +databricks workspace list / --profile dev-aws +``` + +**In Claude Code, this is the most reliable method** because it doesn't depend on persistent environment variables. + +#### 2. Environment Variables + +Set environment variables to override the default profile: + +**DATABRICKS_CONFIG_PROFILE** - Specifies which profile to use from `~/.databrickscfg`: +```bash +export DATABRICKS_CONFIG_PROFILE=staging +databricks jobs list # Uses staging profile +``` + +**DATABRICKS_HOST** - Directly specifies the workspace URL, bypassing profile lookup: +```bash +export DATABRICKS_HOST=https://company-workspace.cloud.databricks.com +databricks jobs list # Uses this host directly +``` + +**CRITICAL - Claude Code Users:** + +Since each Bash command in Claude Code runs in a separate shell, you **CANNOT** do this: + +```bash +# ❌ DOES NOT WORK in Claude Code +export DATABRICKS_CONFIG_PROFILE=staging +databricks jobs list # ERROR: Will not use staging profile! +``` + +Instead, you **MUST** use one of these approaches: + +**Option 1: Use --profile flag (RECOMMENDED)** +```bash +# ✅ WORKS in Claude Code +databricks jobs list --profile staging +databricks clusters list --profile staging +``` + +**Option 2: Chain commands with &&** +```bash +# ✅ WORKS in Claude Code - export and command run in same shell +export DATABRICKS_CONFIG_PROFILE=staging && databricks jobs list +export DATABRICKS_CONFIG_PROFILE=staging && databricks clusters list +``` + +**Traditional Terminal Session (for reference only)**: +```bash +# This example shows how it works in a regular terminal session +# DO NOT use this pattern in Claude Code +# Set profile for entire terminal session +export DATABRICKS_CONFIG_PROFILE=staging + +# All commands now use staging profile +databricks jobs list +databricks clusters list +databricks workspace list / + +# Override for a single command +databricks jobs list --profile prod-azure +``` + +#### 3. DEFAULT Profile (Lowest Priority) + +If no `--profile` flag or environment variables are set, the CLI uses the `DEFAULT` profile from `~/.databrickscfg`. + +### Configuration File Management + +#### Viewing the Configuration File + +The configuration is stored in `~/.databrickscfg`: + +```bash +cat ~/.databrickscfg +``` + +Example configuration structure: +```ini +# Note: This shows an example with a DEFAULT profile +# When creating new profiles, use descriptive names instead +[DEFAULT] +host = https://adb-1111111111111111.10.azuredatabricks.net +auth_type = databricks-cli + +[staging] +host = https://company-workspace.cloud.databricks.com +auth_type = databricks-cli +``` + +#### Editing Profiles + +You can manually edit `~/.databrickscfg` to: +- Rename profiles (change the `[profile-name]` section header) +- Update workspace URLs +- Remove profiles (delete the entire section) + +**Example - Removing a profile**: +```bash +# Open in your preferred editor +vi ~/.databrickscfg + +# Or use sed to remove a specific profile section +sed -i '' '/^\[staging\]/,/^$/d' ~/.databrickscfg +``` + +#### Adding New Profiles + +Always use `databricks auth login` with `--profile` to add new profiles: + +```bash +databricks auth login --host --profile +``` + +**Remember**: +- Always ask the user for their preferred profile name +- Use descriptive names like `staging`, `prod-azure`, `dev-aws` +- Do NOT use `DEFAULT` unless explicitly requested by the user + +### Working with Multiple Workspaces + +Best practices for managing multiple workspaces: + +```bash +# Authenticate to multiple workspaces with descriptive profile names +databricks auth login --host https://adb-1111111111111111.10.azuredatabricks.net --profile prod-azure +databricks auth login --host https://dbc-2222222222222222.cloud.databricks.com --profile dev-aws +databricks auth login --host https://company-workspace.cloud.databricks.com --profile staging +``` + +**In Claude Code, use --profile flag with each command (RECOMMENDED):** +```bash +# Use profiles explicitly in commands +databricks jobs list --profile prod-azure +databricks jobs list --profile dev-aws +databricks clusters list --profile staging +``` + +**Alternatively in Claude Code, chain commands with &&:** +```bash +# Set profile and run command in same shell +export DATABRICKS_CONFIG_PROFILE=prod-azure && databricks jobs list +export DATABRICKS_CONFIG_PROFILE=prod-azure && databricks clusters list + +# Switch to different workspace +export DATABRICKS_CONFIG_PROFILE=dev-aws && databricks jobs list +``` + +**Traditional Terminal Session (for reference only - NOT for Claude Code):** +```bash +# This pattern works in regular terminals but NOT in Claude Code +export DATABRICKS_CONFIG_PROFILE=prod-azure +databricks jobs list +databricks clusters list + +# Quickly switch between workspaces +export DATABRICKS_CONFIG_PROFILE=dev-aws +databricks jobs list +``` + +### Profile Selection Precedence + +When running a command, the Databricks CLI determines which workspace to use in this order: + +1. **`--profile` flag** (if specified) → Highest priority +2. **`DATABRICKS_HOST` environment variable** (if set) → Overrides profile +3. **`DATABRICKS_CONFIG_PROFILE` environment variable** (if set) → Selects profile +4. **`DEFAULT` profile** in `~/.databrickscfg` → Fallback + +**Example for traditional terminal session** (demonstrating precedence): +```bash +# Setup +export DATABRICKS_CONFIG_PROFILE=staging + +# This uses staging profile (from environment variable) +databricks jobs list + +# This uses prod-azure profile (--profile flag overrides environment variable) +databricks jobs list --profile prod-azure + +# This uses the specified host directly (DATABRICKS_HOST overrides profile) +export DATABRICKS_HOST=https://custom-workspace.cloud.databricks.com +databricks jobs list # Uses custom-workspace.cloud.databricks.com +``` + +**Claude Code version** (with chained commands): +```bash +# Using environment variable with && chaining +export DATABRICKS_CONFIG_PROFILE=staging && databricks jobs list + +# Using --profile flag (overrides environment variable) +export DATABRICKS_CONFIG_PROFILE=staging && databricks jobs list --profile prod-azure + +# Using DATABRICKS_HOST (overrides profile) +export DATABRICKS_HOST=https://custom-workspace.cloud.databricks.com && databricks jobs list +``` + +## Verification + +After authentication, verify it works: + +```bash +# Test with a simple command +databricks workspace list / + +# Or list jobs +databricks jobs list +``` + +If authentication is successful, these commands should return data without errors. + +## Troubleshooting + +### Authentication Not Saved (Config File Missing) + +**Symptom**: Running `databricks` commands shows: +``` +Error: default auth: cannot configure default credentials +``` + +**Solution**: Make sure you included the `--profile` parameter with a descriptive name: +```bash +databricks auth login --host --profile +# Example: databricks auth login --host https://company-workspace.cloud.databricks.com --profile staging +``` + +### Browser Doesn't Open Automatically + +**Solution**: +1. Check the terminal output for a URL +2. Manually copy and paste the URL into your browser +3. Complete the authentication +4. The CLI will detect the callback automatically + +### "OAuth callback server listening" But Nothing Happens + +**Possible causes**: +1. Firewall blocking localhost connections +2. Port 8020 already in use +3. Browser not set as default application + +**Solution**: +1. Check if port 8020 is available: `lsof -i :8020` +2. Close any applications using that port +3. Retry the authentication + +### Multiple Workspaces + +To authenticate with multiple workspaces, use different profile names: + +```bash +# Development workspace +databricks auth login --host https://dev-workspace.databricks.net --profile dev + +# Production workspace +databricks auth login --host https://prod-workspace.databricks.net --profile prod + +# Use specific profile +databricks jobs list --profile dev +databricks jobs list --profile prod +``` + +### Re-authenticating + +If your OAuth token expires or you need to re-authenticate: + +```bash +# Re-run the login command +databricks auth login --host --profile +``` + +This will overwrite the existing profile with new credentials. + +### Debug Mode + +For troubleshooting authentication issues, use debug mode: + +```bash +databricks auth login --host --profile --debug +``` + +This shows detailed information about the OAuth flow, including: +- OAuth server endpoints +- Callback server status +- Token exchange process + +## Security Best Practices + +1. **Never commit** `~/.databrickscfg` to version control +2. **Never share** your OAuth tokens or configuration file +3. **Use separate profiles** for different environments (dev/staging/prod) +4. **Regularly rotate** credentials by re-authenticating +5. **Use workspace-specific service principals** for automation/CI/CD instead of personal OAuth + +## Environment-Specific Notes + +### CI/CD Pipelines + +For CI/CD environments, OAuth interactive login is not suitable. Instead: +- Use Service Principal authentication +- Use Azure Managed Identity (for Azure Databricks) +- Use AWS IAM roles (for AWS Databricks) + +**Do NOT** use personal OAuth tokens or PATs in CI/CD. + +### Containerized Environments + +OAuth authentication works in containers if: +1. A browser is available on the host machine +2. Port forwarding is configured for the callback server +3. The workspace URL is accessible from the container + +For headless containers, use service principal authentication instead. + +## Common Commands After Authentication + +```bash +# List workspaces +databricks workspace list / --profile + +# List jobs +databricks jobs list --profile + +# List clusters +databricks clusters list --profile + +# Get current user info +databricks current-user me --profile + +# Test connection +databricks workspace export /Users/ --format SOURCE --profile +``` + +## References + +- [Databricks CLI Authentication Documentation](https://docs.databricks.com/en/dev-tools/auth.html) +- [OAuth 2.0 with Databricks](https://docs.databricks.com/en/dev-tools/auth.html#oauth-2-0) diff --git a/.claude/skills/databricks-core/databricks-cli-install.md b/.claude/skills/databricks-core/databricks-cli-install.md new file mode 100644 index 0000000..83805fe --- /dev/null +++ b/.claude/skills/databricks-core/databricks-cli-install.md @@ -0,0 +1,178 @@ +# Databricks CLI Installation + +Install or update the Databricks CLI on macOS, Windows, or Linux using doc-validated methods (Homebrew, WinGet, curl install script, manual download, or user directory install for non-sudo environments). Includes verification and common failure recovery. + +## Sandboxed / IDE environments (Cursor, containers) + +CLI install commands often write to system directories outside the workspace (e.g. `/opt/homebrew/`, `/usr/local/bin/`) which are blocked in sandboxed environments. + +**Agent behavior**: Do not attempt to run install commands directly. Present the appropriate command to the user and ask them to run it in their own terminal. After they confirm, verify with `databricks -v`. + +For Linux/macOS containers or Cursor: prefer the **Linux manual install to user directory** method (`~/.local/bin`) — it requires no sudo and no writes outside the workspace. + +## Preconditions (always do first) +1. Determine OS and shell: + - macOS/Linux: bash/zsh + - Windows: Command Prompt / PowerShell; optionally WSL for Linux shell +2. Detect whether `databricks` is already installed: + - Run: `databricks -v` (or `databricks version`) + - If already installed with a recent version, installation is already OK. +3. Avoid the legacy Python package `databricks-cli` (PyPI). This skill installs the modern Databricks CLI binary. + +## Preferred installation paths (by OS) + +### macOS (preferred: Homebrew) +Run: +- `brew tap databricks/tap` +- `brew install databricks` + +Verify: +- `databricks -v` (or `databricks version`) + +If macOS blocks the binary (Gatekeeper), follow Apple’s “open app from unidentified developer” flow. + +#### macOS fallback: curl installer +Run: +- `curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh` + +Notes: +- If `/usr/local/bin` is not writable, re-run with `sudo`. +- Installs to `/usr/local/bin/databricks`. + +Verify: +- `databricks -v` + +### Linux (preferred: Homebrew if available) +Run: +- `brew tap databricks/tap` +- `brew install databricks` + +Verify: +- `databricks -v` + +#### Linux fallback: curl installer +Run: +- `curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh` + +Notes: +- If `/usr/local/bin` is not writable, re-run with `sudo`. +- Installs to `/usr/local/bin/databricks`. + +Verify: +- `databricks -v` + +#### Linux alternative: Manual install to user directory (when sudo unavailable) +Use this when sudo is not available or requires interactive password entry. + +Steps: +1. Detect architecture: + - `uname -m` (e.g., `x86_64`, `aarch64`) +2. Get the latest download URL using GitHub API: + ```bash + curl -s https://api.github.com/repos/databricks/cli/releases/latest | grep "browser_download_url.*linux.*$(uname -m | sed 's/x86_64/amd64/' | sed 's/aarch64/arm64/')" | head -1 | cut -d '"' -f 4 + ``` +3. Download and install to `~/.local/bin`: + ```bash + mkdir -p ~/.local/bin + cd ~/.local/bin + curl -L "" -o databricks.tar.gz + tar -xzf databricks.tar.gz + rm databricks.tar.gz + chmod +x databricks + ``` +4. Add to PATH (add to `~/.bashrc` or `~/.zshrc` for persistence): + ```bash + export PATH="$HOME/.local/bin:$PATH" + ``` +5. Verify: + - `databricks -v` + +Notes: +- The download files are `.tar.gz` archives (not `.zip`) with naming pattern: `databricks_cli__linux_.tar.gz` +- Common architectures: `amd64` (x86_64), `arm64` (aarch64) +- This method works in containerized environments and sandboxed IDEs (e.g. Cursor) without sudo access + +### Windows (preferred: WinGet) +Run in Command Prompt (then restart the terminal session): +- `winget search databricks` +- `winget install Databricks.DatabricksCLI` + +Verify: +- `databricks -v` + +#### Windows alternative: Chocolatey (Experimental) +Run: +- `choco install databricks-cli` + +Verify: +- `databricks -v` + +#### Windows fallback: curl installer (recommended via WSL) +Databricks recommends WSL for the curl-based install path. +Requirements: +- WSL available +- `unzip` installed in the environment where you run the installer + +Run (in WSL bash): +- `curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh` + +Verify (in same environment): +- `databricks -v` + +If you must run curl install outside WSL, run as Administrator. +Installs to `C:\Windows\databricks.exe`. + +## Manual install (all OSes): download from GitHub releases +Use this when package managers or curl install are not possible. + +Steps: +1. Get the latest release download URL: + - Visit https://github.com/databricks/cli/releases/latest + - OR use GitHub API: `curl -s https://api.github.com/repos/databricks/cli/releases/latest | grep browser_download_url` +2. Download the appropriate file for your OS and architecture: + - Linux: `databricks_cli__linux_.tar.gz` (use tar -xzf) + - macOS: `databricks_cli__darwin_.zip` (use unzip) + - Windows: `databricks_cli__windows_.zip` (use native extraction) + - Common architectures: `amd64` (x86_64), `arm64` (aarch64/Apple Silicon) +3. Extract the archive. +4. Ensure the extracted `databricks` executable is on PATH, or run it from its folder. +5. Verify with `databricks -v`. + +## Update / repair procedures + +### Homebrew update (macOS/Linux) +- `brew upgrade databricks` +- `databricks -v` + +### WinGet update (Windows) +- `winget upgrade Databricks.DatabricksCLI` +- `databricks -v` + +### curl update (all OSes) +1. Delete existing binary: + - macOS/Linux: `/usr/local/bin/databricks` + - Windows: `C:\Windows\databricks.exe` +2. Re-run: + - `curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh` +3. Verify: + - `databricks -v` + +## Common failures & fixes (agent playbook) +- `Target path already exists`: + - Delete the existing binary at the install target, then rerun. +- Permission error writing `/usr/local/bin`: + - Re-run curl installer with `sudo` (macOS/Linux). + - If sudo requires interactive password, use manual install to `~/.local/bin` instead. +- `sudo: a terminal is required to read the password`: + - Cannot use sudo in non-interactive environments (containers, CI/CD). + - Use manual install to `~/.local/bin` method instead (see "Linux alternative" section). +- Windows PATH not updated after WinGet: + - Restart Command Prompt/PowerShell. +- Multiple `databricks` binaries on PATH: + - Use `which databricks` (macOS/Linux/WSL) or `where databricks` (Windows) and remove the wrong one. +- Wrong file type (trying to unzip a tar.gz): + - Linux releases are `.tar.gz` files, use `tar -xzf` not `unzip`. + - macOS and Windows releases are `.zip` files, use appropriate extraction tool. +- `databricks: command not found` after installation to `~/.local/bin`: + - Add to PATH: `export PATH="$HOME/.local/bin:$PATH"` + - For persistence, add the export command to `~/.bashrc` or `~/.zshrc`.