diff --git a/.github/workflows/ci-cli.yml b/.github/workflows/ci-cli.yml index b8a5c9b..5c92a55 100644 --- a/.github/workflows/ci-cli.yml +++ b/.github/workflows/ci-cli.yml @@ -2,12 +2,12 @@ name: "CI: CLI" on: push: - branches: [main] + branches: [main, develop] paths: - "cli/**" - ".github/workflows/ci-cli.yml" pull_request: - branches: [main] + branches: [main, develop] paths: - "cli/**" - ".github/workflows/ci-cli.yml" diff --git a/.github/workflows/ci-server.yml b/.github/workflows/ci-server.yml index 10c1466..0e16f49 100644 --- a/.github/workflows/ci-server.yml +++ b/.github/workflows/ci-server.yml @@ -2,12 +2,12 @@ name: "CI: Server" on: push: - branches: [main] + branches: [main, develop] paths: - "server/**" - ".github/workflows/ci-server.yml" pull_request: - branches: [main] + branches: [main, develop] paths: - "server/**" - ".github/workflows/ci-server.yml" diff --git a/.github/workflows/prerelease-server.yml b/.github/workflows/prerelease-server.yml new file mode 100644 index 0000000..5a7553b --- /dev/null +++ b/.github/workflows/prerelease-server.yml @@ -0,0 +1,54 @@ +name: "Pre-release: Server CUDA (develop)" + +# Triggered on push to `develop` (i.e. PR merges; direct pushes are +# blocked by branch protection). Builds the CUDA-only image and pushes +# it to Docker Hub as the floating tag `:develop-cu128`, so the prod +# RTX 3090 box can stage a pre-release without waiting for a release +# tag. +# +# CPU image is intentionally skipped here — it's only built on real +# `server/v*` release tags. +on: + push: + branches: [develop] + paths: + - "server/**" + - "doc/openapi.yaml" + - ".github/workflows/prerelease-server.yml" + +permissions: + contents: read + +jobs: + docker-cuda: + name: Build CUDA image (amd64, develop) + runs-on: ubuntu-latest + steps: + - name: Checkout + uses: actions/checkout@v4 + + - name: Set up Buildx + uses: docker/setup-buildx-action@v3 + + - name: Login to Docker Hub + uses: docker/login-action@v3 + with: + username: ${{ secrets.DOCKER_USERNAME }} + password: ${{ secrets.DOCKER_PASSWORD }} + + - name: Build and push + uses: docker/build-push-action@v6 + with: + context: server + file: server/Dockerfile.cuda + platforms: linux/amd64 + push: true + provenance: mode=max + sbom: true + build-args: VERSION=develop-${{ github.sha }} + # `openapi=doc` mounts the repo-root doc/ folder so the dashboard + # build stage can `COPY --from=openapi openapi.yaml` without us + # widening the primary build context (which is `server/`). + build-contexts: | + openapi=doc + tags: dvcdsys/code-index:develop-cu128 diff --git a/README.md b/README.md index ea07357..2c7c7a2 100644 --- a/README.md +++ b/README.md @@ -47,6 +47,7 @@ Grep and fuzzy file search work fine for small projects. At scale they break dow - **Web dashboard** at `/dashboard` — projects, semantic search, user + API-key management, runtime sidecar control, drift indicator. Embedded directly into the server binary. - **`cix` CLI** — drop-in `cix search`/`cix symbols`/`cix files` commands for terminal + agent use. - **File watcher** — `cix watch` keeps the index fresh as you edit, no manual reindex. +- **Workspaces** *(experimental)* — group multiple repositories into a single named workspace; clone GitHub repos server-side via stored PAT, then run hybrid BM25 + dense search across all of them. See [`workspaces.md`](workspaces.md). - **OpenAPI as source of truth** — Go server interface + TypeScript dashboard types are generated from `doc/openapi.yaml`. Swagger UI at `/docs`. --- @@ -248,8 +249,10 @@ The dashboard ships embedded in the server binary at `/dashboard`. No extra serv |------|----------|--------------| | **Home** | everyone | Live status strip (server version, current embedding model, sidecar Ready/Loading) + module shortcuts | | **Projects** | everyone | List indexed projects, view stats (file count, languages, symbols, vector count, sqlite/chroma sizes), copy reindex commands. Cards turn **red with "Stale model"** badge when the runtime embedding model differs from the model the project was indexed with — see [Drift indicator](#drift-indicator). | +| **Workspaces** *(experimental)* | everyone | Group multiple repositories into a named workspace and search them as one corpus. Add GitHub repos by URL + branch — the server clones them under its data dir, indexes them with the same pipeline as local projects, and tracks status (`cloning` / `indexing` / `indexed` / `failed`). Run hybrid BM25 + dense search across the whole group from a two-stage search dialog. See [`workspaces.md`](workspaces.md). | | **Search** | everyone | Five modes: semantic, symbols, references, definitions, files. Same engine the CLI uses. | | **API Keys** | everyone | Mint long-lived `cix_*` keys (256-bit entropy, GitHub-class), copy them once, revoke at any time. | +| **GitHub Tokens** *(experimental)* | everyone | Store personal access tokens used by workspaces to clone private repositories and (optionally) auto-register push webhooks for incremental reindexing. Tokens are AES-256-GCM encrypted at rest; the plaintext is returned exactly once on creation and never again. Pair this with the **Workspaces** page to onboard private repos without pasting the PAT every time. | | **Users** | admin | Invite teammates, set role (admin/viewer), reset password (forces change on next login), disable account. | | **Settings** | everyone | Theme, default editor, change own password. | | **Server** | admin | Runtime config — embedding model, `n_ctx`, `n_gpu_layers`, `n_threads`, batch size, queue concurrency. **Save & Restart** drains in-flight embeddings, restarts the sidecar, polls until ready. Source pill on each field shows whether the live value comes from the DB override, env bootstrap, or the recommended fallback. | @@ -274,6 +277,36 @@ After running the reindex, the drift signal clears automatically. Set `CIX_EMBEDDINGS_ENABLED=false` to bring the server up without the llama-server sidecar — auth, dashboard, project metadata, and symbol/file searches all keep working; only semantic search and indexing are disabled. The Server page renders a warning banner and disables the relevant inputs. +### Workspaces and external repositories *(experimental)* + +The **Workspaces** page lets you group several repositories into one +named workspace and search them as a single corpus — useful for tasks +that span microservices, infra-as-code, API specs, and the like. Unlike +`cix init` (which indexes the project you're `cd`'d into), workspaces +track **external repositories that the server itself clones**. + +You add a repo by GitHub URL + branch; the server clones it under its +data directory (default `/repos//`), indexes it with +the standard pipeline (tree-sitter chunking → CodeRankEmbed embeddings +→ chromem + FTS5), and tracks the lifecycle via a per-repo `status` +field (`pending` → `cloning` → `indexing` → `indexed` / `failed`). +Existing local projects can also be **linked** into a workspace without +re-cloning. + +Private repos and webhook auto-registration go through the **GitHub +Tokens** page. Tokens are AES-256-GCM encrypted at rest, scoped per +entry, and never exposed back to clients after creation. With +`webhook_mode=auto` the server registers a push webhook on the +upstream repo and re-indexes automatically on every push to the +tracked branch. + +Workspaces are gated by `CIX_WORKSPACES_ENABLED=true` and are still +experimental — defaults, search-algorithm tuning, and the UI shape are +all evolving. See [`workspaces.md`](workspaces.md) for: enabling the +feature, end-to-end setup, the search algorithm and its tunables, +webhook modes, REST API reference, and a candid strengths/weaknesses +section based on the calibration eval. + --- ## CLI Reference @@ -356,6 +389,19 @@ Install the bundled skill so Claude knows to use `cix` automatically: cp -r skills/cix ~/.claude/skills/cix ``` +For multi-repo work via the experimental **workspaces** feature, the +`cix-workspace` skill teaches the agent the cross-project workflow and +ships a dedicated `cix-workspace-investigator` sub-agent for parallel +per-repo fan-out: + +```bash +cp -r skills/cix-workspace ~/.claude/skills/cix-workspace +mkdir -p ~/.claude/agents +cp skills/cix-workspace/agents/cix-workspace-investigator.md ~/.claude/agents/ +``` + +Invoke with `/cix-workspace `. See [`workspaces.md`](workspaces.md#agent-integration) for the agent contract and behavior rules. + Then in any Claude Code session, invoke the skill **paired with the actual engineering task** — not a search query. The pattern is `/cix `: ``` @@ -488,7 +534,7 @@ Use `.cixignore` when you want to exclude files from the index that are **not** ```gitignore # .cixignore api/generated/ -generated/ +vendor/ *.pb.go testdata/fixtures/ ``` diff --git a/cli/cmd/workspace.go b/cli/cmd/workspace.go new file mode 100644 index 0000000..bacde5d --- /dev/null +++ b/cli/cmd/workspace.go @@ -0,0 +1,363 @@ +package cmd + +import ( + "encoding/json" + "errors" + "fmt" + "os" + "strings" + + "github.com/anthropics/code-index/cli/internal/client" + "github.com/spf13/cobra" +) + +// workspaceCmd routes every workspace-scoped CLI verb. The user-facing +// argument grammar is name-first: +// +// cix ws → list workspaces (default) +// cix ws list → list workspaces (alternate) +// cix ws → describe workspace (list repos + status) +// cix ws list → list repos in the workspace +// cix ws repos → list repos (alias) +// cix ws describe → describe (same as `cix ws `) +// cix ws search → two-stage workspace search +// +// We deliberately roll the dispatch by hand instead of using cobra +// subcommands so the workspace NAME can sit in the first positional +// slot — cobra can't recognise a dynamic value (workspace name) as a +// command name. The trade-off is no auto-completion on ``; in +// exchange the surface reads the way operators think about workspaces. +var workspaceCmd = &cobra.Command{ + Use: "workspace [name] [verb] [args...]", + Aliases: []string{"ws"}, + Short: "Cross-project semantic search via workspaces", + Long: `Workspaces group GitHub repositories for cross-project semantic search. + +Argument grammar — name-first: + + cix ws list workspaces visible to me + cix ws list list workspaces (alternate form) + cix ws describe a workspace (repos + status) + cix ws list list repos in + cix ws repos same as list + cix ws search two-stage semantic search in + +Examples: + cix ws + cix ws platform + cix ws platform list + cix ws platform search "JWT validation" + cix ws platform search "rate limiting" --top-projects 8 --top-chunks 30 --json + +Workspace identifiers accept the opaque id OR the (case-insensitive) +name. Repository attachment, GitHub token management, and the +detailed dashboard view all live at /dashboard on the cix-server.`, + Args: cobra.ArbitraryArgs, + RunE: runWorkspace, +} + +var ( + wsJSON bool + wsVerbose bool + wsSearchTopProjects int + wsSearchTopChunks int +) + +func init() { + rootCmd.AddCommand(workspaceCmd) + // Flags live on the parent — applies to every verb. `cobra` parses + // flags before our manual routing runs, so `cix ws platform search + // "..." --json` works regardless of where the user puts the flag. + workspaceCmd.Flags().BoolVar(&wsJSON, "json", false, "Emit raw JSON instead of formatted output") + workspaceCmd.Flags().BoolVarP(&wsVerbose, "verbose", "v", false, "Show extra columns on list / describe") + workspaceCmd.Flags().IntVar(&wsSearchTopProjects, "top-projects", 10, "Search: top-N projects in the projects panel (1-50)") + workspaceCmd.Flags().IntVar(&wsSearchTopChunks, "top-chunks", 20, "Search: top-K chunks returned overall (1-200)") +} + +func runWorkspace(cmd *cobra.Command, args []string) error { + cli, err := getClient() + if err != nil { + return err + } + + switch { + case len(args) == 0: + return cmdListWorkspaces(cli) + case len(args) == 1 && strings.EqualFold(args[0], "list"): + return cmdListWorkspaces(cli) + case len(args) == 1: + // `cix ws ` — describe. + return cmdDescribeWorkspace(cli, args[0]) + } + + // 2+ args. First is the workspace name, second the verb. + name := args[0] + verb := strings.ToLower(args[1]) + rest := args[2:] + + switch verb { + case "list", "repos": + if len(rest) > 0 { + return fmt.Errorf("%q takes no extra arguments", verb) + } + return cmdListRepos(cli, name) + case "describe": + if len(rest) > 0 { + return fmt.Errorf("describe takes no extra arguments") + } + return cmdDescribeWorkspace(cli, name) + case "search": + if len(rest) == 0 { + return errors.New("search needs a query string (cix ws search \"\")") + } + query := strings.Join(rest, " ") + return cmdWorkspaceSearch(cli, name, query) + default: + return fmt.Errorf("unknown verb %q — use one of: list, repos, describe, search", verb) + } +} + +// --------------------------------------------------------------------------- +// `cix ws list` +// --------------------------------------------------------------------------- + +func cmdListWorkspaces(cli *client.Client) error { + resp, err := cli.ListWorkspaces() + if err != nil { + return err + } + if wsJSON { + return emitJSON(resp) + } + if resp.Total == 0 { + fmt.Fprintln(os.Stderr, "no workspaces — create one at /dashboard/workspaces") + return nil + } + for _, w := range resp.Workspaces { + line := w.ID + " " + w.Name + if w.Description != "" { + line += " — " + w.Description + } + fmt.Println(line) + if wsVerbose { + // In verbose mode we follow each workspace with its repo + // count + indexed status. Two extra HTTP calls per + // workspace; acceptable at typical scale (<10 workspaces). + if reposResp, rerr := cli.ListWorkspaceRepos(w.ID); rerr == nil { + indexed := 0 + for _, r := range reposResp.Repos { + if r.Status == "indexed" { + indexed++ + } + } + fmt.Printf(" %d repos (%d indexed)\n", reposResp.Total, indexed) + } + } + } + return nil +} + +// --------------------------------------------------------------------------- +// `cix ws list` / ` repos` +// --------------------------------------------------------------------------- + +func cmdListRepos(cli *client.Client, identifier string) error { + id, err := resolveWorkspaceID(cli, identifier) + if err != nil { + return err + } + resp, err := cli.ListWorkspaceRepos(id) + if err != nil { + return err + } + if wsJSON { + return emitJSON(resp) + } + if resp.Total == 0 { + fmt.Fprintln(os.Stderr, "no repos attached — add one at /dashboard/workspaces") + return nil + } + for _, r := range resp.Repos { + statusBadge := r.Status + switch r.Status { + case "indexed": + statusBadge = "✓ indexed" + case "failed": + statusBadge = "✗ failed" + case "cloning", "indexing", "pending": + statusBadge = "… " + r.Status + } + fmt.Printf("%s %s@%s\n", statusBadge, r.GitHubURL, r.Branch) + if wsVerbose { + fmt.Printf(" project: %s\n", r.ProjectPath) + if r.LastIndexedAt != nil { + fmt.Printf(" last indexed: %s\n", *r.LastIndexedAt) + } + if r.LastError != nil && *r.LastError != "" { + fmt.Printf(" last error: %s\n", *r.LastError) + } + } + } + return nil +} + +// --------------------------------------------------------------------------- +// `cix ws ` / ` describe` +// --------------------------------------------------------------------------- + +func cmdDescribeWorkspace(cli *client.Client, identifier string) error { + list, err := cli.ListWorkspaces() + if err != nil { + return err + } + var ws *client.Workspace + for i := range list.Workspaces { + w := &list.Workspaces[i] + if w.ID == identifier || strings.EqualFold(w.Name, identifier) { + ws = w + break + } + } + if ws == nil { + return fmt.Errorf("workspace %q not found (run `cix ws list`)", identifier) + } + reposResp, err := cli.ListWorkspaceRepos(ws.ID) + if err != nil { + return err + } + + if wsJSON { + return emitJSON(map[string]any{ + "workspace": ws, + "repos": reposResp.Repos, + "total": reposResp.Total, + }) + } + + fmt.Printf("Workspace: %s\n", ws.Name) + fmt.Printf(" id: %s\n", ws.ID) + if ws.Description != "" { + fmt.Printf(" description: %s\n", ws.Description) + } + indexed := 0 + for _, r := range reposResp.Repos { + if r.Status == "indexed" { + indexed++ + } + } + fmt.Printf(" repos: %d (%d indexed)\n", reposResp.Total, indexed) + if reposResp.Total == 0 { + fmt.Fprintln(os.Stderr, "\n (no repos attached — add at /dashboard/workspaces)") + return nil + } + fmt.Println() + for _, r := range reposResp.Repos { + statusBadge := r.Status + switch r.Status { + case "indexed": + statusBadge = "✓" + case "failed": + statusBadge = "✗" + default: + statusBadge = "…" + } + fmt.Printf(" %s %s@%s\n", statusBadge, r.GitHubURL, r.Branch) + fmt.Printf(" project: %s\n", r.ProjectPath) + if r.LastIndexedAt != nil { + fmt.Printf(" last indexed: %s\n", *r.LastIndexedAt) + } + if r.LastError != nil && *r.LastError != "" { + fmt.Printf(" last error: %s\n", *r.LastError) + } + } + return nil +} + +// --------------------------------------------------------------------------- +// `cix ws search ` +// --------------------------------------------------------------------------- + +func cmdWorkspaceSearch(cli *client.Client, identifier, query string) error { + id, err := resolveWorkspaceID(cli, identifier) + if err != nil { + return err + } + resp, err := cli.WorkspaceSearch(id, query, wsSearchTopProjects, wsSearchTopChunks) + if err != nil { + return err + } + if wsJSON { + return emitJSON(resp) + } + return renderSearch(resp) +} + +// resolveWorkspaceID maps a user-typed identifier (id or name) to the +// canonical opaque id used by the API. One ListWorkspaces call regardless +// — keeps the surface uniform across `list`, `describe`, `search`. +func resolveWorkspaceID(cli *client.Client, identifier string) (string, error) { + list, err := cli.ListWorkspaces() + if err != nil { + return "", err + } + for i := range list.Workspaces { + w := &list.Workspaces[i] + if w.ID == identifier || strings.EqualFold(w.Name, identifier) { + return w.ID, nil + } + } + return "", fmt.Errorf("workspace %q not found (run `cix ws list`)", identifier) +} + +func renderSearch(resp *client.WorkspaceSearchResponse) error { + switch resp.Status { + case "empty": + fmt.Fprintln(os.Stderr, "no chunks matched the query") + return nil + case "partial_failure": + fmt.Fprintln(os.Stderr, "at least one repo errored — results below are incomplete; check server logs") + } + + if len(resp.StaleFTSRepos) > 0 { + fmt.Fprintf(os.Stderr, + "warning: %d repo(s) were indexed before BM25 was enabled; hybrid degrades to dense-only for them.\n"+ + " reindex to fix: ", len(resp.StaleFTSRepos)) + paths := make([]string, len(resp.StaleFTSRepos)) + for i, s := range resp.StaleFTSRepos { + paths[i] = s.ProjectPath + } + fmt.Fprintln(os.Stderr, strings.Join(paths, ", ")) + fmt.Fprintln(os.Stderr) + } + + if len(resp.Projects) > 0 { + fmt.Println("Top projects:") + for _, p := range resp.Projects { + label := p.Label + if label == "" { + label = p.ProjectPath + } + fmt.Printf(" [%.3f] %s — %d hits · bm25 %.3f · dense %.3f · %s\n", + p.ProjectScore, label, p.NumHits, p.BM25Score, p.DenseScore, p.ProjectPath) + } + fmt.Println() + } + fmt.Println("Top chunks:") + for _, c := range resp.Chunks { + head := fmt.Sprintf("%s:%d-%d", c.FilePath, c.StartLine, c.EndLine) + fmt.Printf(" [%.3f] %s\n", c.Score, head) + fmt.Printf(" project: %s\n", c.ProjectPath) + if c.SymbolName != "" { + fmt.Printf(" symbol: %s\n", c.SymbolName) + } + fmt.Println() + } + return nil +} + +// emitJSON writes a Go value as indented JSON to stdout. +func emitJSON(v any) error { + enc := json.NewEncoder(os.Stdout) + enc.SetIndent("", " ") + return enc.Encode(v) +} diff --git a/cli/internal/client/workspace.go b/cli/internal/client/workspace.go new file mode 100644 index 0000000..a3be475 --- /dev/null +++ b/cli/internal/client/workspace.go @@ -0,0 +1,136 @@ +package client + +import ( + "fmt" + "net/url" +) + +// WorkspaceSearchProject mirrors the OpenAPI WorkspaceSearchProject +// schema — one entry per surviving project in the hybrid candidacy +// ranking. +type WorkspaceSearchProject struct { + ProjectPath string `json:"project_path"` + Label string `json:"label"` + ProjectScore float32 `json:"project_score"` + NumHits int `json:"num_hits"` + BM25Score float32 `json:"bm25_score"` + DenseScore float32 `json:"dense_score"` +} + +// WorkspaceSearchChunk mirrors WorkspaceSearchChunk. +type WorkspaceSearchChunk struct { + ProjectPath string `json:"project_path"` + FilePath string `json:"file_path"` + StartLine int `json:"start_line"` + EndLine int `json:"end_line"` + SymbolName string `json:"symbol_name,omitempty"` + Language string `json:"language,omitempty"` + Score float32 `json:"score"` + Content string `json:"content"` +} + +// WorkspaceSearchStaleFTSRepo names a repo whose BM25 index hasn't +// been backfilled yet (indexed before chunks_fts existed); hybrid +// degrades to dense-only for that entry until reindex. +type WorkspaceSearchStaleFTSRepo struct { + ProjectPath string `json:"project_path"` +} + +// WorkspaceSearchResponse mirrors WorkspaceSearchResponse. +type WorkspaceSearchResponse struct { + Status string `json:"status"` + Projects []WorkspaceSearchProject `json:"projects"` + Chunks []WorkspaceSearchChunk `json:"chunks"` + StaleFTSRepos []WorkspaceSearchStaleFTSRepo `json:"stale_fts_repos,omitempty"` +} + +// Workspace is the metadata projection of a workspace row. +type Workspace struct { + ID string `json:"id"` + Name string `json:"name"` + Description string `json:"description"` +} + +// WorkspaceListResponse is the GET /workspaces shape. +type WorkspaceListResponse struct { + Workspaces []Workspace `json:"workspaces"` + Total int `json:"total"` +} + +// WorkspaceRepo mirrors the server's WorkspaceRepo payload — every +// field the dashboard or `cix ws list` would display. +type WorkspaceRepo struct { + ID string `json:"id"` + WorkspaceID string `json:"workspace_id"` + GitHubURL string `json:"github_url"` + Branch string `json:"branch"` + ProjectPath string `json:"project_path"` + TokenID *string `json:"token_id,omitempty"` + AutoWebhook bool `json:"auto_webhook"` + Status string `json:"status"` + LastSHA *string `json:"last_sha,omitempty"` + LastError *string `json:"last_error,omitempty"` + LastIndexedAt *string `json:"last_indexed_at,omitempty"` + CreatedAt string `json:"created_at"` + UpdatedAt string `json:"updated_at"` +} + +// WorkspaceRepoListResponse is the GET /workspaces/{id}/repos shape. +type WorkspaceRepoListResponse struct { + Repos []WorkspaceRepo `json:"repos"` + Total int `json:"total"` +} + +// ListWorkspaces — GET /api/v1/workspaces. Returns +// ServiceUnavailable as a typed error so callers can render a hint when +// CIX_WORKSPACES_ENABLED is off on the server side. +func (c *Client) ListWorkspaces() (*WorkspaceListResponse, error) { + resp, err := c.do("GET", "/api/v1/workspaces", nil) + if err != nil { + return nil, err + } + var out WorkspaceListResponse + if err := parseResponse(resp, &out); err != nil { + return nil, err + } + return &out, nil +} + +// ListWorkspaceRepos — GET /api/v1/workspaces/{id}/repos. Returns +// every attached repo with its current status (pending / cloning / +// indexing / indexed / failed) so the CLI can render a readable +// per-repo summary. +func (c *Client) ListWorkspaceRepos(workspaceID string) (*WorkspaceRepoListResponse, error) { + resp, err := c.do("GET", "/api/v1/workspaces/"+url.PathEscape(workspaceID)+"/repos", nil) + if err != nil { + return nil, err + } + var out WorkspaceRepoListResponse + if err := parseResponse(resp, &out); err != nil { + return nil, err + } + return &out, nil +} + +// WorkspaceSearch — GET /api/v1/workspaces/{id}/search. id is the +// workspace's opaque ULID/UUID returned by ListWorkspaces. +func (c *Client) WorkspaceSearch(id, query string, topProjects, topChunks int) (*WorkspaceSearchResponse, error) { + v := url.Values{} + v.Set("q", query) + if topProjects > 0 { + v.Set("top_projects", fmt.Sprintf("%d", topProjects)) + } + if topChunks > 0 { + v.Set("top_chunks", fmt.Sprintf("%d", topChunks)) + } + path := "/api/v1/workspaces/" + url.PathEscape(id) + "/search?" + v.Encode() + resp, err := c.do("GET", path, nil) + if err != nil { + return nil, err + } + var out WorkspaceSearchResponse + if err := parseResponse(resp, &out); err != nil { + return nil, err + } + return &out, nil +} diff --git a/cli/internal/projectconfig/projectconfig_test.go b/cli/internal/projectconfig/projectconfig_test.go index 1769681..1705afd 100644 --- a/cli/internal/projectconfig/projectconfig_test.go +++ b/cli/internal/projectconfig/projectconfig_test.go @@ -91,8 +91,8 @@ func TestSubmodulePaths_Standard(t *testing.T) { writeFile(t, filepath.Join(root, ".gitmodules"), `[submodule "api/schema/acme-shared"] path = api/schema/acme-shared url = https://github.com/Example/acme-shared.git -[submodule "api/generated/acme-models"] - path = api/generated/acme-models +[submodule "api/models/acme-models"] + path = api/models/acme-models url = https://github.com/Example/acme-models.git `) @@ -102,7 +102,7 @@ func TestSubmodulePaths_Standard(t *testing.T) { } sort.Strings(paths) - expected := []string{"api/schema/acme-shared", "api/generated/acme-models"} + expected := []string{"api/models/acme-models", "api/schema/acme-shared"} if len(paths) != len(expected) { t.Fatalf("expected %v, got %v", expected, paths) diff --git a/doc/WORKSPACES.md b/doc/WORKSPACES.md new file mode 100644 index 0000000..1ae0131 --- /dev/null +++ b/doc/WORKSPACES.md @@ -0,0 +1,184 @@ +# Workspaces — operator guide + +The workspaces feature lets cix index a group of GitHub repositories +together and serve cross-project semantic search against the union. +This document covers everything an operator needs to enable, configure, +and troubleshoot the feature in production. + +> **Status (PR1–PR3).** The skeleton, clone/index pipeline, and webhook +> receiver are all in. Two-stage cross-project search is the deliverable +> of PR4–PR6 — until those merge, `workspaces` behaves like a tag over +> per-project indexes. + +## Quick start + +1. **Enable the feature flag.** Add to the cix-server environment: + ``` + CIX_WORKSPACES_ENABLED=true + CIX_SECRET_KEY= # see "Encryption" + ``` + Restart the server. Without the flag every workspaces endpoint + returns `503 service unavailable`. +2. **Open the dashboard** at `https:///dashboard` and sign in. +3. **Add a GitHub PAT** under **GitHub Tokens → Add token** if you need + to clone private repos. The plaintext value is encrypted before it + hits SQLite and is never returned in any subsequent response. +4. **Create a workspace** under **Workspaces → New workspace**. +5. **Attach a repository:** workspace detail → Add repo. Fill in URL, + branch, optional token, and choose **Auto-register webhook** if + your PAT carries `admin:repo_hook`. Otherwise check **I'll set it + up myself** and copy the displayed URL + secret into GitHub. +6. The server clones the repo into `//` + and runs the existing indexer pipeline against it. Status transitions + visible on the workspace detail page: `pending → cloning → indexing → indexed`. + +## Environment variables + +| Variable | Default | Purpose | +|---|---|---| +| `CIX_WORKSPACES_ENABLED` | `false` | Master switch for the feature. | +| `CIX_SECRET_KEY` | (auto-generate) | 32-byte AES key encoding GitHub tokens. Hex or base64. | +| `CIX_SECRET_KEYFILE` | unset | Alternative — path to a 0600-perm key file. | +| `CIX_SECRETS_DATA_DIR` | `dirname(CIX_SQLITE_PATH)` | Where the auto-generated keyfile lives. | +| `CIX_WORKSPACES_DATA_DIR` | `/repos` | Where cloned repos live. | +| `CIX_WORKER_CONCURRENCY` | `2` | Parallel job workers. Clone+index is mostly IO-bound. | +| `CIX_PUBLIC_URL` | unset | Externally-reachable URL used to build webhook delivery URLs. | + +### Encryption key resolution + +Resolution order: + +1. `CIX_SECRET_KEY` (hex or base64 32-byte value) +2. `CIX_SECRET_KEYFILE` (path; file must be `0600`) +3. `/.secret_key` — auto-generated on first run + with `CIX_WORKSPACES_ENABLED=true`. The server **refuses to start** + if `github_tokens` is non-empty and the resolved key cannot decrypt + the first row — protects against accidental key rotation that would + silently brick all tokens. + +For production, supply `CIX_SECRET_KEY` explicitly or mount a keyfile +via `CIX_SECRET_KEYFILE`. The auto-generated keyfile is a single-host +convenience for dev. + +## Webhooks + +GitHub deliveries hit `POST /api/v1/webhooks/github/`. +The endpoint is **public** in the auth sense (no Bearer/session check) +but every delivery is HMAC-SHA256-validated against the per-row +`webhook_secret`. The secret is shown exactly once on add-repo and on +**Workspaces → Repo → Webhook info**. + +Supported events: + +| Event | Behaviour | +|---|---| +| `push` (tracked branch) | Enqueues `clone_repo` job — dedupe collapses bursts. | +| `push` (other branch / delete) | 200 `{"status":"ignored"}`. | +| `ping` | 200 `{"status":"ping"}`. Use to confirm setup. | +| anything else | 200 `{"status":"ignored"}`, logged for audit. | + +### Cloudflare tunnel (recommended for self-hosted) + +Webhooks require a public URL. The simplest no-cost option is a +[Cloudflare Tunnel](https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/). +On the cix-server host: + +```bash +# One-time: install + log in +brew install cloudflared +cloudflared tunnel login + +# Create a named tunnel +cloudflared tunnel create cix + +# Route a hostname to the tunnel (replace cix.example.com with yours) +cloudflared tunnel route dns cix cix.example.com + +# Run the tunnel — replace 21847 with your CIX_PORT +cloudflared tunnel --url http://localhost:21847 run cix +``` + +Then set `CIX_PUBLIC_URL=https://cix.example.com` and restart the server. +The dashboard's add-repo dialog and the webhook-info endpoint will +generate fully-qualified URLs that GitHub can reach. + +For ad-hoc testing without DNS: + +```bash +cloudflared tunnel --url http://localhost:21847 +# prints a one-shot https://*.trycloudflare.com URL +``` + +Set `CIX_PUBLIC_URL` to whatever cloudflared prints and restart. +Single-process tunnels are torn down with the parent — not suitable for +production but perfect for the first end-to-end smoke test. + +### Manual webhook setup + +If `auto_webhook=false` (default) the dashboard surfaces the URL + secret +after add-repo. Paste them into GitHub: + +1. Repo → **Settings → Webhooks → Add webhook** +2. **Payload URL** = the value from the dashboard +3. **Content type** = `application/json` +4. **Secret** = the value from the dashboard +5. **Which events?** → **Just the push event** +6. **Active** ✓ + +GitHub will send a `ping` immediately — the cix server returns 200, and +GitHub's webhook page will mark the delivery green. + +### Auto-register + +When the PAT carries `admin:repo_hook` scope and `auto_webhook=true`, +the server calls `POST /repos/{owner}/{repo}/hooks` on your behalf +during add-repo and persists the resulting hook id (used to +de-register on delete). Failure is non-fatal — the response includes +`auto_registered: false` and an operator-facing note explaining the +specific reason (missing scope, network error, etc.). + +## Background workers + +A single in-process worker pool drains a SQLite-backed queue (`jobs` +table). Concurrency is `CIX_WORKER_CONCURRENCY` (default 2). Job types +in PR2–PR3: + +- `clone_repo` — clones (or fetches+resets on reuse) via go-git; + registers `projects` row; chains `index_repo`. +- `index_repo` — runs the existing 3-phase indexer in-process against + the clone directory; flips repo status to `indexed`. + +Future PRs add `build_call_graph` and `compute_workspace_communities`. + +### Inspecting the queue + +`GET /api/v1/jobs` lists recent jobs with optional `status=` / `type=` / +`limit=` filters. Useful for diagnosing stuck repos. + +## Troubleshooting + +- **Status stuck at `cloning`** — check `GET /jobs?status=running` and + the cix-server logs. Most common cause: PAT missing `repo` scope on + a private repo, or network not reaching github.com. +- **Status stuck at `failed`** with `last_error` set — the message + comes directly from go-git or the indexer. Common fixes: rotate the + PAT, confirm the branch name, verify the runtime model is loaded + (`GET /api/v1/admin/sidecar/status`). +- **Webhook deliveries returning 401** — the secret in GitHub doesn't + match what cix stored. Click **Webhook info** in the dashboard to + see the canonical value, paste again. Secrets rotate when the + workspace_repo is recreated. +- **Encryption key mismatch on startup** — operator-readable error in + the boot log. Recover the prior `CIX_SECRET_KEY` from your secrets + manager or wipe `github_tokens` manually before retrying. + +## What's coming (PR4 – PR7) + +- **PR4** — Intra-project call-graph extraction (`call_edges` table) + + eval harness so the rest of the pipeline can lean on it. +- **PR5** — Louvain community detection per workspace; centroid + embeddings stored in a dedicated chromem collection. +- **PR6** — Two-stage workspace search endpoint + (`GET /workspaces/{id}/search`). +- **PR7** — CLI subcommand + `cix:workspace` Claude Code skill + + dashboard polish (per-repo status panels, search UI, graph viz). diff --git a/doc/openapi.yaml b/doc/openapi.yaml index 32799c9..4990567 100644 --- a/doc/openapi.yaml +++ b/doc/openapi.yaml @@ -80,6 +80,19 @@ tags: description: Admin-only user management - name: api-keys description: Issue and revoke owner-scoped API keys for CLI/SDK use + - name: workspaces + description: | + Workspaces group GitHub repositories for cross-project semantic search. + Server-wide shared — every authenticated user can list, create, and + modify any workspace. PR1 ships CRUD only; repository attachment, + webhooks, and the two-stage search endpoint land in subsequent + releases of the workspaces feature branch. + - name: github-tokens + description: | + GitHub Personal Access Tokens used by the workspaces feature for + cloning private repos and (optionally) registering webhooks. Stored + encrypted-at-rest via AES-GCM; the plaintext is surfaced exactly + once on POST and never returned thereafter. paths: /health: @@ -707,6 +720,35 @@ paths: "500": $ref: "#/components/responses/InternalError" + /api/v1/projects/{path}/workspaces: + parameters: + - $ref: "#/components/parameters/ProjectHash" + get: + operationId: listProjectWorkspaces + tags: [projects] + summary: List workspaces that contain this project + description: | + Returns every workspace that has this project attached, owned or + linked. The project page uses this to show "Workspaces" chips + the user can click to jump to the workspace detail page. + + Empty list when the project isn't part of any workspace yet — + either it was indexed directly via /projects rather than via a + workspace_repo, or all its memberships have been detached. + responses: + "200": + description: Memberships + content: + application/json: + schema: + $ref: "#/components/schemas/ProjectWorkspaceList" + "401": + $ref: "#/components/responses/Unauthorized" + "404": + $ref: "#/components/responses/NotFound" + "500": + $ref: "#/components/responses/InternalError" + /api/v1/projects/{path}/search: parameters: - $ref: "#/components/parameters/ProjectHash" @@ -722,9 +764,12 @@ paths: then merged into per-file groups and ranked by best match score. `min_score` semantics: - - omitted → server default `0.4` (calibrated for CodeRankEmbed-Q8) + - omitted → server default `0.2` (light relevance floor that + doesn't silently drop abstract natural-language queries + whose best chunks score in [0.25, 0.35]) - explicit `0` → return everything above HNSW floor - - explicit positive → that floor + - explicit positive → that floor (use `0.4+` for strict + code-symbol searches calibrated for CodeRankEmbed-Q8) `query_time_ms` is rounded to 1 decimal place. requestBody: @@ -1090,1205 +1135,2574 @@ paths: "404": $ref: "#/components/responses/NotFound" -components: - securitySchemes: - bearerAuth: - type: http - scheme: bearer - description: "API key passed as `Authorization: Bearer `" - - parameters: - ProjectHash: - name: path - in: path - required: true + /api/v1/workspaces: + get: + operationId: listWorkspaces + tags: [workspaces] + summary: List all workspaces description: | - First 16 hex chars of `SHA1(host_path)`. See - `internal/projects.HashPath`. - schema: - type: string - pattern: "^[a-f0-9]{16}$" - example: "5b7d2c9e1a3f8042" - - responses: - Unauthorized: - description: Missing or invalid API key - content: - application/json: - schema: - $ref: "#/components/schemas/Error" - NotFound: - description: Resource not found - content: - application/json: - schema: - $ref: "#/components/schemas/Error" - Unprocessable: - description: Malformed request body or missing required fields - content: - application/json: - schema: - $ref: "#/components/schemas/Error" - InternalError: - description: Unhandled server error - content: - application/json: - schema: - $ref: "#/components/schemas/Error" - IndexerUnavailable: - description: Indexing service not configured - content: - application/json: - schema: - $ref: "#/components/schemas/Error" - Forbidden: - description: Authenticated, but lacks the required role - content: - application/json: - schema: - $ref: "#/components/schemas/Error" - Conflict: - description: Resource already exists (e.g. email taken) - content: - application/json: - schema: - $ref: "#/components/schemas/Error" + Returns every workspace in the system, newest first. Server-wide + shared visibility — the caller sees workspaces created by any user. + responses: + "200": + description: Workspace list + content: + application/json: + schema: + $ref: "#/components/schemas/WorkspaceListResponse" + "401": + $ref: "#/components/responses/Unauthorized" + "503": + $ref: "#/components/responses/WorkspacesDisabled" + post: + operationId: createWorkspace + tags: [workspaces] + summary: Create a new workspace + requestBody: + required: true + content: + application/json: + schema: + $ref: "#/components/schemas/CreateWorkspaceRequest" + responses: + "201": + description: Workspace created + content: + application/json: + schema: + $ref: "#/components/schemas/Workspace" + "401": + $ref: "#/components/responses/Unauthorized" + "409": + $ref: "#/components/responses/Conflict" + "422": + $ref: "#/components/responses/Unprocessable" + "503": + $ref: "#/components/responses/WorkspacesDisabled" - schemas: - Error: - type: object - required: [detail] - properties: - detail: + /api/v1/workspaces/{id}: + parameters: + - name: id + in: path + required: true + schema: type: string + description: Workspace ID (ULID-like string returned by createWorkspace). + get: + operationId: getWorkspace + tags: [workspaces] + summary: Get a single workspace + responses: + "200": + description: Workspace + content: + application/json: + schema: + $ref: "#/components/schemas/Workspace" + "401": + $ref: "#/components/responses/Unauthorized" + "404": + $ref: "#/components/responses/NotFound" + "503": + $ref: "#/components/responses/WorkspacesDisabled" + patch: + operationId: updateWorkspace + tags: [workspaces] + summary: Update workspace metadata + requestBody: + required: true + content: + application/json: + schema: + $ref: "#/components/schemas/UpdateWorkspaceRequest" + responses: + "200": + description: Updated workspace + content: + application/json: + schema: + $ref: "#/components/schemas/Workspace" + "401": + $ref: "#/components/responses/Unauthorized" + "404": + $ref: "#/components/responses/NotFound" + "409": + $ref: "#/components/responses/Conflict" + "422": + $ref: "#/components/responses/Unprocessable" + "503": + $ref: "#/components/responses/WorkspacesDisabled" + delete: + operationId: deleteWorkspace + tags: [workspaces] + summary: Delete a workspace + description: | + Removes the workspace row. PR1 has nothing else to cascade + (workspace_repos lands in PR2); future PRs will cascade repos, + communities, and the centroid Chroma collection. + responses: + "204": + description: Deleted + "401": + $ref: "#/components/responses/Unauthorized" + "404": + $ref: "#/components/responses/NotFound" + "503": + $ref: "#/components/responses/WorkspacesDisabled" - BootstrapStatusResponse: - type: object - required: [needs_bootstrap] - properties: - needs_bootstrap: - type: boolean - description: True when the users table is empty. - - User: - type: object - required: [id, email, role, must_change_password, created_at, updated_at, disabled] - properties: - id: - type: string - email: + /api/v1/workspaces/{id}/repos: + parameters: + - name: id + in: path + required: true + schema: type: string - format: email - role: + description: Workspace ID. + get: + operationId: listWorkspaceRepos + tags: [workspaces] + summary: List repositories attached to a workspace + responses: + "200": + description: Repo list + content: + application/json: + schema: + $ref: "#/components/schemas/WorkspaceRepoListResponse" + "401": + $ref: "#/components/responses/Unauthorized" + "404": + $ref: "#/components/responses/NotFound" + "503": + $ref: "#/components/responses/WorkspacesDisabled" + post: + operationId: addWorkspaceRepo + tags: [workspaces] + summary: Attach a GitHub repository to a workspace + description: | + Inserts a workspace_repos row in status `pending` and enqueues a + `clone_repo` background job. The clone job is followed by an + `index_repo` job on success; the dashboard polls + `/api/v1/workspaces/{id}/repos` to surface status transitions. + + Provide `token_id` to clone a private repository. The + `auto_webhook` flag is accepted in PR2 but not yet acted upon — + PR3 wires the auto-register path against the GitHub API. + + The response includes a one-shot `webhook_url` + `webhook_secret` + so an operator can manually register the webhook in GitHub if + `auto_webhook` is false. The secret is also returned by the + webhook-info endpoint added in PR3. + requestBody: + required: true + content: + application/json: + schema: + $ref: "#/components/schemas/AddWorkspaceRepoRequest" + responses: + "201": + description: Repo attached + clone enqueued + content: + application/json: + schema: + $ref: "#/components/schemas/WorkspaceRepoCreated" + "401": + $ref: "#/components/responses/Unauthorized" + "404": + $ref: "#/components/responses/NotFound" + "409": + $ref: "#/components/responses/Conflict" + "422": + $ref: "#/components/responses/Unprocessable" + "503": + $ref: "#/components/responses/WorkspacesDisabled" + + /api/v1/workspaces/{id}/repos/link: + parameters: + - name: id + in: path + required: true + schema: type: string - enum: [admin, viewer] - must_change_password: - type: boolean - created_at: + post: + operationId: linkExistingProject + tags: [workspaces] + summary: Attach an already-indexed project to a workspace + description: | + Inserts a workspace_repos row marked `is_linked=true` pointing at + an existing indexed project. No clone happens, no index job is + enqueued, no GitHub webhook is registered — the row is a + lightweight membership pointer so workspace-level features + (search, communities, the repo list) include the project. + + The project's `host_path` must be of the form + "github.com/owner/repo@branch" (i.e. created from a GitHub + source) and the project must be in `status='indexed'`. Per- + workspace uniqueness is enforced via the same composite UNIQUE + as the regular Add Repo flow — a project already in this + workspace (owned or linked) returns 409. + + Use this when the user wants to bring an existing repo from + workspace A into workspace B without paying the clone+index + cost twice. + requestBody: + required: true + content: + application/json: + schema: + $ref: "#/components/schemas/LinkExistingProjectRequest" + responses: + "201": + description: Linked + content: + application/json: + schema: + $ref: "#/components/schemas/WorkspaceRepoCreated" + "401": + $ref: "#/components/responses/Unauthorized" + "404": + $ref: "#/components/responses/NotFound" + "409": + $ref: "#/components/responses/Conflict" + "422": + $ref: "#/components/responses/Unprocessable" + "503": + $ref: "#/components/responses/WorkspacesDisabled" + + /api/v1/workspaces/{id}/repos/{repo_id}: + parameters: + - name: id + in: path + required: true + schema: type: string - format: date-time - updated_at: + - name: repo_id + in: path + required: true + schema: type: string - format: date-time + delete: + operationId: deleteWorkspaceRepo + tags: [workspaces] + summary: Detach a repository from a workspace + description: | + Removes the workspace_repos row. The cloned directory on disk and + the indexed project rows remain — a follow-up cleanup job lands + in a later release. PR3 will also de-register the GitHub webhook + when auto_webhook=true. + responses: + "204": + description: Detached + "401": + $ref: "#/components/responses/Unauthorized" + "404": + $ref: "#/components/responses/NotFound" + "503": + $ref: "#/components/responses/WorkspacesDisabled" + + /api/v1/workspaces/{id}/search: + parameters: + - name: id + in: path + required: true + schema: + type: string + get: + operationId: workspaceSearch + tags: [workspaces] + summary: Hybrid BM25+dense search across all repos in a workspace + description: | + Embeds the query, then fans out two parallel sub-queries per + project: dense (chromem cosine) and sparse (SQLite FTS5 BM25 + over chunks_fts). Per-project the two ranked lists are fused + via Reciprocal Rank Fusion (k=60). + + Across projects an α-blended candidacy score (`α × bm25_norm + + (1-α) × dense_norm` with α=0.5, both signals min-max + normalized per query) plus a relative threshold + (`candidacy ≥ best × 0.4`) drops projects that share no + semantic and no lexical overlap with the query. Pure-dense + fan-out returned the N nearest vectors regardless of + absolute distance, so workspaces routinely surfaced + irrelevant repos at noise-level cosine similarity; the BM25 + gate fixes that by requiring at least one of the two + signals to be a meaningful fraction of the best. + + The chunks list is then built by round-robin interleaving: + rank-1 from every surviving project before any rank-2, etc., + capped per-project so one dominant repo can't take every + slot. Always live — no background rebuild job, no debounce. + parameters: + - name: q + in: query + required: true + schema: + type: string + minLength: 1 + - name: top_projects + in: query + required: false + schema: + type: integer + minimum: 1 + maximum: 50 + default: 10 + - name: top_chunks + in: query + required: false + schema: + type: integer + minimum: 1 + maximum: 200 + default: 20 + - name: min_score + in: query + required: false + description: | + Floor on raw cosine similarity. Chunks below this are + dropped before aggregation. Default 0.4 — symmetric with + per-project search default so an unfiltered workspace + query doesn't return cross-repo noise that a single-repo + query would have rejected. Pass 0 explicitly for + intentional cross-project sweeps that need long-tail + recall (e.g. "authentication and authorization" across a + mixed-domain workspace). + schema: + type: number + format: float + minimum: 0 + maximum: 1 + default: 0.4 + responses: + "200": + description: Search results + content: + application/json: + schema: + $ref: "#/components/schemas/WorkspaceSearchResponse" + "401": + $ref: "#/components/responses/Unauthorized" + "404": + $ref: "#/components/responses/NotFound" + "503": + $ref: "#/components/responses/WorkspacesDisabled" + + /api/v1/workspaces/{id}/repos/{repo_id}/webhook-info: + parameters: + - name: id + in: path + required: true + schema: + type: string + - name: repo_id + in: path + required: true + schema: + type: string + get: + operationId: getWorkspaceRepoWebhookInfo + tags: [workspaces] + summary: Get the webhook URL + secret for manual GitHub setup + description: | + Returns the publicly-reachable webhook URL and the HMAC secret. + Pair this with GitHub Settings → Webhooks when `auto_webhook` is + false. The secret rotates if the workspace_repo is deleted and + re-attached. + responses: + "200": + description: Webhook coordinates + content: + application/json: + schema: + $ref: "#/components/schemas/WebhookInfoResponse" + "401": + $ref: "#/components/responses/Unauthorized" + "404": + $ref: "#/components/responses/NotFound" + "503": + $ref: "#/components/responses/WorkspacesDisabled" + + /api/v1/webhooks/github/{repo_id}: + parameters: + - name: repo_id + in: path + required: true + schema: + type: string + post: + operationId: receiveGithubWebhook + tags: [workspaces] + summary: Receive a GitHub webhook delivery (public, HMAC-authenticated) + description: | + Public endpoint — `requireAuth` is bypassed. Authentication is + per-row via the `X-Hub-Signature-256` header which must be + HMAC-SHA256 of the request body keyed by the workspace_repo's + `webhook_secret`. Mismatched signatures return 401; unknown + `repo_id` returns 404. On a valid `push` for the tracked branch + the server enqueues a `fetch_repo` job (dedupe collapses burst + deliveries). + + GitHub `ping` deliveries return 200 with no side effects so the + setup confirmation flow works. + security: [] + parameters: + - name: X-Hub-Signature-256 + in: header + required: false + schema: + type: string + description: HMAC-SHA256 over the body, hex-encoded with sha256= prefix. + - name: X-GitHub-Event + in: header + required: false + schema: + type: string + requestBody: + required: true + content: + application/json: + schema: + type: object + additionalProperties: true + responses: + "202": + description: Delivery accepted (enqueued or already in flight) + content: + application/json: + schema: + $ref: "#/components/schemas/WebhookAccepted" + "200": + description: Ping or no-op delivery + content: + application/json: + schema: + $ref: "#/components/schemas/WebhookAccepted" + "401": + description: HMAC signature mismatch + content: + application/json: + schema: + $ref: "#/components/schemas/Error" + "404": + description: Unknown workspace_repo id + content: + application/json: + schema: + $ref: "#/components/schemas/Error" + "503": + $ref: "#/components/responses/WorkspacesDisabled" + + /api/v1/workspaces/{id}/repos/{repo_id}/reindex: + parameters: + - name: id + in: path + required: true + schema: + type: string + - name: repo_id + in: path + required: true + schema: + type: string + post: + operationId: reindexWorkspaceRepo + tags: [workspaces] + summary: Manually re-trigger the clone + index pipeline + description: | + Enqueues a fresh `clone_repo` job for the repo. Dedupe collapses + repeated triggers into the existing in-flight job — only one + clone is ever active per repo at a time. + responses: + "202": + description: Reindex enqueued (or already running — dedupe) + content: + application/json: + schema: + $ref: "#/components/schemas/ReindexEnqueuedResponse" + "401": + $ref: "#/components/responses/Unauthorized" + "404": + $ref: "#/components/responses/NotFound" + "503": + $ref: "#/components/responses/WorkspacesDisabled" + + /api/v1/jobs: + get: + operationId: listJobs + tags: [workspaces] + summary: List background jobs (status / type filter) + parameters: + - name: status + in: query + required: false + schema: + type: string + enum: [pending, running, completed, failed] + - name: type + in: query + required: false + schema: + type: string + - name: limit + in: query + required: false + schema: + type: integer + minimum: 1 + maximum: 500 + default: 100 + responses: + "200": + description: Job list + content: + application/json: + schema: + $ref: "#/components/schemas/JobListResponse" + "401": + $ref: "#/components/responses/Unauthorized" + "503": + $ref: "#/components/responses/WorkspacesDisabled" + + /api/v1/github-tokens: + get: + operationId: listGithubTokens + tags: [github-tokens] + summary: List stored GitHub PATs (metadata only) + description: | + Returns metadata for every stored token — name, scopes, timestamps. + Plaintext values are NEVER returned by this endpoint; the only time + plaintext is surfaced is the response to POST /api/v1/github-tokens. + responses: + "200": + description: Token list + content: + application/json: + schema: + $ref: "#/components/schemas/GithubTokenListResponse" + "401": + $ref: "#/components/responses/Unauthorized" + "503": + $ref: "#/components/responses/WorkspacesDisabled" + post: + operationId: createGithubToken + tags: [github-tokens] + summary: Store a new GitHub PAT (encrypted-at-rest) + description: | + Accepts a plaintext token in the request body. The server encrypts + it with AES-GCM via internal/secrets and persists only the + ciphertext. The response carries metadata only — the plaintext is + already in the caller's hands. Scope validation against the GitHub + API is deferred to a later release; PR1 stores the supplied scopes + verbatim if any. + requestBody: + required: true + content: + application/json: + schema: + $ref: "#/components/schemas/CreateGithubTokenRequest" + responses: + "201": + description: Token stored + content: + application/json: + schema: + $ref: "#/components/schemas/GithubToken" + "401": + $ref: "#/components/responses/Unauthorized" + "409": + $ref: "#/components/responses/Conflict" + "422": + $ref: "#/components/responses/Unprocessable" + "503": + $ref: "#/components/responses/WorkspacesDisabled" + + /api/v1/github-tokens/{id}: + parameters: + - name: id + in: path + required: true + schema: + type: string + delete: + operationId: deleteGithubToken + tags: [github-tokens] + summary: Delete a stored GitHub PAT + description: | + Permanently removes the encrypted blob. Subsequent workspaces + operations that reference this token id will fail. PR1 does not + block deletion when the token is referenced — workspace_repos + landing in PR2 will introduce that FK. + responses: + "204": + description: Deleted + "401": + $ref: "#/components/responses/Unauthorized" + "404": + $ref: "#/components/responses/NotFound" + "503": + $ref: "#/components/responses/WorkspacesDisabled" + + /api/v1/github-tokens/{id}/accounts: + parameters: + - name: id + in: path + required: true + schema: + type: string + get: + operationId: listTokenAccounts + tags: [github-tokens] + summary: List the GitHub accounts visible to a stored PAT + description: | + Returns the PAT owner's personal account plus every organisation + the PAT can see. The dashboard renders this as the first step of + the add-repo flow so the operator can drill into a specific + account before picking a repository — useful when /user/repos + doesn't surface every org repo (e.g. SAML-protected orgs only + appear under /orgs/{login}/repos). + + The PAT plaintext never leaves the server; the dashboard only + addresses the token by id. + responses: + "200": + description: A list of accounts (1 user + 0..N orgs) + content: + application/json: + schema: + type: object + required: [accounts, total] + properties: + accounts: + type: array + items: + $ref: "#/components/schemas/GithubAccount" + total: + type: integer + "401": + $ref: "#/components/responses/Unauthorized" + "404": + $ref: "#/components/responses/NotFound" + "422": + description: GitHub rejected the token + content: + application/json: + schema: + $ref: "#/components/schemas/Error" + "502": + description: Could not reach GitHub + content: + application/json: + schema: + $ref: "#/components/schemas/Error" + "503": + $ref: "#/components/responses/WorkspacesDisabled" + + /api/v1/github-tokens/{id}/repos: + parameters: + - name: id + in: path + required: true + schema: + type: string + get: + operationId: listTokenRepos + tags: [github-tokens] + summary: List GitHub repositories visible to a stored PAT + description: | + Returns the repos the PAT can see, ordered by most recently + pushed. Used by the dashboard's add-repo flow to populate the + repo picker. + + When `account` is omitted the response is the affiliations- + aggregated view (GET /user/repos) — every repo the PAT can + see as owner, collaborator, or organization member. When + `account` is set the server hits `/users/{login}/repos` for + a user account or `/orgs/{login}/repos` for an org, depending + on `account_type`. Use the account-scoped call when /user/repos + misses an org repo (typical for SAML-protected orgs). + + The PAT plaintext never leaves the server; the dashboard only + addresses the token by id. Up to 500 repos are returned (5 pages + of 100). Larger affiliations should rely on client-side text + filtering or pick a more specific account. + parameters: + - name: q + in: query + required: false + description: Optional case-insensitive substring filter on full_name. + schema: + type: string + - name: account + in: query + required: false + description: | + Optional account login to scope the listing to. When set, + `account_type` must also be set. + schema: + type: string + - name: account_type + in: query + required: false + description: | + Required when `account` is set; ignored otherwise. + schema: + type: string + enum: [user, org] + responses: + "200": + description: A list of repositories + content: + application/json: + schema: + type: object + required: [repos, total] + properties: + repos: + type: array + items: + $ref: "#/components/schemas/GithubRepo" + total: + type: integer + "401": + $ref: "#/components/responses/Unauthorized" + "404": + $ref: "#/components/responses/NotFound" + "422": + description: GitHub rejected the token + content: + application/json: + schema: + $ref: "#/components/schemas/Error" + "502": + description: Could not reach GitHub + content: + application/json: + schema: + $ref: "#/components/schemas/Error" + "503": + $ref: "#/components/responses/WorkspacesDisabled" + +components: + securitySchemes: + bearerAuth: + type: http + scheme: bearer + description: "API key passed as `Authorization: Bearer `" + + parameters: + ProjectHash: + name: path + in: path + required: true + description: | + First 16 hex chars of `SHA1(host_path)`. See + `internal/projects.HashPath`. + schema: + type: string + pattern: "^[a-f0-9]{16}$" + example: "5b7d2c9e1a3f8042" + + responses: + Unauthorized: + description: Missing or invalid API key + content: + application/json: + schema: + $ref: "#/components/schemas/Error" + NotFound: + description: Resource not found + content: + application/json: + schema: + $ref: "#/components/schemas/Error" + Unprocessable: + description: Malformed request body or missing required fields + content: + application/json: + schema: + $ref: "#/components/schemas/Error" + InternalError: + description: Unhandled server error + content: + application/json: + schema: + $ref: "#/components/schemas/Error" + IndexerUnavailable: + description: Indexing service not configured + content: + application/json: + schema: + $ref: "#/components/schemas/Error" + Forbidden: + description: Authenticated, but lacks the required role + content: + application/json: + schema: + $ref: "#/components/schemas/Error" + Conflict: + description: Resource already exists (e.g. email taken) + content: + application/json: + schema: + $ref: "#/components/schemas/Error" + WorkspacesDisabled: + description: | + Workspaces feature is disabled on this server. Set + `CIX_WORKSPACES_ENABLED=true` and restart to enable. + content: + application/json: + schema: + $ref: "#/components/schemas/Error" + + schemas: + Error: + type: object + required: [detail] + properties: + detail: + type: string + + BootstrapStatusResponse: + type: object + required: [needs_bootstrap] + properties: + needs_bootstrap: + type: boolean + description: True when the users table is empty. + + User: + type: object + required: [id, email, role, must_change_password, created_at, updated_at, disabled] + properties: + id: + type: string + email: + type: string + format: email + role: + type: string + enum: [admin, viewer] + must_change_password: + type: boolean + created_at: + type: string + format: date-time + updated_at: + type: string + format: date-time + disabled: + type: boolean + description: | + True when `disabled_at` is set. Disabled users cannot + authenticate via password OR API key. + disabled_at: + type: string + format: date-time + nullable: true + + UserWithStats: + allOf: + - $ref: "#/components/schemas/User" + - type: object + required: [active_sessions_count, api_keys_count] + properties: + last_login_at: + type: string + format: date-time + nullable: true + description: | + Most recent session creation timestamp (RFC3339). + Null if the user has never logged in. + active_sessions_count: + type: integer + minimum: 0 + description: Count of non-expired sessions for this user. + api_keys_count: + type: integer + minimum: 0 + description: Count of non-revoked API keys owned by this user. + + UserListResponse: + type: object + required: [users, total] + properties: + users: + type: array + items: + $ref: "#/components/schemas/UserWithStats" + total: + type: integer + + LoginRequest: + type: object + required: [email, password] + properties: + email: + type: string + format: email + password: + type: string + minLength: 1 + + LoginResponse: + type: object + required: [user] + properties: + user: + $ref: "#/components/schemas/User" + + MeResponse: + type: object + required: [user, auth_method] + properties: + user: + $ref: "#/components/schemas/User" + auth_method: + type: string + enum: [session, api_key] + description: | + Tells the dashboard whether to surface "logout" (session) or + hide it (api_key access — there's nothing to log out of). + + ChangePasswordRequest: + type: object + required: [current_password, new_password] + properties: + current_password: + type: string + minLength: 1 + new_password: + type: string + minLength: 8 + description: Minimum 8 characters. No upper bound. + + CreateUserRequest: + type: object + required: [email, initial_password, role] + properties: + email: + type: string + format: email + initial_password: + type: string + minLength: 8 + description: | + One-time password the new user must change on first login. + The admin shares this out-of-band. + role: + type: string + enum: [admin, viewer] + + UpdateUserRequest: + type: object + properties: + role: + type: string + enum: [admin, viewer] + description: | + New role for the user. Refused for the last enabled admin + when set to `viewer`. disabled: type: boolean description: | - True when `disabled_at` is set. Disabled users cannot - authenticate via password OR API key. - disabled_at: + When true, the user can no longer authenticate. Refused for + the last enabled admin when set to true. + + RuntimeConfig: + type: object + required: + - embedding_model + - llama_ctx_size + - llama_n_gpu_layers + - llama_n_threads + - max_embedding_concurrency + - llama_batch_size + - source + properties: + embedding_model: + type: string + description: HF repo ID or absolute filesystem path to a .gguf file. + llama_ctx_size: + type: integer + minimum: 1 + llama_n_gpu_layers: + type: integer + description: -1 = all layers (Metal/CUDA), 0 = CPU only. + llama_n_threads: + type: integer + minimum: 0 + description: 0 = let llama-server auto-detect. + max_embedding_concurrency: + type: integer + minimum: 1 + llama_batch_size: + type: integer + minimum: 1 + source: + type: object + additionalProperties: + type: string + enum: [db, env, recommended] + description: | + Per-field origin label so the dashboard can render a "DB" / + "Env" / "Recommended" pill next to each value. Keys match the + other field names: `embedding_model`, `llama_ctx_size`, ... + recommended: + $ref: "#/components/schemas/RuntimeConfigRecommended" + updated_at: type: string format: date-time nullable: true + description: When the runtime_settings row was last written, or null when only env/recommended are in effect. + updated_by: + type: string + nullable: true + description: Who issued the last PUT, captured from the active session. - UserWithStats: - allOf: - - $ref: "#/components/schemas/User" - - type: object - required: [active_sessions_count, api_keys_count] - properties: - last_login_at: - type: string - format: date-time - nullable: true - description: | - Most recent session creation timestamp (RFC3339). - Null if the user has never logged in. - active_sessions_count: - type: integer - minimum: 0 - description: Count of non-expired sessions for this user. - api_keys_count: - type: integer - minimum: 0 - description: Count of non-revoked API keys owned by this user. + RuntimeConfigRecommended: + type: object + required: + - embedding_model + - llama_ctx_size + - llama_n_gpu_layers + - llama_n_threads + - max_embedding_concurrency + - llama_batch_size + properties: + embedding_model: { type: string } + llama_ctx_size: { type: integer } + llama_n_gpu_layers: { type: integer } + llama_n_threads: { type: integer } + max_embedding_concurrency: { type: integer } + llama_batch_size: { type: integer } + + RuntimeConfigUpdate: + type: object + description: | + All fields optional. Send a value to set/replace the override for + that field, send `""` (string fields) or `0` (numeric fields) to + CLEAR the override (next read falls back to env / recommended). + Omitted fields keep their current value. + properties: + embedding_model: + type: string + nullable: true + llama_ctx_size: + type: integer + nullable: true + llama_n_gpu_layers: + type: integer + nullable: true + llama_n_threads: + type: integer + nullable: true + max_embedding_concurrency: + type: integer + nullable: true + llama_batch_size: + type: integer + nullable: true + + SidecarStatus: + type: object + required: [state, ready, in_flight] + properties: + state: + type: string + enum: [running, starting, restarting, failed, disabled] + pid: + type: integer + minimum: 0 + description: 0 when no child process is alive (failed / disabled). + uptime_seconds: + type: integer + minimum: 0 + model: + type: string + ready: + type: boolean + last_error: + type: string + in_flight: + type: integer + minimum: 0 + description: Embedding queue depth at the moment of sampling. + restart_in_flight: + type: boolean + description: True between accept of POST /sidecar/restart and respawn completion. + + RestartAccepted: + type: object + required: [restart_id] + properties: + restart_id: + type: string + description: Opaque ID; future versions may expose per-restart progress under this id. + + ModelEntry: + type: object + required: [id, path, size_bytes] + properties: + id: + type: string + description: HF repo ID derived from the cache directory name (e.g. owner/model). + path: + type: string + description: Absolute path to the .gguf file on disk. + size_bytes: + type: integer + format: int64 + minimum: 0 + + ModelList: + type: object + required: [models, cache_dir] + properties: + models: + type: array + items: + $ref: "#/components/schemas/ModelEntry" + cache_dir: + type: string + description: The CIX_GGUF_CACHE_DIR that was scanned. Empty list with non-empty cache_dir = no .gguf files found. + + Session: + type: object + required: [id, created_at, expires_at, last_seen_at, is_current] + properties: + id: + type: string + created_at: + type: string + format: date-time + expires_at: + type: string + format: date-time + last_seen_at: + type: string + format: date-time + last_seen_ip: + type: string + nullable: true + last_seen_ua: + type: string + nullable: true + is_current: + type: boolean + description: True for the session carrying this request. + + SessionListResponse: + type: object + required: [sessions, total] + properties: + sessions: + type: array + items: + $ref: "#/components/schemas/Session" + total: + type: integer + + ApiKey: + type: object + required: [id, owner_user_id, name, prefix, created_at, revoked] + properties: + id: + type: string + owner_user_id: + type: string + name: + type: string + prefix: + type: string + description: | + Display-only prefix of the full key (e.g. `cix_a1b2c3d4`). + Long enough to recognise in lists, short enough that it + cannot reconstruct the original. + created_at: + type: string + format: date-time + last_used_at: + type: string + format: date-time + nullable: true + last_used_ip: + type: string + nullable: true + last_used_ua: + type: string + nullable: true + revoked: + type: boolean + revoked_at: + type: string + format: date-time + nullable: true - UserListResponse: + ApiKeyListResponse: type: object - required: [users, total] + required: [api_keys, total] properties: - users: + api_keys: type: array items: - $ref: "#/components/schemas/UserWithStats" + $ref: "#/components/schemas/ApiKey" total: type: integer - LoginRequest: + CreateApiKeyRequest: type: object - required: [email, password] + required: [name] properties: - email: - type: string - format: email - password: + name: type: string minLength: 1 + description: | + Human-friendly label shown in the dashboard. The full key + value is generated server-side and returned exactly once. - LoginResponse: - type: object - required: [user] - properties: - user: - $ref: "#/components/schemas/User" - - MeResponse: + ApiKeyCreated: type: object - required: [user, auth_method] + required: [api_key, full_key] properties: - user: - $ref: "#/components/schemas/User" - auth_method: + api_key: + $ref: "#/components/schemas/ApiKey" + full_key: type: string - enum: [session, api_key] description: | - Tells the dashboard whether to surface "logout" (session) or - hide it (api_key access — there's nothing to log out of). + The plaintext key value. **Returned exactly once.** Store it + securely — there is no way to retrieve it later. - ChangePasswordRequest: + HealthResponse: type: object - required: [current_password, new_password] + required: [status] properties: - current_password: + status: type: string - minLength: 1 - new_password: + enum: [ok, unhealthy] + reason: type: string - minLength: 8 - description: Minimum 8 characters. No upper bound. + description: Set only when `status` is `unhealthy`. - CreateUserRequest: + StatusResponse: type: object - required: [email, initial_password, role] + required: + - status + - backend + - server_version + - api_version + - model_loaded + - embedding_model + - projects + - active_indexing_jobs properties: - email: + status: type: string - format: email - initial_password: + enum: [ok] + backend: type: string - minLength: 8 - description: | - One-time password the new user must change on first login. - The admin shares this out-of-band. - role: + description: Backend identifier (e.g. `go`). + server_version: type: string - enum: [admin, viewer] - - UpdateUserRequest: - type: object - properties: - role: + api_version: type: string - enum: [admin, viewer] + example: v1 + model_loaded: + type: boolean description: | - New role for the user. Refused for the last enabled admin - when set to `viewer`. - disabled: + Whether the llama-server sidecar reports ready within 500 ms. + False when the sidecar is starting or has crashed. + embedding_model: + type: string + description: Hugging Face model id (e.g. `awhiteside/CodeRankEmbed-Q8_0-GGUF`). + projects: + type: integer + minimum: 0 + description: Total registered projects. + active_indexing_jobs: + type: integer + minimum: 0 + description: Currently-running `index_runs` rows. + update_available: type: boolean description: | - When true, the user can no longer authenticate. Refused for - the last enabled admin when set to true. + True when the version-check service has found a `server/v*` + release on GitHub strictly newer than the running server. + Field is omitted entirely when version-check is not wired + (set `CIX_VERSION_CHECK_ENABLED=false` to disable polling). + latest_version: + type: string + nullable: true + description: | + Latest released server version (without the `server/v` prefix, + e.g. `0.5.1`). Null until the first successful poll completes. + release_url: + type: string + nullable: true + description: GitHub release page URL for `latest_version`. Null when unknown. + version_check: + $ref: "#/components/schemas/VersionCheckStatus" - RuntimeConfig: + VersionCheckStatus: type: object - required: - - embedding_model - - llama_ctx_size - - llama_n_gpu_layers - - llama_n_threads - - max_embedding_concurrency - - llama_batch_size - - source + required: [enabled] properties: - embedding_model: + enabled: + type: boolean + description: Whether the periodic GitHub poll is running. + checked_at: type: string - description: HF repo ID or absolute filesystem path to a .gguf file. - llama_ctx_size: + format: date-time + nullable: true + description: Last poll timestamp (UTC, RFC 3339). Null before the first poll. + error: + type: string + nullable: true + description: Last error message, if the most recent poll failed. Null on success. + + ProjectSettings: + type: object + required: [exclude_patterns, max_file_size] + properties: + exclude_patterns: + type: array + items: { type: string } + max_file_size: type: integer - minimum: 1 - llama_n_gpu_layers: + minimum: 0 + + ProjectStats: + type: object + required: [total_files, indexed_files, total_chunks, total_symbols] + properties: + total_files: type: integer - description: -1 = all layers (Metal/CUDA), 0 = CPU only. - llama_n_threads: + minimum: 0 + indexed_files: type: integer minimum: 0 - description: 0 = let llama-server auto-detect. - max_embedding_concurrency: + total_chunks: type: integer - minimum: 1 - llama_batch_size: + minimum: 0 + total_symbols: type: integer - minimum: 1 - source: - type: object - additionalProperties: - type: string - enum: [db, env, recommended] - description: | - Per-field origin label so the dashboard can render a "DB" / - "Env" / "Recommended" pill next to each value. Keys match the - other field names: `embedding_model`, `llama_ctx_size`, ... - recommended: - $ref: "#/components/schemas/RuntimeConfigRecommended" + minimum: 0 + + Project: + type: object + required: + - path_hash + - host_path + - container_path + - languages + - settings + - stats + - status + - created_at + - updated_at + - last_indexed_at + properties: + path_hash: + type: string + pattern: "^[a-f0-9]{16}$" + description: First 16 hex chars of SHA1(host_path) — stable URL identifier. + host_path: + type: string + description: Absolute filesystem path on the operator's machine. + container_path: + type: string + description: Path inside the container (often equal to host_path). + languages: + type: array + items: { type: string } + settings: + $ref: "#/components/schemas/ProjectSettings" + stats: + $ref: "#/components/schemas/ProjectStats" + status: + type: string + enum: [created, indexing, indexed, error] + created_at: + type: string + format: date-time updated_at: type: string - format: date-time + format: date-time + last_indexed_at: + type: string + format: date-time + nullable: true + indexed_with_model: + type: string + nullable: true + description: | + Embedding model identifier active when this project was last + (re)indexed. NULL on rows that pre-date drift tracking — the + dashboard treats NULL as "Unknown" rather than as drift. + sqlite_path: + type: string nullable: true - description: When the runtime_settings row was last written, or null when only env/recommended are in effect. - updated_by: + description: Resolved SQLite database path for the active model. NULL on dashboards that don't expose storage info. + chroma_path: type: string nullable: true - description: Who issued the last PUT, captured from the active session. + description: Resolved chromem-go collection directory for this project. NULL when not computed. + sqlite_size_bytes: + type: integer + format: int64 + nullable: true + minimum: 0 + chroma_size_bytes: + type: integer + format: int64 + nullable: true + minimum: 0 - RuntimeConfigRecommended: + ProjectListResponse: type: object - required: - - embedding_model - - llama_ctx_size - - llama_n_gpu_layers - - llama_n_threads - - max_embedding_concurrency - - llama_batch_size + required: [projects, total] properties: - embedding_model: { type: string } - llama_ctx_size: { type: integer } - llama_n_gpu_layers: { type: integer } - llama_n_threads: { type: integer } - max_embedding_concurrency: { type: integer } - llama_batch_size: { type: integer } + projects: + type: array + items: + $ref: "#/components/schemas/Project" + total: + type: integer + minimum: 0 - RuntimeConfigUpdate: + CreateProjectRequest: type: object - description: | - All fields optional. Send a value to set/replace the override for - that field, send `""` (string fields) or `0` (numeric fields) to - CLEAR the override (next read falls back to env / recommended). - Omitted fields keep their current value. + required: [host_path] properties: - embedding_model: + host_path: type: string - nullable: true - llama_ctx_size: - type: integer - nullable: true - llama_n_gpu_layers: - type: integer - nullable: true - llama_n_threads: - type: integer - nullable: true - max_embedding_concurrency: - type: integer - nullable: true - llama_batch_size: + + UpdateProjectRequest: + type: object + properties: + settings: + $ref: "#/components/schemas/ProjectSettings" + + DirEntry: + type: object + required: [path, file_count] + properties: + path: + type: string + file_count: type: integer - nullable: true + minimum: 0 - SidecarStatus: + SymbolEntry: type: object - required: [state, ready, in_flight] + required: [name, kind, file_path, language] properties: - state: + name: type: string - enum: [running, starting, restarting, failed, disabled] - pid: + kind: + type: string + file_path: + type: string + language: + type: string + + ProjectSummary: + type: object + required: + - path_hash + - host_path + - status + - languages + - total_files + - total_chunks + - total_symbols + - top_directories + - recent_symbols + properties: + path_hash: + type: string + pattern: "^[a-f0-9]{16}$" + description: First 16 hex chars of SHA1(host_path) — stable URL identifier. + host_path: + type: string + status: + type: string + languages: + type: array + items: { type: string } + total_files: type: integer minimum: 0 - description: 0 when no child process is alive (failed / disabled). - uptime_seconds: + total_chunks: type: integer minimum: 0 - model: - type: string - ready: - type: boolean - last_error: - type: string - in_flight: + total_symbols: type: integer minimum: 0 - description: Embedding queue depth at the moment of sampling. - restart_in_flight: - type: boolean - description: True between accept of POST /sidecar/restart and respawn completion. + top_directories: + type: array + items: + $ref: "#/components/schemas/DirEntry" + recent_symbols: + type: array + items: + $ref: "#/components/schemas/SymbolEntry" - RestartAccepted: + SymbolSearchRequest: type: object - required: [restart_id] + required: [query] properties: - restart_id: + query: type: string - description: Opaque ID; future versions may expose per-restart progress under this id. + minLength: 1 + kinds: + type: array + items: { type: string } + limit: + type: integer + minimum: 0 + default: 20 - ModelEntry: + SymbolResultItem: type: object - required: [id, path, size_bytes] + required: [name, kind, file_path, line, end_line, language] properties: - id: + name: { type: string } + kind: { type: string } + file_path: { type: string } + line: { type: integer } + end_line: { type: integer } + language: { type: string } + signature: { type: string } + parent_name: { type: string } + + SymbolSearchResponse: + type: object + required: [results, total] + properties: + results: + type: array + items: + $ref: "#/components/schemas/SymbolResultItem" + total: + type: integer + minimum: 0 + + DefinitionRequest: + type: object + required: [symbol] + properties: + symbol: type: string - description: HF repo ID derived from the cache directory name (e.g. owner/model). - path: + minLength: 1 + kind: type: string - description: Absolute path to the .gguf file on disk. - size_bytes: + file_path: + type: string + limit: type: integer - format: int64 minimum: 0 + default: 10 - ModelList: + DefinitionItem: type: object - required: [models, cache_dir] + required: [name, kind, file_path, line, end_line, language] properties: - models: + name: { type: string } + kind: { type: string } + file_path: { type: string } + line: { type: integer } + end_line: { type: integer } + language: { type: string } + signature: { type: string } + parent_name: { type: string } + + DefinitionResponse: + type: object + required: [results, total] + properties: + results: type: array items: - $ref: "#/components/schemas/ModelEntry" - cache_dir: - type: string - description: The CIX_GGUF_CACHE_DIR that was scanned. Empty list with non-empty cache_dir = no .gguf files found. + $ref: "#/components/schemas/DefinitionItem" + total: + type: integer + minimum: 0 - Session: + ReferenceRequest: type: object - required: [id, created_at, expires_at, last_seen_at, is_current] + required: [symbol] properties: - id: - type: string - created_at: - type: string - format: date-time - expires_at: + symbol: type: string - format: date-time - last_seen_at: + minLength: 1 + limit: + type: integer + minimum: 0 + default: 50 + file_path: type: string - format: date-time - last_seen_ip: + + ReferenceItem: + type: object + required: + - file_path + - start_line + - end_line + - content + - chunk_type + - symbol_name + - language + properties: + file_path: { type: string } + start_line: { type: integer } + end_line: + type: integer + description: Always equal to `start_line` (refs table stores tokens, not ranges). + content: type: string - nullable: true - last_seen_ua: + description: Always empty — see endpoint description. + chunk_type: type: string - nullable: true - is_current: - type: boolean - description: True for the session carrying this request. + enum: [reference] + symbol_name: { type: string } + language: { type: string } - SessionListResponse: + ReferenceResponse: type: object - required: [sessions, total] + required: [results, total] properties: - sessions: + results: type: array items: - $ref: "#/components/schemas/Session" + $ref: "#/components/schemas/ReferenceItem" total: type: integer + minimum: 0 - ApiKey: + FileSearchRequest: type: object - required: [id, owner_user_id, name, prefix, created_at, revoked] + required: [query] properties: - id: - type: string - owner_user_id: - type: string - name: - type: string - prefix: - type: string - description: | - Display-only prefix of the full key (e.g. `cix_a1b2c3d4`). - Long enough to recognise in lists, short enough that it - cannot reconstruct the original. - created_at: - type: string - format: date-time - last_used_at: - type: string - format: date-time - nullable: true - last_used_ip: - type: string - nullable: true - last_used_ua: + query: type: string - nullable: true - revoked: - type: boolean - revoked_at: + minLength: 1 + description: Substring matched against `file_path`. + limit: + type: integer + minimum: 0 + default: 20 + + FileResultItem: + type: object + required: [file_path, language] + properties: + file_path: { type: string } + language: type: string - format: date-time nullable: true + description: Detected language, or null if undetected. - ApiKeyListResponse: + FileSearchResponse: type: object - required: [api_keys, total] + required: [results, total] properties: - api_keys: + results: type: array items: - $ref: "#/components/schemas/ApiKey" + $ref: "#/components/schemas/FileResultItem" total: type: integer + minimum: 0 - CreateApiKeyRequest: + SemanticSearchRequest: type: object - required: [name] + required: [query] properties: - name: + query: type: string minLength: 1 + limit: + type: integer + minimum: 0 + default: 10 + description: Maximum number of FILE groups (not chunks) to return. + languages: + type: array + items: { type: string } + paths: + type: array + items: { type: string } + description: Whitelist — keep only results whose path matches any prefix or substring. + excludes: + type: array + items: { type: string } + description: Blacklist — drop results whose path matches any prefix or substring. + min_score: + type: number + format: float description: | - Human-friendly label shown in the dashboard. The full key - value is generated server-side and returned exactly once. + Minimum cosine similarity. Omit for server default (0.2 — + light floor that keeps abstract NL queries non-empty). Send + `0` to disable; pass `0.4+` for strict code-symbol searches + calibrated for CodeRankEmbed-Q8. - ApiKeyCreated: + NestedHit: type: object - required: [api_key, full_key] + required: [start_line, end_line, chunk_type, score] properties: - api_key: - $ref: "#/components/schemas/ApiKey" - full_key: - type: string - description: | - The plaintext key value. **Returned exactly once.** Store it - securely — there is no way to retrieve it later. + start_line: { type: integer } + end_line: { type: integer } + symbol_name: { type: string } + chunk_type: { type: string } + score: + type: number + format: float - HealthResponse: + FileMatch: type: object - required: [status] + required: [start_line, end_line, content, score, chunk_type] properties: - status: - type: string - enum: [ok, unhealthy] - reason: - type: string - description: Set only when `status` is `unhealthy`. + start_line: { type: integer } + end_line: { type: integer } + content: { type: string } + score: + type: number + format: float + chunk_type: { type: string } + symbol_name: { type: string } + nested_hits: + type: array + items: + $ref: "#/components/schemas/NestedHit" - StatusResponse: + FileGroupResult: type: object - required: - - status - - backend - - server_version - - api_version - - model_loaded - - embedding_model - - projects - - active_indexing_jobs + required: [file_path, best_score, matches] properties: - status: - type: string - enum: [ok] - backend: - type: string - description: Backend identifier (e.g. `go`). - server_version: - type: string - api_version: - type: string - example: v1 - model_loaded: - type: boolean - description: | - Whether the llama-server sidecar reports ready within 500 ms. - False when the sidecar is starting or has crashed. - embedding_model: - type: string - description: Hugging Face model id (e.g. `awhiteside/CodeRankEmbed-Q8_0-GGUF`). - projects: - type: integer - minimum: 0 - description: Total registered projects. - active_indexing_jobs: + file_path: { type: string } + language: { type: string } + best_score: + type: number + format: float + matches: + type: array + items: + $ref: "#/components/schemas/FileMatch" + + SemanticSearchResponse: + type: object + required: [results, total, query_time_ms] + properties: + results: + type: array + items: + $ref: "#/components/schemas/FileGroupResult" + total: type: integer minimum: 0 - description: Currently-running `index_runs` rows. - update_available: + query_time_ms: + type: number + format: double + description: Wall-clock query latency, rounded to 1 decimal place. + + IndexBeginRequest: + type: object + properties: + full: type: boolean - description: | - True when the version-check service has found a `server/v*` - release on GitHub strictly newer than the running server. - Field is omitted entirely when version-check is not wired - (set `CIX_VERSION_CHECK_ENABLED=false` to disable polling). - latest_version: + default: false + description: When true, wipes existing project state before opening the session. + + IndexBeginResponse: + type: object + required: [run_id, stored_hashes] + properties: + run_id: type: string - nullable: true + stored_hashes: + type: object + additionalProperties: + type: string description: | - Latest released server version (without the `server/v` prefix, - e.g. `0.5.1`). Null until the first successful poll completes. - release_url: - type: string - nullable: true - description: GitHub release page URL for `latest_version`. Null when unknown. - version_check: - $ref: "#/components/schemas/VersionCheckStatus" + Map from file path → SHA-256 of currently-stored content. Empty + when the project has never been indexed (or `full:true` was passed). - VersionCheckStatus: + FilePayload: type: object - required: [enabled] + required: [path, content, content_hash, size] properties: - enabled: - type: boolean - description: Whether the periodic GitHub poll is running. - checked_at: + path: type: string - format: date-time - nullable: true - description: Last poll timestamp (UTC, RFC 3339). Null before the first poll. - error: + content: type: string - nullable: true - description: Last error message, if the most recent poll failed. Null on success. + description: UTF-8 text. Binary files should not be submitted. + content_hash: + type: string + description: SHA-256 hex digest of `content`. + language: + type: string + size: + type: integer + minimum: 0 - ProjectSettings: + IndexFilesRequest: type: object - required: [exclude_patterns, max_file_size] + required: [run_id, files] properties: - exclude_patterns: + run_id: + type: string + minLength: 1 + files: type: array - items: { type: string } - max_file_size: - type: integer - minimum: 0 + maxItems: 50 + items: + $ref: "#/components/schemas/FilePayload" - ProjectStats: + IndexFilesResponse: type: object - required: [total_files, indexed_files, total_chunks, total_symbols] + required: [files_accepted, chunks_created, files_processed_total] properties: - total_files: - type: integer - minimum: 0 - indexed_files: + files_accepted: type: integer minimum: 0 - total_chunks: + chunks_created: type: integer minimum: 0 - total_symbols: + files_processed_total: type: integer minimum: 0 - Project: + IndexFinishRequest: type: object - required: - - path_hash - - host_path - - container_path - - languages - - settings - - stats - - status - - created_at - - updated_at - - last_indexed_at + required: [run_id] properties: - path_hash: - type: string - pattern: "^[a-f0-9]{16}$" - description: First 16 hex chars of SHA1(host_path) — stable URL identifier. - host_path: - type: string - description: Absolute filesystem path on the operator's machine. - container_path: + run_id: type: string - description: Path inside the container (often equal to host_path). - languages: + minLength: 1 + deleted_paths: type: array items: { type: string } - settings: - $ref: "#/components/schemas/ProjectSettings" - stats: - $ref: "#/components/schemas/ProjectStats" - status: - type: string - enum: [created, indexing, indexed, error] - created_at: - type: string - format: date-time - updated_at: - type: string - format: date-time - last_indexed_at: - type: string - format: date-time - nullable: true - indexed_with_model: - type: string - nullable: true - description: | - Embedding model identifier active when this project was last - (re)indexed. NULL on rows that pre-date drift tracking — the - dashboard treats NULL as "Unknown" rather than as drift. - sqlite_path: - type: string - nullable: true - description: Resolved SQLite database path for the active model. NULL on dashboards that don't expose storage info. - chroma_path: - type: string - nullable: true - description: Resolved chromem-go collection directory for this project. NULL when not computed. - sqlite_size_bytes: - type: integer - format: int64 - nullable: true - minimum: 0 - chroma_size_bytes: + total_files_discovered: type: integer - format: int64 - nullable: true minimum: 0 - ProjectListResponse: + IndexFinishResponse: type: object - required: [projects, total] + required: [status, files_processed, chunks_created] properties: - projects: - type: array - items: - $ref: "#/components/schemas/Project" - total: + status: + type: string + enum: [completed] + files_processed: + type: integer + minimum: 0 + chunks_created: type: integer minimum: 0 - CreateProjectRequest: - type: object - required: [host_path] - properties: - host_path: - type: string - - UpdateProjectRequest: + IndexCancelResponse: type: object + required: [cancelled] properties: - settings: - $ref: "#/components/schemas/ProjectSettings" + cancelled: + type: boolean - DirEntry: + IndexProgressInfo: type: object - required: [path, file_count] + description: | + Progress payload. The active-session variant carries every field; + the historical-fallback variant only carries `files_processed`, + `files_total`, and `chunks_created`. properties: - path: + phase: type: string - file_count: + enum: [receiving, completed] + files_discovered: type: integer minimum: 0 + files_processed: + type: integer + minimum: 0 + files_total: + type: integer + minimum: 0 + chunks_created: + type: integer + minimum: 0 + elapsed_seconds: + type: number + format: double + run_id: + type: string - SymbolEntry: + IndexProgressResponse: type: object - required: [name, kind, file_path, language] + required: [status] properties: - name: - type: string - kind: - type: string - file_path: - type: string - language: + status: type: string + enum: [idle, indexing, completed, cancelled, failed, running] + description: | + `idle` — no session ever / fallback unavailable. + `indexing` — session active. + `completed`/`cancelled`/`failed`/`running` — last-run status from `index_runs`. + progress: + $ref: "#/components/schemas/IndexProgressInfo" - ProjectSummary: + IndexProgressEvent: type: object - required: - - path_hash - - host_path - - status - - languages - - total_files - - total_chunks - - total_symbols - - top_directories - - recent_symbols + description: | + One event in the NDJSON stream emitted by `POST /index/files` when + the client sends `Accept: application/x-ndjson`. The `event` field + discriminates the variant; other fields are populated as relevant. + required: [event] properties: - path_hash: + event: type: string - pattern: "^[a-f0-9]{16}$" - description: First 16 hex chars of SHA1(host_path) — stable URL identifier. - host_path: + enum: + - file_started + - file_chunked + - file_embedded + - file_done + - file_error + - heartbeat + - batch_done + - error + run_id: type: string - status: + path: type: string - languages: - type: array - items: { type: string } - total_files: + file_index: type: integer - minimum: 0 - total_chunks: + batch_size: type: integer - minimum: 0 - total_symbols: + chunks: type: integer - minimum: 0 - top_directories: - type: array - items: - $ref: "#/components/schemas/DirEntry" - recent_symbols: - type: array - items: - $ref: "#/components/schemas/SymbolEntry" - - SymbolSearchRequest: - type: object - required: [query] - properties: - query: + embed_ms: + type: integer + format: int64 + ts: type: string - minLength: 1 - kinds: - type: array - items: { type: string } - limit: + format: date-time + message: + type: string + fatal: + type: boolean + files_accepted: + type: integer + chunks_created: + type: integer + files_processed_total: type: integer - minimum: 0 - default: 20 - SymbolResultItem: + Workspace: type: object - required: [name, kind, file_path, line, end_line, language] + required: [id, name, description, created_at, updated_at] properties: - name: { type: string } - kind: { type: string } - file_path: { type: string } - line: { type: integer } - end_line: { type: integer } - language: { type: string } - signature: { type: string } - parent_name: { type: string } + id: + type: string + description: ULID-like opaque identifier. + name: + type: string + description: Unique workspace name. + description: + type: string + description: Free-form description. Empty string when absent. + created_at: + type: string + format: date-time + updated_at: + type: string + format: date-time - SymbolSearchResponse: + WorkspaceListResponse: type: object - required: [results, total] + required: [workspaces, total] properties: - results: + workspaces: type: array items: - $ref: "#/components/schemas/SymbolResultItem" + $ref: "#/components/schemas/Workspace" total: type: integer - minimum: 0 - DefinitionRequest: + CreateWorkspaceRequest: type: object - required: [symbol] + required: [name] properties: - symbol: + name: type: string minLength: 1 - kind: - type: string - file_path: + description: type: string - limit: - type: integer - minimum: 0 - default: 10 - - DefinitionItem: - type: object - required: [name, kind, file_path, line, end_line, language] - properties: - name: { type: string } - kind: { type: string } - file_path: { type: string } - line: { type: integer } - end_line: { type: integer } - language: { type: string } - signature: { type: string } - parent_name: { type: string } - - DefinitionResponse: - type: object - required: [results, total] - properties: - results: - type: array - items: - $ref: "#/components/schemas/DefinitionItem" - total: - type: integer - minimum: 0 + description: Optional free-form description. - ReferenceRequest: + UpdateWorkspaceRequest: type: object - required: [symbol] + description: | + Both fields are optional — omitting a field leaves the existing + value unchanged. Passing an empty string for `description` clears + it. `name` must be non-empty when provided. properties: - symbol: + name: type: string minLength: 1 - limit: - type: integer - minimum: 0 - default: 50 - file_path: + description: type: string - ReferenceItem: + GithubToken: type: object - required: - - file_path - - start_line - - end_line - - content - - chunk_type - - symbol_name - - language + required: [id, name, scopes, created_at] properties: - file_path: { type: string } - start_line: { type: integer } - end_line: - type: integer - description: Always equal to `start_line` (refs table stores tokens, not ranges). - content: + id: type: string - description: Always empty — see endpoint description. - chunk_type: + name: type: string - enum: [reference] - symbol_name: { type: string } - language: { type: string } + scopes: + type: array + items: + type: string + description: | + Best-effort scope list. PR1 stores whatever the client supplies; + later releases populate this by calling GitHub's /user endpoint + with the plaintext token. + created_at: + type: string + format: date-time + last_used_at: + type: string + format: date-time + nullable: true - ReferenceResponse: + GithubTokenListResponse: type: object - required: [results, total] + required: [tokens, total] properties: - results: + tokens: type: array items: - $ref: "#/components/schemas/ReferenceItem" + $ref: "#/components/schemas/GithubToken" total: type: integer - minimum: 0 - FileSearchRequest: + CreateGithubTokenRequest: type: object - required: [query] + required: [name, token] properties: - query: + name: type: string minLength: 1 - description: Substring matched against `file_path`. - limit: - type: integer - minimum: 0 - default: 20 + description: Human-friendly label shown in the dashboard. + token: + type: string + minLength: 1 + description: | + The plaintext PAT. The server encrypts it with AES-GCM before + persisting; this is the only request body that ever carries + the plaintext value. + scopes: + type: array + items: + type: string + deprecated: true + description: | + Ignored. The server now derives real scopes from GitHub's + X-OAuth-Scopes response header by calling GET /user with the + supplied token, so user-supplied scope hints are no longer + consulted. Kept for backwards compatibility with older + clients that still send it. - FileResultItem: + WorkspaceRepo: type: object - required: [file_path, language] + required: + - id + - workspace_id + - github_url + - branch + - project_path + - status + - auto_webhook + - webhook_mode + - is_linked + - created_at + - updated_at properties: - file_path: { type: string } - language: + id: + type: string + workspace_id: + type: string + github_url: + type: string + description: Canonical https://github.com/owner/repo URL. + branch: + type: string + project_path: + type: string + description: | + Indexed project's host_path — "github.com/owner/repo@branch". + Use this with the existing /api/v1/projects/{path}/* endpoints + (path = first 16 hex chars of SHA1). + token_id: type: string nullable: true - description: Detected language, or null if undetected. + description: | + GitHub token used for clone+webhook calls. Null when the + repo is public. + auto_webhook: + type: boolean + description: | + Legacy alias for `webhook_mode == "auto"`. Always present so + old clients keep working; new clients should consult + `webhook_mode` instead. + webhook_mode: + type: string + enum: [manual, auto, disabled] + description: | + Operator's intent for how this repo gets kept fresh. `auto` + asks the server to register the GitHub webhook; `manual` + means the operator pastes the URL+secret into GitHub + themselves; `disabled` skips auto-sync entirely — reindex + via the dashboard button only. + status: + type: string + enum: [pending, cloning, indexing, indexed, failed] + last_sha: + type: string + nullable: true + description: HEAD SHA at last successful clone. + last_error: + type: string + nullable: true + last_indexed_at: + type: string + format: date-time + nullable: true + is_linked: + type: boolean + description: | + True when this row is a lightweight pointer to a project + already owned by another workspace_repo — added via the + "Add Existing Project" flow. Linked rows have no clone on + disk, no webhook, and no token; reindex is a no-op (must + be triggered from the canonical owning row). + created_at: + type: string + format: date-time + updated_at: + type: string + format: date-time - FileSearchResponse: + WorkspaceRepoListResponse: type: object - required: [results, total] + required: [repos, total] properties: - results: + repos: type: array items: - $ref: "#/components/schemas/FileResultItem" + $ref: "#/components/schemas/WorkspaceRepo" total: type: integer - minimum: 0 - SemanticSearchRequest: + AddWorkspaceRepoRequest: type: object - required: [query] + required: [github_url, branch] properties: - query: + github_url: + type: string + description: https://github.com/owner/repo URL. + branch: type: string minLength: 1 - limit: - type: integer - minimum: 0 - default: 10 - description: Maximum number of FILE groups (not chunks) to return. - languages: - type: array - items: { type: string } - paths: - type: array - items: { type: string } - description: Whitelist — keep only results whose path matches any prefix or substring. - excludes: - type: array - items: { type: string } - description: Blacklist — drop results whose path matches any prefix or substring. - min_score: - type: number - format: float + token_id: + type: string + description: | + Optional id of a stored GitHub PAT. Required for private repos. + auto_webhook: + type: boolean + default: false + deprecated: true + description: | + Legacy field. New clients should send `webhook_mode` instead. + When both are provided, `webhook_mode` wins; when only the + bool is set, `true` is mapped to `webhook_mode = "auto"`. + webhook_mode: + type: string + enum: [manual, auto, disabled] + default: manual + description: | + How the server should keep this repo fresh: + - `auto` — server registers the webhook in GitHub on your + behalf (requires admin:repo_hook on the PAT). + - `manual` — server stores a webhook_secret and returns it + once; you paste the URL + secret into GitHub yourself. + - `disabled` — no auto-sync at all; reindex via the + dashboard button only. + + GithubRepo: + type: object + required: [full_name, default_branch, private, html_url] + description: A repository visible to a stored PAT. + properties: + full_name: + type: string + description: "owner/name" + default_branch: + type: string description: | - Minimum cosine similarity. Omit for server default (0.4 for - CodeRankEmbed-Q8). Send `0` explicitly to disable the floor. + The repo's default branch; the dashboard pre-fills the branch + input with this when the user picks a repo from the list. + private: + type: boolean + html_url: + type: string + description: + type: string - NestedHit: + GithubAccount: type: object - required: [start_line, end_line, chunk_type, score] + required: [login, type] + description: | + A GitHub account the PAT can see. The user owning the PAT is + returned first, followed by every org accessible via /user/orgs. + The dashboard's add-repo flow shows these in a Select before + the repository picker so the operator can drill into a specific + org instead of relying on the affiliations-aggregated view. properties: - start_line: { type: integer } - end_line: { type: integer } - symbol_name: { type: string } - chunk_type: { type: string } - score: - type: number - format: float + login: + type: string + description: GitHub login (user name or org slug). + type: + type: string + enum: [user, org] + description: | + "user" for the PAT owner; "org" for organisations. + avatar_url: + type: string - FileMatch: + WorkspaceRepoCreated: type: object - required: [start_line, end_line, content, score, chunk_type] + required: [repo, webhook_url, webhook_secret] properties: - start_line: { type: integer } - end_line: { type: integer } - content: { type: string } - score: - type: number - format: float - chunk_type: { type: string } - symbol_name: { type: string } - nested_hits: - type: array - items: - $ref: "#/components/schemas/NestedHit" + repo: + $ref: "#/components/schemas/WorkspaceRepo" + webhook_url: + type: string + description: | + Publicly-reachable POST endpoint to register in GitHub when + doing the webhook setup manually. Includes the workspace_repo + id segment. Empty string for linked rows (no webhook). + webhook_secret: + type: string + description: | + HMAC secret. **Returned once on create + once via + webhook-info.** Use as the "Secret" field in GitHub's webhook + UI; deliveries are validated by HMAC-SHA256 over the body. + Empty string for linked rows (no webhook). - FileGroupResult: + LinkExistingProjectRequest: type: object - required: [file_path, best_score, matches] + required: [project_hash] properties: - file_path: { type: string } - language: { type: string } - best_score: - type: number - format: float - matches: - type: array - items: - $ref: "#/components/schemas/FileMatch" + project_hash: + type: string + minLength: 16 + maxLength: 16 + description: | + The 16-hex `path_hash` of an indexed project — the same value + used in /api/v1/projects/{path}. The server resolves it to + the canonical `host_path` and inserts a linked workspace_repo + row. The project must already be in status='indexed' and have + a host_path of the form "github.com/owner/repo@branch". - SemanticSearchResponse: + ProjectWorkspaceList: type: object - required: [results, total, query_time_ms] + required: [workspaces] properties: - results: + workspaces: type: array items: - $ref: "#/components/schemas/FileGroupResult" - total: - type: integer - minimum: 0 - query_time_ms: - type: number - format: double - description: Wall-clock query latency, rounded to 1 decimal place. + $ref: "#/components/schemas/ProjectWorkspaceEntry" - IndexBeginRequest: + ProjectWorkspaceEntry: type: object + required: [workspace_id, workspace_name, repo_id, branch, status, is_linked] properties: - full: + workspace_id: + type: string + workspace_name: + type: string + repo_id: + type: string + description: workspace_repos.id — same value used in /repos endpoints. + branch: + type: string + status: + type: string + enum: [pending, cloning, indexing, indexed, failed] + is_linked: type: boolean - default: false - description: When true, wipes existing project state before opening the session. - IndexBeginResponse: + ReindexEnqueuedResponse: type: object - required: [run_id, stored_hashes] + required: [status] properties: - run_id: + status: type: string - stored_hashes: - type: object - additionalProperties: - type: string - description: | - Map from file path → SHA-256 of currently-stored content. Empty - when the project has never been indexed (or `full:true` was passed). + enum: [enqueued, already_running] + repo: + $ref: "#/components/schemas/WorkspaceRepo" - FilePayload: + Job: type: object - required: [path, content, content_hash, size] + required: + - id + - type + - status + - attempts + - max_attempts + - scheduled_at + - created_at properties: - path: + id: type: string - content: + type: type: string - description: UTF-8 text. Binary files should not be submitted. - content_hash: + status: type: string - description: SHA-256 hex digest of `content`. - language: + enum: [pending, running, completed, failed] + dedupe_key: type: string - size: + nullable: true + payload: + type: object + additionalProperties: true + description: | + Raw JSON payload — shape depends on `type`. Render as-is in + the dashboard; don't assume structure. + attempts: + type: integer + minimum: 0 + max_attempts: type: integer minimum: 0 + last_error: + type: string + nullable: true + scheduled_at: + type: string + format: date-time + started_at: + type: string + format: date-time + nullable: true + completed_at: + type: string + format: date-time + nullable: true + created_at: + type: string + format: date-time - IndexFilesRequest: + JobListResponse: type: object - required: [run_id, files] + required: [jobs, total] properties: - run_id: - type: string - minLength: 1 - files: + jobs: type: array - maxItems: 50 items: - $ref: "#/components/schemas/FilePayload" - - IndexFilesResponse: - type: object - required: [files_accepted, chunks_created, files_processed_total] - properties: - files_accepted: - type: integer - minimum: 0 - chunks_created: - type: integer - minimum: 0 - files_processed_total: + $ref: "#/components/schemas/Job" + total: type: integer - minimum: 0 - IndexFinishRequest: + WebhookInfoResponse: type: object - required: [run_id] + required: [webhook_url, webhook_secret, auto_registered] properties: - run_id: + webhook_url: type: string - minLength: 1 - deleted_paths: - type: array - items: { type: string } - total_files_discovered: - type: integer - minimum: 0 + description: | + Full URL to paste into GitHub's webhook config. Empty path-only + value when CIX_PUBLIC_URL is unset — prepend your tunnel origin. + webhook_secret: + type: string + description: HMAC secret. Treat as sensitive — rotates on repo recreate. + auto_registered: + type: boolean + description: | + True when the server successfully auto-registered the webhook + against the GitHub API (auto_webhook=true on create + PAT had + admin:repo_hook). When false, the operator must register manually. - IndexFinishResponse: + WebhookAccepted: type: object - required: [status, files_processed, chunks_created] + required: [status] properties: status: type: string - enum: [completed] - files_processed: - type: integer - minimum: 0 - chunks_created: - type: integer - minimum: 0 + enum: [enqueued, already_running, ignored, ping] + repo_id: + type: string - IndexCancelResponse: + WorkspaceSearchResponse: type: object - required: [cancelled] + required: [status, projects, chunks] properties: - cancelled: - type: boolean + status: + type: string + enum: [ok, empty, partial_failure] + description: | + `ok` — results follow. `empty` — workspace queried fine but + nothing cleared the `min_score` floor. `partial_failure` — + no chunks returned but at least one repo errored out during + the fan-out (see `failed_repos`). + projects: + type: array + description: | + Top projects ranked by `project_score`. Surfaces which repos + in the workspace the query is most relevant to, independent + of which individual chunks rank highest globally. + items: + $ref: "#/components/schemas/WorkspaceSearchProject" + chunks: + type: array + items: + $ref: "#/components/schemas/WorkspaceSearchChunk" + pending_repos: + type: array + description: | + Repos that belong to the workspace but weren't queryable + yet — clone or index hasn't completed (or the last attempt + failed). Their matches will appear once they reach + `status=indexed`. Empty if every repo is ready. + items: + $ref: "#/components/schemas/WorkspaceSearchPendingRepo" + failed_repos: + type: array + description: | + Repos whose per-project vector search returned an error + during this request (e.g. corrupt collection on disk). The + rest of the workspace is still aggregated; surface this so + the operator knows the result set is incomplete. + items: + $ref: "#/components/schemas/WorkspaceSearchFailedRepo" + stale_fts_repos: + type: array + description: | + Repos that were indexed before the BM25 mirror + (`chunks_fts`) was added: they're queryable via dense + search but the sparse half of the hybrid is empty, which + collapses the algorithm to pure-dense fan-out for these + entries. Trigger a reindex on each to backfill the FTS + side. Empty once every workspace repo has been reindexed + under the new schema. + items: + $ref: "#/components/schemas/WorkspaceSearchStaleFTSRepo" - IndexProgressInfo: + WorkspaceSearchPendingRepo: type: object - description: | - Progress payload. The active-session variant carries every field; - the historical-fallback variant only carries `files_processed`, - `files_total`, and `chunks_created`. + required: [project_path, status] properties: - phase: + project_path: type: string - enum: [receiving, completed] - files_discovered: - type: integer - minimum: 0 - files_processed: - type: integer - minimum: 0 - files_total: - type: integer - minimum: 0 - chunks_created: - type: integer - minimum: 0 - elapsed_seconds: - type: number - format: double - run_id: + status: type: string + enum: [pending, cloning, indexing, failed] + description: | + Current row state in `workspace_repos.status`. Anything + other than `indexed` means the repo hasn't contributed to + this response. - IndexProgressResponse: + WorkspaceSearchFailedRepo: type: object - required: [status] + required: [project_path, reason] properties: - status: + project_path: + type: string + reason: type: string - enum: [idle, indexing, completed, cancelled, failed, running] description: | - `idle` — no session ever / fallback unavailable. - `indexing` — session active. - `completed`/`cancelled`/`failed`/`running` — last-run status from `index_runs`. - progress: - $ref: "#/components/schemas/IndexProgressInfo" + Short category for the failure — `vectorstore_error`, + `timeout`, etc. Intentionally not the raw error message so + internal details don't leak; check the server logs by + `workspace_id` for the full error. - IndexProgressEvent: + WorkspaceSearchStaleFTSRepo: type: object - description: | - One event in the NDJSON stream emitted by `POST /index/files` when - the client sends `Accept: application/x-ndjson`. The `event` field - discriminates the variant; other fields are populated as relevant. - required: [event] + required: [project_path] properties: - event: + project_path: type: string - enum: - - file_started - - file_chunked - - file_embedded - - file_done - - file_error - - heartbeat - - batch_done - - error - run_id: + + WorkspaceSearchProject: + type: object + required: [project_path, label, project_score, num_hits, bm25_score, dense_score] + properties: + project_path: type: string - path: + label: type: string - file_index: - type: integer - batch_size: - type: integer - chunks: - type: integer - embed_ms: + description: | + Short human-readable label derived from the project_path's + last segment (e.g. "owner/repo@main" → "repo@main"). + project_score: + type: number + format: float + description: | + Hybrid candidacy in [0,1] — the α-blend of per-query + min-max normalized BM25 and dense signals (α=0.5) the + project-relevance gate ranks by. The "Top projects" + panel sorts by this value. + num_hits: type: integer - format: int64 - ts: + description: | + Chunks from this project that survived the per-project + chunk cap and made it into the global chunks list. + bm25_score: + type: number + format: float + description: | + Mean of the top-N raw BM25 scores in this project (sign + flipped from SQLite's bm25() so positive = better). + Surfaced so the dashboard can show "this repo surfaced + on literal token overlap" vs. "pure semantic similarity". + dense_score: + type: number + format: float + description: | + Mean of the top-N raw cosine similarities in this + project. Together with `bm25_score`, the two raw signals + that feed into `project_score`. + + WorkspaceSearchChunk: + type: object + required: + - project_path + - file_path + - start_line + - end_line + - score + - content + properties: + project_path: type: string - format: date-time - message: + file_path: type: string - fatal: - type: boolean - files_accepted: - type: integer - chunks_created: + start_line: type: integer - files_processed_total: + end_line: type: integer + symbol_name: + type: string + language: + type: string + score: + type: number + format: float + description: | + Raw cosine similarity between the query and this chunk — + the value chunks are sorted by. No per-project boost is + applied (a previous revision multiplied this by + project_score, which let one repo dominate every result + for short queries like product-name acronyms). + content: + type: string diff --git a/server/cmd/cix-server/main.go b/server/cmd/cix-server/main.go index af97945..7fe85fd 100644 --- a/server/cmd/cix-server/main.go +++ b/server/cmd/cix-server/main.go @@ -20,13 +20,19 @@ import ( "github.com/dvcdsys/code-index/server/internal/config" "github.com/dvcdsys/code-index/server/internal/db" "github.com/dvcdsys/code-index/server/internal/embeddings" + "github.com/dvcdsys/code-index/server/internal/githubtokens" "github.com/dvcdsys/code-index/server/internal/httpapi" "github.com/dvcdsys/code-index/server/internal/indexer" + "github.com/dvcdsys/code-index/server/internal/jobs" "github.com/dvcdsys/code-index/server/internal/runtimecfg" + "github.com/dvcdsys/code-index/server/internal/secrets" "github.com/dvcdsys/code-index/server/internal/sessions" "github.com/dvcdsys/code-index/server/internal/users" "github.com/dvcdsys/code-index/server/internal/vectorstore" "github.com/dvcdsys/code-index/server/internal/versioncheck" + "github.com/dvcdsys/code-index/server/internal/workspacejobs" + "github.com/dvcdsys/code-index/server/internal/workspacerepos" + "github.com/dvcdsys/code-index/server/internal/workspaces" ) func runHealthcheck() { @@ -191,6 +197,93 @@ func run() error { } } + // Workspaces feature wiring. The whole subsystem is gated by + // CIX_WORKSPACES_ENABLED so existing deployments don't surface + // half-wired endpoints in /docs. When the flag is off we skip the + // secrets boot AND the job worker pool entirely so operators don't + // trip on encryption-key or polling-overhead concerns they never + // opted into. + var ( + wsSvc *workspaces.Service + ghSvc *githubtokens.Service + wrSvc *workspacerepos.Service + jobsSvc *jobs.Service + ) + if cfg.WorkspacesEnabled { + secSvc, err := secrets.Open(secrets.OpenOptions{ + DataDir: cfg.SecretsDataDir, + Logger: logger, + AllowGenerate: true, + }) + if err != nil { + // Hard error — github_tokens are unreadable without a key, and + // silently disabling encryption would be worse than refusing to + // start. The dashboard surfaces the error via /api/v1/status if + // the operator misses the boot log. + return fmt.Errorf("workspaces secrets: %w", err) + } + ghSvc = githubtokens.New(database, secSvc) + + // Sanity gate: if encrypted rows exist but the resolved key cannot + // decrypt them, refuse to start so the operator doesn't accidentally + // nuke the data. Probing one row is enough — Decrypt fails uniformly + // on a wrong key. + n, err := ghSvc.CountWithEncryption(context.Background()) + if err != nil { + return fmt.Errorf("workspaces secrets sanity: %w", err) + } + if n > 0 { + // List one and try Reveal. We don't care about the value — only + // whether the decryption succeeds. + toks, _ := ghSvc.List(context.Background()) + if len(toks) > 0 { + if _, err := ghSvc.Reveal(context.Background(), toks[0].ID); err != nil { + return fmt.Errorf("encryption key does not match existing github_tokens — refusing to start (recover the prior CIX_SECRET_KEY or wipe github_tokens manually): %w", err) + } + } + } + if secSvc.Autogenerated() { + logger.Warn("workspaces: a fresh encryption keyfile was generated this boot — back it up before redeploying", + "source", secSvc.Source()) + } else { + logger.Info("workspaces: encryption key loaded", "source", secSvc.Source()) + } + wsSvc = workspaces.New(database) + wrSvc = workspacerepos.New(database) + + // Persistent job queue + worker pool. Worker concurrency comes + // from CIX_WORKER_CONCURRENCY (default 2). Handlers are registered + // before Start so racing inserts get picked up immediately. + jobsSvc = jobs.New(database, jobs.Options{ + Concurrency: cfg.WorkerConcurrency, + Logger: logger, + }) + workspacejobs.Register(workspacejobs.Deps{ + DB: database, + Jobs: jobsSvc, + WorkspaceRepos: wrSvc, + GithubTokens: ghSvc, + Indexer: idx, + VectorStore: vs, + DataDir: cfg.WorkspacesDataDir, + Logger: logger, + }) + jobsSvc.Start(context.Background()) + // Defer shutdown — stop new claims, drain in-flight work. + defer func() { + stopCtx, cancel := context.WithTimeout(context.Background(), 10*time.Second) + defer cancel() + if err := jobsSvc.Stop(stopCtx); err != nil { + logger.Warn("workspaces jobs: stop", "err", err) + } + }() + logger.Info("workspaces: jobs worker pool started", + "concurrency", cfg.WorkerConcurrency, + "data_dir", cfg.WorkspacesDataDir) + } else { + logger.Info("workspaces feature disabled (CIX_WORKSPACES_ENABLED=false)") + } + // Background version-check poller. The 60s initial delay keeps GitHub // off the boot path; the goroutine exits cleanly when bgCtx is canceled // in the shutdown branch below. @@ -206,21 +299,27 @@ func run() error { go vcSvc.Run(bgCtx) handler := httpapi.NewRouter(httpapi.Deps{ - DB: database, - ServerVersion: version, - APIVersion: apiVersion, - Backend: backend, - EmbeddingModel: cfg.EmbeddingModel, - Logger: logger, - AuthDisabled: cfg.AuthDisabled, - Users: usrSvc, - Sessions: sessSvc, - APIKeys: akSvc, - EmbeddingSvc: embedSvc, - VectorStore: vs, - Indexer: idx, - RuntimeCfg: rcfg, - VersionCheck: vcSvc, + DB: database, + ServerVersion: version, + APIVersion: apiVersion, + Backend: backend, + EmbeddingModel: cfg.EmbeddingModel, + Logger: logger, + AuthDisabled: cfg.AuthDisabled, + Users: usrSvc, + Sessions: sessSvc, + APIKeys: akSvc, + EmbeddingSvc: embedSvc, + VectorStore: vs, + Indexer: idx, + RuntimeCfg: rcfg, + VersionCheck: vcSvc, + WorkspacesEnabled: cfg.WorkspacesEnabled, + Workspaces: wsSvc, + GithubTokens: ghSvc, + WorkspaceRepos: wrSvc, + Jobs: jobsSvc, + PublicBaseURL: cfg.PublicBaseURL, }) srv := &http.Server{ diff --git a/server/dashboard/src/modules/github-tokens/GithubTokensPage.tsx b/server/dashboard/src/modules/github-tokens/GithubTokensPage.tsx new file mode 100644 index 0000000..c3e1626 --- /dev/null +++ b/server/dashboard/src/modules/github-tokens/GithubTokensPage.tsx @@ -0,0 +1,257 @@ +import { useEffect, useState } from 'react'; +import { AlertCircle, Github, Plus, Trash2 } from 'lucide-react'; +import { ApiError, api } from '@/api/client'; +import { Alert, AlertDescription, AlertTitle } from '@/ui/alert'; +import { Button } from '@/ui/button'; +import { Skeleton } from '@/ui/skeleton'; +import { + Dialog, + DialogContent, + DialogDescription, + DialogFooter, + DialogHeader, + DialogTitle, + DialogTrigger, +} from '@/ui/dialog'; +import { Input } from '@/ui/input'; +import { Label } from '@/ui/label'; + +type GithubToken = { + id: string; + name: string; + scopes: string[]; + created_at: string; + last_used_at?: string | null; +}; + +type GithubTokenListResponse = { + tokens: GithubToken[]; + total: number; +}; + +// GithubTokensPage manages encrypted-at-rest GitHub PATs used by the +// workspaces feature for cloning private repos and (optionally) registering +// webhooks. The plaintext value is sent on POST and never returned — +// subsequent operations identify tokens by id. +export default function GithubTokensPage() { + const [list, setList] = useState(null); + const [error, setError] = useState(null); + const [featureOff, setFeatureOff] = useState(false); + + async function reload() { + try { + const resp = await api.get('/github-tokens'); + setList(resp.tokens); + setError(null); + setFeatureOff(false); + } catch (e) { + if (e instanceof ApiError && e.status === 503) { + setFeatureOff(true); + setList([]); + return; + } + setError(e instanceof Error ? e.message : String(e)); + } + } + + useEffect(() => { + void reload(); + }, []); + + if (featureOff) { + return ( +
+
+ + + Workspaces feature is disabled + + GitHub tokens are part of the workspaces feature. Set{' '} + CIX_WORKSPACES_ENABLED=true and restart the server + to enable. + + +
+ ); + } + + return ( +
+
+ {error && ( + + + Could not load tokens + {error} + + )} + {list === null ? ( +
+ + +
+ ) : list.length === 0 ? ( + + ) : ( +
    + {list.map((t) => ( + + ))} +
+ )} +
+ ); +} + +function Header({ onCreated }: { onCreated?: () => void }) { + return ( +
+
+

GitHub Tokens

+

+ Personal Access Tokens for cloning private repositories. Stored + encrypted; the plaintext value is never returned after creation. +

+
+ {onCreated && } +
+ ); +} + +function EmptyState() { + return ( +
+ +

No GitHub tokens yet

+

+ Tokens are required when adding private repositories to a workspace. +

+
+ ); +} + +function TokenRow({ token, onDeleted }: { token: GithubToken; onDeleted: () => void }) { + const [busy, setBusy] = useState(false); + + async function handleDelete() { + if (!confirm(`Delete token "${token.name}"? This cannot be undone.`)) return; + setBusy(true); + try { + await api.delete(`/github-tokens/${token.id}`); + onDeleted(); + } catch (e) { + alert(e instanceof Error ? e.message : String(e)); + } finally { + setBusy(false); + } + } + + return ( +
  • +
    +
    {token.name}
    +
    + scopes:{' '} + {token.scopes.length + ? token.scopes.join(', ') + : '(fine-grained or none)'} + {token.last_used_at && ( + <> · last used {new Date(token.last_used_at).toLocaleString()} + )} +
    +
    + +
  • + ); +} + +function CreateTokenDialog({ onCreated }: { onCreated: () => void }) { + const [open, setOpen] = useState(false); + const [name, setName] = useState(''); + const [token, setToken] = useState(''); + const [busy, setBusy] = useState(false); + const [err, setErr] = useState(null); + + // Scopes are intentionally not asked for — the server validates the + // token against GET /user and reads the real X-OAuth-Scopes header, + // which is the only thing GitHub will actually enforce. Asking the + // user just invites drift between displayed and effective scopes. + async function submit() { + setBusy(true); + setErr(null); + try { + await api.post('/github-tokens', { name, token }); + setName(''); + setToken(''); + setOpen(false); + onCreated(); + } catch (e) { + setErr(e instanceof Error ? e.message : String(e)); + } finally { + setBusy(false); + } + } + + return ( + + + + + + + Add GitHub token + + Stored encrypted-at-rest with AES-256-GCM. The plaintext value + never leaves this request — there is no way to retrieve it after + saving. Scopes are read from GitHub on save (no need to enter + them here). + + +
    +
    + + setName(e.target.value)} + placeholder="personal" + /> +
    +
    + + setToken(e.target.value)} + placeholder="ghp_... or github_pat_..." + className="font-mono" + /> +
    + {err && ( + + {err} + + )} +
    + + + + +
    +
    + ); +} diff --git a/server/dashboard/src/modules/github-tokens/index.ts b/server/dashboard/src/modules/github-tokens/index.ts new file mode 100644 index 0000000..112d1b7 --- /dev/null +++ b/server/dashboard/src/modules/github-tokens/index.ts @@ -0,0 +1,12 @@ +import { Github } from 'lucide-react'; +import type { Module } from '../types'; +import GithubTokensPage from './GithubTokensPage'; + +export const GithubTokensModule: Module = { + id: 'github-tokens', + label: 'GitHub Tokens', + icon: Github, + path: '/github-tokens', + element: GithubTokensPage, + weight: 35, +}; diff --git a/server/dashboard/src/modules/projects/ProjectDetailPage.tsx b/server/dashboard/src/modules/projects/ProjectDetailPage.tsx index 6a475f1..4b2a604 100644 --- a/server/dashboard/src/modules/projects/ProjectDetailPage.tsx +++ b/server/dashboard/src/modules/projects/ProjectDetailPage.tsx @@ -12,7 +12,7 @@ import { formatDateTime, formatRelative } from '@/lib/formatDate'; import { useRuntimeModel } from '@/lib/useServerStatus'; import { DeleteProjectDialog } from './components/DeleteProjectDialog'; import { ProjectInfoCard } from './components/ProjectInfoCard'; -import { useProject, useProjectSummary } from './hooks'; +import { useProject, useProjectSummary, useProjectWorkspaces } from './hooks'; const STATUS_VARIANT: Record = { created: 'outline', @@ -27,6 +27,7 @@ export function ProjectDetailPage() { const isAdmin = user?.role === 'admin'; const project = useProject(id); const summary = useProjectSummary(id); + const workspaces = useProjectWorkspaces(id); const currentModel = useRuntimeModel(); if (project.isLoading) return ; @@ -130,6 +131,46 @@ export function ProjectDetailPage() { +
    +

    Workspaces

    + {workspaces.isLoading ? ( + + ) : !workspaces.data || workspaces.data.workspaces.length === 0 ? ( +

    + Not part of any workspace yet. +

    + ) : ( +
    + {workspaces.data.workspaces.map((w) => ( + + ))} +
    + )} +
    +

    Top directories

    diff --git a/server/dashboard/src/modules/projects/hooks.ts b/server/dashboard/src/modules/projects/hooks.ts index 10a0ca1..38d19be 100644 --- a/server/dashboard/src/modules/projects/hooks.ts +++ b/server/dashboard/src/modules/projects/hooks.ts @@ -10,6 +10,24 @@ export const projectKeys = { all: ['projects'] as const, detail: (hash: string) => ['projects', hash] as const, summary: (hash: string) => ['projects', hash, 'summary'] as const, + workspaces: (hash: string) => ['projects', hash, 'workspaces'] as const, +}; + +// ProjectWorkspaceEntry mirrors the Go response shape from +// /api/v1/projects/{hash}/workspaces — one row per workspace_repo +// pointing at this project. Defined locally so the hook doesn't +// depend on a regen of generated.ts every time the page renders. +export type ProjectWorkspaceEntry = { + workspace_id: string; + workspace_name: string; + repo_id: string; + branch: string; + status: 'pending' | 'cloning' | 'indexing' | 'indexed' | 'failed'; + is_linked: boolean; +}; + +export type ProjectWorkspaceList = { + workspaces: ProjectWorkspaceEntry[]; }; export function useProjects() { @@ -35,6 +53,19 @@ export function useProjectSummary(hash: string | undefined) { }); } +// useProjectWorkspaces returns every workspace this project participates +// in. Used by the project detail page to render "Workspaces" chips. The +// endpoint is cheap and the membership is rarely stale, so we don't +// poll — refetch happens on window focus via react-query defaults. +export function useProjectWorkspaces(hash: string | undefined) { + return useQuery({ + queryKey: hash ? projectKeys.workspaces(hash) : ['projects', 'unknown', 'workspaces'], + queryFn: ({ signal }) => + api.get(`/projects/${hash}/workspaces`, { signal }), + enabled: Boolean(hash), + }); +} + export function useDeleteProject() { const qc = useQueryClient(); return useMutation({ diff --git a/server/dashboard/src/modules/registry.ts b/server/dashboard/src/modules/registry.ts index 0f6b6fe..2f775c3 100644 --- a/server/dashboard/src/modules/registry.ts +++ b/server/dashboard/src/modules/registry.ts @@ -1,20 +1,27 @@ import { ApiKeysModule } from './api-keys'; +import { GithubTokensModule } from './github-tokens'; import { HomeModule } from './home'; import { ProjectsModule } from './projects'; import { SearchModule } from './search'; import { ServerModule } from './server'; import { SettingsModule } from './settings'; import { UsersModule } from './users'; +import { WorkspacesModule } from './workspaces'; import type { Module } from './types'; // Static registry of every dashboard feature. Order in the sidebar is // determined by `weight` (default 100). PR-D adds API Keys, Users, Settings. // PR-E adds Server (admin-only runtime config + sidecar lifecycle). +// Workspaces feature PR1 adds Workspaces + GitHub Tokens — these self-hide +// when CIX_WORKSPACES_ENABLED is false (the pages render a "feature off" +// alert; the sidebar still shows the modules so operators can discover them). export const MODULES: Module[] = [ HomeModule, ProjectsModule, + WorkspacesModule, SearchModule, ApiKeysModule, + GithubTokensModule, UsersModule, SettingsModule, ServerModule, diff --git a/server/dashboard/src/modules/workspaces/WorkspaceDetailPage.tsx b/server/dashboard/src/modules/workspaces/WorkspaceDetailPage.tsx new file mode 100644 index 0000000..0efd737 --- /dev/null +++ b/server/dashboard/src/modules/workspaces/WorkspaceDetailPage.tsx @@ -0,0 +1,256 @@ +import { useCallback, useEffect, useRef, useState } from 'react'; +import { Link, useNavigate, useParams } from 'react-router-dom'; +import { AlertCircle, ChevronLeft, Trash2 } from 'lucide-react'; +import { ApiError, api } from '@/api/client'; +import { Alert, AlertDescription, AlertTitle } from '@/ui/alert'; +import { Button } from '@/ui/button'; +import { Skeleton } from '@/ui/skeleton'; +import { AddExistingProjectDialog } from './components/AddExistingProjectDialog'; +import { AddRepoDialog } from './components/AddRepoDialog'; +import { RepoCard } from './components/RepoCard'; +import { WorkspaceSearchDialog } from './components/WorkspaceSearchDialog'; +import { isInFlight } from './types'; +import type { + Workspace, + WorkspaceRepo, + WorkspaceRepoListResponse, +} from './types'; + +// Auto-dismiss the "indexing finished" toast after this many ms. Long +// enough to read, short enough not to linger past when the user has +// likely moved on. +const INDEX_DONE_TOAST_MS = 5000; + +// Background polling cadence. Three seconds is short enough that the +// "indexing" → "indexed" transition is visible while you watch the +// dashboard, long enough that the cost of polling for a workspace +// with many repos stays modest. Only runs while at least one repo is +// in flight. +const POLL_MS = 3000; + +export function WorkspaceDetailPage() { + const { id = '' } = useParams<{ id: string }>(); + const navigate = useNavigate(); + const [workspace, setWorkspace] = useState(null); + const [repos, setRepos] = useState(null); + const [error, setError] = useState(null); + const [notFound, setNotFound] = useState(false); + const [indexDoneMsg, setIndexDoneMsg] = useState(null); + + const loadRepos = useCallback(async () => { + try { + const r = await api.get(`/workspaces/${id}/repos`); + setRepos(r.repos); + } catch (e) { + const msg = e instanceof Error ? e.message : String(e); + setError(msg); + } + }, [id]); + + // Initial workspace + repo fetch. + useEffect(() => { + let cancelled = false; + api + .get(`/workspaces/${id}`) + .then((ws) => { + if (!cancelled) setWorkspace(ws); + }) + .catch((e) => { + if (cancelled) return; + if (e instanceof ApiError && e.status === 404) { + setNotFound(true); + return; + } + setError(e instanceof Error ? e.message : String(e)); + }); + void loadRepos(); + return () => { + cancelled = true; + }; + }, [id, loadRepos]); + + // Live progress polling. Active only while at least one repo is + // in pending/cloning/indexing — terminal states stop the tick so we + // don't burn CPU on an idle workspace. + useEffect(() => { + if (!repos || repos.length === 0) return; + const anyBusy = repos.some((r) => isInFlight(r.status)); + if (!anyBusy) return; + const handle = setInterval(() => { + void loadRepos(); + }, POLL_MS); + return () => clearInterval(handle); + }, [repos, loadRepos]); + + // Detect the "last in-flight repo just finished" transition. Workspace + // search is live (no centroid rebuild step) so we just confirm to + // the user that the new repo is now searchable. + // + // wasInflightRef is the gate: we only fire the toast on a + // true → false transition, not on the initial page load where + // everything was already indexed. Reset back to false after firing + // so a second indexing wave (add another repo later) re-arms it. + const wasInflightRef = useRef(false); + useEffect(() => { + if (!repos) return; + const anyBusy = repos.some((r) => isInFlight(r.status)); + if (anyBusy) { + wasInflightRef.current = true; + return; + } + if (wasInflightRef.current) { + wasInflightRef.current = false; + setIndexDoneMsg('Indexing finished — workspace search is ready.'); + } + }, [repos]); + + // Auto-dismiss the toast so it doesn't linger after the user moves on. + useEffect(() => { + if (!indexDoneMsg) return; + const handle = setTimeout(() => setIndexDoneMsg(null), INDEX_DONE_TOAST_MS); + return () => clearTimeout(handle); + }, [indexDoneMsg]); + + async function handleDeleteWorkspace() { + if (!workspace) return; + if ( + !confirm( + `Delete workspace "${workspace.name}"?\n\nThis removes all attached repos and the indexed projects.`, + ) + ) { + return; + } + try { + await api.delete(`/workspaces/${workspace.id}`); + navigate('/workspaces'); + } catch (e) { + alert(e instanceof Error ? e.message : String(e)); + } + } + + if (notFound) { + return ( +
    + + + + Workspace not found + + It may have been deleted. Return to the list and try another. + + +
    + ); + } + + if (workspace === null) { + return ( +
    + + + +
    + ); + } + + return ( +
    + + +
    +
    +

    {workspace.name}

    + {workspace.description && ( +

    + {workspace.description} +

    + )} +
    +
    + + + r.project_path)} + onAdded={loadRepos} + /> + +
    +
    + + {indexDoneMsg && ( + + Workspace search ready + {indexDoneMsg} + + )} + + {error && ( + + + Could not load repositories + {error} + + )} + +
    +
    +

    Repositories

    + {repos && ( + + {repos.length === 0 + ? 'none' + : `${repos.filter((r) => r.status === 'indexed').length} of ${ + repos.length + } indexed`} + + )} +
    + + {repos === null ? ( +
    + + +
    + ) : repos.length === 0 ? ( + + ) : ( +
    + {repos.map((repo) => ( + + ))} +
    + )} +
    +
    + ); +} + +function BackLink() { + return ( + + All workspaces + + ); +} + +function ReposEmptyState() { + return ( +
    +

    No repositories yet

    +

    + Click Add repo above to attach the first one. +

    +
    + ); +} diff --git a/server/dashboard/src/modules/workspaces/WorkspacesListPage.tsx b/server/dashboard/src/modules/workspaces/WorkspacesListPage.tsx new file mode 100644 index 0000000..9af928f --- /dev/null +++ b/server/dashboard/src/modules/workspaces/WorkspacesListPage.tsx @@ -0,0 +1,110 @@ +import { useEffect, useState } from 'react'; +import { AlertCircle, Boxes } from 'lucide-react'; +import { ApiError, api } from '@/api/client'; +import { Alert, AlertDescription, AlertTitle } from '@/ui/alert'; +import { Skeleton } from '@/ui/skeleton'; +import { WorkspaceCard } from './components/WorkspaceCard'; +import { CreateWorkspaceDialog } from './components/CreateWorkspaceDialog'; +import type { Workspace, WorkspaceListResponse } from './types'; + +export function WorkspacesListPage() { + const [list, setList] = useState(null); + const [error, setError] = useState(null); + const [featureOff, setFeatureOff] = useState(false); + + async function reload() { + try { + const r = await api.get('/workspaces'); + setList(r.workspaces); + setError(null); + setFeatureOff(false); + } catch (e) { + if (e instanceof ApiError && e.status === 503) { + setFeatureOff(true); + setList([]); + return; + } + setError(e instanceof Error ? e.message : String(e)); + } + } + + useEffect(() => { + void reload(); + }, []); + + if (featureOff) { + return ( +
    +
    + + + Workspaces feature is disabled + + Set CIX_WORKSPACES_ENABLED=true on the server and restart + to enable workspaces. + + +
    + ); + } + + return ( +
    +
    + + {error && ( + + + Failed to load workspaces + {error} + + )} + + {list === null ? ( +
    + {Array.from({ length: 3 }).map((_, i) => ( + + ))} +
    + ) : list.length === 0 ? ( + + ) : ( +
    + {list.map((ws) => ( + + ))} +
    + )} +
    + ); +} + +function Header({ onCreated }: { onCreated?: () => void }) { + return ( +
    +
    +

    Workspaces

    +

    + Group GitHub repositories for cross-project semantic search. +

    +
    + {onCreated && } +
    + ); +} + +function EmptyState() { + return ( +
    + +
    +

    No workspaces yet

    +

    + Click New workspace to create one. Inside the + workspace you'll be able to attach GitHub repositories and use + cross-project semantic search. +

    +
    +
    + ); +} diff --git a/server/dashboard/src/modules/workspaces/WorkspacesPage.tsx b/server/dashboard/src/modules/workspaces/WorkspacesPage.tsx new file mode 100644 index 0000000..a04f3c5 --- /dev/null +++ b/server/dashboard/src/modules/workspaces/WorkspacesPage.tsx @@ -0,0 +1,15 @@ +import { Route, Routes } from 'react-router-dom'; +import { WorkspacesListPage } from './WorkspacesListPage'; +import { WorkspaceDetailPage } from './WorkspaceDetailPage'; + +// Router shell for the workspaces module — list at the index, detail +// page keyed on :id. Matches the projects module's two-level layout so +// the dashboard reads with one navigation pattern across features. +export default function WorkspacesPage() { + return ( + + } /> + } /> + + ); +} diff --git a/server/dashboard/src/modules/workspaces/components/AddExistingProjectDialog.tsx b/server/dashboard/src/modules/workspaces/components/AddExistingProjectDialog.tsx new file mode 100644 index 0000000..9f8eddd --- /dev/null +++ b/server/dashboard/src/modules/workspaces/components/AddExistingProjectDialog.tsx @@ -0,0 +1,336 @@ +import { useEffect, useMemo, useState } from 'react'; +import { Link2, Loader2 } from 'lucide-react'; +import { api, ApiError } from '@/api/client'; +import { Alert, AlertDescription, AlertTitle } from '@/ui/alert'; +import { Badge } from '@/ui/badge'; +import { Button } from '@/ui/button'; +import { + Dialog, + DialogContent, + DialogDescription, + DialogFooter, + DialogHeader, + DialogTitle, + DialogTrigger, +} from '@/ui/dialog'; +import { Input } from '@/ui/input'; +import { Label } from '@/ui/label'; +import type { Project, ProjectListResponse } from '@/api/types'; +import type { WorkspaceRepo, WorkspaceRepoCreated } from '../types'; + +// Per-row disabled reason. null means the row is selectable. +// "already-in-workspace" is NOT included here — those projects are +// filtered out of the list entirely rather than rendered as disabled, +// per the operator's request: a workspace's own projects shouldn't be +// noise in the "add more" picker. +type LinkDisabledReason = 'not-indexed' | 'not-github'; + +function disabledReasonFor(p: Project): LinkDisabledReason | null { + if (p.status !== 'indexed') return 'not-indexed'; + if (!p.host_path.startsWith('github.com/') || !p.host_path.includes('@')) return 'not-github'; + return null; +} + +function disabledLabel(r: LinkDisabledReason): string { + switch (r) { + case 'not-indexed': + return 'not indexed yet'; + case 'not-github': + return 'local path (cannot link)'; + } +} + +// AddExistingProjectDialog lets the operator pick one or many already- +// indexed projects and link them into this workspace in a single +// submission. The list shows every project on the server — unselectable +// rows are rendered as disabled with a short reason so the operator +// understands why they can't be picked. +// +// Submit fans out N POSTs to /workspaces/{id}/repos/link sequentially. +// We chose sequential over parallel because: (a) it makes per-project +// error reporting trivial, (b) the per-call cost is tiny (no clone, no +// index), (c) a backend hiccup mid-batch leaves the workspace in a +// predictable partial state instead of a thundering-herd race. +export function AddExistingProjectDialog({ + workspaceID, + existingProjectPaths, + onAdded, +}: { + workspaceID: string; + existingProjectPaths: string[]; + onAdded: () => void; +}) { + const [open, setOpen] = useState(false); + const [projects, setProjects] = useState(null); + const [loadErr, setLoadErr] = useState(null); + const [query, setQuery] = useState(''); + const [selected, setSelected] = useState>(new Set()); + const [submitting, setSubmitting] = useState(false); + // Per-project failure messages collected during the batch submit. + // Keyed by path_hash so we can render them next to the row. + const [errs, setErrs] = useState>({}); + + // Fetch projects when the dialog opens. Resetting state on each open + // means the user gets a fresh picker every time — selections from a + // previous open don't leak through. + useEffect(() => { + if (!open) return; + setProjects(null); + setLoadErr(null); + setQuery(''); + setSelected(new Set()); + setErrs({}); + api + .get('/projects') + .then((r) => setProjects(r.projects)) + .catch((e: unknown) => { + const msg = + e instanceof ApiError ? e.detail : e instanceof Error ? e.message : String(e); + setLoadErr(msg); + setProjects([]); + }); + }, [open]); + + const inWorkspace = useMemo(() => new Set(existingProjectPaths), [existingProjectPaths]); + + // Annotate each project with its disabled reason once, so render + + // filter + count all share one source of truth. Projects already in + // this workspace are dropped entirely — they're not useful targets + // for the "Add Existing Project" flow. + const annotated = useMemo(() => { + if (!projects) return []; + return projects + .filter((p) => !inWorkspace.has(p.host_path)) + .map((p) => ({ p, reason: disabledReasonFor(p) })); + }, [projects, inWorkspace]); + + const filtered = useMemo(() => { + const q = query.trim().toLowerCase(); + if (!q) return annotated; + return annotated.filter((row) => row.p.host_path.toLowerCase().includes(q)); + }, [annotated, query]); + + const selectableInView = useMemo( + () => filtered.filter((row) => row.reason === null), + [filtered], + ); + const allInViewSelected = + selectableInView.length > 0 && + selectableInView.every((row) => selected.has(row.p.path_hash)); + + function toggle(hash: string) { + setSelected((prev) => { + const next = new Set(prev); + if (next.has(hash)) next.delete(hash); + else next.add(hash); + return next; + }); + } + + function toggleAllInView() { + setSelected((prev) => { + const next = new Set(prev); + if (allInViewSelected) { + for (const row of selectableInView) next.delete(row.p.path_hash); + } else { + for (const row of selectableInView) next.add(row.p.path_hash); + } + return next; + }); + } + + async function handleSubmit() { + if (selected.size === 0 || !projects) return; + setSubmitting(true); + const collected: Record = {}; + const succeeded = new Set(); + const toLink = projects.filter((p) => selected.has(p.path_hash)); + for (const p of toLink) { + try { + await api.post( + `/workspaces/${workspaceID}/repos/link`, + { project_hash: p.path_hash }, + ); + succeeded.add(p.path_hash); + } catch (e: unknown) { + const msg = + e instanceof ApiError ? e.detail : e instanceof Error ? e.message : String(e); + collected[p.path_hash] = msg; + } + } + setErrs(collected); + setSubmitting(false); + + // Drop the successfully-linked projects from the local list so + // they disappear immediately, without waiting for the parent's + // repos refetch to round-trip and reflow our existingProjectPaths + // prop. Also clear them from the selected set so the count drops + // back to "0 selected" (or to the count of still-failing rows). + if (succeeded.size > 0) { + setProjects((prev) => (prev ? prev.filter((p) => !succeeded.has(p.path_hash)) : prev)); + setSelected((prev) => { + const next = new Set(prev); + for (const h of succeeded) next.delete(h); + return next; + }); + } + + if (Object.keys(collected).length === 0) { + // All succeeded — close + refresh the parent list. + setOpen(false); + onAdded(); + } else if (succeeded.size > 0) { + // Partial success — refresh the parent so the successes show up, + // keep the dialog open with the per-row error annotations so the + // operator can see what failed without losing context. + onAdded(); + } + } + + return ( + + + + + + + Link existing projects + + Select one or more indexed projects to link into this workspace. + Linked projects show up in workspace search without re-cloning or + re-indexing. + + + +
    + {loadErr && ( + + Could not load projects + {loadErr} + + )} + +
    + + setQuery(e.target.value)} + disabled={projects === null} + /> +
    + +
    + + {projects === null + ? 'Loading…' + : `${selected.size} selected · ${selectableInView.length} selectable in view · ${annotated.length} total`} + + {selectableInView.length > 0 && ( + + )} +
    + +
    + {projects === null ? ( +
    + Loading projects… +
    + ) : filtered.length === 0 ? ( +
    + {annotated.length === 0 + ? 'No projects on this server yet.' + : 'No projects match the filter.'} +
    + ) : ( +
      + {filtered.map(({ p, reason }) => { + const isChecked = selected.has(p.path_hash); + const disabled = reason !== null || submitting; + const rowErr = errs[p.path_hash]; + return ( +
    • + +
    • + ); + })} +
    + )} +
    +
    + + + + + +
    +
    + ); +} + +// Re-export the row shape so consumers don't need to dig into types.ts. +export type { WorkspaceRepo }; diff --git a/server/dashboard/src/modules/workspaces/components/AddRepoDialog.tsx b/server/dashboard/src/modules/workspaces/components/AddRepoDialog.tsx new file mode 100644 index 0000000..88a3ce1 --- /dev/null +++ b/server/dashboard/src/modules/workspaces/components/AddRepoDialog.tsx @@ -0,0 +1,668 @@ +import { useEffect, useMemo, useState } from 'react'; +import { Copy, Loader2, Lock, Plus, Unlock } from 'lucide-react'; +import { api, ApiError } from '@/api/client'; +import { Alert, AlertDescription, AlertTitle } from '@/ui/alert'; +import { Button } from '@/ui/button'; +import { + Dialog, + DialogContent, + DialogDescription, + DialogFooter, + DialogHeader, + DialogTitle, + DialogTrigger, +} from '@/ui/dialog'; +import { Input } from '@/ui/input'; +import { Label } from '@/ui/label'; +import { RadioGroup, RadioGroupItem } from '@/ui/radio-group'; +import { + Select, + SelectContent, + SelectItem, + SelectTrigger, + SelectValue, +} from '@/ui/select'; +import type { + GithubAccount, + GithubAccountListResponse, + GithubRepo, + GithubRepoListResponse, + GithubToken, + GithubTokenListResponse, + WebhookMode, + WorkspaceRepoCreated, +} from '../types'; + +// Sentinel value for the "(public repo, no token)" Select option. Radix +// Select forbids an empty-string item value, so we encode the no-token +// choice as a distinct string and translate at the request boundary. +const NO_TOKEN = '__none__'; + +// AddRepoDialog is a staged form: each step gates the next so the user +// can't pick a repository before choosing a token, and can't submit +// before pinning down a branch + webhook mode. The shape mirrors how +// people actually fill it in: PAT → repo → branch → webhook policy. +export function AddRepoDialog({ + workspaceID, + onAdded, +}: { + workspaceID: string; + onAdded: () => void; +}) { + const [open, setOpen] = useState(false); + const [tokens, setTokens] = useState(null); + const [tokenID, setTokenID] = useState(''); + + // Account step — loaded after a token is picked. The list contains + // the PAT owner (user) plus every org from /user/orgs; the dashboard + // requires the operator to pick one specifically so we always know + // which slice of GitHub to query for the repo picker. Default is + // the first account returned (the user themselves). + const [accounts, setAccounts] = useState(null); + const [accountsErr, setAccountsErr] = useState(null); + const [accountsLoading, setAccountsLoading] = useState(false); + const [accountKey, setAccountKey] = useState(''); + + // The repo step. `repos` is the unfiltered fetch result; the visible + // dropdown is filtered client-side by `repoQuery` so typing is + // instant and we only hit GitHub once per account selection. + const [repos, setRepos] = useState(null); + const [reposErr, setReposErr] = useState(null); + const [reposLoading, setReposLoading] = useState(false); + const [repoQuery, setRepoQuery] = useState(''); + const [selectedRepo, setSelectedRepo] = useState(null); + const [manualUrl, setManualUrl] = useState(''); // used when no token + + const [branch, setBranch] = useState('main'); + const [webhookMode, setWebhookMode] = useState('manual'); + + const [submitting, setSubmitting] = useState(false); + const [submitErr, setSubmitErr] = useState(null); + const [created, setCreated] = useState(null); + + // Load tokens when the dialog opens — keep the request out of the + // page mount path so users who never open the dialog don't pay for + // the network call. + useEffect(() => { + if (!open) return; + api + .get('/github-tokens') + .then((r) => setTokens(r.tokens)) + .catch(() => setTokens([])); + }, [open]); + + // When a token is picked, fetch the accounts visible to it. The + // first account in the response (always the PAT owner) is auto- + // selected so the repo picker can populate immediately without an + // extra user click. + useEffect(() => { + if (!tokenID || tokenID === NO_TOKEN) { + setAccounts(null); + setAccountKey(''); + setRepos(null); + setSelectedRepo(null); + setAccountsLoading(false); + return; + } + let cancelled = false; + setAccountsErr(null); + setAccountKey(''); + setRepos(null); + setSelectedRepo(null); + setAccountsLoading(true); + api + .get(`/github-tokens/${tokenID}/accounts`) + .then((r) => { + if (cancelled) return; + setAccounts(r.accounts); + setAccountsLoading(false); + if (r.accounts.length > 0) { + setAccountKey(`${r.accounts[0].type}:${r.accounts[0].login}`); + } + }) + .catch((e) => { + if (cancelled) return; + const msg = + e instanceof ApiError ? e.detail : e instanceof Error ? e.message : String(e); + setAccountsErr(msg); + setAccountsLoading(false); + }); + return () => { + cancelled = true; + }; + }, [tokenID]); + + // When the account selection changes, load that account's repos. + // /users/{login}/repos for user, /orgs/{login}/repos for org — we + // always scope the listing so the operator gets a predictable, + // bounded result instead of the aggregated view. + useEffect(() => { + if (!tokenID || tokenID === NO_TOKEN || !accountKey || !accounts) return; + const acc = accounts.find((a) => `${a.type}:${a.login}` === accountKey); + if (!acc) return; + + let cancelled = false; + setReposLoading(true); + setReposErr(null); + setSelectedRepo(null); + + api + .get(`/github-tokens/${tokenID}/repos`, { + query: { account: acc.login, account_type: acc.type }, + }) + .then((r) => { + if (!cancelled) { + setRepos(r.repos); + setReposLoading(false); + } + }) + .catch((e) => { + if (cancelled) return; + const msg = + e instanceof ApiError ? e.detail : e instanceof Error ? e.message : String(e); + setReposErr(msg); + setReposLoading(false); + }); + return () => { + cancelled = true; + }; + }, [tokenID, accountKey, accounts]); + + const filteredRepos = useMemo(() => { + if (!repos) return []; + if (!repoQuery.trim()) return repos.slice(0, 100); + const needle = repoQuery.toLowerCase(); + return repos.filter((r) => r.full_name.toLowerCase().includes(needle)).slice(0, 100); + }, [repos, repoQuery]); + + // The "ready to submit" gate. Either we have a picked repo OR a + // valid manual URL, plus a non-empty branch. + const githubUrl = selectedRepo?.html_url ?? manualUrl.trim(); + const validUrl = /^https:\/\/github\.com\/[^/]+\/[^/]+/.test(githubUrl); + const canSubmit = validUrl && branch.trim() !== '' && !submitting; + + async function submit() { + setSubmitting(true); + setSubmitErr(null); + try { + const payload: Record = { + github_url: githubUrl, + branch: branch.trim(), + webhook_mode: webhookMode, + }; + if (tokenID && tokenID !== NO_TOKEN) { + payload.token_id = tokenID; + } + const resp = await api.post( + `/workspaces/${workspaceID}/repos`, + payload, + ); + setCreated(resp); + onAdded(); + } catch (e) { + const msg = + e instanceof ApiError + ? e.detail + : e instanceof Error + ? e.message + : String(e); + setSubmitErr(msg); + } finally { + setSubmitting(false); + } + } + + function reset() { + setTokenID(''); + setAccounts(null); + setAccountsErr(null); + setAccountsLoading(false); + setAccountKey(''); + setRepos(null); + setReposErr(null); + setSelectedRepo(null); + setManualUrl(''); + setBranch('main'); + setWebhookMode('manual'); + setSubmitErr(null); + setCreated(null); + setRepoQuery(''); + } + + // The "result" view replaces the form once the repo is created so + // the user can copy the webhook URL/secret (they're only surfaced + // here + once via /webhook-info). + if (created) { + return ( + { + setOpen(v); + if (!v) reset(); + }} + > + + + + + + Repository attached + + Clone + indexing is queued. The card will progress through + cloning → indexing → indexed as the worker picks it up. + + + + + + + + + ); + } + + return ( + { + setOpen(v); + if (!v) reset(); + }} + > + + + + {/* `min-w-0` on every direct grid child is the trick: DialogContent + is `display: grid`, and grid items default to `min-width: auto` + (= min-content). A long unbreakable repo full_name then blows + out the grid track and the whole dialog widens past max-w-lg. + Applying min-w-0 lets the track shrink and the inner truncate + actually take effect. */} + + + Add repository + + Pick a token, then an account and a repository. Branch + defaults to main — change it if the repo + uses a different one (the picker shows each repo's default + in the column on the right). + + + +
    + {/* Step 1: token */} +
    + + + {tokens?.length === 0 && ( +

    + No tokens stored yet. Add one under GitHub Tokens{' '} + in the sidebar. +

    + )} +
    + + {/* Accounts fetch is paginated server-side (/user + up to 5 + pages of /user/repos) and can take a few seconds against + a SSO-protected org. Surface a spinner so the form + doesn't look frozen between picking the token and the + account selector appearing. */} + {tokenID && tokenID !== NO_TOKEN && accountsLoading && ( +
    + + Loading accounts visible to this token… +
    + )} + + {/* Step 2: account — the PAT owner plus every org they + belong to. The operator must pick one specifically so we + always know which slice of GitHub to ask. */} + {tokenID && tokenID !== NO_TOKEN && accounts !== null && ( +
    + + + {accounts.length === 0 && ( +

    + GitHub returned no accounts for this token. Check the + PAT has at least the read:user +{' '} + read:org scopes. +

    + )} + {accountsErr && ( +

    {accountsErr}

    + )} +
    + )} + + {/* Step 3: repository — only shown once accounts are loaded + (and therefore an account auto-selected). Showing the + Repository label before that just renders an empty box + that adds to the "form is frozen" feeling. */} + {tokenID && tokenID !== NO_TOKEN && accounts !== null && ( +
    + + {reposLoading ? ( +
    + + Loading repos accessible to this token… +
    + ) : reposErr ? ( + + {reposErr} + + ) : repos === null ? null : ( + <> + { + setRepoQuery(e.target.value); + setSelectedRepo(null); + }} + /> +
    + {filteredRepos.length === 0 ? ( +
    + No matching repositories. {repos.length} total visible + to this token. +
    + ) : ( +
      + {filteredRepos.map((r) => { + const active = selectedRepo?.full_name === r.full_name; + return ( +
    • + +
    • + ); + })} +
    + )} +
    + + )} +
    + )} + + {/* Step 2 (no-token variant): manual URL input */} + {tokenID === NO_TOKEN && ( +
    + + setManualUrl(e.target.value)} + /> +

    + Only public repositories can be cloned without a token. +

    +
    + )} + + {/* Step 3: branch — needs a URL to be meaningful */} + {validUrl && ( +
    + + setBranch(e.target.value)} + /> +
    + )} + + {/* Step 4: webhook mode — needs everything above */} + {validUrl && ( +
    + + setWebhookMode(v as WebhookMode)} + > + + + + +
    + )} + + {submitErr && ( + + {submitErr} + + )} +
    + + + + + +
    +
    + ); +} + +function WebhookModeOption({ + value, + title, + hint, + selected, + disabled, +}: { + value: WebhookMode; + title: string; + hint: string; + selected: boolean; + disabled?: boolean; +}) { + return ( + + ); +} + +function CreatedResult({ + created, + mode, +}: { + created: WorkspaceRepoCreated; + mode: WebhookMode; +}) { + return ( +
    +
    + Project:{' '} + {created.repo.project_path} +
    + + {mode === 'auto' && ( + + + {created.auto_registered + ? 'Webhook registered with GitHub' + : 'Auto-register failed'} + + {!created.auto_registered && created.auto_register_note && ( + {created.auto_register_note} + )} + + )} + + {mode === 'manual' && ( + <> + + Configure the webhook in GitHub + + Add a webhook in Settings → Webhooks → Add webhook{' '} + for the repo with the URL and secret below. Content-type:{' '} + application/json. Events: push. + + + + +

    + The secret is shown once here. Store it in a password manager — + you can also re-fetch it via the API's webhook-info endpoint. +

    + + )} + + {mode === 'disabled' && ( + + Webhook disabled + + This repo will only be reindexed when you click Reindex{' '} + on its card. + + + )} +
    + ); +} + +function CopyableField({ + label, + value, + mono, +}: { + label: string; + value: string; + mono?: boolean; +}) { + const [copied, setCopied] = useState(false); + return ( +
    + +
    + + +
    +
    + ); +} diff --git a/server/dashboard/src/modules/workspaces/components/CreateWorkspaceDialog.tsx b/server/dashboard/src/modules/workspaces/components/CreateWorkspaceDialog.tsx new file mode 100644 index 0000000..8e53edd --- /dev/null +++ b/server/dashboard/src/modules/workspaces/components/CreateWorkspaceDialog.tsx @@ -0,0 +1,95 @@ +import { useState } from 'react'; +import { Plus } from 'lucide-react'; +import { api } from '@/api/client'; +import { Button } from '@/ui/button'; +import { Alert, AlertDescription } from '@/ui/alert'; +import { + Dialog, + DialogContent, + DialogDescription, + DialogFooter, + DialogHeader, + DialogTitle, + DialogTrigger, +} from '@/ui/dialog'; +import { Input } from '@/ui/input'; +import { Label } from '@/ui/label'; + +export function CreateWorkspaceDialog({ onCreated }: { onCreated: () => void }) { + const [open, setOpen] = useState(false); + const [name, setName] = useState(''); + const [description, setDescription] = useState(''); + const [busy, setBusy] = useState(false); + const [err, setErr] = useState(null); + + async function submit() { + setBusy(true); + setErr(null); + try { + await api.post('/workspaces', { name, description }); + setName(''); + setDescription(''); + setOpen(false); + onCreated(); + } catch (e) { + setErr(e instanceof Error ? e.message : String(e)); + } finally { + setBusy(false); + } + } + + return ( + + + + + + + Create workspace + + A workspace groups GitHub repositories for cross-project + semantic search. After creating it, open the workspace to + attach repositories. + + +
    +
    + + setName(e.target.value)} + placeholder="platform" + /> +
    +
    + + setDescription(e.target.value)} + placeholder="microservices cluster" + /> +
    + {err && ( + + {err} + + )} +
    + + + + +
    +
    + ); +} diff --git a/server/dashboard/src/modules/workspaces/components/RepoCard.tsx b/server/dashboard/src/modules/workspaces/components/RepoCard.tsx new file mode 100644 index 0000000..990d493 --- /dev/null +++ b/server/dashboard/src/modules/workspaces/components/RepoCard.tsx @@ -0,0 +1,221 @@ +import { useEffect, useState } from 'react'; +import { + AlertTriangle, + CheckCircle2, + Link2, + Loader2, + RefreshCw, + Trash2, + Webhook, + WebhookOff, +} from 'lucide-react'; +import { api } from '@/api/client'; +import { Badge } from '@/ui/badge'; +import { Button } from '@/ui/button'; +import { Card, CardContent } from '@/ui/card'; +import { formatRelative } from '@/lib/formatDate'; +import type { WorkspaceRepo } from '../types'; +import { isInFlight } from '../types'; + +// RepoCard renders one workspace_repo as a self-contained card. Status +// drives the visual treatment — a spinner with an elapsed-time counter +// is shown while clone/index is in flight so the operator gets feedback +// without staring at a stale page. +export function RepoCard({ + repo, + onDeleted, + onReindexed, +}: { + repo: WorkspaceRepo; + onDeleted: () => void; + onReindexed: () => void; +}) { + const [busy, setBusy] = useState<'delete' | 'reindex' | null>(null); + const inFlight = isInFlight(repo.status); + + async function handleDelete() { + const detachMsg = repo.is_linked + ? `Detach the linked project "${repo.github_url}@${repo.branch}" from this workspace?\n\nThe project itself stays — only this workspace's link to it is removed.` + : `Detach "${repo.github_url}@${repo.branch}" from this workspace?\n\nThe indexed project will also be removed.`; + if (!confirm(detachMsg)) { + return; + } + setBusy('delete'); + try { + await api.delete(`/workspaces/${repo.workspace_id}/repos/${repo.id}`); + onDeleted(); + } catch (e) { + alert(e instanceof Error ? e.message : String(e)); + } finally { + setBusy(null); + } + } + + async function handleReindex() { + setBusy('reindex'); + try { + await api.post( + `/workspaces/${repo.workspace_id}/repos/${repo.id}/reindex`, + {}, + ); + onReindexed(); + } catch (e) { + alert(e instanceof Error ? e.message : String(e)); + } finally { + setBusy(null); + } + } + + return ( + + +
    +
    +
    + {repo.github_url} + @ {repo.branch} +
    +
    + {repo.project_path} +
    +
    +
    + {/* Reindex is a no-op for linked rows — they don't own the + clone, so the canonical workspace must trigger it. Hide + the button to make that contract obvious. */} + {!repo.is_linked && ( + + )} + +
    +
    + +
    + + {repo.is_linked ? ( + + linked + + ) : ( + + )} + {repo.last_indexed_at && ( + + · indexed {formatRelative(repo.last_indexed_at)} + + )} +
    + + {repo.last_error && ( +
    + {repo.last_error} +
    + )} +
    +
    + ); +} + +// StatusBadge renders the colour-coded status + an elapsed-time read +// while a clone/index job is running. The elapsed counter ticks once a +// second so the user can tell the job hasn't silently stalled. +function StatusBadge({ repo }: { repo: WorkspaceRepo }) { + const inFlight = isInFlight(repo.status); + const elapsed = useElapsedSince(inFlight ? repo.updated_at : null); + + if (repo.status === 'indexed') { + return ( + + indexed + + ); + } + if (repo.status === 'failed') { + return ( + + failed + + ); + } + return ( + + + {repo.status} + {elapsed !== null && ( + · {formatDuration(elapsed)} + )} + + ); +} + +function WebhookBadge({ repo }: { repo: WorkspaceRepo }) { + switch (repo.webhook_mode) { + case 'auto': + return ( + + auto + + ); + case 'manual': + return ( + + manual + + ); + case 'disabled': + return ( + + disabled + + ); + } +} + +// useElapsedSince ticks once a second so the in-flight badge shows +// elapsed time without re-fetching from the server. +function useElapsedSince(iso: string | null): number | null { + const [now, setNow] = useState(() => Date.now()); + useEffect(() => { + if (iso === null) return; + const t = setInterval(() => setNow(Date.now()), 1000); + return () => clearInterval(t); + }, [iso]); + if (iso === null) return null; + const ts = Date.parse(iso); + if (Number.isNaN(ts)) return null; + return Math.max(0, Math.floor((now - ts) / 1000)); +} + +function formatDuration(seconds: number): string { + if (seconds < 60) return `${seconds}s`; + const m = Math.floor(seconds / 60); + const s = seconds % 60; + return `${m}m ${s}s`; +} diff --git a/server/dashboard/src/modules/workspaces/components/WorkspaceCard.tsx b/server/dashboard/src/modules/workspaces/components/WorkspaceCard.tsx new file mode 100644 index 0000000..6400b32 --- /dev/null +++ b/server/dashboard/src/modules/workspaces/components/WorkspaceCard.tsx @@ -0,0 +1,125 @@ +import { useEffect, useState } from 'react'; +import { Link } from 'react-router-dom'; +import { Boxes, ChevronRight, Loader2 } from 'lucide-react'; +import { api } from '@/api/client'; +import { Badge } from '@/ui/badge'; +import { Card, CardContent } from '@/ui/card'; +import type { Workspace, WorkspaceRepo, WorkspaceRepoListResponse } from '../types'; +import { isInFlight } from '../types'; +import { formatRelative } from '@/lib/formatDate'; + +// WorkspaceCard mirrors the projects ProjectCard so the dashboard reads +// with one visual language: counts at-a-glance, status badge, "click +// anywhere" surface. Repos are loaded lazily so the list page renders +// instantly and each card fills in as soon as its summary arrives. +export function WorkspaceCard({ workspace }: { workspace: Workspace }) { + const [repos, setRepos] = useState(null); + + useEffect(() => { + let cancelled = false; + api + .get(`/workspaces/${workspace.id}/repos`) + .then((r) => { + if (!cancelled) setRepos(r.repos); + }) + .catch(() => { + if (!cancelled) setRepos([]); + }); + return () => { + cancelled = true; + }; + }, [workspace.id]); + + const summary = computeSummary(repos); + + return ( + + + +
    +
    +
    + + {workspace.name} +
    + {workspace.description && ( +
    + {workspace.description} +
    + )} +
    + +
    + +
    + {summary.busy ? ( + + + {summary.busy === 1 ? '1 in progress' : `${summary.busy} in progress`} + + ) : repos === null ? ( + + Loading… + + ) : repos.length === 0 ? ( + + No repos yet + + ) : summary.failed > 0 ? ( + {summary.failed} failed + ) : ( + Ready + )} + {repos !== null && repos.length > 0 && ( + + {summary.indexed}/{repos.length} indexed + + )} +
    + +
    + + {repos !== null && repos.length > 0 + ? `Updated ${formatRelative(latestUpdate(repos))}` + : `Created ${formatRelative(workspace.created_at)}`} + +
    +
    +
    + + ); +} + +// computeSummary turns the repo list into the three numbers the card +// surface needs. Lives in this file because no other view computes the +// same shape. +function computeSummary(repos: WorkspaceRepo[] | null): { + indexed: number; + busy: number; + failed: number; +} { + if (!repos) return { indexed: 0, busy: 0, failed: 0 }; + let indexed = 0; + let busy = 0; + let failed = 0; + for (const r of repos) { + if (r.status === 'indexed') indexed++; + else if (r.status === 'failed') failed++; + else if (isInFlight(r.status)) busy++; + } + return { indexed, busy, failed }; +} + +// latestUpdate returns the most recent updated_at across a repo list. +// Used so the card's "Updated …" footer tracks the freshest signal +// rather than the workspace row's stale updated_at. +function latestUpdate(repos: WorkspaceRepo[]): string { + let best = repos[0]?.updated_at ?? ''; + for (const r of repos) { + if (r.updated_at > best) best = r.updated_at; + } + return best; +} diff --git a/server/dashboard/src/modules/workspaces/components/WorkspaceSearchDialog.tsx b/server/dashboard/src/modules/workspaces/components/WorkspaceSearchDialog.tsx new file mode 100644 index 0000000..c1e8787 --- /dev/null +++ b/server/dashboard/src/modules/workspaces/components/WorkspaceSearchDialog.tsx @@ -0,0 +1,235 @@ +import { useRef, useState } from 'react'; +import { Loader2, Search } from 'lucide-react'; +import { api } from '@/api/client'; +import type { components } from '@/api/generated'; +import { Alert, AlertDescription } from '@/ui/alert'; +import { Button } from '@/ui/button'; +import { + Dialog, + DialogContent, + DialogDescription, + DialogHeader, + DialogTitle, + DialogTrigger, +} from '@/ui/dialog'; +import { Input } from '@/ui/input'; +import type { Workspace } from '../types'; + +// Pull the response shape straight from the OpenAPI-generated types so +// any future schema change (added fields, renamed properties) shows up +// as a TS error here instead of a silent contract drift like the one +// the boost-score refactor created. +type SearchResponse = components['schemas']['WorkspaceSearchResponse']; + +export function WorkspaceSearchDialog({ workspace }: { workspace: Workspace }) { + const [open, setOpen] = useState(false); + const [query, setQuery] = useState(''); + const [busy, setBusy] = useState(false); + const [resp, setResp] = useState(null); + const [err, setErr] = useState(null); + + // Holding the active AbortController on a ref so a fast second + // submit cancels the first request — without this the slower + // response can land after the newer one and overwrite the displayed + // results. The same ref is used to cancel on dialog close. + const abortRef = useRef(null); + + async function submit() { + if (!query.trim()) return; + abortRef.current?.abort(); + const ctl = new AbortController(); + abortRef.current = ctl; + setBusy(true); + setErr(null); + try { + const r = await api.get( + `/workspaces/${workspace.id}/search`, + { query: { q: query }, signal: ctl.signal }, + ); + setResp(r); + } catch (e) { + if (e instanceof DOMException && e.name === 'AbortError') return; + setErr(e instanceof Error ? e.message : String(e)); + } finally { + // Only flip busy off if THIS request is the active one — a + // newer submit might have already replaced abortRef.current. + if (abortRef.current === ctl) setBusy(false); + } + } + + function reset() { + abortRef.current?.abort(); + abortRef.current = null; + setResp(null); + setQuery(''); + setErr(null); + setBusy(false); + } + + return ( + { + setOpen(v); + if (!v) reset(); + }} + > + + + + {/* `min-w-0` on every direct grid child — DialogContent is + display: grid, and grid items default to min-width: auto + (= min-content). A long unbreakable line in the markdown + chunk content would then blow past max-w-3xl. Letting the + track shrink lets the inner
    's overflow-x-auto actually
    +          kick in. */}
    +      
    +        
    +          Search: {workspace.name}
    +          
    +            Fan-out across every repo in this workspace. Chunks ranked by
    +            raw similarity score; the projects panel ranks repos by the
    +            mean of their top hits, capped at a few chunks per repo so a
    +            single dominant project can't hide the others.
    +          
    +        
    +        
    +
    + setQuery(e.target.value)} + onKeyDown={(e) => { + if (e.key === 'Enter' && !busy) void submit(); + }} + placeholder="e.g. JWT validation across services" + /> + +
    + {err && ( + + {err} + + )} + {resp?.pending_repos && resp.pending_repos.length > 0 && ( + + + {resp.pending_repos.length} repo + {resp.pending_repos.length === 1 ? '' : 's'} still + indexing — their matches won't appear yet. + + + )} + {resp?.failed_repos && resp.failed_repos.length > 0 && ( + + + {resp.failed_repos.length} repo + {resp.failed_repos.length === 1 ? '' : 's'} failed to + query — results below are incomplete. Check server logs + for details. + + + )} + {resp?.stale_fts_repos && resp.stale_fts_repos.length > 0 && ( + + + {resp.stale_fts_repos.length} repo + {resp.stale_fts_repos.length === 1 ? '' : 's'} indexed + before BM25 was enabled — keyword matching is empty + for {resp.stale_fts_repos.length === 1 ? 'it' : 'them'}, + so results below fall back to dense-only ranking. + Re-index each to enable hybrid search:{' '} + + {resp.stale_fts_repos + .map((r) => r.project_path.split('/').pop() ?? r.project_path) + .join(', ')} + + + + )} + {resp && resp.status === 'empty' && ( + + + No chunks matched the query above the relevance threshold. + + + )} + {resp && resp.status === 'ok' && } +
    +
    +
    + ); +} + +function SearchResults({ resp }: { resp: SearchResponse }) { + return ( +
    + {resp.projects.length > 0 && ( +
    +
    + Top projects +
    +
      + {resp.projects.map((p) => ( +
    • +
      + + {p.label || p.project_path} + + + {p.project_score.toFixed(3)} + +
      +
      + {p.num_hits} hit{p.num_hits === 1 ? '' : 's'} ·{' '} + bm25 {p.bm25_score.toFixed(3)} · dense{' '} + {p.dense_score.toFixed(3)} · {p.project_path} +
      +
    • + ))} +
    +
    + )} +
    +
    + Top chunks +
    +
      + {resp.chunks.map((c, i) => ( +
    • +
      + + {c.file_path}:{c.start_line}-{c.end_line} + + + {c.score.toFixed(3)} + +
      +
      + {c.project_path} + {c.symbol_name && · {c.symbol_name}} +
      + {/* whitespace-pre keeps source indentation; the parent + min-w-0 chain lets overflow-x-auto produce a scrollbar + instead of pushing the dialog wider than max-w-3xl. */} +
      +                {c.content}
      +              
      +
    • + ))} +
    +
    +
    + ); +} diff --git a/server/dashboard/src/modules/workspaces/index.ts b/server/dashboard/src/modules/workspaces/index.ts new file mode 100644 index 0000000..f52e2a5 --- /dev/null +++ b/server/dashboard/src/modules/workspaces/index.ts @@ -0,0 +1,12 @@ +import { Boxes } from 'lucide-react'; +import type { Module } from '../types'; +import WorkspacesPage from './WorkspacesPage'; + +export const WorkspacesModule: Module = { + id: 'workspaces', + label: 'Workspaces', + icon: Boxes, + path: '/workspaces', + element: WorkspacesPage, + weight: 25, +}; diff --git a/server/dashboard/src/modules/workspaces/types.ts b/server/dashboard/src/modules/workspaces/types.ts new file mode 100644 index 0000000..6000cb8 --- /dev/null +++ b/server/dashboard/src/modules/workspaces/types.ts @@ -0,0 +1,102 @@ +// Shared wire types for the workspaces module. These mirror the OpenAPI +// schemas but are hand-rolled because the generated `components/schemas` +// types are wrapped in `paths[...].get.responses` indirection that's +// noisy to consume directly. When the spec changes, update both. + +export type Workspace = { + id: string; + name: string; + description: string; + created_at: string; + updated_at: string; +}; + +export type WorkspaceListResponse = { + workspaces: Workspace[]; + total: number; +}; + +export type WebhookMode = 'manual' | 'auto' | 'disabled'; + +export type RepoStatus = + | 'pending' + | 'cloning' + | 'indexing' + | 'indexed' + | 'failed'; + +export type WorkspaceRepo = { + id: string; + workspace_id: string; + github_url: string; + branch: string; + project_path: string; + token_id: string | null; + auto_webhook: boolean; + webhook_mode: WebhookMode; + status: RepoStatus; + last_sha: string | null; + last_error: string | null; + last_indexed_at: string | null; + is_linked: boolean; + created_at: string; + updated_at: string; +}; + +export type WorkspaceRepoListResponse = { + repos: WorkspaceRepo[]; + total: number; +}; + +export type GithubToken = { + id: string; + name: string; + scopes: string[]; + created_at: string; + last_used_at?: string | null; +}; + +export type GithubTokenListResponse = { + tokens: GithubToken[]; + total: number; +}; + +export type GithubRepo = { + full_name: string; + default_branch: string; + private: boolean; + html_url: string; + description?: string; +}; + +export type GithubRepoListResponse = { + repos: GithubRepo[]; + total: number; +}; + +export type GithubAccountType = 'user' | 'org'; + +export type GithubAccount = { + login: string; + type: GithubAccountType; + avatar_url?: string; +}; + +export type GithubAccountListResponse = { + accounts: GithubAccount[]; + total: number; +}; + +export type WorkspaceRepoCreated = { + repo: WorkspaceRepo; + webhook_url: string; + webhook_secret: string; + auto_registered?: boolean; + auto_register_note?: string; +}; + +// Whether the repo's status counts as "still doing something". Polling +// stops as soon as every repo in the workspace is in a terminal state. +export function isInFlight(status: RepoStatus): boolean { + return status === 'pending' || status === 'cloning' || status === 'indexing'; +} diff --git a/server/dashboard/tsconfig.tsbuildinfo b/server/dashboard/tsconfig.tsbuildinfo index 9c15565..9a79695 100644 --- a/server/dashboard/tsconfig.tsbuildinfo +++ b/server/dashboard/tsconfig.tsbuildinfo @@ -1 +1 @@ -{"root":["./src/main.tsx","./src/vite-env.d.ts","./src/api/client.ts","./src/api/generated.ts","./src/api/types.ts","./src/app/app.tsx","./src/app/footer.tsx","./src/app/shell.tsx","./src/app/sidebar.tsx","./src/app/themeprovider.tsx","./src/app/updatebanner.tsx","./src/app/providers.tsx","./src/auth/authprovider.tsx","./src/auth/bootstrapneededpage.tsx","./src/auth/changepasswordpage.tsx","./src/auth/loginpage.tsx","./src/auth/useauth.ts","./src/lib/cn.ts","./src/lib/editorpreference.ts","./src/lib/formatdate.ts","./src/lib/theme.ts","./src/lib/useserverstatus.ts","./src/modules/registry.ts","./src/modules/types.ts","./src/modules/api-keys/apikeyspage.tsx","./src/modules/api-keys/hooks.ts","./src/modules/api-keys/index.ts","./src/modules/api-keys/components/apikeytable.tsx","./src/modules/api-keys/components/createapikeydialog.tsx","./src/modules/api-keys/components/revokeapikeydialog.tsx","./src/modules/home/homepage.tsx","./src/modules/home/index.ts","./src/modules/projects/projectdetailpage.tsx","./src/modules/projects/projectslistpage.tsx","./src/modules/projects/projectspage.tsx","./src/modules/projects/hooks.ts","./src/modules/projects/index.ts","./src/modules/projects/components/deleteprojectdialog.tsx","./src/modules/projects/components/projectcard.tsx","./src/modules/projects/components/projectinfocard.tsx","./src/modules/search/searchpage.tsx","./src/modules/search/hooks.ts","./src/modules/search/index.ts","./src/modules/search/components/filters.tsx","./src/modules/search/components/resultfilecard.tsx","./src/modules/search/components/resultsnippet.tsx","./src/modules/search/components/searchinput.tsx","./src/modules/server/serverpage.tsx","./src/modules/server/hooks.ts","./src/modules/server/index.ts","./src/modules/server/components/saveandrestartdialog.tsx","./src/modules/server/components/sidecarstatebadge.tsx","./src/modules/server/components/sourcepill.tsx","./src/modules/server/sections/advancedsection.tsx","./src/modules/server/sections/embeddingmodelsection.tsx","./src/modules/server/sections/runtimeparamssection.tsx","./src/modules/server/sections/sidecarsection.tsx","./src/modules/settings/settingspage.tsx","./src/modules/settings/hooks.ts","./src/modules/settings/index.ts","./src/modules/settings/components/changepasswordform.tsx","./src/modules/settings/components/sessionrow.tsx","./src/modules/settings/sections/editorsection.tsx","./src/modules/settings/sections/profilesection.tsx","./src/modules/settings/sections/sessionssection.tsx","./src/modules/settings/sections/themesection.tsx","./src/modules/users/userspage.tsx","./src/modules/users/hooks.ts","./src/modules/users/index.ts","./src/modules/users/components/deleteuserdialog.tsx","./src/modules/users/components/disableuserbutton.tsx","./src/modules/users/components/inviteuserdialog.tsx","./src/modules/users/components/userroleselect.tsx","./src/modules/users/components/userstable.tsx","./src/ui/alert.tsx","./src/ui/badge.tsx","./src/ui/button.tsx","./src/ui/card.tsx","./src/ui/dialog.tsx","./src/ui/input.tsx","./src/ui/label.tsx","./src/ui/radio-group.tsx","./src/ui/scroll-area.tsx","./src/ui/select.tsx","./src/ui/skeleton.tsx","./src/ui/slider.tsx","./src/ui/sonner.tsx","./src/ui/switch.tsx","./src/ui/table.tsx","./src/ui/tabs.tsx","./src/ui/tooltip.tsx"],"version":"5.9.3"} \ No newline at end of file +{"root":["./src/main.tsx","./src/vite-env.d.ts","./src/api/client.ts","./src/api/generated.ts","./src/api/types.ts","./src/app/app.tsx","./src/app/footer.tsx","./src/app/shell.tsx","./src/app/sidebar.tsx","./src/app/themeprovider.tsx","./src/app/updatebanner.tsx","./src/app/providers.tsx","./src/auth/authprovider.tsx","./src/auth/bootstrapneededpage.tsx","./src/auth/changepasswordpage.tsx","./src/auth/loginpage.tsx","./src/auth/useauth.ts","./src/lib/cn.ts","./src/lib/editorpreference.ts","./src/lib/formatdate.ts","./src/lib/theme.ts","./src/lib/useserverstatus.ts","./src/modules/registry.ts","./src/modules/types.ts","./src/modules/api-keys/apikeyspage.tsx","./src/modules/api-keys/hooks.ts","./src/modules/api-keys/index.ts","./src/modules/api-keys/components/apikeytable.tsx","./src/modules/api-keys/components/createapikeydialog.tsx","./src/modules/api-keys/components/revokeapikeydialog.tsx","./src/modules/github-tokens/githubtokenspage.tsx","./src/modules/github-tokens/index.ts","./src/modules/home/homepage.tsx","./src/modules/home/index.ts","./src/modules/projects/projectdetailpage.tsx","./src/modules/projects/projectslistpage.tsx","./src/modules/projects/projectspage.tsx","./src/modules/projects/hooks.ts","./src/modules/projects/index.ts","./src/modules/projects/components/deleteprojectdialog.tsx","./src/modules/projects/components/projectcard.tsx","./src/modules/projects/components/projectinfocard.tsx","./src/modules/search/searchpage.tsx","./src/modules/search/hooks.ts","./src/modules/search/index.ts","./src/modules/search/components/filters.tsx","./src/modules/search/components/resultfilecard.tsx","./src/modules/search/components/resultsnippet.tsx","./src/modules/search/components/searchinput.tsx","./src/modules/server/serverpage.tsx","./src/modules/server/hooks.ts","./src/modules/server/index.ts","./src/modules/server/components/saveandrestartdialog.tsx","./src/modules/server/components/sidecarstatebadge.tsx","./src/modules/server/components/sourcepill.tsx","./src/modules/server/sections/advancedsection.tsx","./src/modules/server/sections/embeddingmodelsection.tsx","./src/modules/server/sections/runtimeparamssection.tsx","./src/modules/server/sections/sidecarsection.tsx","./src/modules/settings/settingspage.tsx","./src/modules/settings/hooks.ts","./src/modules/settings/index.ts","./src/modules/settings/components/changepasswordform.tsx","./src/modules/settings/components/sessionrow.tsx","./src/modules/settings/sections/editorsection.tsx","./src/modules/settings/sections/profilesection.tsx","./src/modules/settings/sections/sessionssection.tsx","./src/modules/settings/sections/themesection.tsx","./src/modules/users/userspage.tsx","./src/modules/users/hooks.ts","./src/modules/users/index.ts","./src/modules/users/components/deleteuserdialog.tsx","./src/modules/users/components/disableuserbutton.tsx","./src/modules/users/components/inviteuserdialog.tsx","./src/modules/users/components/userroleselect.tsx","./src/modules/users/components/userstable.tsx","./src/modules/workspaces/workspacedetailpage.tsx","./src/modules/workspaces/workspaceslistpage.tsx","./src/modules/workspaces/workspacespage.tsx","./src/modules/workspaces/index.ts","./src/modules/workspaces/types.ts","./src/modules/workspaces/components/addexistingprojectdialog.tsx","./src/modules/workspaces/components/addrepodialog.tsx","./src/modules/workspaces/components/createworkspacedialog.tsx","./src/modules/workspaces/components/repocard.tsx","./src/modules/workspaces/components/workspacecard.tsx","./src/modules/workspaces/components/workspacesearchdialog.tsx","./src/ui/alert.tsx","./src/ui/badge.tsx","./src/ui/button.tsx","./src/ui/card.tsx","./src/ui/dialog.tsx","./src/ui/input.tsx","./src/ui/label.tsx","./src/ui/radio-group.tsx","./src/ui/scroll-area.tsx","./src/ui/select.tsx","./src/ui/skeleton.tsx","./src/ui/slider.tsx","./src/ui/sonner.tsx","./src/ui/switch.tsx","./src/ui/table.tsx","./src/ui/tabs.tsx","./src/ui/tooltip.tsx"],"version":"5.9.3"} \ No newline at end of file diff --git a/server/go.mod b/server/go.mod index c5e8332..c659bb9 100644 --- a/server/go.mod +++ b/server/go.mod @@ -5,22 +5,36 @@ go 1.25.9 require ( github.com/getkin/kin-openapi v0.135.0 github.com/go-chi/chi/v5 v5.2.4 + github.com/go-git/go-git/v5 v5.19.0 github.com/google/uuid v1.6.0 github.com/oapi-codegen/runtime v1.4.0 github.com/odvcencio/gotreesitter v0.0.0-20260423084729-38e2b42712f2 github.com/philippgille/chromem-go v0.7.0 golang.org/x/crypto v0.50.0 + golang.org/x/sync v0.20.0 modernc.org/sqlite v1.34.1 ) require ( + dario.cat/mergo v1.0.0 // indirect + github.com/Microsoft/go-winio v0.6.2 // indirect + github.com/ProtonMail/go-crypto v1.1.6 // indirect github.com/apapsch/go-jsonmerge/v2 v2.0.0 // indirect + github.com/cloudflare/circl v1.6.3 // indirect + github.com/cyphar/filepath-securejoin v0.6.1 // indirect github.com/dprotaso/go-yit v0.0.0-20220510233725-9ba8df137936 // indirect github.com/dustin/go-humanize v1.0.1 // indirect + github.com/emirpasic/gods v1.18.1 // indirect + github.com/go-git/gcfg v1.5.1-0.20230307220236-3a3c6141e376 // indirect + github.com/go-git/go-billy/v5 v5.9.0 // indirect github.com/go-openapi/jsonpointer v0.22.4 // indirect github.com/go-openapi/swag/jsonname v0.25.4 // indirect + github.com/golang/groupcache v0.0.0-20241129210726-2c02b8208cf8 // indirect github.com/hashicorp/golang-lru/v2 v2.0.7 // indirect + github.com/jbenet/go-context v0.0.0-20150711004518-d14ea06fba99 // indirect github.com/josharian/intern v1.0.0 // indirect + github.com/kevinburke/ssh_config v1.2.0 // indirect + github.com/klauspost/cpuid/v2 v2.3.0 // indirect github.com/mailru/easyjson v0.9.1 // indirect github.com/mattn/go-isatty v0.0.20 // indirect github.com/mohae/deepcopy v0.0.0-20170929034955-c48cc78d4826 // indirect @@ -29,17 +43,22 @@ require ( github.com/oasdiff/yaml v0.0.9 // indirect github.com/oasdiff/yaml3 v0.0.9 // indirect github.com/perimeterx/marshmallow v1.1.5 // indirect + github.com/pjbgf/sha1cd v0.6.0 // indirect github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec // indirect + github.com/sergi/go-diff v1.3.2-0.20230802210424-5b0b94c5c0d3 // indirect + github.com/skeema/knownhosts v1.3.1 // indirect github.com/speakeasy-api/jsonpath v0.6.3 // indirect github.com/speakeasy-api/openapi v1.19.2 // indirect github.com/vmware-labs/yaml-jsonpath v0.3.2 // indirect github.com/woodsbury/decimal128 v1.4.0 // indirect + github.com/xanzy/ssh-agent v0.3.3 // indirect go.yaml.in/yaml/v3 v3.0.4 // indirect golang.org/x/mod v0.34.0 // indirect - golang.org/x/sync v0.20.0 // indirect + golang.org/x/net v0.53.0 // indirect golang.org/x/sys v0.43.0 // indirect golang.org/x/text v0.36.0 // indirect golang.org/x/tools v0.43.0 // indirect + gopkg.in/warnings.v0 v0.1.2 // indirect gopkg.in/yaml.v3 v3.0.1 // indirect modernc.org/gc/v3 v3.0.0-20240107210532-573471604cb6 // indirect modernc.org/libc v1.55.3 // indirect diff --git a/server/go.sum b/server/go.sum index 1debe9e..0fb3957 100644 --- a/server/go.sum +++ b/server/go.sum @@ -1,10 +1,25 @@ +dario.cat/mergo v1.0.0 h1:AGCNq9Evsj31mOgNPcLyXc+4PNABt905YmuqPYYpBWk= +dario.cat/mergo v1.0.0/go.mod h1:uNxQE+84aUszobStD9th8a29P2fMDhsBdgRYvZOxGmk= +github.com/Microsoft/go-winio v0.5.2/go.mod h1:WpS1mjBmmwHBEWmogvA2mj8546UReBk4v8QkMxJ6pZY= +github.com/Microsoft/go-winio v0.6.2 h1:F2VQgta7ecxGYO8k3ZZz3RS8fVIXVxONVUPlNERoyfY= +github.com/Microsoft/go-winio v0.6.2/go.mod h1:yd8OoFMLzJbo9gZq8j5qaps8bJ9aShtEA8Ipt1oGCvU= +github.com/ProtonMail/go-crypto v1.1.6 h1:ZcV+Ropw6Qn0AX9brlQLAUXfqLBc7Bl+f/DmNxpLfdw= +github.com/ProtonMail/go-crypto v1.1.6/go.mod h1:rA3QumHc/FZ8pAHreoekgiAbzpNsfQAosU5td4SnOrE= github.com/RaveNoX/go-jsoncommentstrip v1.0.0/go.mod h1:78ihd09MekBnJnxpICcwzCMzGrKSKYe4AqU6PDYYpjk= +github.com/anmitsu/go-shlex v0.0.0-20200514113438-38f4b401e2be h1:9AeTilPcZAjCFIImctFaOjnTIavg87rW78vTPkQqLI8= +github.com/anmitsu/go-shlex v0.0.0-20200514113438-38f4b401e2be/go.mod h1:ySMOLuWl6zY27l47sB3qLNK6tF2fkHG55UZxx8oIVo4= github.com/apapsch/go-jsonmerge/v2 v2.0.0 h1:axGnT1gRIfimI7gJifB699GoE/oq+F2MU7Dml6nw9rQ= github.com/apapsch/go-jsonmerge/v2 v2.0.0/go.mod h1:lvDnEdqiQrp0O42VQGgmlKpxL1AP2+08jFMw88y4klk= +github.com/armon/go-socks5 v0.0.0-20160902184237-e75332964ef5 h1:0CwZNZbxp69SHPdPJAN/hZIm0C4OItdklCFmMRWYpio= +github.com/armon/go-socks5 v0.0.0-20160902184237-e75332964ef5/go.mod h1:wHh0iHkYZB8zMSxRWpUBQtwG5a7fFgvEO+odwuTv2gs= github.com/bmatcuk/doublestar v1.1.1/go.mod h1:UD6OnuiIn0yFxxA2le/rnRU1G4RaI4UvFv1sNto9p6w= github.com/chzyer/logex v1.1.10/go.mod h1:+Ywpsq7O8HXn0nuIou7OrIPyXbp3wmkHB+jjWRnGsAI= github.com/chzyer/readline v0.0.0-20180603132655-2972be24d48e/go.mod h1:nSuG5e5PlCu98SY8svDHJxuZscDgtXS6KTTbou5AhLI= github.com/chzyer/test v0.0.0-20180213035817-a1ea475d72b1/go.mod h1:Q3SI9o4m/ZMnBNeIyt5eFwwo7qiLfzFZmjNmxjkiQlU= +github.com/cloudflare/circl v1.6.3 h1:9GPOhQGF9MCYUeXyMYlqTR6a5gTrgR/fBLXvUgtVcg8= +github.com/cloudflare/circl v1.6.3/go.mod h1:2eXP6Qfat4O/Yhh8BznvKnJ+uzEoTQ6jVKJRn81BiS4= +github.com/cyphar/filepath-securejoin v0.6.1 h1:5CeZ1jPXEiYt3+Z6zqprSAgSWiggmpVyciv8syjIpVE= +github.com/cyphar/filepath-securejoin v0.6.1/go.mod h1:A8hd4EnAeyujCJRrICiOWqjS1AX0a9kM5XL+NwKoYSc= github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc h1:U9qPSI2PIWSS1VwoXQT9A3Wy9MM3WgvqSxFWenqJduM= @@ -14,13 +29,27 @@ github.com/dprotaso/go-yit v0.0.0-20220510233725-9ba8df137936 h1:PRxIJD8XjimM5aT github.com/dprotaso/go-yit v0.0.0-20220510233725-9ba8df137936/go.mod h1:ttYvX5qlB+mlV1okblJqcSMtR4c52UKxDiX9GRBS8+Q= github.com/dustin/go-humanize v1.0.1 h1:GzkhY7T5VNhEkwH0PVJgjz+fX1rhBrR7pRT3mDkpeCY= github.com/dustin/go-humanize v1.0.1/go.mod h1:Mu1zIs6XwVuF/gI1OepvI0qD18qycQx+mFykh5fBlto= +github.com/elazarl/goproxy v1.7.2 h1:Y2o6urb7Eule09PjlhQRGNsqRfPmYI3KKQLFpCAV3+o= +github.com/elazarl/goproxy v1.7.2/go.mod h1:82vkLNir0ALaW14Rc399OTTjyNREgmdL2cVoIbS6XaE= +github.com/emirpasic/gods v1.18.1 h1:FXtiHYKDGKCW2KzwZKx0iC0PQmdlorYgdFG9jPXJ1Bc= +github.com/emirpasic/gods v1.18.1/go.mod h1:8tpGGwCnJ5H4r6BWwaV6OrWmMoPhUl5jm/FMNAnJvWQ= github.com/fsnotify/fsnotify v1.4.7/go.mod h1:jwhsz4b93w/PPRr/qN1Yymfu8t87LnFCMoQvtojpjFo= github.com/fsnotify/fsnotify v1.4.9 h1:hsms1Qyu0jgnwNXIxa+/V/PDsU6CfLf6CNO8H7IWoS4= github.com/fsnotify/fsnotify v1.4.9/go.mod h1:znqG4EE+3YCdAaPaxE2ZRY/06pZUdp0tY4IgpuI1SZQ= github.com/getkin/kin-openapi v0.135.0 h1:751SjYfbiwqukYuVjwYEIKNfrSwS5YpA7DZnKSwQgtg= github.com/getkin/kin-openapi v0.135.0/go.mod h1:6dd5FJl6RdX4usBtFBaQhk9q62Yb2J0Mk5IhUO/QqFI= +github.com/gliderlabs/ssh v0.3.8 h1:a4YXD1V7xMF9g5nTkdfnja3Sxy1PVDCj1Zg4Wb8vY6c= +github.com/gliderlabs/ssh v0.3.8/go.mod h1:xYoytBv1sV0aL3CavoDuJIQNURXkkfPA/wxQ1pL1fAU= github.com/go-chi/chi/v5 v5.2.4 h1:WtFKPHwlywe8Srng8j2BhOD9312j9cGUxG1SP4V2cR4= github.com/go-chi/chi/v5 v5.2.4/go.mod h1:X7Gx4mteadT3eDOMTsXzmI4/rwUpOwBHLpAfupzFJP0= +github.com/go-git/gcfg v1.5.1-0.20230307220236-3a3c6141e376 h1:+zs/tPmkDkHx3U66DAb0lQFJrpS6731Oaa12ikc+DiI= +github.com/go-git/gcfg v1.5.1-0.20230307220236-3a3c6141e376/go.mod h1:an3vInlBmSxCcxctByoQdvwPiA7DTK7jaaFDBTtu0ic= +github.com/go-git/go-billy/v5 v5.9.0 h1:jItGXszUDRtR/AlferWPTMN4j38BQ88XnXKbilmmBPA= +github.com/go-git/go-billy/v5 v5.9.0/go.mod h1:jCnQMLj9eUgGU7+ludSTYoZL/GGmii14RxKFj7ROgHw= +github.com/go-git/go-git-fixtures/v4 v4.3.2-0.20231010084843-55a94097c399 h1:eMje31YglSBqCdIqdhKBW8lokaMrL3uTkpGYlE2OOT4= +github.com/go-git/go-git-fixtures/v4 v4.3.2-0.20231010084843-55a94097c399/go.mod h1:1OCfN199q1Jm3HZlxleg+Dw/mwps2Wbk9frAWm+4FII= +github.com/go-git/go-git/v5 v5.19.0 h1:+WkVUQZSy/F1Gb13udrMKjIM2PrzsNfDKFSfo5tkMtc= +github.com/go-git/go-git/v5 v5.19.0/go.mod h1:Pb1v0c7/g8aGQJwx9Us09W85yGoyvSwuhEGMH7zjDKQ= github.com/go-openapi/jsonpointer v0.22.4 h1:dZtK82WlNpVLDW2jlA1YCiVJFVqkED1MegOUy9kR5T4= github.com/go-openapi/jsonpointer v0.22.4/go.mod h1:elX9+UgznpFhgBuaMQ7iu4lvvX1nvNsesQ3oxmYTw80= github.com/go-openapi/swag/jsonname v0.25.4 h1:bZH0+MsS03MbnwBXYhuTttMOqk+5KcQ9869Vye1bNHI= @@ -30,6 +59,8 @@ github.com/go-openapi/testify/v2 v2.0.2/go.mod h1:HCPmvFFnheKK2BuwSA0TbbdxJ3I16p github.com/go-task/slim-sprig v0.0.0-20210107165309-348f09dbbbc0/go.mod h1:fyg7847qk6SyHyPtNmDHnmrv/HOrqktSC+C9fM+CJOE= github.com/go-test/deep v1.0.8 h1:TDsG77qcSprGbC6vTN8OuXp5g+J+b5Pcguhf7Zt61VM= github.com/go-test/deep v1.0.8/go.mod h1:5C2ZWiW0ErCdrYzpqxLbTX7MG14M9iiw8DgHncVwcsE= +github.com/golang/groupcache v0.0.0-20241129210726-2c02b8208cf8 h1:f+oWsMOmNPc8JmEHVZIycC7hBoQxHH9pNKQORJNozsQ= +github.com/golang/groupcache v0.0.0-20241129210726-2c02b8208cf8/go.mod h1:wcDNUvekVysuuOpQKo3191zZyTpiI6se1N1ULghS0sw= github.com/golang/protobuf v1.2.0/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U= github.com/golang/protobuf v1.4.0-rc.1/go.mod h1:ceaxUfeHdC40wWswd/P6IGgMaK3YpKi5j83Wpe3EHw8= github.com/golang/protobuf v1.4.0-rc.1.0.20200221234624-67d41d38c208/go.mod h1:xKAWHe0F5eneWXFV3EuXVDTCmh+JuBKY0li0aMyXATA= @@ -43,8 +74,8 @@ github.com/google/go-cmp v0.3.0/go.mod h1:8QqcDgzrUqlUb/G2PQTWiueGozuR1884gddMyw github.com/google/go-cmp v0.3.1/go.mod h1:8QqcDgzrUqlUb/G2PQTWiueGozuR1884gddMywk6iLU= github.com/google/go-cmp v0.4.0/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= github.com/google/go-cmp v0.5.5/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE= -github.com/google/go-cmp v0.6.0 h1:ofyhxvXcZhMsU5ulbFiLKl/XBFqE1GSq7atu8tAmTRI= -github.com/google/go-cmp v0.6.0/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY= +github.com/google/go-cmp v0.7.0 h1:wk8382ETsv4JYUZwIsn6YpYiWiBsYLSJiTsyBybVuN8= +github.com/google/go-cmp v0.7.0/go.mod h1:pXiqmnSA92OHEEa9HXL2W4E7lf9JzCmGVUdgjX3N/iU= github.com/google/pprof v0.0.0-20210407192527-94a9f03dee38/go.mod h1:kpwsk12EmLew5upagYY7GY0pfYCcupk39gWOCRROcvE= github.com/google/pprof v0.0.0-20240409012703-83162a5b38cd h1:gbpYu9NMq8jhDVbvlGkMFWCjLFlqqEZjEmObmhUy6Vo= github.com/google/pprof v0.0.0-20240409012703-83162a5b38cd/go.mod h1:kf6iHlnVGwgKolg33glAes7Yg/8iWP8ukqeldJSO7jw= @@ -54,9 +85,15 @@ github.com/hashicorp/golang-lru/v2 v2.0.7 h1:a+bsQ5rvGLjzHuww6tVxozPZFVghXaHOwFs github.com/hashicorp/golang-lru/v2 v2.0.7/go.mod h1:QeFd9opnmA6QUJc5vARoKUSoFhyfM2/ZepoAG6RGpeM= github.com/hpcloud/tail v1.0.0/go.mod h1:ab1qPbhIpdTxEkNHXyeSf5vhxWSCs/tWer42PpOxQnU= github.com/ianlancetaylor/demangle v0.0.0-20200824232613-28f6c0f3b639/go.mod h1:aSSvb/t6k1mPoxDqO4vJh6VOCGPwU4O0C2/Eqndh1Sc= +github.com/jbenet/go-context v0.0.0-20150711004518-d14ea06fba99 h1:BQSFePA1RWJOlocH6Fxy8MmwDt+yVQYULKfN0RoTN8A= +github.com/jbenet/go-context v0.0.0-20150711004518-d14ea06fba99/go.mod h1:1lJo3i6rXxKeerYnT8Nvf0QmHCRC1n8sfWVwXF2Frvo= github.com/josharian/intern v1.0.0 h1:vlS4z54oSdjm0bgjRigI+G1HpF+tI+9rE5LLzOg8HmY= github.com/josharian/intern v1.0.0/go.mod h1:5DoeVV0s6jJacbCEi61lwdGj/aVlrQvzHFFd8Hwg//Y= github.com/juju/gnuflag v0.0.0-20171113085948-2ce1bb71843d/go.mod h1:2PavIy+JPciBPrBUjwbNvtwB6RQlve+hkpll6QSNmOE= +github.com/kevinburke/ssh_config v1.2.0 h1:x584FjTGwHzMwvHx18PXxbBVzfnxogHaAReU4gf13a4= +github.com/kevinburke/ssh_config v1.2.0/go.mod h1:CT57kijsi8u/K/BOFA39wgDQJ9CxiF4nAY/ojJ6r6mM= +github.com/klauspost/cpuid/v2 v2.3.0 h1:S4CRMLnYUhGeDFDqkGriYKdfoFlDnMtqTiI/sFzhA9Y= +github.com/klauspost/cpuid/v2 v2.3.0/go.mod h1:hqwkgyIinND0mEev00jJYCxPNVRVXFQeu1XKlok6oO0= github.com/kr/pretty v0.1.0/go.mod h1:dAy3ld7l9f0ibDNOQOHHMYYIIbhfbHSm3C4ZsoJORNo= github.com/kr/pretty v0.3.1 h1:flRD4NNwYAUpkphVc1HcthR4KEIFJ65n8Mw5qdRn3LE= github.com/kr/pretty v0.3.1/go.mod h1:hoEshYVHaxMs3cyo3Yncou5ZscifuDolrwPKZanG3xk= @@ -95,27 +132,37 @@ github.com/onsi/gomega v1.7.0/go.mod h1:ex+gbHU/CVuBBDIJjb2X0qEXbFg53c61hWP/1Cpa github.com/onsi/gomega v1.7.1/go.mod h1:XdKZgCCFLUoM/7CFJVPcG8C1xQ1AJ0vpAezJrB7JYyY= github.com/onsi/gomega v1.10.1/go.mod h1:iN09h71vgCQne3DLsj+A5owkum+a2tYe+TOCB1ybHNo= github.com/onsi/gomega v1.17.0/go.mod h1:HnhC7FXeEQY45zxNK3PPoIUhzk/80Xly9PcubAlGdZY= -github.com/onsi/gomega v1.19.0 h1:4ieX6qQjPP/BfC3mpsAtIGGlxTWPeA3Inl/7DtXw1tw= github.com/onsi/gomega v1.19.0/go.mod h1:LY+I3pBVzYsTBU1AnDwOSxaYi9WoWiqgwooUqq9yPro= +github.com/onsi/gomega v1.34.1 h1:EUMJIKUjM8sKjYbtxQI9A4z2o+rruxnzNvpknOXie6k= +github.com/onsi/gomega v1.34.1/go.mod h1:kU1QgUvBDLXBJq618Xvm2LUX6rSAfRaFRTcdOeDLwwY= github.com/perimeterx/marshmallow v1.1.5 h1:a2LALqQ1BlHM8PZblsDdidgv1mWi1DgC2UmX50IvK2s= github.com/perimeterx/marshmallow v1.1.5/go.mod h1:dsXbUu8CRzfYP5a87xpp0xq9S3u0Vchtcl8we9tYaXw= github.com/philippgille/chromem-go v0.7.0 h1:4jfvfyKymjKNfGxBUhHUcj1kp7B17NL/I1P+vGh1RvY= github.com/philippgille/chromem-go v0.7.0/go.mod h1:hTd+wGEm/fFPQl7ilfCwQXkgEUxceYh86iIdoKMolPo= +github.com/pjbgf/sha1cd v0.6.0 h1:3WJ8Wz8gvDz29quX1OcEmkAlUg9diU4GxJHqs0/XiwU= +github.com/pjbgf/sha1cd v0.6.0/go.mod h1:lhpGlyHLpQZoxMv8HcgXvZEhcGs0PG/vsZnEJ7H0iCM= +github.com/pkg/errors v0.9.1 h1:FEBLx1zS214owpjy7qsBeixbURkuhQAwrK5UwLGTwt4= +github.com/pkg/errors v0.9.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0= github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 h1:Jamvg5psRIccs7FGNTlIRMkT8wgtp5eCXdBlqhYGL6U= github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec h1:W09IVJc94icq4NjY3clb7Lk8O1qJ8BdBEF8z0ibU0rE= github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec/go.mod h1:qqbHyh8v60DhA7CoWK5oRCqLrMHRGoxYCSS9EjAz6Eo= -github.com/rogpeppe/go-internal v1.13.1 h1:KvO1DLK/DRN07sQ1LQKScxyZJuNnedQ5/wKSR38lUII= -github.com/rogpeppe/go-internal v1.13.1/go.mod h1:uMEvuHeurkdAXX61udpOXGD/AzZDWNMNyH2VO9fmH0o= -github.com/sergi/go-diff v1.1.0 h1:we8PVUC3FE2uYfodKH/nBHMSetSfHDR6scGdBi+erh0= +github.com/rogpeppe/go-internal v1.14.1 h1:UQB4HGPB6osV0SQTLymcB4TgvyWu6ZyliaW0tI/otEQ= +github.com/rogpeppe/go-internal v1.14.1/go.mod h1:MaRKkUm5W0goXpeCfT7UZI6fk/L7L7so1lCWt35ZSgc= github.com/sergi/go-diff v1.1.0/go.mod h1:STckp+ISIX8hZLjrqAeVduY0gWCT9IjLuqbuNXdaHfM= +github.com/sergi/go-diff v1.3.2-0.20230802210424-5b0b94c5c0d3 h1:n661drycOFuPLCN3Uc8sB6B/s6Z4t2xvBgU1htSHuq8= +github.com/sergi/go-diff v1.3.2-0.20230802210424-5b0b94c5c0d3/go.mod h1:A0bzQcvG0E7Rwjx0REVgAGH58e96+X0MeOfepqsbeW4= +github.com/sirupsen/logrus v1.7.0/go.mod h1:yWOB1SBYBC5VeMP7gHvWumXLIWorT60ONWic61uBYv0= +github.com/skeema/knownhosts v1.3.1 h1:X2osQ+RAjK76shCbvhHHHVl3ZlgDm8apHEHFqRjnBY8= +github.com/skeema/knownhosts v1.3.1/go.mod h1:r7KTdC8l4uxWRyK2TpQZ/1o5HaSzh06ePQNxPwTcfiY= github.com/speakeasy-api/jsonpath v0.6.3 h1:c+QPwzAOdrWvzycuc9HFsIZcxKIaWcNpC+xhOW9rJxU= github.com/speakeasy-api/jsonpath v0.6.3/go.mod h1:2cXloNuQ+RSXi5HTRaeBh7JEmjRXTiaKpFTdZiL7URI= github.com/speakeasy-api/openapi v1.19.2 h1:md90tE71/M8jS3cuRlsuWP5Aed4xoG5PSRvXeZgCv/M= github.com/speakeasy-api/openapi v1.19.2/go.mod h1:UfKa7FqE4jgexJZuj51MmdHAFGmDv0Zaw3+yOd81YKU= github.com/spkg/bom v0.0.0-20160624110644-59b7046e48ad/go.mod h1:qLr4V1qq6nMqFKkMo8ZTx3f+BZEkzsRUY10Xsm2mwU0= github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME= +github.com/stretchr/testify v1.2.2/go.mod h1:a8OnRcib4nhh0OaRAV+Yts87kKdq0PP7pXfy6kDkUVs= github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI= github.com/stretchr/testify v1.4.0/go.mod h1:j7eGeouHqKxXV5pUuKE4zz7dFj8WfuZ+81PSLYec5m4= github.com/stretchr/testify v1.5.1/go.mod h1:5W2xD1RspED5o8YsWQXVCued0rvSQ+mT+I5cxcmMvtA= @@ -127,14 +174,19 @@ github.com/vmware-labs/yaml-jsonpath v0.3.2 h1:/5QKeCBGdsInyDCyVNLbXyilb61MXGi9N github.com/vmware-labs/yaml-jsonpath v0.3.2/go.mod h1:U6whw1z03QyqgWdgXxvVnQ90zN1BWz5V+51Ewf8k+rQ= github.com/woodsbury/decimal128 v1.4.0 h1:xJATj7lLu4f2oObouMt2tgGiElE5gO6mSWUjQsBgUlc= github.com/woodsbury/decimal128 v1.4.0/go.mod h1:BP46FUrVjVhdTbKT+XuQh2xfQaGki9LMIRJSFuh6THU= +github.com/xanzy/ssh-agent v0.3.3 h1:+/15pJfg/RsTxqYcX6fHqOXZwwMP+2VyYWJeWM2qQFM= +github.com/xanzy/ssh-agent v0.3.3/go.mod h1:6dzNDKs0J9rVPHPhaGCukekBHKqfl+L3KghI1Bc68Uw= github.com/yuin/goldmark v1.2.1/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74= go.yaml.in/yaml/v3 v3.0.4 h1:tfq32ie2Jv2UxXFdLJdh3jXuOzWiL1fo0bu/FbuKpbc= go.yaml.in/yaml/v3 v3.0.4/go.mod h1:DhzuOOF2ATzADvBadXxruRBLzYTpT36CKvDb3+aBEFg= golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w= golang.org/x/crypto v0.0.0-20191011191535-87dc89f01550/go.mod h1:yigFU9vqHzYiE8UmvKecakEJjdnWj3jj499lnFckfCI= golang.org/x/crypto v0.0.0-20200622213623-75b288015ac9/go.mod h1:LzIPMQfyMNhhGPhUkYOs5KpL4U8rLKemX1yGLhDgUto= +golang.org/x/crypto v0.0.0-20220622213112-05595931fe9d/go.mod h1:IxCIyHEi3zRg3s0A5j5BB6A9Jmi73HwBIUl50j+osU4= golang.org/x/crypto v0.50.0 h1:zO47/JPrL6vsNkINmLoo/PH1gcxpls50DNogFvB5ZGI= golang.org/x/crypto v0.50.0/go.mod h1:3muZ7vA7PBCE6xgPX7nkzzjiUq87kRItoJQM1Yo8S+Q= +golang.org/x/exp v0.0.0-20260410095643-746e56fc9e2f h1:W3F4c+6OLc6H2lb//N1q4WpJkhzJCK5J6kUi1NTVXfM= +golang.org/x/exp v0.0.0-20260410095643-746e56fc9e2f/go.mod h1:J1xhfL/vlindoeF/aINzNzt2Bket5bjo9sdOYzOsU80= golang.org/x/mod v0.3.0/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA= golang.org/x/mod v0.34.0 h1:xIHgNUUnW6sYkcM5Jleh05DvLOtwc6RitGHbDk4akRI= golang.org/x/mod v0.34.0/go.mod h1:ykgH52iCZe79kzLLMhyCUzhMci+nQj+0XkbXpNYtVjY= @@ -144,9 +196,10 @@ golang.org/x/net v0.0.0-20190620200207-3b0461eec859/go.mod h1:z5CRVTTTmAJ677TzLL golang.org/x/net v0.0.0-20200520004742-59133d7f0dd7/go.mod h1:qpuaurCH72eLCgpAm/N6yyVIVM9cpaDIP3A8BGJEC5A= golang.org/x/net v0.0.0-20201021035429-f5854403a974/go.mod h1:sp8m0HH+o8qH0wwXwYZr8TS3Oi6o0r6Gce1SSxlDquU= golang.org/x/net v0.0.0-20210428140749-89ef3d95e781/go.mod h1:OJAsFXCWl8Ukc7SiCT/9KSuxbyM7479/AVlXFRxuMCk= +golang.org/x/net v0.0.0-20211112202133-69e39bad7dc2/go.mod h1:9nx3DQGgdP8bBQD5qxJ1jj9UTztislL4KSBs9R2vV5Y= golang.org/x/net v0.0.0-20220225172249-27dd8689420f/go.mod h1:CfG3xpIq0wQ8r1q4Su4UZFWDARRcnwPjda9FqA0JpMk= -golang.org/x/net v0.52.0 h1:He/TN1l0e4mmR3QqHMT2Xab3Aj3L9qjbhRm78/6jrW0= -golang.org/x/net v0.52.0/go.mod h1:R1MAz7uMZxVMualyPXb+VaqGSa3LIaUqk0eEt3w36Sw= +golang.org/x/net v0.53.0 h1:d+qAbo5L0orcWAr0a9JweQpjXF19LMXJE8Ey7hwOdUA= +golang.org/x/net v0.53.0/go.mod h1:JvMuJH7rrdiCfbeHoo3fCQU24Lf5JJwT9W3sJFulfgs= golang.org/x/sync v0.0.0-20180314180146-1d60e4601c6f/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= golang.org/x/sync v0.0.0-20201020160332-67f06af15bc9/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM= @@ -157,20 +210,25 @@ golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5h golang.org/x/sys v0.0.0-20190412213103-97732733099d/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20190904154756-749cb33beabd/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20191005200804-aed5e4c7ecf9/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20191026070338-33540a1f6037/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20191120155948-bd437916bb0e/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20191204072324-ce4227a45e2e/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20200323222414-85ca7c5b95cd/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20200930185726-fdedc70b468f/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20201119102817-f84b799fce68/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20210112080510-489259a85091/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= +golang.org/x/sys v0.0.0-20210124154548-22da62e12c0c/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20210423082822-04245dca01da/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20210615035016-665e8c7367d1/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.0.0-20211216021012-1d35b9e2eb4e/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= +golang.org/x/sys v0.0.0-20220715151400-c0bba94af5f8/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= golang.org/x/sys v0.43.0 h1:Rlag2XtaFTxp19wS8MXlJwTvoh8ArU6ezoyFsMyCTNI= golang.org/x/sys v0.43.0/go.mod h1:4GL1E5IUh+htKOUEOaiffhrAeqysfVGipDYzABqnCmw= golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo= golang.org/x/term v0.0.0-20210927222741-03fcf44c2211/go.mod h1:jbD1KX2456YbFQfuXm/mYQcufACuNUgVhRMnK/tPxf8= +golang.org/x/term v0.42.0 h1:UiKe+zDFmJobeJ5ggPwOshJIVt6/Ft0rcfrXZDLWAWY= +golang.org/x/term v0.42.0/go.mod h1:Dq/D+snpsbazcBG5+F9Q1n2rXV8Ma+71xEjTRufARgY= golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ= golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ= golang.org/x/text v0.3.6/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ= @@ -201,11 +259,12 @@ gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c/go.mod h1:JHkPIbrfpd72SG/EV gopkg.in/fsnotify.v1 v1.4.7/go.mod h1:Tz8NjZHkW78fSQdbUxIjBTcgA1z1m8ZHf0WmKUhAMys= gopkg.in/tomb.v1 v1.0.0-20141024135613-dd632973f1e7 h1:uRGJdciOHaEIrze2W8Q3AKkepLTh2hOroT7a+7czfdQ= gopkg.in/tomb.v1 v1.0.0-20141024135613-dd632973f1e7/go.mod h1:dt/ZhP58zS4L8KSrWDmTeBkI65Dw0HsyUHuEVlX15mw= +gopkg.in/warnings.v0 v0.1.2 h1:wFXVbFY8DY5/xOe1ECiWdKCzZlxgshcYVNkBHstARME= +gopkg.in/warnings.v0 v0.1.2/go.mod h1:jksf8JmL6Qr/oQM2OXTHunEvvTAsrWBLb6OOjuVWRNI= gopkg.in/yaml.v2 v2.2.1/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI= gopkg.in/yaml.v2 v2.2.2/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI= gopkg.in/yaml.v2 v2.2.4/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI= gopkg.in/yaml.v2 v2.3.0/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI= -gopkg.in/yaml.v2 v2.4.0 h1:D8xgwECY7CYvx+Y2n4sBz93Jn9JRvxdiyyo8CTfuKaY= gopkg.in/yaml.v2 v2.4.0/go.mod h1:RDklbk79AGWmwhnvt/jBztapEOGDOx6ZbXqjP6csGnQ= gopkg.in/yaml.v3 v3.0.0-20191026110619-0b21df46bc1d/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= diff --git a/server/internal/callgraph/callgraph.go b/server/internal/callgraph/callgraph.go new file mode 100644 index 0000000..75a9a9d --- /dev/null +++ b/server/internal/callgraph/callgraph.go @@ -0,0 +1,328 @@ +// Package callgraph extracts approximate caller→callee edges from the +// symbols + refs tables that the existing indexer populates. The output +// is the call_edges table, which the PR5 community detector consumes as +// the structural signal for Louvain clustering. +// +// IMPORTANT — this is a CO-OCCURRENCE graph, not a precision call graph. +// We resolve callees by name only (refs.name → symbols.name), constrained +// by the caller's enclosing function/method scope (refs.line inside +// symbols.line..end_line). Weights are inversely proportional to the +// callee-name popcount so common names like init/run/handle contribute +// less to the structural signal. The eval harness (eval/) verifies the +// downstream community quality on hand-labeled fixtures. +// +// Why not use tree-sitter scopes for true resolution? Two reasons: +// +// 1. Cost. A precision call graph would require per-language scope +// analysis (Python globals, Go receiver types, JS module resolution). +// The Louvain community detector is robust to noisy edges — adding +// 50% precision typically buys <5% recall improvement in practice, +// not worth the complexity budget. +// +// 2. Substitutability. The eval harness in eval/ measures recall@1 on +// hand-labeled fixtures. If this heuristic falls below threshold for +// a language, we swap in the symbol co-occurrence fallback (same-file +// edges) without touching downstream code — only the source column +// in call_edges changes. +package callgraph + +import ( + "context" + "database/sql" + "fmt" + "strings" +) + +// Source values written to call_edges.source. Useful when the eval +// harness decides to swap one graph for another, or when we want to +// compare two heuristics side-by-side on the same project. +const ( + SourceRefsHeuristic = "refs_heuristic" + SourceCoOccurrence = "co_occurrence" +) + +// DefaultPopcountDrop is the popcount threshold above which a callee +// name is considered too ambiguous to contribute meaningful edges. 20 +// is empirical — names with 20+ definitions in one project are nearly +// always identifiers like "init", "process", "handle". Lowering further +// hurts recall on legitimate library-style code (many small handlers). +const DefaultPopcountDrop = 20 + +// Options configures Build. +type Options struct { + // PopcountDrop is the upper bound on the number of callee candidates + // for a single ref-name lookup. Above this we treat the name as + // noise and emit no edges. Default: DefaultPopcountDrop. + PopcountDrop int + // SameFileBonus multiplies weight when caller and callee live in + // the same file — strong signal that the lookup is the intended + // target. Default: 2.0. + SameFileBonus float64 + // SameParentBonus multiplies weight when caller and callee share + // parent_name (same class/module). Default: 1.5. + SameParentBonus float64 + // MinWeight drops edges whose final weight falls below this. Avoids + // flooding the graph with vanishingly-weighted edges that contribute + // noise to Louvain. Default: 0.01. + MinWeight float64 +} + +// DefaultOptions returns the production-tuned defaults. +func DefaultOptions() Options { + return Options{ + PopcountDrop: DefaultPopcountDrop, + SameFileBonus: 2.0, + SameParentBonus: 1.5, + MinWeight: 0.01, + } +} + +// Stats reports what Build produced. Surfaced in logs + the eval +// harness output. +type Stats struct { + RefsConsidered int + RefsWithCaller int + RefsAboveThreshold int + EdgesEmitted int + EdgesAccumulated int // distinct (caller, callee) pairs in DB +} + +// Build runs the refs-heuristic extractor against a single project and +// REPLACES every call_edges row for that project_path. Idempotent — call +// after each FinishIndexing. The caller picks the database transaction +// boundary: passing tx=nil uses the underlying *sql.DB; passing a tx +// makes the whole operation atomic with the surrounding work. +func Build(ctx context.Context, db *sql.DB, projectPath string, opts Options) (Stats, error) { + if opts.PopcountDrop <= 0 { + opts.PopcountDrop = DefaultPopcountDrop + } + if opts.SameFileBonus == 0 { + opts.SameFileBonus = 2.0 + } + if opts.SameParentBonus == 0 { + opts.SameParentBonus = 1.5 + } + if opts.MinWeight == 0 { + opts.MinWeight = 0.01 + } + + if _, err := db.ExecContext(ctx, `DELETE FROM call_edges WHERE project_path = ?`, projectPath); err != nil { + return Stats{}, fmt.Errorf("clear prior edges: %w", err) + } + + // 1. Build the callee-name → []callee_symbol index for this project. + // Function/method symbols only — we don't model field accesses. + calleesByName, err := loadCallees(ctx, db, projectPath) + if err != nil { + return Stats{}, err + } + + // 2. Build the file → [...caller symbols ordered by narrowest span] + // index so refs can resolve their enclosing scope in O(log n). + callersByFile, err := loadCallers(ctx, db, projectPath) + if err != nil { + return Stats{}, err + } + + // 3. Walk refs, emit edges. Accumulator collapses duplicate (caller, + // callee) pairs by summing weights — gives heavy-use edges a + // proportional contribution to Louvain modularity. + rows, err := db.QueryContext(ctx, + `SELECT name, file_path, line FROM refs WHERE project_path = ?`, projectPath) + if err != nil { + return Stats{}, fmt.Errorf("scan refs: %w", err) + } + defer rows.Close() + + type edgeKey struct{ caller, callee string } + edges := make(map[edgeKey]float64) + + stats := Stats{} + for rows.Next() { + var ( + name, filePath string + line int + ) + if err := rows.Scan(&name, &filePath, &line); err != nil { + return Stats{}, fmt.Errorf("scan ref row: %w", err) + } + stats.RefsConsidered++ + + caller := resolveCaller(callersByFile, filePath, line) + if caller == nil { + continue + } + stats.RefsWithCaller++ + + candidates := calleesByName[name] + if len(candidates) == 0 || len(candidates) > opts.PopcountDrop { + continue + } + stats.RefsAboveThreshold++ + + baseWeight := 1.0 / float64(len(candidates)) + for _, cb := range candidates { + // Self-edges (recursion) are valid for the call graph but + // they don't contribute to community separation — drop them + // to keep the graph clean. + if cb.ID == caller.ID { + continue + } + w := baseWeight + if cb.FilePath == caller.FilePath { + w *= opts.SameFileBonus + } + if cb.ParentName != "" && cb.ParentName == caller.ParentName { + w *= opts.SameParentBonus + } + if w < opts.MinWeight { + continue + } + edges[edgeKey{caller: caller.ID, callee: cb.ID}] += w + stats.EdgesEmitted++ + } + } + if err := rows.Err(); err != nil { + return Stats{}, fmt.Errorf("refs iterator: %w", err) + } + + // 4. Bulk insert the accumulated edges. modernc.org/sqlite handles + // multi-statement INSERTs cleanly when wrapped in a transaction. + if len(edges) == 0 { + return stats, nil + } + tx, err := db.BeginTx(ctx, nil) + if err != nil { + return stats, fmt.Errorf("begin tx: %w", err) + } + stmt, err := tx.PrepareContext(ctx, + `INSERT INTO call_edges (project_path, caller_symbol, callee_symbol, weight, source) + VALUES (?, ?, ?, ?, ?)`) + if err != nil { + _ = tx.Rollback() + return stats, fmt.Errorf("prepare insert: %w", err) + } + defer stmt.Close() + for k, w := range edges { + if _, err := stmt.ExecContext(ctx, projectPath, k.caller, k.callee, w, SourceRefsHeuristic); err != nil { + _ = tx.Rollback() + return stats, fmt.Errorf("insert edge: %w", err) + } + } + if err := tx.Commit(); err != nil { + return stats, fmt.Errorf("commit edges: %w", err) + } + stats.EdgesAccumulated = len(edges) + return stats, nil +} + +// callerSymbol is a slimmed projection of symbols for the per-file +// caller-resolution structure. We carry FilePath + ParentName so the +// weight calculation can apply the same-file / same-parent bonuses +// without re-querying. +type callerSymbol struct { + ID string + Line int + EndLine int + FilePath string + ParentName string +} + +// loadCallers groups function/method symbols by file_path, sorted by +// (line ASC). Resolution at a given line takes O(log n) via binary +// search + linear walk to the narrowest enclosing scope. +func loadCallers(ctx context.Context, db *sql.DB, projectPath string) (map[string][]callerSymbol, error) { + rows, err := db.QueryContext(ctx, ` + SELECT id, line, end_line, file_path, COALESCE(parent_name, '') + FROM symbols + WHERE project_path = ? AND kind IN ('function', 'method') + ORDER BY file_path, line`, projectPath) + if err != nil { + return nil, fmt.Errorf("load callers: %w", err) + } + defer rows.Close() + out := map[string][]callerSymbol{} + for rows.Next() { + var c callerSymbol + if err := rows.Scan(&c.ID, &c.Line, &c.EndLine, &c.FilePath, &c.ParentName); err != nil { + return nil, err + } + out[c.FilePath] = append(out[c.FilePath], c) + } + return out, rows.Err() +} + +// resolveCaller returns the narrowest function/method symbol enclosing +// (file_path, line). When no function contains the line (e.g. a ref at +// module scope) returns nil — module-level refs simply contribute no +// edges, which is the desired behaviour for community detection. +func resolveCaller(byFile map[string][]callerSymbol, filePath string, line int) *callerSymbol { + list, ok := byFile[filePath] + if !ok { + return nil + } + // Pick the symbol with the smallest (end_line - line) span that + // still contains `line`. Multiple nested scopes is rare in code + // (decorators, closures) but cheap to walk linearly. + var best *callerSymbol + for i := range list { + s := &list[i] + if line < s.Line || line > s.EndLine { + continue + } + if best == nil || (s.EndLine-s.Line) < (best.EndLine-best.Line) { + best = s + } + } + return best +} + +// loadCallees indexes function/method symbols by name. Returned slice +// per name MAY contain multiple entries (overloads / homonyms across +// files). The Build loop reads len() of the slice as the popcount used +// for inverse-frequency weighting. +func loadCallees(ctx context.Context, db *sql.DB, projectPath string) (map[string][]callerSymbol, error) { + rows, err := db.QueryContext(ctx, ` + SELECT id, line, end_line, file_path, COALESCE(parent_name, ''), name + FROM symbols + WHERE project_path = ? AND kind IN ('function', 'method')`, projectPath) + if err != nil { + return nil, fmt.Errorf("load callees: %w", err) + } + defer rows.Close() + out := map[string][]callerSymbol{} + for rows.Next() { + var ( + c callerSymbol + name string + ) + if err := rows.Scan(&c.ID, &c.Line, &c.EndLine, &c.FilePath, &c.ParentName, &name); err != nil { + return nil, err + } + out[name] = append(out[name], c) + } + return out, rows.Err() +} + +// CountEdges returns the number of rows in call_edges for a project. +// Used by /api/v1/workspaces/{id}/repos to surface graph completion +// state in the dashboard ("graph: 1234 edges"). +func CountEdges(ctx context.Context, db *sql.DB, projectPath string) (int, error) { + var n int + err := db.QueryRowContext(ctx, + `SELECT COUNT(*) FROM call_edges WHERE project_path = ?`, projectPath).Scan(&n) + if err != nil { + return 0, fmt.Errorf("count edges: %w", err) + } + return n, nil +} + +// --- helpers shared with the eval harness --- + +// NormaliseName lowercases identifiers and strips Go's exported-case +// distinction so the eval harness can specify pairs without worrying +// about which exact form the parser captured. Caller must ensure both +// sides are passed through the same way. +func NormaliseName(name string) string { + return strings.ToLower(name) +} diff --git a/server/internal/callgraph/callgraph_test.go b/server/internal/callgraph/callgraph_test.go new file mode 100644 index 0000000..bf55f10 --- /dev/null +++ b/server/internal/callgraph/callgraph_test.go @@ -0,0 +1,282 @@ +package callgraph + +import ( + "context" + "database/sql" + "testing" + "time" + + "github.com/dvcdsys/code-index/server/internal/db" +) + +// fixtureDB stands up a project with hand-crafted symbols + refs so we +// can assert the heuristic's outputs without driving the full indexer. +// Each test calls this helper and supplies a seed function. +func fixtureDB(t *testing.T, seed func(d *sql.DB)) (*sql.DB, string) { + t.Helper() + d, err := db.Open(":memory:") + if err != nil { + t.Fatalf("open: %v", err) + } + t.Cleanup(func() { _ = d.Close() }) + const projectPath = "github.com/test/repo@main" + now := time.Now().UTC().Format(time.RFC3339Nano) + if _, err := d.Exec(`INSERT INTO projects (host_path, container_path, languages, settings, stats, status, created_at, updated_at, path_hash) + VALUES (?, ?, '[]', '{}', '{}', 'created', ?, ?, 'abc')`, + projectPath, projectPath, now, now); err != nil { + t.Fatalf("seed project: %v", err) + } + seed(d) + return d, projectPath +} + +func insertSymbol(t *testing.T, d *sql.DB, projectPath, id, name, kind, file string, line, end int, parent string) { + t.Helper() + if _, err := d.Exec( + `INSERT INTO symbols (id, project_path, name, kind, file_path, line, end_line, language, signature, parent_name) + VALUES (?, ?, ?, ?, ?, ?, ?, 'go', '', NULLIF(?, ''))`, + id, projectPath, name, kind, file, line, end, parent, + ); err != nil { + t.Fatalf("insert symbol: %v", err) + } +} + +func insertRef(t *testing.T, d *sql.DB, projectPath, name, file string, line int) { + t.Helper() + if _, err := d.Exec( + `INSERT INTO refs (project_path, name, file_path, line, col, language) + VALUES (?, ?, ?, ?, 0, 'go')`, + projectPath, name, file, line, + ); err != nil { + t.Fatalf("insert ref: %v", err) + } +} + +func loadEdges(t *testing.T, d *sql.DB, projectPath string) []edgeRow { + t.Helper() + rows, err := d.Query(`SELECT caller_symbol, callee_symbol, weight FROM call_edges WHERE project_path = ?`, projectPath) + if err != nil { + t.Fatalf("query edges: %v", err) + } + defer rows.Close() + var out []edgeRow + for rows.Next() { + var e edgeRow + _ = rows.Scan(&e.Caller, &e.Callee, &e.Weight) + out = append(out, e) + } + return out +} + +type edgeRow struct { + Caller, Callee string + Weight float64 +} + +// Happy path: one caller, one callee, in-the-same-file, popcount=1. +// Resulting weight: 1/1 × same-file bonus = 2.0. +func TestBuildSingleEdge(t *testing.T) { + const pp = "github.com/test/repo@main" + d, _ := fixtureDB(t, func(d *sql.DB) { + insertSymbol(t, d, pp, "fn1", "doX", "function", "main.go", 10, 20, "") + insertSymbol(t, d, pp, "fn2", "helper", "function", "main.go", 30, 35, "") + insertRef(t, d, pp, "helper", "main.go", 15) // call from inside fn1 + }) + + stats, err := Build(context.Background(), d, pp, DefaultOptions()) + if err != nil { + t.Fatalf("Build: %v", err) + } + if stats.EdgesAccumulated != 1 { + t.Fatalf("expected 1 edge, got %d (stats: %+v)", stats.EdgesAccumulated, stats) + } + edges := loadEdges(t, d, pp) + if len(edges) != 1 { + t.Fatalf("expected 1 row, got %d", len(edges)) + } + if edges[0].Caller != "fn1" || edges[0].Callee != "fn2" { + t.Fatalf("wrong edge direction: %+v", edges[0]) + } + if edges[0].Weight < 1.9 || edges[0].Weight > 2.1 { + t.Fatalf("expected weight ~2.0 (same-file bonus), got %v", edges[0].Weight) + } +} + +// popcount=N → weight = 1/N. Refs to a common name (e.g. "init") in +// 25 places: every edge dropped (PopcountDrop=20 default). +func TestPopcountDrop(t *testing.T) { + d, _ := fixtureDB(t, func(d *sql.DB) { + pp := "github.com/test/repo@main" + insertSymbol(t, d, pp, "caller", "doX", "function", "caller.go", 10, 20, "") + // 25 callee definitions of "init" across many files. + for i := 0; i < 25; i++ { + file := "a.go" + if i%2 == 0 { + file = "b.go" + } + id := "init_" + string(rune('a'+i%26)) + string(rune('a'+i/26)) + insertSymbol(t, d, pp, id, "init", "function", file, 100+i*10, 105+i*10, "") + } + insertRef(t, d, pp, "init", "caller.go", 15) + }) + pp := "github.com/test/repo@main" + stats, err := Build(context.Background(), d, pp, DefaultOptions()) + if err != nil { + t.Fatalf("Build: %v", err) + } + if stats.EdgesAccumulated != 0 { + t.Fatalf("expected 0 edges (popcount drop), got %d", stats.EdgesAccumulated) + } +} + +// Module-scope ref: line lies outside any function's span → no edge. +func TestModuleScopeRefSkipped(t *testing.T) { + d, _ := fixtureDB(t, func(d *sql.DB) { + pp := "github.com/test/repo@main" + insertSymbol(t, d, pp, "fn", "main", "function", "main.go", 10, 20, "") + // Ref at line 5 — before any function in this file. + insertRef(t, d, pp, "main", "main.go", 5) + }) + stats, err := Build(context.Background(), d, "github.com/test/repo@main", DefaultOptions()) + if err != nil { + t.Fatalf("Build: %v", err) + } + if stats.EdgesAccumulated != 0 { + t.Fatalf("expected 0 edges, got %d (stats: %+v)", stats.EdgesAccumulated, stats) + } +} + +// Self-edges (recursion) are dropped — they don't help Louvain. +func TestSelfEdgeDropped(t *testing.T) { + d, _ := fixtureDB(t, func(d *sql.DB) { + pp := "github.com/test/repo@main" + insertSymbol(t, d, pp, "fac", "factorial", "function", "f.go", 10, 30, "") + // Ref to "factorial" from INSIDE factorial — recursion. + insertRef(t, d, pp, "factorial", "f.go", 15) + }) + stats, err := Build(context.Background(), d, "github.com/test/repo@main", DefaultOptions()) + if err != nil { + t.Fatalf("Build: %v", err) + } + if stats.EdgesAccumulated != 0 { + t.Fatalf("expected 0 edges (self-recursion), got %d", stats.EdgesAccumulated) + } +} + +// Cross-file caller/callee → 1/N weight without the same-file bonus. +func TestCrossFileWeight(t *testing.T) { + d, _ := fixtureDB(t, func(d *sql.DB) { + pp := "github.com/test/repo@main" + insertSymbol(t, d, pp, "main", "main", "function", "cmd/main.go", 10, 20, "") + insertSymbol(t, d, pp, "lib1", "helper", "function", "lib/a.go", 10, 15, "") + insertSymbol(t, d, pp, "lib2", "helper", "function", "lib/b.go", 10, 15, "") + // popcount=2 → base weight 0.5 + insertRef(t, d, pp, "helper", "cmd/main.go", 15) + }) + stats, err := Build(context.Background(), d, "github.com/test/repo@main", DefaultOptions()) + if err != nil { + t.Fatalf("Build: %v", err) + } + if stats.EdgesAccumulated != 2 { + t.Fatalf("expected 2 edges (one per candidate), got %d", stats.EdgesAccumulated) + } + edges := loadEdges(t, d, "github.com/test/repo@main") + for _, e := range edges { + // Cross-file callees → no same-file bonus → weight = 0.5 + if e.Weight < 0.49 || e.Weight > 0.51 { + t.Fatalf("expected cross-file weight ≈0.5, got %v", e.Weight) + } + } +} + +// Same parent (method calls within a class) → 1.5× bonus. +func TestSameParentBonus(t *testing.T) { + d, _ := fixtureDB(t, func(d *sql.DB) { + pp := "github.com/test/repo@main" + // Two methods on the same struct, same file. + insertSymbol(t, d, pp, "m1", "Process", "method", "svc.go", 10, 20, "Service") + insertSymbol(t, d, pp, "m2", "helper", "method", "svc.go", 25, 35, "Service") + insertRef(t, d, pp, "helper", "svc.go", 15) + }) + stats, err := Build(context.Background(), d, "github.com/test/repo@main", DefaultOptions()) + if err != nil { + t.Fatalf("Build: %v", err) + } + if stats.EdgesAccumulated != 1 { + t.Fatalf("expected 1 edge, got %d", stats.EdgesAccumulated) + } + edges := loadEdges(t, d, "github.com/test/repo@main") + // 1/1 × 2.0 (same-file) × 1.5 (same-parent) = 3.0 + if edges[0].Weight < 2.95 || edges[0].Weight > 3.05 { + t.Fatalf("expected weight ≈3.0 (file+parent bonus), got %v", edges[0].Weight) + } +} + +// Multiple refs to the same callee accumulate into one row (sum of weights). +func TestAccumulation(t *testing.T) { + d, _ := fixtureDB(t, func(d *sql.DB) { + pp := "github.com/test/repo@main" + insertSymbol(t, d, pp, "fn", "main", "function", "main.go", 1, 100, "") + insertSymbol(t, d, pp, "helper", "helper", "function", "main.go", 200, 210, "") + insertRef(t, d, pp, "helper", "main.go", 5) + insertRef(t, d, pp, "helper", "main.go", 6) + insertRef(t, d, pp, "helper", "main.go", 7) + }) + stats, err := Build(context.Background(), d, "github.com/test/repo@main", DefaultOptions()) + if err != nil { + t.Fatalf("Build: %v", err) + } + edges := loadEdges(t, d, "github.com/test/repo@main") + if len(edges) != 1 { + t.Fatalf("expected 1 distinct edge, got %d", len(edges)) + } + // 3 refs × (1/1 × 2.0 same-file) = 6.0 + if edges[0].Weight < 5.9 || edges[0].Weight > 6.1 { + t.Fatalf("expected accumulated weight ≈6.0, got %v", edges[0].Weight) + } + if stats.EdgesEmitted != 3 || stats.EdgesAccumulated != 1 { + t.Fatalf("unexpected stats: %+v", stats) + } +} + +// Idempotency: running Build twice yields the same edge set, not a duplicate. +func TestIdempotent(t *testing.T) { + d, _ := fixtureDB(t, func(d *sql.DB) { + pp := "github.com/test/repo@main" + insertSymbol(t, d, pp, "a", "doX", "function", "main.go", 10, 20, "") + insertSymbol(t, d, pp, "b", "helper", "function", "main.go", 30, 35, "") + insertRef(t, d, pp, "helper", "main.go", 15) + }) + pp := "github.com/test/repo@main" + if _, err := Build(context.Background(), d, pp, DefaultOptions()); err != nil { + t.Fatalf("first Build: %v", err) + } + first := loadEdges(t, d, pp) + if _, err := Build(context.Background(), d, pp, DefaultOptions()); err != nil { + t.Fatalf("second Build: %v", err) + } + second := loadEdges(t, d, pp) + if len(first) != len(second) { + t.Fatalf("idempotency broken: %d vs %d edges", len(first), len(second)) + } +} + +func TestCountEdges(t *testing.T) { + d, _ := fixtureDB(t, func(d *sql.DB) { + pp := "github.com/test/repo@main" + insertSymbol(t, d, pp, "a", "doX", "function", "main.go", 10, 20, "") + insertSymbol(t, d, pp, "b", "helper", "function", "main.go", 30, 35, "") + insertRef(t, d, pp, "helper", "main.go", 15) + }) + pp := "github.com/test/repo@main" + if _, err := Build(context.Background(), d, pp, DefaultOptions()); err != nil { + t.Fatalf("Build: %v", err) + } + n, err := CountEdges(context.Background(), d, pp) + if err != nil { + t.Fatalf("Count: %v", err) + } + if n != 1 { + t.Fatalf("expected 1, got %d", n) + } +} diff --git a/server/internal/callgraph/eval/eval_test.go b/server/internal/callgraph/eval/eval_test.go new file mode 100644 index 0000000..fdfb9d3 --- /dev/null +++ b/server/internal/callgraph/eval/eval_test.go @@ -0,0 +1,331 @@ +// Package eval is the gate that decides whether the refs-heuristic call +// graph is good enough to feed Louvain. Hand-labeled (caller, callee) +// pairs across Go / Python / TypeScript fixtures pass through the full +// chunker → symbols/refs → callgraph.Build pipeline, and the harness +// asserts the labeled edge appears in call_edges. +// +// Threshold: precision per language must be >= 0.6, matching the Plan +// agent's mitigation rule. If a language falls below, we don't ship the +// downstream Louvain feature with this graph as the substrate — the +// reasonable fallback is the symbol co-occurrence graph (edges between +// symbols in the same file), implemented as a separate Source in +// callgraph. +package eval + +import ( + "context" + "database/sql" + "fmt" + "sort" + "testing" + "time" + + "github.com/google/uuid" + + "github.com/dvcdsys/code-index/server/internal/callgraph" + "github.com/dvcdsys/code-index/server/internal/chunker" + "github.com/dvcdsys/code-index/server/internal/db" +) + +// labeledPair is one (caller, callee) expectation. The caller and callee +// values are symbol NAMES — the harness resolves them via the symbols +// table after chunking. Names that match multiple symbols (overloads) +// are disallowed in fixtures; pick unique names. +type labeledPair struct { + Caller string + Callee string +} + +// langFixture pairs a small piece of source code with the (caller, callee) +// edges the heuristic should produce. The fixture is exercised end-to-end: +// chunked → symbols+refs persisted → callgraph.Build → edge presence asserted. +type langFixture struct { + Name string // displayed in test output + Language string // language constant chunker recognises + FilePath string // virtual path; only the extension matters for some chunkers + Source string + Expected []labeledPair +} + +// Threshold below which we'd fall back to a co-occurrence graph. 0.6 is +// the Plan agent's recommended floor; tighten once we have real data. +const precisionFloor = 0.6 + +// fixtures kept tiny on purpose — every line is curated. A typical real +// project has thousands of refs; the eval is about checking the heuristic +// resolves the SHAPE correctly on small but realistic code, not about +// stress-testing throughput. +var fixtures = []langFixture{ + { + Name: "go-handlers", + Language: "go", + FilePath: "handlers.go", + Source: `package main + +import "fmt" + +type Server struct{} + +func (s *Server) HandleLogin(req string) string { + user := parseUser(req) + if !s.validate(user) { + return "denied" + } + return s.issueToken(user) +} + +func (s *Server) validate(user string) bool { + return user != "" +} + +func (s *Server) issueToken(user string) string { + return fmt.Sprintf("token:%s", user) +} + +func parseUser(req string) string { + return req +} + +func main() { + s := &Server{} + s.HandleLogin("user1") +} +`, + Expected: []labeledPair{ + {Caller: "HandleLogin", Callee: "parseUser"}, + {Caller: "HandleLogin", Callee: "validate"}, + {Caller: "HandleLogin", Callee: "issueToken"}, + {Caller: "main", Callee: "HandleLogin"}, + }, + }, + { + Name: "python-pipeline", + Language: "python", + FilePath: "pipeline.py", + Source: `class Pipeline: + def run(self, source): + rows = self.fetch(source) + cleaned = self.clean(rows) + return self.persist(cleaned) + + def fetch(self, source): + return open_source(source) + + def clean(self, rows): + return [r for r in rows if r] + + def persist(self, rows): + write_rows(rows) + return len(rows) + + +def open_source(name): + return [name] + + +def write_rows(rows): + print(rows) + + +def main(): + p = Pipeline() + p.run("local") +`, + Expected: []labeledPair{ + {Caller: "run", Callee: "fetch"}, + {Caller: "run", Callee: "clean"}, + {Caller: "run", Callee: "persist"}, + {Caller: "fetch", Callee: "open_source"}, + {Caller: "persist", Callee: "write_rows"}, + {Caller: "main", Callee: "run"}, + }, + }, + { + Name: "typescript-store", + Language: "typescript", + FilePath: "store.ts", + Source: `function loadConfig(): string { + return readFile("config.json"); +} + +function readFile(path: string): string { + return path; +} + +function applyMigrations(cfg: string): void { + const items = parseItems(cfg); + runItems(items); +} + +function parseItems(cfg: string): string[] { + return [cfg]; +} + +function runItems(items: string[]): void { + console.log(items.length); +} + +function bootstrap(): void { + const cfg = loadConfig(); + applyMigrations(cfg); +} +`, + Expected: []labeledPair{ + {Caller: "loadConfig", Callee: "readFile"}, + {Caller: "applyMigrations", Callee: "parseItems"}, + {Caller: "applyMigrations", Callee: "runItems"}, + {Caller: "bootstrap", Callee: "loadConfig"}, + {Caller: "bootstrap", Callee: "applyMigrations"}, + }, + }, +} + +// TestEval runs every fixture, records precision per language, and fails +// the build only when at least one language falls below precisionFloor. +// +// Output format is deliberately verbose — when we eventually swap the +// heuristic, this log gives a clean before/after comparison. +func TestEval(t *testing.T) { + results := map[string]float64{} + for _, fx := range fixtures { + got, total, err := runFixture(t, fx) + if err != nil { + t.Fatalf("[%s] fixture failed: %v", fx.Name, err) + } + prec := float64(got) / float64(total) + results[fx.Name] = prec + t.Logf("[%s] %d / %d expected edges present (precision %.2f)", + fx.Name, got, total, prec) + } + + // Sort for stable output regardless of map iteration order. + keys := make([]string, 0, len(results)) + for k := range results { + keys = append(keys, k) + } + sort.Strings(keys) + + failing := []string{} + for _, k := range keys { + if results[k] < precisionFloor { + failing = append(failing, fmt.Sprintf("%s=%.2f", k, results[k])) + } + } + if len(failing) > 0 { + t.Fatalf("call-graph eval failed precision floor %.2f for: %v\nFall back to co-occurrence graph (callgraph.SourceCoOccurrence) before shipping PR5.", + precisionFloor, failing) + } +} + +// runFixture executes the full chunk → persist → build → assert pipeline +// for one source file. Returns (edges-found, total-expected, err). +func runFixture(t *testing.T, fx langFixture) (int, int, error) { + t.Helper() + database, err := db.Open(":memory:") + if err != nil { + return 0, 0, fmt.Errorf("open db: %w", err) + } + defer database.Close() + + const projectPath = "github.com/test/repo@main" + now := time.Now().UTC().Format(time.RFC3339Nano) + if _, err := database.Exec(`INSERT INTO projects (host_path, container_path, languages, settings, stats, status, created_at, updated_at, path_hash) + VALUES (?, ?, '[]', '{}', '{}', 'created', ?, ?, 'abc')`, + projectPath, projectPath, now, now); err != nil { + return 0, 0, fmt.Errorf("seed project: %w", err) + } + + // Activate every language the chunker registry knows about — defaults + // to all when nothing is configured by tests (chunker uses package + // state). Configure with an empty slice means "default registry". + chunker.Configure(nil) + + chunks, refs, err := chunker.ChunkFile(fx.FilePath, fx.Source, fx.Language, 0) + if err != nil { + return 0, 0, fmt.Errorf("chunk: %w", err) + } + + if err := persistChunks(database, projectPath, fx.Language, chunks, refs); err != nil { + return 0, 0, err + } + + if _, err := callgraph.Build(context.Background(), database, projectPath, callgraph.DefaultOptions()); err != nil { + return 0, 0, fmt.Errorf("build: %w", err) + } + + got := 0 + for _, pair := range fx.Expected { + ok, err := hasEdgeByName(database, projectPath, pair.Caller, pair.Callee) + if err != nil { + return 0, 0, err + } + if ok { + got++ + } else { + t.Logf("[%s] MISSING edge: %s → %s", fx.Name, pair.Caller, pair.Callee) + } + } + return got, len(fx.Expected), nil +} + +// persistChunks mirrors indexer.ProcessFiles' symbol+ref persistence +// without the embedding loop. Lifted to keep the eval harness independent +// of the embeddings sidecar. +func persistChunks(d *sql.DB, projectPath, language string, chunks []chunker.Chunk, refs []chunker.Reference) error { + for _, c := range chunks { + if c.SymbolName == nil { + continue + } + switch c.ChunkType { + case "function", "class", "method", "type": + default: + continue + } + var sig string + if c.SymbolSignature != nil { + sig = *c.SymbolSignature + } + var parent any + if c.ParentName != nil && *c.ParentName != "" { + parent = *c.ParentName + } else { + parent = nil + } + if _, err := d.Exec( + `INSERT INTO symbols (id, project_path, name, kind, file_path, line, end_line, language, signature, parent_name) + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`, + uuid.NewString(), projectPath, *c.SymbolName, c.ChunkType, c.FilePath, + c.StartLine, c.EndLine, language, sig, parent, + ); err != nil { + return fmt.Errorf("insert symbol: %w", err) + } + } + for _, r := range refs { + if _, err := d.Exec( + `INSERT INTO refs (project_path, name, file_path, line, col, language) + VALUES (?, ?, ?, ?, ?, ?)`, + projectPath, r.Name, r.FilePath, r.Line, r.Col, language, + ); err != nil { + return fmt.Errorf("insert ref: %w", err) + } + } + return nil +} + +// hasEdgeByName resolves both names to symbol ids and checks call_edges. +// Returns false (not an error) when either name resolves to zero +// symbols — that's typically a missing chunk and is informative for the +// harness output. +func hasEdgeByName(d *sql.DB, projectPath, caller, callee string) (bool, error) { + var n int + err := d.QueryRow(` + SELECT COUNT(*) FROM call_edges + WHERE project_path = ? + AND caller_symbol IN (SELECT id FROM symbols WHERE project_path = ? AND name = ?) + AND callee_symbol IN (SELECT id FROM symbols WHERE project_path = ? AND name = ?)`, + projectPath, projectPath, caller, projectPath, callee).Scan(&n) + if err != nil { + return false, err + } + return n > 0, nil +} diff --git a/server/internal/chunksfts/chunksfts.go b/server/internal/chunksfts/chunksfts.go new file mode 100644 index 0000000..4077e55 --- /dev/null +++ b/server/internal/chunksfts/chunksfts.go @@ -0,0 +1,254 @@ +// Package chunksfts maintains the BM25-searchable FTS5 mirror of every +// chunk that lives in the chromem-go vector store. +// +// We index the same chunk content (plus symbol_name and file_path) into +// a trigram-tokenized FTS5 virtual table so that workspace search can +// combine dense vector retrieval with sparse keyword retrieval. The +// sparse signal serves two roles: +// +// 1. Acronym / short-token precision. Short product codes and +// unique identifiers get diffuse cosine scores from the embedding +// model — BM25 over literal tokens recovers the precision. +// +// 2. Project-relevance gating. Pure dense fan-out returns N nearest +// vectors from every project's chromem collection regardless of +// semantic distance, so projects that share zero vocabulary with +// the query still show up at chunk_score ~0.2-0.3. BM25 returning +// zero hits is a strong "this project has nothing" signal that +// dense alone cannot produce. +// +// chunks_fts is filterable only by rowid; chunks_meta is the regular +// indexed shadow that lets us look up (and delete) the rows belonging +// to a given (project_path, file_path) pair efficiently. The two +// tables are kept consistent inside the indexer's per-file SQL +// transaction. +package chunksfts + +import ( + "context" + "database/sql" + "fmt" + "strings" +) + +// Chunk is the unit the indexer hands us. Mirrors vectorstore.Chunk +// with only the fields BM25 indexing cares about — the embedding stays +// in chromem. +type Chunk struct { + Content string + FilePath string + StartLine int + EndLine int + ChunkType string + SymbolName string + Language string +} + +// Hit is one BM25 ranking row. +type Hit struct { + FilePath string + StartLine int + EndLine int + ChunkType string + SymbolName string + Language string + Content string + // Score is positive ("higher is better"). SQLite's bm25() returns + // negative values where smaller = more relevant; we flip the sign + // at the storage boundary so callers don't need to know. + Score float64 +} + +// UpsertByFileTx replaces the FTS rows for one (project_path, file_path) +// pair atomically inside the caller's transaction. Existing rows for the +// pair are deleted first, then the new chunks are inserted. Both tables +// (chunks_meta and chunks_fts) are kept in sync via a per-row rowid +// generated by chunks_meta's autoincrement primary key. +// +// The caller (indexer) wraps this in the same per-file SQL tx that +// commits symbols/refs/file_hashes, so a tx rollback unwinds every +// SQLite-side change for the file atomically. +func UpsertByFileTx(ctx context.Context, tx *sql.Tx, projectPath, filePath string, chunks []Chunk) error { + if err := DeleteByFileTx(ctx, tx, projectPath, filePath); err != nil { + return err + } + if len(chunks) == 0 { + return nil + } + metaStmt, err := tx.PrepareContext(ctx, + `INSERT INTO chunks_meta + (project_path, file_path, start_line, end_line, chunk_type, symbol_name, language) + VALUES (?, ?, ?, ?, ?, ?, ?)`, + ) + if err != nil { + return fmt.Errorf("prepare chunks_meta insert: %w", err) + } + defer metaStmt.Close() + + ftsStmt, err := tx.PrepareContext(ctx, + `INSERT INTO chunks_fts(rowid, content, symbol_name, file_path) + VALUES (?, ?, ?, ?)`, + ) + if err != nil { + return fmt.Errorf("prepare chunks_fts insert: %w", err) + } + defer ftsStmt.Close() + + for _, c := range chunks { + res, err := metaStmt.ExecContext(ctx, + projectPath, filePath, + c.StartLine, c.EndLine, + nullIfEmpty(c.ChunkType), + nullIfEmpty(c.SymbolName), + nullIfEmpty(c.Language), + ) + if err != nil { + return fmt.Errorf("insert chunks_meta: %w", err) + } + rowid, err := res.LastInsertId() + if err != nil { + return fmt.Errorf("chunks_meta LastInsertId: %w", err) + } + if _, err := ftsStmt.ExecContext(ctx, rowid, c.Content, c.SymbolName, c.FilePath); err != nil { + return fmt.Errorf("insert chunks_fts: %w", err) + } + } + return nil +} + +// DeleteByFileTx removes the BM25 rows for one (project_path, file_path). +// chunks_fts must be cleared via rowid IN (SELECT rowid FROM chunks_meta +// WHERE ...) because FTS5 cannot filter directly on unindexed metadata. +func DeleteByFileTx(ctx context.Context, tx *sql.Tx, projectPath, filePath string) error { + if _, err := tx.ExecContext(ctx, + `DELETE FROM chunks_fts + WHERE rowid IN ( + SELECT rowid FROM chunks_meta WHERE project_path = ? AND file_path = ? + )`, + projectPath, filePath, + ); err != nil { + return fmt.Errorf("delete chunks_fts: %w", err) + } + if _, err := tx.ExecContext(ctx, + `DELETE FROM chunks_meta WHERE project_path = ? AND file_path = ?`, + projectPath, filePath, + ); err != nil { + return fmt.Errorf("delete chunks_meta: %w", err) + } + return nil +} + +// DeleteByProjectTx wipes every BM25 row for a project. Used by the +// indexer's full-reindex wipe path and by projects.Delete so removing a +// project leaves no stranded FTS rows. +func DeleteByProjectTx(ctx context.Context, tx *sql.Tx, projectPath string) error { + if _, err := tx.ExecContext(ctx, + `DELETE FROM chunks_fts + WHERE rowid IN (SELECT rowid FROM chunks_meta WHERE project_path = ?)`, + projectPath, + ); err != nil { + return fmt.Errorf("delete chunks_fts by project: %w", err) + } + if _, err := tx.ExecContext(ctx, + `DELETE FROM chunks_meta WHERE project_path = ?`, + projectPath, + ); err != nil { + return fmt.Errorf("delete chunks_meta by project: %w", err) + } + return nil +} + +// DeleteByProject is the non-tx form for callers that don't already hold +// one (admin DeleteProject handler, manual cleanup). +func DeleteByProject(ctx context.Context, db *sql.DB, projectPath string) error { + tx, err := db.BeginTx(ctx, nil) + if err != nil { + return err + } + defer tx.Rollback() //nolint:errcheck // no-op after commit + if err := DeleteByProjectTx(ctx, tx, projectPath); err != nil { + return err + } + return tx.Commit() +} + +// SearchProject runs an OR-joined trigram FTS5 query restricted to a +// single project and returns up to `limit` chunks ranked by BM25. +// +// Empty or all-tokens-too-short queries return a nil slice without +// hitting the DB — there is nothing to match. +func SearchProject(ctx context.Context, db *sql.DB, projectPath, query string, limit int) ([]Hit, error) { + if limit <= 0 { + limit = 20 + } + fts5Q := buildFTS5Query(query) + if fts5Q == "" { + return nil, nil + } + rows, err := db.QueryContext(ctx, + `SELECT cm.file_path, cm.start_line, cm.end_line, + cm.chunk_type, cm.symbol_name, cm.language, + cf.content, bm25(chunks_fts) AS bm + FROM chunks_fts cf + JOIN chunks_meta cm ON cm.rowid = cf.rowid + WHERE chunks_fts MATCH ? AND cm.project_path = ? + ORDER BY bm ASC + LIMIT ?`, + fts5Q, projectPath, limit, + ) + if err != nil { + return nil, fmt.Errorf("chunks_fts search: %w", err) + } + defer rows.Close() + var out []Hit + for rows.Next() { + var ( + h Hit + chunkT sql.NullString + symName sql.NullString + language sql.NullString + bm float64 + ) + if err := rows.Scan(&h.FilePath, &h.StartLine, &h.EndLine, + &chunkT, &symName, &language, &h.Content, &bm); err != nil { + return nil, fmt.Errorf("scan chunks_fts row: %w", err) + } + h.ChunkType = chunkT.String + h.SymbolName = symName.String + h.Language = language.String + // SQLite returns more-negative bm25 for better matches. Flip so + // callers can blend with cosine-style "higher is better" scores. + h.Score = -bm + out = append(out, h) + } + if err := rows.Err(); err != nil { + return nil, fmt.Errorf("iterate chunks_fts: %w", err) + } + return out, nil +} + +// buildFTS5Query turns a free-text query into a safe FTS5 expression: +// each whitespace-separated word becomes a double-quoted phrase, all +// phrases are OR-joined. Single-character tokens are dropped (trigram +// tokenizer falls back to prefix-search for sub-3 tokens which is +// noisy). Internal double quotes are doubled per the FTS5 grammar. +func buildFTS5Query(q string) string { + words := strings.Fields(q) + parts := make([]string, 0, len(words)) + for _, w := range words { + w = strings.TrimSpace(w) + if len([]rune(w)) < 2 { + continue + } + w = strings.ReplaceAll(w, `"`, `""`) + parts = append(parts, `"`+w+`"`) + } + return strings.Join(parts, " OR ") +} + +func nullIfEmpty(s string) any { + if s == "" { + return nil + } + return s +} diff --git a/server/internal/chunksfts/chunksfts_test.go b/server/internal/chunksfts/chunksfts_test.go new file mode 100644 index 0000000..ab9e4de --- /dev/null +++ b/server/internal/chunksfts/chunksfts_test.go @@ -0,0 +1,272 @@ +package chunksfts + +import ( + "context" + "database/sql" + "strings" + "testing" + + "github.com/dvcdsys/code-index/server/internal/db" +) + +func openTestDB(t *testing.T) *sql.DB { + t.Helper() + d, err := db.Open(":memory:") + if err != nil { + t.Fatalf("open test db: %v", err) + } + t.Cleanup(func() { d.Close() }) + return d +} + +func upsert(t *testing.T, d *sql.DB, project, file string, chunks []Chunk) { + t.Helper() + ctx := context.Background() + tx, err := d.BeginTx(ctx, nil) + if err != nil { + t.Fatalf("begin tx: %v", err) + } + defer tx.Rollback() //nolint:errcheck + if err := UpsertByFileTx(ctx, tx, project, file, chunks); err != nil { + t.Fatalf("UpsertByFileTx: %v", err) + } + if err := tx.Commit(); err != nil { + t.Fatalf("commit: %v", err) + } +} + +func TestUpsertAndSearchProject_FindsLiteralToken(t *testing.T) { + d := openTestDB(t) + ctx := context.Background() + + upsert(t, d, "proj-a", "src/widget_processor.go", []Chunk{ + {Content: "func ProcessWidget(w *Widget) error { ... }", FilePath: "src/widget_processor.go", StartLine: 1, EndLine: 5, SymbolName: "ProcessWidget", Language: "go"}, + {Content: "// WIDGET is the internal product code", FilePath: "src/widget_processor.go", StartLine: 10, EndLine: 10, Language: "go"}, + }) + upsert(t, d, "proj-b", "src/util.go", []Chunk{ + {Content: "func helloWorld() {}", FilePath: "src/util.go", StartLine: 1, EndLine: 3, SymbolName: "helloWorld", Language: "go"}, + }) + + hits, err := SearchProject(ctx, d, "proj-a", "WIDGET", 10) + if err != nil { + t.Fatalf("SearchProject: %v", err) + } + if len(hits) == 0 { + t.Fatal("expected at least one WIDGET hit in proj-a") + } + for _, h := range hits { + if !strings.Contains(strings.ToLower(h.Content+h.SymbolName), "widget") { + t.Errorf("hit doesn't actually mention widget: %+v", h) + } + } + + hitsB, err := SearchProject(ctx, d, "proj-b", "WIDGET", 10) + if err != nil { + t.Fatalf("SearchProject b: %v", err) + } + if len(hitsB) != 0 { + t.Errorf("expected zero WIDGET hits in proj-b, got %d", len(hitsB)) + } +} + +func TestSearchProject_RanksMoreMentionsHigher(t *testing.T) { + d := openTestDB(t) + ctx := context.Background() + + // Two chunks in the same project; chunk-1 mentions "ping" once, + // chunk-2 mentions it many times. BM25 should rank chunk-2 higher. + upsert(t, d, "p", "f.go", []Chunk{ + {Content: "this code does a ping once", FilePath: "f.go", StartLine: 1, EndLine: 1, Language: "go"}, + {Content: "ping ping ping ping ping handle request ping loop", FilePath: "f.go", StartLine: 2, EndLine: 2, Language: "go"}, + }) + + hits, err := SearchProject(ctx, d, "p", "ping", 10) + if err != nil { + t.Fatalf("SearchProject: %v", err) + } + if len(hits) < 2 { + t.Fatalf("expected >=2 hits, got %d", len(hits)) + } + if hits[0].StartLine != 2 { + t.Errorf("expected line 2 ranked first; got order %v %v", hits[0].StartLine, hits[1].StartLine) + } + if hits[0].Score <= hits[1].Score { + t.Errorf("expected hits[0].Score > hits[1].Score, got %v vs %v", hits[0].Score, hits[1].Score) + } +} + +func TestSearchProject_OrJoinsTokens(t *testing.T) { + d := openTestDB(t) + ctx := context.Background() + + upsert(t, d, "p", "f.go", []Chunk{ + {Content: "totally unrelated content", FilePath: "f.go", StartLine: 1, EndLine: 1, Language: "go"}, + {Content: "this mentions ping only", FilePath: "f.go", StartLine: 2, EndLine: 2, Language: "go"}, + {Content: "this mentions WIDGET only", FilePath: "f.go", StartLine: 3, EndLine: 3, Language: "go"}, + {Content: "this mentions both ping and WIDGET", FilePath: "f.go", StartLine: 4, EndLine: 4, Language: "go"}, + }) + + hits, err := SearchProject(ctx, d, "p", "ping WIDGET", 10) + if err != nil { + t.Fatalf("SearchProject: %v", err) + } + if len(hits) != 3 { + t.Fatalf("expected 3 hits (lines 2,3,4), got %d", len(hits)) + } + if hits[0].StartLine != 4 { + t.Errorf("expected the both-tokens chunk (line 4) ranked first, got line %d", hits[0].StartLine) + } +} + +func TestSearchProject_TrigramMatchesInsideCamelCase(t *testing.T) { + d := openTestDB(t) + ctx := context.Background() + upsert(t, d, "p", "f.go", []Chunk{ + {Content: "func processWidgetItemEvent() {}", FilePath: "f.go", StartLine: 1, EndLine: 1, SymbolName: "processWidgetItemEvent", Language: "go"}, + {Content: "func helloWorld() {}", FilePath: "f.go", StartLine: 2, EndLine: 2, SymbolName: "helloWorld", Language: "go"}, + }) + hits, err := SearchProject(ctx, d, "p", "Widget", 10) + if err != nil { + t.Fatalf("SearchProject: %v", err) + } + if len(hits) != 1 || hits[0].StartLine != 1 { + t.Errorf("trigram should match inside CamelCase identifier, got %v", hits) + } +} + +func TestSearchProject_EmptyQueryReturnsNil(t *testing.T) { + d := openTestDB(t) + ctx := context.Background() + upsert(t, d, "p", "f.go", []Chunk{{Content: "anything", FilePath: "f.go", StartLine: 1, EndLine: 1}}) + for _, q := range []string{"", " ", "a", " a b "} { + hits, err := SearchProject(ctx, d, "p", q, 10) + if err != nil { + t.Fatalf("SearchProject %q: %v", q, err) + } + if len(hits) != 0 { + t.Errorf("query %q expected 0 hits, got %d", q, len(hits)) + } + } +} + +func TestUpsertByFile_ReplacesExisting(t *testing.T) { + d := openTestDB(t) + ctx := context.Background() + upsert(t, d, "p", "f.go", []Chunk{ + {Content: "old WIDGET content", FilePath: "f.go", StartLine: 1, EndLine: 1, Language: "go"}, + }) + upsert(t, d, "p", "f.go", []Chunk{ + {Content: "new replacement content", FilePath: "f.go", StartLine: 1, EndLine: 1, Language: "go"}, + }) + hits, err := SearchProject(ctx, d, "p", "WIDGET", 10) + if err != nil { + t.Fatalf("SearchProject: %v", err) + } + if len(hits) != 0 { + t.Errorf("old content should be gone after upsert, got %d hits", len(hits)) + } + hits2, err := SearchProject(ctx, d, "p", "replacement", 10) + if err != nil { + t.Fatalf("SearchProject 2: %v", err) + } + if len(hits2) != 1 { + t.Errorf("new content should be searchable, got %d hits", len(hits2)) + } +} + +func TestDeleteByFile_RemovesFromFTS(t *testing.T) { + d := openTestDB(t) + ctx := context.Background() + upsert(t, d, "p", "f.go", []Chunk{{Content: "WIDGET is here", FilePath: "f.go", StartLine: 1, EndLine: 1}}) + upsert(t, d, "p", "g.go", []Chunk{{Content: "also WIDGET here", FilePath: "g.go", StartLine: 1, EndLine: 1}}) + + tx, err := d.BeginTx(ctx, nil) + if err != nil { + t.Fatal(err) + } + if err := DeleteByFileTx(ctx, tx, "p", "f.go"); err != nil { + t.Fatal(err) + } + if err := tx.Commit(); err != nil { + t.Fatal(err) + } + + hits, _ := SearchProject(ctx, d, "p", "WIDGET", 10) + if len(hits) != 1 || hits[0].FilePath != "g.go" { + t.Errorf("expected only g.go to remain, got %+v", hits) + } + + // chunks_meta must be drained too — no orphan rows. + var n int + if err := d.QueryRowContext(ctx, + `SELECT COUNT(*) FROM chunks_meta WHERE project_path = ? AND file_path = ?`, + "p", "f.go").Scan(&n); err != nil { + t.Fatal(err) + } + if n != 0 { + t.Errorf("expected 0 chunks_meta rows for deleted file, got %d", n) + } +} + +func TestDeleteByProject_RemovesEverything(t *testing.T) { + d := openTestDB(t) + ctx := context.Background() + upsert(t, d, "p1", "a.go", []Chunk{{Content: "WIDGET here", FilePath: "a.go", StartLine: 1, EndLine: 1}}) + upsert(t, d, "p1", "b.go", []Chunk{{Content: "more WIDGET", FilePath: "b.go", StartLine: 1, EndLine: 1}}) + upsert(t, d, "p2", "c.go", []Chunk{{Content: "p2 WIDGET", FilePath: "c.go", StartLine: 1, EndLine: 1}}) + + if err := DeleteByProject(ctx, d, "p1"); err != nil { + t.Fatal(err) + } + + hits1, _ := SearchProject(ctx, d, "p1", "WIDGET", 10) + if len(hits1) != 0 { + t.Errorf("expected p1 wiped, got %d hits", len(hits1)) + } + hits2, _ := SearchProject(ctx, d, "p2", "WIDGET", 10) + if len(hits2) != 1 { + t.Errorf("expected p2 untouched, got %d hits", len(hits2)) + } + + var n int + if err := d.QueryRowContext(ctx, `SELECT COUNT(*) FROM chunks_meta WHERE project_path = ?`, "p1").Scan(&n); err != nil { + t.Fatal(err) + } + if n != 0 { + t.Errorf("expected 0 chunks_meta rows for p1, got %d", n) + } +} + +func TestSearchProject_ScopedToProject(t *testing.T) { + d := openTestDB(t) + ctx := context.Background() + upsert(t, d, "p1", "a.go", []Chunk{{Content: "WIDGET order", FilePath: "a.go", StartLine: 1, EndLine: 1}}) + upsert(t, d, "p2", "b.go", []Chunk{{Content: "WIDGET payment", FilePath: "b.go", StartLine: 1, EndLine: 1}}) + + hits, err := SearchProject(ctx, d, "p1", "WIDGET", 10) + if err != nil { + t.Fatal(err) + } + if len(hits) != 1 || hits[0].FilePath != "a.go" { + t.Errorf("expected only a.go from p1, got %+v", hits) + } +} + +func TestBuildFTS5Query(t *testing.T) { + cases := []struct { + in, want string + }{ + {"", ""}, + {" ", ""}, + {"a b", ""}, + {"WIDGET", `"WIDGET"`}, + {"add ping WIDGET", `"add" OR "ping" OR "WIDGET"`}, + {`oh "yes"`, `"oh" OR """yes"""`}, + } + for _, c := range cases { + got := buildFTS5Query(c.in) + if got != c.want { + t.Errorf("buildFTS5Query(%q) = %q, want %q", c.in, got, c.want) + } + } +} diff --git a/server/internal/config/config.go b/server/internal/config/config.go index 372ddc7..b2e1bb9 100644 --- a/server/internal/config/config.go +++ b/server/internal/config/config.go @@ -93,6 +93,52 @@ type Config struct { VersionCheckEnabled bool VersionCheckInterval time.Duration VersionCheckRepo string + + // WorkspacesEnabled gates the entire workspaces feature surface + // (workspaces / github_tokens CRUD; later releases add webhook receiver, + // jobs, communities, two-stage search). Default OFF — the feature is in + // active development on the feature branch and disabled in main releases + // until the full pipeline ships. Source: CIX_WORKSPACES_ENABLED (default + // false). + WorkspacesEnabled bool + + // SecretKey / SecretKeyFile control encryption-at-rest for the + // workspaces feature's github_tokens. Resolution order: + // 1. CIX_SECRET_KEY (env var, hex or base64 32-byte value) + // 2. CIX_SECRET_KEYFILE (env var, path to a 0600-perm key file) + // 3. /.secret_key (auto-generated on first run; only used + // when workspaces are enabled and the operator hasn't explicitly + // pointed at a key) + // PR1 keeps both fields here for documentation — the actual resolution + // happens in internal/secrets.Open() which reads the env vars directly. + SecretKey string + SecretKeyFile string + + // SecretsDataDir is the directory used as the auto-generated keyfile + // destination when neither CIX_SECRET_KEY nor CIX_SECRET_KEYFILE is + // set. Defaults to the SQLite parent directory so the generated key + // lives alongside the encrypted data — losing one almost certainly + // means losing both. + SecretsDataDir string + + // WorkspacesDataDir is the parent directory the worker pool clones + // GitHub repositories under (workspace_repos.{id}/). Defaults to + // /repos. Source: CIX_WORKSPACES_DATA_DIR. + WorkspacesDataDir string + + // WorkerConcurrency controls how many jobs goroutines drain the + // queue at once. Clone+index is mostly IO; 2 is a sensible default + // for a self-hosted single-node deployment. Source: + // CIX_WORKER_CONCURRENCY (default 2). + WorkerConcurrency int + + // PublicBaseURL is the externally-reachable URL of this server + // (e.g. "https://cix.example.com"). Used to construct webhook + // delivery URLs surfaced to operators on POST /workspaces/{id}/repos. + // Optional — when empty, handlers return path-only URLs and trust + // the operator to prepend their tunnel/proxy origin. Source: + // CIX_PUBLIC_URL. + PublicBaseURL string } // ModelSafeName returns the embedding model name normalised for use inside @@ -265,6 +311,25 @@ func Load() (*Config, error) { c.VersionCheckRepo = getenv("CIX_VERSION_CHECK_REPO", "dvcdsys/code-index") + workspacesOn, err := getenvBool("CIX_WORKSPACES_ENABLED", false) + if err != nil { + return nil, err + } + c.WorkspacesEnabled = workspacesOn + + c.SecretKey = getenv("CIX_SECRET_KEY", "") + c.SecretKeyFile = getenv("CIX_SECRET_KEYFILE", "") + c.SecretsDataDir = getenv("CIX_SECRETS_DATA_DIR", filepath.Dir(c.SQLitePath)) + c.WorkspacesDataDir = getenv("CIX_WORKSPACES_DATA_DIR", filepath.Join(filepath.Dir(c.SQLitePath), "repos")) + + workerConc, err := getenvInt("CIX_WORKER_CONCURRENCY", 2) + if err != nil { + return nil, err + } + c.WorkerConcurrency = workerConc + + c.PublicBaseURL = strings.TrimSpace(getenv("CIX_PUBLIC_URL", "")) + return c, nil } diff --git a/server/internal/db/db.go b/server/internal/db/db.go index 942ec6e..c696718 100644 --- a/server/internal/db/db.go +++ b/server/internal/db/db.go @@ -6,10 +6,12 @@ package db import ( "crypto/sha1" "database/sql" + "errors" "fmt" "net/url" "os" "path/filepath" + "strings" _ "modernc.org/sqlite" ) @@ -76,9 +78,217 @@ func Open(path string) (*sql.DB, error) { return nil, fmt.Errorf("migrate indexed_with_model: %w", err) } + // PR10 — extend workspace_repos with webhook_mode so the dashboard + // can distinguish manual/auto/disabled intents. Older databases get + // the column with a sensible default; rows where auto_webhook=1 are + // retro-fitted to 'auto' so they keep the same effective behaviour. + if err := migrateWebhookMode(db); err != nil { + _ = db.Close() + return nil, fmt.Errorf("migrate webhook_mode: %w", err) + } + + // PR13 — workspace_repos.is_linked + drop the legacy global UNIQUE + // on project_path. The rebuild path is taken only when the old + // constraint is still present; freshly-created DBs hit the new + // CREATE TABLE shape via Schema and the rebuild becomes a no-op. + if err := migrateWorkspaceReposLinked(db); err != nil { + _ = db.Close() + return nil, fmt.Errorf("migrate workspace_repos is_linked: %w", err) + } + + // PR14 — workspace search switched from the Louvain-centroid two- + // stage pipeline to a weighted fan-out. The communities + + // community_members tables stop being written; drop them on + // upgrade so the schema reflects what's actually used. + if err := migrateDropCommunities(db); err != nil { + _ = db.Close() + return nil, fmt.Errorf("migrate drop communities: %w", err) + } + return db, nil } +// migrateDropCommunities removes the PR5–PR12 communities + +// community_members tables. The PR14 fan-out search doesn't need +// them; leaving them around would just confuse anyone reading the +// schema. Idempotent via IF EXISTS, child rows in community_members +// go first to avoid FK-on-DELETE noise. +func migrateDropCommunities(db *sql.DB) error { + for _, stmt := range []string{ + `DROP TABLE IF EXISTS community_members`, + `DROP TABLE IF EXISTS communities`, + } { + if _, err := db.Exec(stmt); err != nil { + return fmt.Errorf("exec %q: %w", stmt, err) + } + } + return nil +} + +// migrateWorkspaceReposLinked brings pre-PR13 workspace_repos tables up to +// the current shape: adds the is_linked column and removes the legacy +// global UNIQUE on project_path so the same indexed project can live in +// multiple workspaces. Two cases: +// +// 1. Table doesn't exist yet (fresh DB) — nothing to migrate; Schema's +// CREATE TABLE IF NOT EXISTS already laid the new shape down. +// 2. Table has the old shape (project_path declared UNIQUE inline). We +// read the stored DDL from sqlite_master, and if it still contains +// "project_path TEXT NOT NULL UNIQUE", do the standard SQLite +// table-rebuild dance inside a transaction. is_linked is folded into +// the new table so we avoid a second ALTER pass. +// 3. Table has the new shape but is_linked is still missing (operator +// applied a partial migration manually) — ALTER TABLE ADD COLUMN. +// +// The check is conservative: any DDL string that doesn't contain the +// legacy UNIQUE marker is treated as already-migrated. +func migrateWorkspaceReposLinked(db *sql.DB) error { + tableExists, haveIsLinked, err := workspaceReposColumns(db) + if err != nil { + return err + } + if !tableExists { + return nil + } + + needRebuild, err := workspaceReposNeedsUniqueDrop(db) + if err != nil { + return err + } + if needRebuild { + return rebuildWorkspaceReposWithoutGlobalUnique(db) + } + if !haveIsLinked { + if _, err := db.Exec( + `ALTER TABLE workspace_repos ADD COLUMN is_linked INTEGER NOT NULL DEFAULT 0`, + ); err != nil { + return fmt.Errorf("add is_linked column: %w", err) + } + } + return nil +} + +// workspaceReposColumns reports whether workspace_repos exists and +// whether the is_linked column is already present. +func workspaceReposColumns(db *sql.DB) (tableExists, haveIsLinked bool, err error) { + rows, qerr := db.Query(`PRAGMA table_info(workspace_repos)`) + if qerr != nil { + return false, false, fmt.Errorf("table_info workspace_repos: %w", qerr) + } + defer rows.Close() + for rows.Next() { + var ( + cid int + name, typ string + notnull, pk int + dflt sql.NullString + ) + if scanErr := rows.Scan(&cid, &name, &typ, ¬null, &dflt, &pk); scanErr != nil { + return false, false, scanErr + } + tableExists = true + if name == "is_linked" { + haveIsLinked = true + } + } + return tableExists, haveIsLinked, rows.Err() +} + +// workspaceReposNeedsUniqueDrop returns true when the stored DDL for +// workspace_repos still has project_path declared as inline-UNIQUE. +// String inspection is the only reasonable signal — PRAGMA index_list +// also lists the auto-index from the composite UNIQUE so column-level +// detection is unreliable. +func workspaceReposNeedsUniqueDrop(db *sql.DB) (bool, error) { + var ddl sql.NullString + row := db.QueryRow( + `SELECT sql FROM sqlite_master WHERE type = 'table' AND name = 'workspace_repos'`, + ) + if err := row.Scan(&ddl); err != nil { + if errors.Is(err, sql.ErrNoRows) { + return false, nil + } + return false, fmt.Errorf("read workspace_repos ddl: %w", err) + } + if !ddl.Valid { + return false, nil + } + // Whitespace varies (formatting, indentation) — collapse and + // uppercase to make the substring match robust. + normalised := strings.ToUpper(strings.Join(strings.Fields(ddl.String), " ")) + return strings.Contains(normalised, "PROJECT_PATH TEXT NOT NULL UNIQUE"), nil +} + +// rebuildWorkspaceReposWithoutGlobalUnique creates a new +// workspace_repos table with the current shape (no global UNIQUE on +// project_path, is_linked present), copies all rows from the old +// table, drops the old one, renames the new one, and recreates the +// indices. Wrapped in a transaction so a mid-rebuild failure leaves +// the original table intact. +func rebuildWorkspaceReposWithoutGlobalUnique(db *sql.DB) error { + tx, err := db.Begin() + if err != nil { + return fmt.Errorf("begin rebuild tx: %w", err) + } + defer func() { _ = tx.Rollback() }() + + if _, err := tx.Exec(`CREATE TABLE workspace_repos_new ( + id TEXT PRIMARY KEY, + workspace_id TEXT NOT NULL, + github_url TEXT NOT NULL, + branch TEXT NOT NULL, + project_path TEXT NOT NULL, + token_id TEXT, + webhook_secret TEXT NOT NULL, + webhook_id INTEGER, + auto_webhook INTEGER NOT NULL DEFAULT 0, + webhook_mode TEXT NOT NULL DEFAULT 'manual', + status TEXT NOT NULL DEFAULT 'pending', + last_sha TEXT, + last_error TEXT, + last_indexed_at TEXT, + is_linked INTEGER NOT NULL DEFAULT 0, + created_at TEXT NOT NULL, + updated_at TEXT NOT NULL, + UNIQUE(workspace_id, github_url, branch), + FOREIGN KEY (workspace_id) REFERENCES workspaces(id) ON DELETE CASCADE, + FOREIGN KEY (token_id) REFERENCES github_tokens(id) ON DELETE SET NULL + )`); err != nil { + return fmt.Errorf("create workspace_repos_new: %w", err) + } + + if _, err := tx.Exec(`INSERT INTO workspace_repos_new + (id, workspace_id, github_url, branch, project_path, + token_id, webhook_secret, webhook_id, auto_webhook, webhook_mode, + status, last_sha, last_error, last_indexed_at, + created_at, updated_at) + SELECT id, workspace_id, github_url, branch, project_path, + token_id, webhook_secret, webhook_id, auto_webhook, webhook_mode, + status, last_sha, last_error, last_indexed_at, + created_at, updated_at + FROM workspace_repos`); err != nil { + return fmt.Errorf("copy workspace_repos rows: %w", err) + } + + if _, err := tx.Exec(`DROP TABLE workspace_repos`); err != nil { + return fmt.Errorf("drop old workspace_repos: %w", err) + } + if _, err := tx.Exec(`ALTER TABLE workspace_repos_new RENAME TO workspace_repos`); err != nil { + return fmt.Errorf("rename workspace_repos_new: %w", err) + } + if _, err := tx.Exec(`CREATE INDEX IF NOT EXISTS idx_workspace_repos_workspace ON workspace_repos(workspace_id)`); err != nil { + return fmt.Errorf("recreate workspace index: %w", err) + } + if _, err := tx.Exec(`CREATE INDEX IF NOT EXISTS idx_workspace_repos_project ON workspace_repos(project_path)`); err != nil { + return fmt.Errorf("recreate project index: %w", err) + } + + if err := tx.Commit(); err != nil { + return fmt.Errorf("commit rebuild tx: %w", err) + } + return nil +} + // migratePathHash brings older databases up to the current schema by adding // the path_hash column when missing and backfilling it from host_path. The // schema DDL is idempotent via CREATE TABLE IF NOT EXISTS so we rely on @@ -179,6 +389,53 @@ func migrateIndexedWithModel(db *sql.DB) error { return nil } +// migrateWebhookMode adds workspace_repos.webhook_mode to pre-PR10 +// databases and backfills it from the older auto_webhook bool so rows +// inserted before this migration keep their effective behaviour. Same +// PRAGMA-table_info / ALTER-only-if-absent pattern as the other helpers. +func migrateWebhookMode(db *sql.DB) error { + // workspace_repos may not exist yet on databases that pre-date the + // workspaces feature entirely — PRAGMA table_info returns no rows in + // that case and we have nothing to migrate. + rows, err := db.Query(`PRAGMA table_info(workspace_repos)`) + if err != nil { + return fmt.Errorf("table_info workspace_repos: %w", err) + } + have := false + tableExists := false + for rows.Next() { + var ( + cid int + name, typ string + notnull, pk int + dflt sql.NullString + ) + if err := rows.Scan(&cid, &name, &typ, ¬null, &dflt, &pk); err != nil { + rows.Close() + return err + } + tableExists = true + if name == "webhook_mode" { + have = true + } + } + rows.Close() + if !tableExists || have { + return nil + } + if _, err := db.Exec( + `ALTER TABLE workspace_repos ADD COLUMN webhook_mode TEXT NOT NULL DEFAULT 'manual'`, + ); err != nil { + return fmt.Errorf("add webhook_mode column: %w", err) + } + if _, err := db.Exec( + `UPDATE workspace_repos SET webhook_mode = 'auto' WHERE auto_webhook = 1`, + ); err != nil { + return fmt.Errorf("backfill webhook_mode: %w", err) + } + return nil +} + // HashHostPath returns the 16-char SHA1 prefix used as the URL segment for // projects. Exported so projects.Create and the migration share one // implementation (keep it byte-identical to projects.HashPath). diff --git a/server/internal/db/db_test.go b/server/internal/db/db_test.go index 293ec35..f0cfba9 100644 --- a/server/internal/db/db_test.go +++ b/server/internal/db/db_test.go @@ -17,8 +17,15 @@ func TestOpenInMemoryAppliesSchema(t *testing.T) { } defer database.Close() + // FTS5 virtual tables create implementation-detail shadow tables + // (chunks_fts_config, chunks_fts_content, chunks_fts_data, + // chunks_fts_docsize, chunks_fts_idx). Exclude them — we only audit + // the tables we explicitly declare. rows, err := database.Query( - `SELECT name FROM sqlite_master WHERE type='table' AND name NOT LIKE 'sqlite_%'`, + `SELECT name FROM sqlite_master + WHERE type='table' + AND name NOT LIKE 'sqlite_%' + AND name NOT LIKE 'chunks_fts_%'`, ) if err != nil { t.Fatalf("query sqlite_master: %v", err) @@ -219,3 +226,121 @@ func TestSymbolsIndexExists(t *testing.T) { t.Errorf("idx_symbols_project_name count = %d, want 1", n) } } + +// TestMigrate_DropsGlobalUniqueOnProjectPath verifies that opening a +// pre-PR13 database (workspace_repos with `project_path TEXT NOT NULL UNIQUE`) +// migrates it to the current shape, dropping the global UNIQUE so the +// same indexed project can live in multiple workspaces. +// +// Strategy: create a fresh file-backed DB, manually lay down the +// legacy table shape + seed one row, close, reopen via Open() so the +// migration runs, then try inserting a second row with the same +// project_path in a different workspace_id — pre-migration this would +// fail with UNIQUE constraint failed; post-migration it must succeed. +func TestMigrate_DropsGlobalUniqueOnProjectPath(t *testing.T) { + dir := t.TempDir() + path := filepath.Join(dir, "test.db") + + // Open with the regular driver (bypass Schema by hand-rolling DDL). + raw, err := sql.Open(DriverName, "file:"+path+"?_pragma=foreign_keys(ON)") + if err != nil { + t.Fatalf("raw open: %v", err) + } + + // Lay down only the minimum tables the legacy workspace_repos needs: + // workspaces (FK target) + the OLD workspace_repos shape with the + // inline UNIQUE on project_path. github_tokens is FK-referenced but + // nullable, so we can skip it for this test. + legacy := ` + CREATE TABLE workspaces ( + id TEXT PRIMARY KEY, + name TEXT NOT NULL UNIQUE, + description TEXT NOT NULL DEFAULT '', + created_at TEXT NOT NULL, + updated_at TEXT NOT NULL + ); + CREATE TABLE workspace_repos ( + id TEXT PRIMARY KEY, + workspace_id TEXT NOT NULL, + github_url TEXT NOT NULL, + branch TEXT NOT NULL, + project_path TEXT NOT NULL UNIQUE, + token_id TEXT, + webhook_secret TEXT NOT NULL, + webhook_id INTEGER, + auto_webhook INTEGER NOT NULL DEFAULT 0, + webhook_mode TEXT NOT NULL DEFAULT 'manual', + status TEXT NOT NULL DEFAULT 'pending', + last_sha TEXT, + last_error TEXT, + last_indexed_at TEXT, + created_at TEXT NOT NULL, + updated_at TEXT NOT NULL, + UNIQUE(workspace_id, github_url, branch), + FOREIGN KEY (workspace_id) REFERENCES workspaces(id) ON DELETE CASCADE + ); + INSERT INTO workspaces (id, name, created_at, updated_at) + VALUES ('ws-a', 'alpha', '2026-05-11T00:00:00Z', '2026-05-11T00:00:00Z'); + INSERT INTO workspaces (id, name, created_at, updated_at) + VALUES ('ws-b', 'beta', '2026-05-11T00:00:00Z', '2026-05-11T00:00:00Z'); + INSERT INTO workspace_repos (id, workspace_id, github_url, branch, project_path, + webhook_secret, created_at, updated_at) + VALUES ('repo-1', 'ws-a', 'https://github.com/x/y', 'main', + 'github.com/x/y@main', 's', '2026-05-11T00:00:00Z', '2026-05-11T00:00:00Z'); + ` + if _, err := raw.Exec(legacy); err != nil { + t.Fatalf("seed legacy: %v", err) + } + + // Confirm pre-migration: the second insert with same project_path + // would fail. We catch the error so the test is honest about the + // invariant we're removing. + _, err = raw.Exec(`INSERT INTO workspace_repos (id, workspace_id, github_url, branch, project_path, + webhook_secret, created_at, updated_at) VALUES ('repo-bad', 'ws-b', 'https://github.com/x/y', 'main', + 'github.com/x/y@main', 's', '2026-05-11T00:00:00Z', '2026-05-11T00:00:00Z')`) + if err == nil { + _ = raw.Close() + t.Fatalf("pre-migration insert should fail UNIQUE — test setup is wrong") + } + _ = raw.Close() + + // Now reopen via the real Open() so the migration runs. + migrated, err := Open(path) + if err != nil { + t.Fatalf("Open: %v", err) + } + defer migrated.Close() + + // is_linked column should be present and default to 0 on the + // migrated row. + var isLinked int + if err := migrated.QueryRow( + `SELECT is_linked FROM workspace_repos WHERE id = 'repo-1'`, + ).Scan(&isLinked); err != nil { + t.Fatalf("read is_linked: %v", err) + } + if isLinked != 0 { + t.Fatalf("pre-existing rows must keep is_linked=0, got %d", isLinked) + } + + // And the post-migration invariant we care about: same project_path + // in a different workspace now succeeds. + if _, err := migrated.Exec(`INSERT INTO workspace_repos (id, workspace_id, github_url, branch, project_path, + webhook_secret, status, is_linked, created_at, updated_at) + VALUES ('repo-2', 'ws-b', 'https://github.com/x/y', 'main', + 'github.com/x/y@main', 's', 'indexed', 1, + '2026-05-11T00:00:00Z', '2026-05-11T00:00:00Z')`); err != nil { + t.Fatalf("post-migration cross-workspace insert should succeed: %v", err) + } + + // Per-workspace UNIQUE must still bite — adding the same repo+branch + // to ws-b a second time should fail. + _, err = migrated.Exec(`INSERT INTO workspace_repos (id, workspace_id, github_url, branch, project_path, + webhook_secret, status, is_linked, created_at, updated_at) + VALUES ('repo-3', 'ws-b', 'https://github.com/x/y', 'main', + 'github.com/x/y@main', 's', 'indexed', 1, + '2026-05-11T00:00:00Z', '2026-05-11T00:00:00Z')`) + if err == nil { + t.Fatalf("per-workspace UNIQUE should still reject duplicate (workspace_id, github_url, branch)") + } +} diff --git a/server/internal/db/schema.go b/server/internal/db/schema.go index 34b9910..d55f788 100644 --- a/server/internal/db/schema.go +++ b/server/internal/db/schema.go @@ -149,6 +149,182 @@ CREATE TABLE IF NOT EXISTS runtime_settings ( updated_at TEXT NOT NULL, updated_by TEXT ); + +-- Workspaces feature (PR1 — skeleton). Workspaces group GitHub repositories +-- for cross-project semantic search. Server-wide shared: every authenticated +-- user can see and modify any workspace (per the chosen visibility model). +-- The richer workspace_repos / call_edges / communities tables land in +-- subsequent PRs of the workspaces feature branch. +CREATE TABLE IF NOT EXISTS workspaces ( + id TEXT PRIMARY KEY, + name TEXT NOT NULL UNIQUE, + description TEXT, + created_at TEXT NOT NULL, + updated_at TEXT NOT NULL +); + +-- github_tokens stores GitHub Personal Access Tokens encrypted at rest with +-- AES-256-GCM. Only the encrypted blob ever lives in SQLite — the plaintext +-- is returned exactly once on POST /api/v1/github-tokens and then forgotten. +-- The encryption key comes from CIX_SECRET_KEY / CIX_SECRET_KEYFILE / a +-- generated keyfile (see internal/secrets). +CREATE TABLE IF NOT EXISTS github_tokens ( + id TEXT PRIMARY KEY, + name TEXT NOT NULL UNIQUE, + encrypted BLOB NOT NULL, + scopes TEXT, + created_at TEXT NOT NULL, + last_used_at TEXT +); + +-- Workspaces feature PR2 — workspace_repos + jobs. +-- +-- One workspace_repos row per (repo, branch). project_path is the canonical +-- "github.com/owner/repo@branch" string used as host_path in projects, so +-- existing per-project SQL stays uniform across local + remote sources. +-- webhook_secret is generated server-side at create time and shown exactly +-- once to the operator (or used by the auto-register flow added in PR3). +-- token_id stays nullable so public repos can be added without storing a PAT. +-- last_sha / last_indexed_at survive across reindexes so an incremental +-- fetch_repo job can short-circuit when HEAD hasn't moved. +-- is_linked discriminates owned rows (the canonical Add Repo flow that +-- clones + indexes + owns a webhook) from linked rows (a lightweight +-- membership pointer to an already-indexed project — no clone, no +-- webhook). Uniqueness is per-workspace; the same project_path may live +-- in many workspaces as long as it appears at most once in each. +CREATE TABLE IF NOT EXISTS workspace_repos ( + id TEXT PRIMARY KEY, + workspace_id TEXT NOT NULL, + github_url TEXT NOT NULL, + branch TEXT NOT NULL, + project_path TEXT NOT NULL, + token_id TEXT, + webhook_secret TEXT NOT NULL, + webhook_id INTEGER, + auto_webhook INTEGER NOT NULL DEFAULT 0, + -- webhook_mode is the operator's stated intent for how this repo gets + -- kept fresh: 'auto' (server calls GitHub to register the hook), + -- 'manual' (operator pastes the URL+secret into GitHub themselves), + -- 'disabled' (no auto-sync, reindex via the dashboard button only). + -- Stored separately from auto_webhook so the dashboard can distinguish + -- "manual, still pending operator action" from "deliberately disabled". + webhook_mode TEXT NOT NULL DEFAULT 'manual', + status TEXT NOT NULL DEFAULT 'pending', + last_sha TEXT, + last_error TEXT, + last_indexed_at TEXT, + is_linked INTEGER NOT NULL DEFAULT 0, + created_at TEXT NOT NULL, + updated_at TEXT NOT NULL, + UNIQUE(workspace_id, github_url, branch), + FOREIGN KEY (workspace_id) REFERENCES workspaces(id) ON DELETE CASCADE, + FOREIGN KEY (token_id) REFERENCES github_tokens(id) ON DELETE SET NULL +); +CREATE INDEX IF NOT EXISTS idx_workspace_repos_workspace ON workspace_repos(workspace_id); +CREATE INDEX IF NOT EXISTS idx_workspace_repos_project ON workspace_repos(project_path); + +-- jobs is the persistent worker queue. Survives process restarts; one +-- worker pool drains it. dedupe_key is the partial-unique mechanism that +-- collapses webhook bursts (e.g. 50 push deliveries for the same repo +-- become 1 pending fetch_repo job). status transitions: +-- pending → running → completed | failed +-- Failed jobs may be re-enqueued by the worker up to a per-type retry +-- budget (attempts column tracks the count). +CREATE TABLE IF NOT EXISTS jobs ( + id TEXT PRIMARY KEY, + type TEXT NOT NULL, + status TEXT NOT NULL DEFAULT 'pending', + dedupe_key TEXT, + payload TEXT NOT NULL DEFAULT '{}', + attempts INTEGER NOT NULL DEFAULT 0, + max_attempts INTEGER NOT NULL DEFAULT 3, + last_error TEXT, + scheduled_at TEXT NOT NULL, + started_at TEXT, + completed_at TEXT, + created_at TEXT NOT NULL +); +-- Partial unique index — at most one active job per dedupe_key. Insert +-- a second pending row for the same key and SQLite raises a UNIQUE +-- constraint error, which the jobs service translates to "already +-- enqueued, no-op". +CREATE UNIQUE INDEX IF NOT EXISTS idx_jobs_dedupe_active ON jobs(dedupe_key) + WHERE dedupe_key IS NOT NULL AND status IN ('pending','running'); +CREATE INDEX IF NOT EXISTS idx_jobs_ready ON jobs(status, scheduled_at) + WHERE status = 'pending'; +CREATE INDEX IF NOT EXISTS idx_jobs_type_status ON jobs(type, status); + +-- Workspaces feature PR4 — call_edges + edge_provenance. +-- +-- call_edges holds the approximate caller→callee edges that Louvain +-- consumes in PR5. It is a CO-OCCURRENCE graph, not an exact call graph: +-- we resolve callees by name lookup in the symbols table constrained +-- by the caller's enclosing scope, then weight each edge inversely with +-- the callee-name popcount (1 / N). That keeps Louvain robust to +-- name-collision noise (common in Python/JS) without needing per-language +-- type resolution. +-- +-- One row per (caller, callee). Duplicate caller→callee pairs collapse +-- by accumulating weight at insert time (handler-side INSERT OR REPLACE +-- against the unique constraint below). source identifies which heuristic +-- produced the edge — useful for the eval harness and for swapping in +-- alternative graph builders without dropping the table. +CREATE TABLE IF NOT EXISTS call_edges ( + project_path TEXT NOT NULL, + caller_symbol TEXT NOT NULL, + callee_symbol TEXT NOT NULL, + weight REAL NOT NULL, + source TEXT NOT NULL DEFAULT 'refs_heuristic', + PRIMARY KEY (project_path, caller_symbol, callee_symbol), + FOREIGN KEY (project_path) REFERENCES projects(host_path) ON DELETE CASCADE, + FOREIGN KEY (caller_symbol) REFERENCES symbols(id) ON DELETE CASCADE, + FOREIGN KEY (callee_symbol) REFERENCES symbols(id) ON DELETE CASCADE +); +CREATE INDEX IF NOT EXISTS idx_call_edges_project ON call_edges(project_path); +CREATE INDEX IF NOT EXISTS idx_call_edges_caller ON call_edges(caller_symbol); +CREATE INDEX IF NOT EXISTS idx_call_edges_callee ON call_edges(callee_symbol); + +-- PR14 dropped the workspaces communities/community_members tables. +-- Workspace search is now a weighted fan-out across per-project chromem +-- collections (no Louvain, no centroid index). migrateDropCommunities +-- DROPs the tables on upgrade for installs that ran any of PR5..PR12. + +-- chunks_meta is the row-level shadow for chunks_fts: it stores the +-- non-content metadata we need to retrieve when a BM25 query matches. +-- chunks_fts (FTS5 virtual table) can only filter efficiently by rowid, +-- so we keep a regular indexed table here for (project_path, file_path) +-- lookups and join by rowid on retrieval/deletion. +CREATE TABLE IF NOT EXISTS chunks_meta ( + rowid INTEGER PRIMARY KEY AUTOINCREMENT, + project_path TEXT NOT NULL, + file_path TEXT NOT NULL, + start_line INTEGER NOT NULL, + end_line INTEGER NOT NULL, + chunk_type TEXT, + symbol_name TEXT, + language TEXT +); +CREATE INDEX IF NOT EXISTS idx_chunks_meta_project_file + ON chunks_meta(project_path, file_path); +CREATE INDEX IF NOT EXISTS idx_chunks_meta_project + ON chunks_meta(project_path); + +-- chunks_fts is the BM25-searchable side, parallel to chromem-go's dense +-- vector store. Workspace search runs both in parallel per project then +-- fuses by RRF; project-relevance gating uses BM25 signal to drop repos +-- that share no token with the query (the dense-only fan-out leaks +-- semantically-distant repos as false positives). +-- +-- tokenize='trigram': substring matching on identifiers (CamelCase / +-- snake_case / dotted paths are not tokenized to word boundaries +-- predictably enough for code). Short acronyms like "ABC" become a +-- single trigram; lookups are exact-substring within a word. +CREATE VIRTUAL TABLE IF NOT EXISTS chunks_fts USING fts5( + content, + symbol_name, + file_path, + tokenize = 'trigram' +); ` // ExpectedTables lists the tables the schema creates. Used by db_test and by @@ -163,4 +339,11 @@ var ExpectedTables = []string{ "sessions", "api_keys", "runtime_settings", + "workspaces", + "github_tokens", + "workspace_repos", + "jobs", + "call_edges", + "chunks_meta", + "chunks_fts", } diff --git a/server/internal/githubapi/githubapi.go b/server/internal/githubapi/githubapi.go new file mode 100644 index 0000000..f30ee4c --- /dev/null +++ b/server/internal/githubapi/githubapi.go @@ -0,0 +1,574 @@ +// Package githubapi is a tiny raw-HTTP client for the handful of GitHub +// REST calls the workspaces feature needs. We deliberately do NOT pull +// in google/go-github (which is ~10MB of generated code) for just the +// two operations we use — registering and deleting a repository webhook. +// +// Authentication: callers pass a Personal Access Token (PAT). The token +// is sent as `Authorization: token ` which matches what GitHub +// documents for both fine-grained tokens and classic PATs. +// +// Errors are surfaced verbatim from GitHub when the response carries a +// JSON `message` field. Callers usually display these in the dashboard +// so the operator can fix scope / permission issues without trawling +// server logs. +package githubapi + +import ( + "bytes" + "context" + "encoding/json" + "errors" + "fmt" + "io" + "net/http" + "net/url" + "strings" + "time" +) + +// ErrUnauthorized is returned for 401/403 responses — usually the PAT +// is missing the admin:repo_hook scope. Handlers translate this into +// "you said auto-register but the token can't manage hooks; switch to +// manual or rotate the PAT". +var ErrUnauthorized = errors.New("github API rejected the token") + +// TokenInfo carries the metadata we learn about a PAT by calling +// GET /user. The truth about scopes lives on GitHub, not in user input, +// so we always read X-OAuth-Scopes from the response. +// +// Fine-grained PATs (github_pat_*) do not advertise scopes via this +// header — for them Scopes is empty and FineGrained is true. +type TokenInfo struct { + Login string + Scopes []string + FineGrained bool +} + +// ErrNotFound is the 404 sentinel (e.g. repo missing or token can't see +// it). +var ErrNotFound = errors.New("github API: not found") + +// Client is the per-call wrapper. Bare struct so handlers can construct +// per-request without coordinating lifecycle. +type Client struct { + HTTPClient *http.Client + // BaseURL defaults to https://api.github.com. Overridable for + // GitHub Enterprise — and the test suite, which substitutes an + // httptest server. + BaseURL string +} + +// New returns a Client with sane defaults: 30s timeout, the canonical +// api.github.com base. +func New() *Client { + return &Client{ + HTTPClient: &http.Client{Timeout: 30 * time.Second}, + BaseURL: "https://api.github.com", + } +} + +// Repo is the slice of GET /user/repos we care about for the dashboard +// add-repo flow. We deliberately keep this small — only the fields the +// repo-picker UI actually renders — so we don't bloat the JSON payload +// (a single user can have several hundred repos visible via a PAT). +type Repo struct { + FullName string `json:"full_name"` // "owner/name" + DefaultBranch string `json:"default_branch"` // used to auto-fill the branch input + Private bool `json:"private"` // shown as a lock icon in the dropdown + HTMLURL string `json:"html_url"` // canonical https://github.com/... form + Description string `json:"description,omitempty"` + Owner RepoOwner `json:"owner"` +} + +// RepoOwner mirrors the `owner` slice of GitHub's repo payload. GitHub +// uses the strings "User" / "Organization" (capitalized) so we +// preserve them verbatim and translate at the application boundary. +type RepoOwner struct { + Login string `json:"login"` + Type string `json:"type"` // "User" | "Organization" + AvatarURL string `json:"avatar_url"` +} + +// AccountType discriminates between a personal account (the PAT +// owner) and a GitHub organization. The dashboard uses this to pick +// the right repo-list endpoint when the user drills into an account. +type AccountType string + +const ( + AccountTypeUser AccountType = "user" + AccountTypeOrg AccountType = "org" +) + +// Account is the rendered shape of an entry in the account selector +// — either the PAT owner's personal account or one of the orgs they +// belong to. Reflects what GitHub returns from /user + /user/orgs. +type Account struct { + Login string `json:"login"` + Type AccountType `json:"type"` + AvatarURL string `json:"avatar_url,omitempty"` +} + +// ListAccounts returns the user that owns the PAT plus every distinct +// org owner found in /user/repos. +// +// Why not /user/orgs or /user/memberships/orgs? Both require explicit +// scopes (classic: `read:org`; fine-grained: "Members → Read"). PATs +// that lack those still see and clone repos through /user/repos — so +// scoping the dashboard's account picker to "everything the PAT +// already shows me" is both more permissive and more honest. The +// listed accounts always correspond to repos the token can actually +// access. The trade-off is that orgs with zero visible repos won't +// surface; that's a feature, not a bug — there'd be nothing for the +// operator to pick anyway. +// +// /user is still hit first as a token-validity probe and to pick up +// the operator's own login + avatar even when /user/repos is empty. +func (c *Client) ListAccounts(ctx context.Context, pat string) ([]Account, error) { + if pat == "" { + return nil, fmt.Errorf("PAT required") + } + + user, err := c.fetchUser(ctx, pat) + if err != nil { + return nil, err + } + + const maxPages = 5 + repos, rerr := c.ListUserRepos(ctx, pat, maxPages) + if rerr != nil { + // Don't fail the whole flow on a /user/repos hiccup — return + // at least the personal account so the dialog renders. + return []Account{{ + Login: user.Login, + Type: AccountTypeUser, + AvatarURL: user.AvatarURL, + }}, nil + } + + // The user's own account always comes first so the dashboard can + // default-select it. Then every distinct owner — Users vs Orgs + // distinguished by the owner.type field from /user/repos. Login + // comparison is case-insensitive (GitHub treats `Foo` and `foo` + // as the same account). + out := []Account{{ + Login: user.Login, + Type: AccountTypeUser, + AvatarURL: user.AvatarURL, + }} + seen := map[string]struct{}{ + strings.ToLower(user.Login): {}, + } + for _, rp := range repos { + key := strings.ToLower(rp.Owner.Login) + if key == "" { + continue + } + if _, dup := seen[key]; dup { + continue + } + seen[key] = struct{}{} + out = append(out, Account{ + Login: rp.Owner.Login, + Type: ownerTypeToAccountType(rp.Owner.Type), + AvatarURL: rp.Owner.AvatarURL, + }) + } + return out, nil +} + +// ownerTypeToAccountType normalises GitHub's "User"/"Organization" +// strings to our user/org enum. Unknown values default to user, which +// is the conservative choice — repos will be fetched from +// /users/{login}/repos and just return what's accessible. +func ownerTypeToAccountType(s string) AccountType { + switch strings.ToLower(s) { + case "organization": + return AccountTypeOrg + default: + return AccountTypeUser + } +} + +// fetchUser is a small private helper for the two callsites (this +// package's own ValidateToken and the new ListAccounts). Returns the +// few fields we care about. +func (c *Client) fetchUser(ctx context.Context, pat string) (struct { + Login string `json:"login"` + AvatarURL string `json:"avatar_url"` +}, error) { + type userResp struct { + Login string `json:"login"` + AvatarURL string `json:"avatar_url"` + } + var u userResp + req, err := http.NewRequestWithContext(ctx, http.MethodGet, c.BaseURL+"/user", nil) + if err != nil { + return u, err + } + c.signRequest(req, pat) + resp, err := c.HTTPClient.Do(req) + if err != nil { + return u, fmt.Errorf("github API: %w", err) + } + defer resp.Body.Close() + body, _ := io.ReadAll(resp.Body) + switch resp.StatusCode { + case http.StatusOK: + if err := json.Unmarshal(body, &u); err != nil { + return u, fmt.Errorf("parse /user: %w", err) + } + return u, nil + case http.StatusUnauthorized, http.StatusForbidden: + return u, fmt.Errorf("%w: %s", ErrUnauthorized, githubMessage(body)) + default: + return u, fmt.Errorf("github API %d: %s", resp.StatusCode, githubMessage(body)) + } +} + +// ListReposForAccount returns repos owned by a specific account. Use +// AccountTypeUser to hit /users/{login}/repos (which lists that user's +// public repos plus, when the caller IS that user, all repos they own +// regardless of visibility) and AccountTypeOrg to hit /orgs/{login}/repos +// (which respects the PAT's organization membership / SAML state). +// +// For the "all my repos" case, callers should fall back to +// ListUserRepos — /user/repos returns the affiliations-aggregated view +// in a single call, which is what we want when no account filter is set. +func (c *Client) ListReposForAccount(ctx context.Context, pat string, accountType AccountType, login string, maxPages int) ([]Repo, error) { + if pat == "" { + return nil, fmt.Errorf("PAT required") + } + if login == "" { + return nil, fmt.Errorf("login required") + } + var base string + switch accountType { + case AccountTypeUser: + base = c.BaseURL + "/users/" + url.PathEscape(login) + "/repos?per_page=100&sort=pushed&type=all" + case AccountTypeOrg: + base = c.BaseURL + "/orgs/" + url.PathEscape(login) + "/repos?per_page=100&sort=pushed&type=all" + default: + return nil, fmt.Errorf("unknown account type %q", accountType) + } + return c.fetchRepoPages(ctx, pat, base, maxPages) +} + +// fetchRepoPages is the shared paginator for any /repos-shaped GitHub +// endpoint. Walks Link rel=next up to maxPages; 0 means no cap. +func (c *Client) fetchRepoPages(ctx context.Context, pat, firstURL string, maxPages int) ([]Repo, error) { + out := []Repo{} + page := 0 + pageURL := firstURL + for pageURL != "" { + page++ + req, err := http.NewRequestWithContext(ctx, http.MethodGet, pageURL, nil) + if err != nil { + return nil, err + } + c.signRequest(req, pat) + resp, err := c.HTTPClient.Do(req) + if err != nil { + return nil, fmt.Errorf("github API: %w", err) + } + body, _ := io.ReadAll(resp.Body) + resp.Body.Close() + switch resp.StatusCode { + case http.StatusOK: + var batch []Repo + if err := json.Unmarshal(body, &batch); err != nil { + return nil, fmt.Errorf("parse repos page: %w", err) + } + out = append(out, batch...) + case http.StatusUnauthorized, http.StatusForbidden: + return nil, fmt.Errorf("%w: %s", ErrUnauthorized, githubMessage(body)) + case http.StatusNotFound: + return nil, fmt.Errorf("%w: %s", ErrNotFound, githubMessage(body)) + default: + return nil, fmt.Errorf("github API %d: %s", resp.StatusCode, githubMessage(body)) + } + if maxPages > 0 && page >= maxPages { + break + } + pageURL = parseNextLink(resp.Header.Get("Link")) + } + return out, nil +} + +// ListUserRepos walks /user/repos pages, returning every repo the PAT +// can see as owner / collaborator / org member. The endpoint is +// inherently paginated (per_page=100 is the GitHub max) — we follow +// the Link rel=next header up to maxPages so an outlier user with a +// thousand affiliated repos still completes in bounded time. +// +// maxPages of 0 is interpreted as "no cap" (used in tests); production +// callers should pass a sensible ceiling (typical: 5 = up to 500 repos). +// +// Useful when the operator has not chosen a specific account in the +// dashboard yet — /user/repos is GitHub's affiliations-aggregated view +// and surfaces SAML-protected and collaborator repos that don't appear +// under /orgs/{login}/repos. +func (c *Client) ListUserRepos(ctx context.Context, pat string, maxPages int) ([]Repo, error) { + if pat == "" { + return nil, fmt.Errorf("PAT required") + } + first := c.BaseURL + "/user/repos?per_page=100&sort=pushed&affiliation=owner,collaborator,organization_member" + return c.fetchRepoPages(ctx, pat, first, maxPages) +} + +// parseNextLink extracts the URL of rel=next from a GitHub Link header. +// Format per RFC 5988: `; rel="next", <...>; rel="last"`. +// Empty string when no next page exists — that's the terminator for the +// pagination loop in ListUserRepos. +func parseNextLink(header string) string { + if header == "" { + return "" + } + for _, part := range strings.Split(header, ",") { + segs := strings.Split(strings.TrimSpace(part), ";") + if len(segs) < 2 { + continue + } + isNext := false + for _, p := range segs[1:] { + if strings.TrimSpace(p) == `rel="next"` { + isNext = true + break + } + } + if !isNext { + continue + } + u := strings.TrimSpace(segs[0]) + u = strings.TrimPrefix(u, "<") + u = strings.TrimSuffix(u, ">") + return u + } + return "" +} + +// CreateWebhookOptions parameterises a hook registration. Events defaults +// to ["push"] when nil. +type CreateWebhookOptions struct { + Owner string + Repo string + PAT string + URL string // the cix-server delivery URL + Secret string // HMAC secret cix-server expects + Events []string + Insecure bool // mostly for tests against http:// origins +} + +// HookResponse is the slice of the GitHub response we care about. +type HookResponse struct { + ID int64 `json:"id"` + URL string `json:"url"` + Active bool `json:"active"` +} + +// CreateWebhook calls POST /repos/{owner}/{repo}/hooks. Returns the +// hook id so callers can persist it for later DeleteWebhook. +func (c *Client) CreateWebhook(ctx context.Context, opts CreateWebhookOptions) (HookResponse, error) { + if opts.Owner == "" || opts.Repo == "" { + return HookResponse{}, fmt.Errorf("owner/repo required") + } + if opts.PAT == "" { + return HookResponse{}, fmt.Errorf("PAT required") + } + events := opts.Events + if len(events) == 0 { + events = []string{"push"} + } + body := map[string]any{ + "name": "web", + "active": true, + "events": events, + "config": map[string]any{ + "url": opts.URL, + "content_type": "json", + "secret": opts.Secret, + "insecure_ssl": insecureSSLValue(opts.Insecure), + }, + } + raw, err := json.Marshal(body) + if err != nil { + return HookResponse{}, err + } + endpoint := c.BaseURL + "/repos/" + url.PathEscape(opts.Owner) + "/" + url.PathEscape(opts.Repo) + "/hooks" + req, err := http.NewRequestWithContext(ctx, http.MethodPost, endpoint, bytes.NewReader(raw)) + if err != nil { + return HookResponse{}, err + } + c.signRequest(req, opts.PAT) + resp, err := c.HTTPClient.Do(req) + if err != nil { + return HookResponse{}, fmt.Errorf("github API: %w", err) + } + defer resp.Body.Close() + respBody, _ := io.ReadAll(resp.Body) + switch resp.StatusCode { + case http.StatusCreated, http.StatusOK: + var hr HookResponse + if err := json.Unmarshal(respBody, &hr); err != nil { + return HookResponse{}, fmt.Errorf("parse hook response: %w", err) + } + return hr, nil + case http.StatusUnauthorized, http.StatusForbidden: + return HookResponse{}, fmt.Errorf("%w: %s", ErrUnauthorized, githubMessage(respBody)) + case http.StatusNotFound: + return HookResponse{}, fmt.Errorf("%w: %s", ErrNotFound, githubMessage(respBody)) + default: + return HookResponse{}, fmt.Errorf("github API %d: %s", resp.StatusCode, githubMessage(respBody)) + } +} + +// ValidateToken probes GET /user with the given PAT, returning the +// authenticated login plus the scopes GitHub advertises in the +// X-OAuth-Scopes response header. A 401/403 yields ErrUnauthorized so +// the caller can reject token creation with a precise message. +// +// We treat X-OAuth-Scopes as the only authoritative source of scope +// information: it is what GitHub will actually enforce, so storing +// anything else (e.g. user-typed strings) just invites drift. +func (c *Client) ValidateToken(ctx context.Context, pat string) (TokenInfo, error) { + if pat == "" { + return TokenInfo{}, fmt.Errorf("PAT required") + } + req, err := http.NewRequestWithContext(ctx, http.MethodGet, c.BaseURL+"/user", nil) + if err != nil { + return TokenInfo{}, err + } + c.signRequest(req, pat) + resp, err := c.HTTPClient.Do(req) + if err != nil { + return TokenInfo{}, fmt.Errorf("github API: %w", err) + } + defer resp.Body.Close() + respBody, _ := io.ReadAll(resp.Body) + switch resp.StatusCode { + case http.StatusOK: + var u struct { + Login string `json:"login"` + } + if err := json.Unmarshal(respBody, &u); err != nil { + return TokenInfo{}, fmt.Errorf("parse /user response: %w", err) + } + info := TokenInfo{ + Login: u.Login, + Scopes: parseScopeHeader(resp.Header.Get("X-OAuth-Scopes")), + FineGrained: strings.HasPrefix(pat, "github_pat_"), + } + return info, nil + case http.StatusUnauthorized, http.StatusForbidden: + return TokenInfo{}, fmt.Errorf("%w: %s", ErrUnauthorized, githubMessage(respBody)) + default: + return TokenInfo{}, fmt.Errorf("github API %d: %s", resp.StatusCode, githubMessage(respBody)) + } +} + +// parseScopeHeader splits the comma-separated X-OAuth-Scopes value +// GitHub returns on classic PATs. An empty header (typical for +// fine-grained PATs or a token with no scopes) yields a nil slice +// rather than [""]; callers that need a stable JSON shape replace nil +// with []string{} at the boundary. +func parseScopeHeader(h string) []string { + h = strings.TrimSpace(h) + if h == "" { + return nil + } + parts := strings.Split(h, ",") + out := make([]string, 0, len(parts)) + for _, p := range parts { + s := strings.TrimSpace(p) + if s == "" { + continue + } + out = append(out, s) + } + if len(out) == 0 { + return nil + } + return out +} + +// DeleteWebhook calls DELETE /repos/{owner}/{repo}/hooks/{id}. Treats +// 404 as success — if the hook is already gone the post-condition is +// satisfied. +func (c *Client) DeleteWebhook(ctx context.Context, owner, repo, pat string, hookID int64) error { + endpoint := fmt.Sprintf("%s/repos/%s/%s/hooks/%d", c.BaseURL, + url.PathEscape(owner), url.PathEscape(repo), hookID) + req, err := http.NewRequestWithContext(ctx, http.MethodDelete, endpoint, nil) + if err != nil { + return err + } + c.signRequest(req, pat) + resp, err := c.HTTPClient.Do(req) + if err != nil { + return fmt.Errorf("github API: %w", err) + } + defer resp.Body.Close() + respBody, _ := io.ReadAll(resp.Body) + switch resp.StatusCode { + case http.StatusNoContent, http.StatusNotFound: + return nil + case http.StatusUnauthorized, http.StatusForbidden: + return fmt.Errorf("%w: %s", ErrUnauthorized, githubMessage(respBody)) + default: + return fmt.Errorf("github API %d: %s", resp.StatusCode, githubMessage(respBody)) + } +} + +// --- helpers --- + +func (c *Client) signRequest(req *http.Request, pat string) { + req.Header.Set("Authorization", "token "+pat) + req.Header.Set("Accept", "application/vnd.github+json") + req.Header.Set("X-GitHub-Api-Version", "2022-11-28") + req.Header.Set("User-Agent", "cix-server") +} + +func insecureSSLValue(insecure bool) string { + if insecure { + return "1" + } + return "0" +} + +func githubMessage(body []byte) string { + body = bytes.TrimSpace(body) + if len(body) == 0 { + return "no body" + } + var env struct { + Message string `json:"message"` + } + if err := json.Unmarshal(body, &env); err == nil && env.Message != "" { + return env.Message + } + // Fall back to the raw body, truncated to keep error strings sane. + const maxLen = 200 + s := strings.TrimSpace(string(body)) + if len(s) > maxLen { + s = s[:maxLen] + "..." + } + return s +} + +// ParseOwnerRepo extracts {owner, repo} from an https://github.com/owner/repo URL. +// Mirrors the same logic as workspacerepos.parseGitHubURL but kept private +// to that package — we re-implement here to avoid an import cycle. +func ParseOwnerRepo(githubURL string) (owner, repo string, err error) { + u, perr := url.Parse(strings.TrimSpace(githubURL)) + if perr != nil { + return "", "", fmt.Errorf("invalid URL: %w", perr) + } + if !strings.EqualFold(u.Host, "github.com") { + return "", "", fmt.Errorf("not a github.com URL") + } + path := strings.Trim(u.Path, "/") + path = strings.TrimSuffix(path, ".git") + parts := strings.Split(path, "/") + if len(parts) != 2 || parts[0] == "" || parts[1] == "" { + return "", "", fmt.Errorf("expected /owner/repo path, got %q", u.Path) + } + return parts[0], parts[1], nil +} diff --git a/server/internal/githubapi/githubapi_test.go b/server/internal/githubapi/githubapi_test.go new file mode 100644 index 0000000..9b1d9b5 --- /dev/null +++ b/server/internal/githubapi/githubapi_test.go @@ -0,0 +1,436 @@ +package githubapi + +import ( + "context" + "encoding/json" + "errors" + "io" + "net/http" + "net/http/httptest" + "strings" + "testing" +) + +// fakeServer returns a Client pointing at an httptest.Server that +// captures requests for later assertion. +type recordedReq struct { + Path string + Method string + Auth string + Body map[string]any +} + +func fakeServer(t *testing.T, handler http.Handler) (*Client, *[]recordedReq) { + t.Helper() + var recs []recordedReq + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + raw, _ := io.ReadAll(r.Body) + var body map[string]any + _ = json.Unmarshal(raw, &body) + recs = append(recs, recordedReq{ + Path: r.URL.Path, + Method: r.Method, + Auth: r.Header.Get("Authorization"), + Body: body, + }) + handler.ServeHTTP(w, r) + })) + t.Cleanup(srv.Close) + c := New() + c.BaseURL = srv.URL + return c, &recs +} + +func TestCreateWebhookSendsExpectedRequest(t *testing.T) { + c, recs := fakeServer(t, http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + w.Header().Set("Content-Type", "application/json") + w.WriteHeader(http.StatusCreated) + _, _ = w.Write([]byte(`{"id": 42, "url": "https://api.github.com/repos/o/r/hooks/42", "active": true}`)) + })) + hr, err := c.CreateWebhook(context.Background(), CreateWebhookOptions{ + Owner: "o", + Repo: "r", + PAT: "ghp_xxx", + URL: "https://cix.test/api/v1/webhooks/github/abc", + Secret: "s3cr3t", + }) + if err != nil { + t.Fatalf("CreateWebhook: %v", err) + } + if hr.ID != 42 { + t.Fatalf("expected id=42, got %d", hr.ID) + } + if len(*recs) != 1 { + t.Fatalf("expected 1 request, got %d", len(*recs)) + } + r := (*recs)[0] + if r.Path != "/repos/o/r/hooks" { + t.Fatalf("path: %q", r.Path) + } + if r.Method != http.MethodPost { + t.Fatalf("method: %q", r.Method) + } + if r.Auth != "token ghp_xxx" { + t.Fatalf("auth: %q", r.Auth) + } + if cfg, _ := r.Body["config"].(map[string]any); cfg["secret"] != "s3cr3t" { + t.Fatalf("secret not forwarded: %+v", r.Body) + } +} + +func TestCreateWebhookUnauthorized(t *testing.T) { + c, _ := fakeServer(t, http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + w.WriteHeader(http.StatusUnauthorized) + _, _ = w.Write([]byte(`{"message": "Bad credentials"}`)) + })) + _, err := c.CreateWebhook(context.Background(), CreateWebhookOptions{ + Owner: "o", Repo: "r", PAT: "x", URL: "https://x", Secret: "y", + }) + if !errors.Is(err, ErrUnauthorized) { + t.Fatalf("expected ErrUnauthorized, got %v", err) + } +} + +func TestCreateWebhookForbiddenIsUnauthorized(t *testing.T) { + c, _ := fakeServer(t, http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + w.WriteHeader(http.StatusForbidden) + _, _ = w.Write([]byte(`{"message": "Resource not accessible by personal access token"}`)) + })) + _, err := c.CreateWebhook(context.Background(), CreateWebhookOptions{ + Owner: "o", Repo: "r", PAT: "x", URL: "https://x", Secret: "y", + }) + if !errors.Is(err, ErrUnauthorized) { + t.Fatalf("expected ErrUnauthorized, got %v", err) + } + if !strings.Contains(err.Error(), "not accessible") { + t.Fatalf("error should surface github message, got %v", err) + } +} + +func TestDeleteWebhookTreats404AsSuccess(t *testing.T) { + c, _ := fakeServer(t, http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + w.WriteHeader(http.StatusNotFound) + })) + if err := c.DeleteWebhook(context.Background(), "o", "r", "x", 42); err != nil { + t.Fatalf("404 should be success on DELETE, got %v", err) + } +} + +func TestValidateTokenReturnsScopesFromHeader(t *testing.T) { + c, recs := fakeServer(t, http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + w.Header().Set("X-OAuth-Scopes", "repo, admin:repo_hook, read:org") + w.Header().Set("Content-Type", "application/json") + _, _ = w.Write([]byte(`{"login": "alice"}`)) + })) + info, err := c.ValidateToken(context.Background(), "ghp_xxx") + if err != nil { + t.Fatalf("ValidateToken: %v", err) + } + if info.Login != "alice" { + t.Fatalf("login: %q", info.Login) + } + want := []string{"repo", "admin:repo_hook", "read:org"} + if len(info.Scopes) != len(want) { + t.Fatalf("scopes: got %v, want %v", info.Scopes, want) + } + for i, s := range want { + if info.Scopes[i] != s { + t.Fatalf("scope[%d]=%q want %q", i, info.Scopes[i], s) + } + } + if info.FineGrained { + t.Fatalf("ghp_ prefix should not be fine-grained") + } + if len(*recs) != 1 || (*recs)[0].Path != "/user" { + t.Fatalf("expected GET /user, got %+v", *recs) + } +} + +func TestValidateTokenFineGrainedHasEmptyScopes(t *testing.T) { + c, _ := fakeServer(t, http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + // Fine-grained PATs: GitHub omits X-OAuth-Scopes entirely. + w.Header().Set("Content-Type", "application/json") + _, _ = w.Write([]byte(`{"login": "alice"}`)) + })) + info, err := c.ValidateToken(context.Background(), "github_pat_yyy") + if err != nil { + t.Fatalf("ValidateToken: %v", err) + } + if !info.FineGrained { + t.Fatalf("github_pat_ prefix should be fine-grained") + } + if len(info.Scopes) != 0 { + t.Fatalf("expected empty scopes for fine-grained, got %v", info.Scopes) + } +} + +func TestValidateTokenUnauthorized(t *testing.T) { + c, _ := fakeServer(t, http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + w.WriteHeader(http.StatusUnauthorized) + _, _ = w.Write([]byte(`{"message": "Bad credentials"}`)) + })) + _, err := c.ValidateToken(context.Background(), "bad") + if !errors.Is(err, ErrUnauthorized) { + t.Fatalf("expected ErrUnauthorized, got %v", err) + } + if !strings.Contains(err.Error(), "Bad credentials") { + t.Fatalf("error should surface github message, got %v", err) + } +} + +func TestValidateTokenEmptyHeaderYieldsNilScopes(t *testing.T) { + c, _ := fakeServer(t, http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + w.Header().Set("X-OAuth-Scopes", "") + _, _ = w.Write([]byte(`{"login": "alice"}`)) + })) + info, err := c.ValidateToken(context.Background(), "ghp_x") + if err != nil { + t.Fatalf("ValidateToken: %v", err) + } + if len(info.Scopes) != 0 { + t.Fatalf("empty header should yield empty scopes, got %v", info.Scopes) + } +} + +func TestListUserReposFollowsLinkHeader(t *testing.T) { + // Two-page response: first page sends Link rel=next pointing at + // page 2, which has no further Link header → terminator. + var baseURL string + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + switch r.URL.Query().Get("page") { + case "", "1": + w.Header().Set("Link", `<`+baseURL+`/user/repos?page=2>; rel="next", <`+baseURL+`/user/repos?page=2>; rel="last"`) + _, _ = w.Write([]byte(`[{"full_name":"o/r1","default_branch":"main","private":false,"html_url":"https://github.com/o/r1"}]`)) + case "2": + _, _ = w.Write([]byte(`[{"full_name":"o/r2","default_branch":"develop","private":true,"html_url":"https://github.com/o/r2"}]`)) + default: + http.Error(w, "unexpected", http.StatusBadRequest) + } + })) + t.Cleanup(srv.Close) + baseURL = srv.URL + c := New() + c.BaseURL = srv.URL + + repos, err := c.ListUserRepos(context.Background(), "ghp_x", 5) + if err != nil { + t.Fatalf("ListUserRepos: %v", err) + } + if len(repos) != 2 { + t.Fatalf("expected 2 repos across pages, got %d", len(repos)) + } + if repos[0].FullName != "o/r1" || repos[1].FullName != "o/r2" { + t.Fatalf("unexpected repos: %+v", repos) + } + if !repos[1].Private { + t.Fatalf("private flag should round-trip, got %+v", repos[1]) + } +} + +func TestListUserReposHonoursMaxPages(t *testing.T) { + // Server claims an infinite next-page chain; ListUserRepos must + // stop after maxPages so we don't run forever on a misbehaving + // upstream. + var baseURL string + page := 0 + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + page++ + w.Header().Set("Link", `<`+baseURL+`/user/repos?page=999>; rel="next"`) + _, _ = w.Write([]byte(`[{"full_name":"o/r","default_branch":"main"}]`)) + })) + t.Cleanup(srv.Close) + baseURL = srv.URL + c := New() + c.BaseURL = srv.URL + + _, err := c.ListUserRepos(context.Background(), "ghp_x", 3) + if err != nil { + t.Fatalf("ListUserRepos: %v", err) + } + if page != 3 { + t.Fatalf("expected exactly 3 page hits with maxPages=3, got %d", page) + } +} + +func TestListUserReposUnauthorized(t *testing.T) { + c, _ := fakeServer(t, http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + w.WriteHeader(http.StatusUnauthorized) + _, _ = w.Write([]byte(`{"message": "Bad credentials"}`)) + })) + _, err := c.ListUserRepos(context.Background(), "bad", 1) + if !errors.Is(err, ErrUnauthorized) { + t.Fatalf("expected ErrUnauthorized, got %v", err) + } +} + +func TestParseNextLink(t *testing.T) { + in := `; rel="next", ; rel="last"` + want := "https://api.github.com/user/repos?page=2" + if got := parseNextLink(in); got != want { + t.Fatalf("parseNextLink(%q) = %q, want %q", in, got, want) + } + if got := parseNextLink(""); got != "" { + t.Fatalf("empty header should yield empty, got %q", got) + } + // rel=last only — there's no next, must terminate. + if got := parseNextLink(`; rel="last"`); got != "" { + t.Fatalf("rel=last only should not advance, got %q", got) + } +} + +func TestListAccountsDerivedFromUserRepos(t *testing.T) { + // /user/repos returns repos under three owners: the PAT owner + // (alice/User), one Org (acme), and one User collaborator (bob). + // ListAccounts must dedupe, preserve case, and tag types correctly. + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + switch r.URL.Path { + case "/user": + _, _ = w.Write([]byte(`{"login":"alice","avatar_url":"https://x/alice"}`)) + case "/user/repos": + _, _ = w.Write([]byte(`[ + {"full_name":"alice/dotfiles","default_branch":"main","private":false,"html_url":"x","owner":{"login":"alice","type":"User","avatar_url":"https://x/alice"}}, + {"full_name":"acme/api","default_branch":"main","private":true,"html_url":"x","owner":{"login":"acme","type":"Organization","avatar_url":"https://x/acme"}}, + {"full_name":"acme/web","default_branch":"main","private":true,"html_url":"x","owner":{"login":"acme","type":"Organization","avatar_url":"https://x/acme"}}, + {"full_name":"bob/shared","default_branch":"main","private":false,"html_url":"x","owner":{"login":"bob","type":"User","avatar_url":"https://x/bob"}} + ]`)) + default: + http.Error(w, "unexpected: "+r.URL.Path, http.StatusNotFound) + } + })) + t.Cleanup(srv.Close) + c := New() + c.BaseURL = srv.URL + + got, err := c.ListAccounts(context.Background(), "ghp_x") + if err != nil { + t.Fatalf("ListAccounts: %v", err) + } + if len(got) != 3 { + t.Fatalf("expected user + acme + bob (deduped), got %d: %+v", len(got), got) + } + if got[0].Login != "alice" || got[0].Type != AccountTypeUser { + t.Fatalf("first must be the PAT owner, got %+v", got[0]) + } + // acme should be tagged as org (GitHub's "Organization" → our "org"). + var acmeFound bool + for _, a := range got { + if a.Login == "acme" { + acmeFound = true + if a.Type != AccountTypeOrg { + t.Fatalf("acme must be org-type, got %+v", a) + } + } + } + if !acmeFound { + t.Fatalf("acme org should be present in %+v", got) + } +} + +func TestListAccountsTokenOwnerNotDuplicated(t *testing.T) { + // /user/repos includes the user's own repos. The /user step also + // reports the same login. Without dedupe the owner would appear + // twice — once from /user, once from /user/repos. + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + switch r.URL.Path { + case "/user": + _, _ = w.Write([]byte(`{"login":"Alice"}`)) + case "/user/repos": + // owner.login casing differs from /user response — GitHub + // is case-insensitive but capitalisation can drift; we must + // dedupe regardless. + _, _ = w.Write([]byte(`[ + {"full_name":"alice/repo","default_branch":"main","private":false,"html_url":"x","owner":{"login":"alice","type":"User","avatar_url":""}} + ]`)) + } + })) + t.Cleanup(srv.Close) + c := New() + c.BaseURL = srv.URL + + got, err := c.ListAccounts(context.Background(), "ghp_x") + if err != nil { + t.Fatalf("ListAccounts: %v", err) + } + if len(got) != 1 { + t.Fatalf("PAT owner must not be duplicated, got %+v", got) + } +} + +func TestListAccountsSurvivesUserReposError(t *testing.T) { + // If /user/repos fails we must still return at least the personal + // account — the dialog needs something to render. + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + switch r.URL.Path { + case "/user": + _, _ = w.Write([]byte(`{"login":"alice"}`)) + case "/user/repos": + w.WriteHeader(http.StatusForbidden) + _, _ = w.Write([]byte(`{"message":"saml enforced"}`)) + } + })) + t.Cleanup(srv.Close) + c := New() + c.BaseURL = srv.URL + + got, err := c.ListAccounts(context.Background(), "ghp_x") + if err != nil { + t.Fatalf("ListAccounts should swallow /user/repos errors, got %v", err) + } + if len(got) != 1 || got[0].Login != "alice" { + t.Fatalf("expected just the user, got %+v", got) + } +} + +func TestListReposForAccountUsesCorrectEndpoint(t *testing.T) { + var lastPath string + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + lastPath = r.URL.Path + _, _ = w.Write([]byte(`[{"full_name":"x/y","default_branch":"main","private":false,"html_url":"https://github.com/x/y"}]`)) + })) + t.Cleanup(srv.Close) + c := New() + c.BaseURL = srv.URL + + if _, err := c.ListReposForAccount(context.Background(), "ghp", AccountTypeUser, "alice", 1); err != nil { + t.Fatalf("user: %v", err) + } + if lastPath != "/users/alice/repos" { + t.Fatalf("user account → /users/{login}/repos, got %q", lastPath) + } + + if _, err := c.ListReposForAccount(context.Background(), "ghp", AccountTypeOrg, "acme", 1); err != nil { + t.Fatalf("org: %v", err) + } + if lastPath != "/orgs/acme/repos" { + t.Fatalf("org account → /orgs/{login}/repos, got %q", lastPath) + } +} + +func TestParseOwnerRepo(t *testing.T) { + cases := map[string][2]string{ + "https://github.com/spf13/cobra": {"spf13", "cobra"}, + "https://github.com/spf13/cobra.git": {"spf13", "cobra"}, + "https://github.com/spf13/cobra/": {"spf13", "cobra"}, + "https://github.com/spf13/cobra.git/": {"spf13", "cobra"}, + } + for in, want := range cases { + o, r, err := ParseOwnerRepo(in) + if err != nil { + t.Errorf("%q: %v", in, err) + continue + } + if o != want[0] || r != want[1] { + t.Errorf("%q → (%q,%q), want %v", in, o, r, want) + } + } + bad := []string{ + "https://gitlab.com/x/y", + "https://github.com/onlyowner", + "not a url at all", + } + for _, b := range bad { + if _, _, err := ParseOwnerRepo(b); err == nil { + t.Errorf("%q: expected error", b) + } + } +} diff --git a/server/internal/githubtokens/githubtokens.go b/server/internal/githubtokens/githubtokens.go new file mode 100644 index 0000000..af80db8 --- /dev/null +++ b/server/internal/githubtokens/githubtokens.go @@ -0,0 +1,261 @@ +// Package githubtokens manages GitHub Personal Access Tokens used by the +// workspaces feature to clone private repositories and (optionally) register +// webhooks. Tokens are encrypted at rest via internal/secrets — the plaintext +// is supplied by the dashboard exactly once on POST /api/v1/github-tokens +// and never returned in any subsequent response. +// +// Server-wide shared model: any authenticated user may create / select / delete +// tokens. Track of "owner" is omitted in PR1 by design — when the feature +// matures we'll re-introduce ownership if multi-team semantics are wanted. +// +// Scope validation against the real GitHub API is deferred to PR2 (which +// brings in the go-github client). PR1 stores `scopes` as a free-form JSON +// string so the field is stable in the schema. +package githubtokens + +import ( + "context" + "database/sql" + "encoding/json" + "errors" + "fmt" + "strings" + "time" + + "github.com/google/uuid" + + "github.com/dvcdsys/code-index/server/internal/secrets" +) + +// Errors. +var ( + ErrNotFound = errors.New("github token not found") + ErrNameTaken = errors.New("github token name already in use") + ErrNameEmpty = errors.New("github token name is required") + ErrEmpty = errors.New("github token value is required") +) + +// Token is the metadata view. Plaintext is NEVER stored in this struct — use +// Reveal() to obtain it for an outbound API call (e.g. webhook registration). +type Token struct { + ID string + Name string + Scopes []string // best-effort; empty until validated by the GitHub API in PR2 + CreatedAt time.Time + LastUsedAt *time.Time +} + +// Service wraps the github_tokens table + secrets.Service for encryption. +type Service struct { + DB *sql.DB + Secrets *secrets.Service +} + +// New returns a Service. Both DB and Secrets are required. +func New(db *sql.DB, sec *secrets.Service) *Service { + return &Service{DB: db, Secrets: sec} +} + +// Create stores a new token. The plaintext value is encrypted via +// Secrets.Encrypt and never persisted in cleartext. Returns the metadata +// view — the caller already has the plaintext and is responsible for not +// echoing it back to the user. +func (s *Service) Create(ctx context.Context, name, plaintext string, scopes []string) (Token, error) { + name = strings.TrimSpace(name) + if name == "" { + return Token{}, ErrNameEmpty + } + if strings.TrimSpace(plaintext) == "" { + return Token{}, ErrEmpty + } + if s.Secrets == nil { + return Token{}, fmt.Errorf("secrets service not configured") + } + + encrypted, err := s.Secrets.Encrypt([]byte(plaintext)) + if err != nil { + return Token{}, fmt.Errorf("encrypt: %w", err) + } + + id := uuid.NewString() + now := time.Now().UTC().Format(time.RFC3339Nano) + + scopesJSON, err := encodeScopes(scopes) + if err != nil { + return Token{}, err + } + + _, err = s.DB.ExecContext(ctx, + `INSERT INTO github_tokens (id, name, encrypted, scopes, created_at) + VALUES (?, ?, ?, ?, ?)`, + id, name, encrypted, nullableString(scopesJSON), now, + ) + if err != nil { + if isUniqueConstraintViolation(err) { + return Token{}, ErrNameTaken + } + return Token{}, fmt.Errorf("insert github token: %w", err) + } + return s.GetByID(ctx, id) +} + +// GetByID returns metadata for a token. Plaintext is not loaded — use +// Reveal for that. +func (s *Service) GetByID(ctx context.Context, id string) (Token, error) { + row := s.DB.QueryRowContext(ctx, + `SELECT id, name, scopes, created_at, last_used_at + FROM github_tokens WHERE id = ?`, id) + return scanRow(row) +} + +// List returns all tokens, newest first. Plaintext is never loaded. +func (s *Service) List(ctx context.Context) ([]Token, error) { + rows, err := s.DB.QueryContext(ctx, + `SELECT id, name, scopes, created_at, last_used_at + FROM github_tokens ORDER BY created_at DESC`) + if err != nil { + return nil, fmt.Errorf("list github tokens: %w", err) + } + defer rows.Close() + return scanRows(rows) +} + +// Reveal loads the encrypted blob for a token and decrypts it. Used by the +// workspaces clone / webhook flows when an outbound GitHub API call needs +// the plaintext. ErrNotFound when the row is absent; decryption errors are +// wrapped to discourage oracle-style probing. +func (s *Service) Reveal(ctx context.Context, id string) (string, error) { + if s.Secrets == nil { + return "", fmt.Errorf("secrets service not configured") + } + row := s.DB.QueryRowContext(ctx, `SELECT encrypted FROM github_tokens WHERE id = ?`, id) + var enc []byte + if err := row.Scan(&enc); err != nil { + if errors.Is(err, sql.ErrNoRows) { + return "", ErrNotFound + } + return "", fmt.Errorf("load github token: %w", err) + } + plain, err := s.Secrets.Decrypt(enc) + if err != nil { + return "", fmt.Errorf("decrypt github token: %w", err) + } + return string(plain), nil +} + +// Touch updates last_used_at on every successful Reveal-callsite. Not +// invoked by Reveal itself because some callers (e.g. background cloners) +// hit the same token many times in a tight loop and we'd rather take one +// timestamp at the end of the batch. +func (s *Service) Touch(ctx context.Context, id string) error { + now := time.Now().UTC().Format(time.RFC3339Nano) + _, err := s.DB.ExecContext(ctx, + `UPDATE github_tokens SET last_used_at = ? WHERE id = ?`, now, id) + if err != nil { + return fmt.Errorf("touch github token: %w", err) + } + return nil +} + +// Delete removes a token. ErrNotFound when absent. +func (s *Service) Delete(ctx context.Context, id string) error { + res, err := s.DB.ExecContext(ctx, `DELETE FROM github_tokens WHERE id = ?`, id) + if err != nil { + return fmt.Errorf("delete github token: %w", err) + } + n, err := res.RowsAffected() + if err != nil { + return fmt.Errorf("rows affected: %w", err) + } + if n == 0 { + return ErrNotFound + } + return nil +} + +// CountWithEncryption returns the number of rows in github_tokens. Used at +// startup to decide whether a missing encryption key is fatal (any encrypted +// row would otherwise be unreadable). +func (s *Service) CountWithEncryption(ctx context.Context) (int, error) { + var n int + err := s.DB.QueryRowContext(ctx, `SELECT COUNT(1) FROM github_tokens`).Scan(&n) + if err != nil { + return 0, fmt.Errorf("count github tokens: %w", err) + } + return n, nil +} + +// --- helpers --- + +func scanRow(r interface{ Scan(dest ...any) error }) (Token, error) { + var ( + t Token + scopesJSON sql.NullString + createdAt string + lastUsedAt sql.NullString + ) + err := r.Scan(&t.ID, &t.Name, &scopesJSON, &createdAt, &lastUsedAt) + if err != nil { + if errors.Is(err, sql.ErrNoRows) { + return Token{}, ErrNotFound + } + return Token{}, fmt.Errorf("scan github token: %w", err) + } + t.Scopes = decodeScopes(scopesJSON.String) + t.CreatedAt, _ = time.Parse(time.RFC3339Nano, createdAt) + if lastUsedAt.Valid { + ts, _ := time.Parse(time.RFC3339Nano, lastUsedAt.String) + t.LastUsedAt = &ts + } + return t, nil +} + +func scanRows(rows *sql.Rows) ([]Token, error) { + out := []Token{} + for rows.Next() { + t, err := scanRow(rows) + if err != nil { + return nil, err + } + out = append(out, t) + } + return out, rows.Err() +} + +func encodeScopes(scopes []string) (string, error) { + if len(scopes) == 0 { + return "", nil + } + b, err := json.Marshal(scopes) + if err != nil { + return "", fmt.Errorf("encode scopes: %w", err) + } + return string(b), nil +} + +func decodeScopes(s string) []string { + if s == "" { + return nil + } + var out []string + if err := json.Unmarshal([]byte(s), &out); err != nil { + return nil + } + return out +} + +func nullableString(s string) any { + if s == "" { + return nil + } + return s +} + +func isUniqueConstraintViolation(err error) bool { + if err == nil { + return false + } + msg := err.Error() + return strings.Contains(msg, "UNIQUE constraint failed") || + strings.Contains(msg, "constraint failed: UNIQUE") +} diff --git a/server/internal/githubtokens/githubtokens_test.go b/server/internal/githubtokens/githubtokens_test.go new file mode 100644 index 0000000..846fb09 --- /dev/null +++ b/server/internal/githubtokens/githubtokens_test.go @@ -0,0 +1,152 @@ +package githubtokens + +import ( + "context" + "errors" + "testing" + + "github.com/dvcdsys/code-index/server/internal/db" + "github.com/dvcdsys/code-index/server/internal/secrets" +) + +func mustOpen(t *testing.T) *Service { + t.Helper() + database, err := db.Open(":memory:") + if err != nil { + t.Fatalf("open db: %v", err) + } + t.Cleanup(func() { _ = database.Close() }) + + t.Setenv("CIX_SECRET_KEY", "") + t.Setenv("CIX_SECRET_KEYFILE", "") + sec, err := secrets.Open(secrets.OpenOptions{ + DataDir: t.TempDir(), + AllowGenerate: true, + }) + if err != nil { + t.Fatalf("open secrets: %v", err) + } + return New(database, sec) +} + +func TestCreateAndReveal(t *testing.T) { + svc := mustOpen(t) + ctx := context.Background() + + tok, err := svc.Create(ctx, "personal", "ghp_secret_value", []string{"repo", "admin:repo_hook"}) + if err != nil { + t.Fatalf("Create: %v", err) + } + if tok.ID == "" || tok.Name != "personal" { + t.Fatalf("unexpected token: %+v", tok) + } + if len(tok.Scopes) != 2 || tok.Scopes[0] != "repo" { + t.Fatalf("scopes round-trip failed: %+v", tok.Scopes) + } + + plain, err := svc.Reveal(ctx, tok.ID) + if err != nil { + t.Fatalf("Reveal: %v", err) + } + if plain != "ghp_secret_value" { + t.Fatalf("Reveal returned %q, want plaintext", plain) + } +} + +func TestCreateRejectsEmpty(t *testing.T) { + svc := mustOpen(t) + ctx := context.Background() + if _, err := svc.Create(ctx, " ", "x", nil); !errors.Is(err, ErrNameEmpty) { + t.Fatalf("expected ErrNameEmpty, got %v", err) + } + if _, err := svc.Create(ctx, "name", " ", nil); !errors.Is(err, ErrEmpty) { + t.Fatalf("expected ErrEmpty, got %v", err) + } +} + +func TestCreateDuplicateName(t *testing.T) { + svc := mustOpen(t) + ctx := context.Background() + if _, err := svc.Create(ctx, "personal", "v1", nil); err != nil { + t.Fatalf("first: %v", err) + } + if _, err := svc.Create(ctx, "personal", "v2", nil); !errors.Is(err, ErrNameTaken) { + t.Fatalf("expected ErrNameTaken, got %v", err) + } +} + +func TestListDoesNotLeakPlaintext(t *testing.T) { + svc := mustOpen(t) + ctx := context.Background() + plaintext := "ghp_some_secret_we_should_never_see_in_list" + if _, err := svc.Create(ctx, "personal", plaintext, nil); err != nil { + t.Fatalf("Create: %v", err) + } + list, err := svc.List(ctx) + if err != nil { + t.Fatalf("List: %v", err) + } + if len(list) != 1 { + t.Fatalf("expected 1 token, got %d", len(list)) + } + // The Token struct does not carry a plaintext field — verify by + // re-encoding to JSON-like representation and asserting the plaintext + // is nowhere in it. A direct field check is the simplest assertion: + if list[0].Name == plaintext { + t.Fatalf("plaintext leaked into Name field") + } +} + +func TestDelete(t *testing.T) { + svc := mustOpen(t) + ctx := context.Background() + tok, _ := svc.Create(ctx, "x", "y", nil) + if err := svc.Delete(ctx, tok.ID); err != nil { + t.Fatalf("Delete: %v", err) + } + if err := svc.Delete(ctx, tok.ID); !errors.Is(err, ErrNotFound) { + t.Fatalf("expected ErrNotFound on second delete, got %v", err) + } +} + +func TestRevealNotFound(t *testing.T) { + svc := mustOpen(t) + if _, err := svc.Reveal(context.Background(), "no-such-id"); !errors.Is(err, ErrNotFound) { + t.Fatalf("expected ErrNotFound, got %v", err) + } +} + +func TestTouch(t *testing.T) { + svc := mustOpen(t) + ctx := context.Background() + tok, _ := svc.Create(ctx, "x", "y", nil) + if tok.LastUsedAt != nil { + t.Fatalf("LastUsedAt should be nil on fresh token") + } + if err := svc.Touch(ctx, tok.ID); err != nil { + t.Fatalf("Touch: %v", err) + } + updated, err := svc.GetByID(ctx, tok.ID) + if err != nil { + t.Fatalf("GetByID: %v", err) + } + if updated.LastUsedAt == nil { + t.Fatalf("LastUsedAt should be set after Touch") + } +} + +func TestCountWithEncryption(t *testing.T) { + svc := mustOpen(t) + ctx := context.Background() + n, err := svc.CountWithEncryption(ctx) + if err != nil || n != 0 { + t.Fatalf("expected (0, nil), got (%d, %v)", n, err) + } + if _, err := svc.Create(ctx, "a", "b", nil); err != nil { + t.Fatalf("Create: %v", err) + } + n, _ = svc.CountWithEncryption(ctx) + if n != 1 { + t.Fatalf("expected 1, got %d", n) + } +} diff --git a/server/internal/httpapi/githubtokens.go b/server/internal/httpapi/githubtokens.go new file mode 100644 index 0000000..6375845 --- /dev/null +++ b/server/internal/httpapi/githubtokens.go @@ -0,0 +1,306 @@ +package httpapi + +import ( + "encoding/json" + "errors" + "net/http" + "strings" + "time" + + "github.com/dvcdsys/code-index/server/internal/githubapi" + "github.com/dvcdsys/code-index/server/internal/githubtokens" + "github.com/dvcdsys/code-index/server/internal/httpapi/openapi" +) + +// githubAPI returns a per-request GitHub client. The Deps override lets +// tests point at an httptest server; in production BaseURL stays at +// the canonical api.github.com via githubapi.New. +func (s *Server) githubAPI() *githubapi.Client { + c := githubapi.New() + if s.Deps.GithubAPIBaseURL != "" { + c.BaseURL = s.Deps.GithubAPIBaseURL + } + return c +} + +// githubTokenPayload mirrors openapi.GithubToken on the wire. Plaintext is +// never carried — only the metadata. The plaintext only ever surfaces on +// the very first POST response (see CreateGithubToken), and only because +// the caller already supplied it in the request. +type githubTokenPayload struct { + ID string `json:"id"` + Name string `json:"name"` + Scopes []string `json:"scopes"` + CreatedAt time.Time `json:"created_at"` + LastUsedAt *time.Time `json:"last_used_at,omitempty"` +} + +func githubTokenToPayload(t githubtokens.Token) githubTokenPayload { + scopes := t.Scopes + if scopes == nil { + scopes = []string{} + } + return githubTokenPayload{ + ID: t.ID, + Name: t.Name, + Scopes: scopes, + CreatedAt: t.CreatedAt, + LastUsedAt: t.LastUsedAt, + } +} + +// githubTokensUnavailable returns 503 when the feature flag is off OR the +// service is nil (e.g. no encryption key configured at boot). +func (s *Server) githubTokensUnavailable(w http.ResponseWriter) bool { + if !s.Deps.WorkspacesEnabled || s.Deps.GithubTokens == nil { + writeError(w, http.StatusServiceUnavailable, "workspaces feature is disabled (set CIX_WORKSPACES_ENABLED=true and restart)") + return true + } + return false +} + +// ListGithubTokens — GET /api/v1/github-tokens. +func (s *Server) ListGithubTokens(w http.ResponseWriter, r *http.Request) { + if s.githubTokensUnavailable(w) { + return + } + list, err := s.Deps.GithubTokens.List(r.Context()) + if err != nil { + writeError(w, http.StatusInternalServerError, "could not list github tokens") + return + } + out := make([]githubTokenPayload, 0, len(list)) + for _, t := range list { + out = append(out, githubTokenToPayload(t)) + } + writeJSON(w, http.StatusOK, map[string]any{ + "tokens": out, + "total": len(out), + }) +} + +// CreateGithubToken — POST /api/v1/github-tokens. The plaintext token +// arrives in the request body, gets encrypted, and is then dropped on the +// floor — only the metadata view comes back to the caller. We deliberately +// do NOT echo the plaintext in the response: the caller already has it, +// and re-serialising it would be a needless place for it to leak. +// +// Scopes are NOT taken from the request body — GitHub is the only +// source of truth, so we call GET /user with the PAT, parse +// X-OAuth-Scopes from the response header, and store what GitHub +// actually advertises. The Scopes field on the request stays for +// backwards compatibility with older clients but is ignored. +func (s *Server) CreateGithubToken(w http.ResponseWriter, r *http.Request) { + if s.githubTokensUnavailable(w) { + return + } + var body openapi.CreateGithubTokenRequest + if err := json.NewDecoder(r.Body).Decode(&body); err != nil { + writeError(w, http.StatusUnprocessableEntity, "invalid JSON body") + return + } + if body.Token == "" { + writeError(w, http.StatusUnprocessableEntity, "token value is required") + return + } + + info, verr := s.githubAPI().ValidateToken(r.Context(), body.Token) + if verr != nil { + if errors.Is(verr, githubapi.ErrUnauthorized) { + writeError(w, http.StatusUnprocessableEntity, + "GitHub rejected the token: "+verr.Error()) + return + } + writeError(w, http.StatusBadGateway, + "could not validate token with GitHub: "+verr.Error()) + return + } + + tok, err := s.Deps.GithubTokens.Create(r.Context(), body.Name, body.Token, info.Scopes) + if err != nil { + switch { + case errors.Is(err, githubtokens.ErrNameEmpty): + writeError(w, http.StatusUnprocessableEntity, "name is required") + case errors.Is(err, githubtokens.ErrEmpty): + writeError(w, http.StatusUnprocessableEntity, "token value is required") + case errors.Is(err, githubtokens.ErrNameTaken): + writeError(w, http.StatusConflict, "token name already exists") + default: + writeError(w, http.StatusInternalServerError, "could not store github token") + } + return + } + writeJSON(w, http.StatusCreated, githubTokenToPayload(tok)) +} + +// ListTokenAccounts — GET /api/v1/github-tokens/{id}/accounts. +// +// Returns the PAT owner plus every org the PAT can see (/user/orgs). +// The dashboard uses this as the first step of the add-repo flow so +// the operator can drill into a specific account before picking a +// repository — useful when /user/repos misses SAML-protected org +// repos that only surface under /orgs/{login}/repos. +func (s *Server) ListTokenAccounts(w http.ResponseWriter, r *http.Request, id string) { + if s.githubTokensUnavailable(w) { + return + } + + pat, err := s.Deps.GithubTokens.Reveal(r.Context(), id) + if err != nil { + if errors.Is(err, githubtokens.ErrNotFound) { + writeError(w, http.StatusNotFound, "github token not found") + return + } + writeError(w, http.StatusInternalServerError, "could not load github token") + return + } + + accounts, lerr := s.githubAPI().ListAccounts(r.Context(), pat) + if lerr != nil { + if errors.Is(lerr, githubapi.ErrUnauthorized) { + writeError(w, http.StatusUnprocessableEntity, + "GitHub rejected the token: "+lerr.Error()) + return + } + writeError(w, http.StatusBadGateway, + "could not list accounts via GitHub: "+lerr.Error()) + return + } + _ = s.Deps.GithubTokens.Touch(r.Context(), id) + + out := make([]openapi.GithubAccount, 0, len(accounts)) + for _, a := range accounts { + out = append(out, openapi.GithubAccount{ + Login: a.Login, + Type: openapi.GithubAccountType(a.Type), + AvatarUrl: ptrStr(a.AvatarURL), + }) + } + writeJSON(w, http.StatusOK, map[string]any{ + "accounts": out, + "total": len(out), + }) +} + +// ListTokenRepos — GET /api/v1/github-tokens/{id}/repos. +// +// Reveals the PAT server-side and returns the repos visible to it. +// When `account` is set, the server scopes the listing to that +// account's /users/{login}/repos or /orgs/{login}/repos endpoint; +// when not, it falls back to /user/repos (affiliations-aggregated). +// +// Up to 500 repos (5 pages × 100) so a worst-case org-member with +// lots of affiliations doesn't have to deal with infinite scroll. The +// optional ?q= substring filter is applied server-side so the +// dashboard fetch stays a single round-trip. +func (s *Server) ListTokenRepos( + w http.ResponseWriter, + r *http.Request, + id string, + params openapi.ListTokenReposParams, +) { + if s.githubTokensUnavailable(w) { + return + } + + pat, err := s.Deps.GithubTokens.Reveal(r.Context(), id) + if err != nil { + if errors.Is(err, githubtokens.ErrNotFound) { + writeError(w, http.StatusNotFound, "github token not found") + return + } + writeError(w, http.StatusInternalServerError, "could not load github token") + return + } + + const maxPages = 5 + + var ( + repos []githubapi.Repo + lerr error + ) + if params.Account != nil && *params.Account != "" { + if params.AccountType == nil { + writeError(w, http.StatusUnprocessableEntity, + "account_type is required when account is set") + return + } + accountType := githubapi.AccountType(*params.AccountType) + repos, lerr = s.githubAPI().ListReposForAccount( + r.Context(), pat, accountType, *params.Account, maxPages, + ) + } else { + repos, lerr = s.githubAPI().ListUserRepos(r.Context(), pat, maxPages) + } + if lerr != nil { + if errors.Is(lerr, githubapi.ErrUnauthorized) { + writeError(w, http.StatusUnprocessableEntity, + "GitHub rejected the token: "+lerr.Error()) + return + } + if errors.Is(lerr, githubapi.ErrNotFound) { + writeError(w, http.StatusNotFound, + "account not found on GitHub: "+lerr.Error()) + return + } + writeError(w, http.StatusBadGateway, + "could not list repos via GitHub: "+lerr.Error()) + return + } + _ = s.Deps.GithubTokens.Touch(r.Context(), id) + + // Optional client-supplied filter — applied here so the dashboard + // fetch is a single round-trip even for larger result sets. + if params.Q != nil && *params.Q != "" { + needle := strings.ToLower(*params.Q) + filtered := repos[:0] + for _, rp := range repos { + if strings.Contains(strings.ToLower(rp.FullName), needle) { + filtered = append(filtered, rp) + } + } + repos = filtered + } + + out := make([]openapi.GithubRepo, 0, len(repos)) + for _, rp := range repos { + desc := rp.Description + out = append(out, openapi.GithubRepo{ + FullName: rp.FullName, + DefaultBranch: rp.DefaultBranch, + Private: rp.Private, + HtmlUrl: rp.HTMLURL, + Description: ptrStr(desc), + }) + } + writeJSON(w, http.StatusOK, map[string]any{ + "repos": out, + "total": len(out), + }) +} + +// ptrStr returns nil for the empty string and &s otherwise, matching +// the OpenAPI nullable + omitempty convention without leaking "" into +// the wire format. +func ptrStr(s string) *string { + if s == "" { + return nil + } + return &s +} + +// DeleteGithubToken — DELETE /api/v1/github-tokens/{id}. +func (s *Server) DeleteGithubToken(w http.ResponseWriter, r *http.Request, id string) { + if s.githubTokensUnavailable(w) { + return + } + if err := s.Deps.GithubTokens.Delete(r.Context(), id); err != nil { + if errors.Is(err, githubtokens.ErrNotFound) { + writeError(w, http.StatusNotFound, "github token not found") + return + } + writeError(w, http.StatusInternalServerError, "could not delete github token") + return + } + w.WriteHeader(http.StatusNoContent) +} diff --git a/server/internal/httpapi/indexing_test.go b/server/internal/httpapi/indexing_test.go index dcdce3a..c0b3adf 100644 --- a/server/internal/httpapi/indexing_test.go +++ b/server/internal/httpapi/indexing_test.go @@ -357,3 +357,67 @@ func TestSemanticSearch_HTTP_NoEmbeddings(t *testing.T) { t.Errorf("status=%d", w.Code) } } + +// TestSemanticSearch_DefaultMinScoreIs02 guards the lowered default +// for per-project search (was 0.4, now 0.2). A natural-language query +// whose best chunk scores in [0.2, 0.4] would silently return empty +// under the old default; the new default surfaces it. +// +// fakeEmbedder maps each byte of the input to one vector dimension +// (byte/255). Query "Z" → vec[0]=0.357, all other dims 0. Content of +// 16 'A's → vec[0..15]=0.255 each. Cosine ≈ 0.25 — squarely inside +// the (0.2, 0.4) gap the default change targets. +func TestSemanticSearch_DefaultMinScoreIs02(t *testing.T) { + d, hash := newIndexerTestDeps(t, "/proj-floor") + router := NewRouter(d) + + beginW := doRequest(t, router, http.MethodPost, "/api/v1/projects/"+hash+"/index/begin", map[string]any{}) + var begin indexBeginResponse + _ = json.Unmarshal(beginW.Body.Bytes(), &begin) + + // One file whose chunk will score ~0.25 against query "Z" with + // the byte-positional fakeEmbedder. The chunker splits at unique + // content; a single short body fits in one chunk. + body := "AAAAAAAAAAAAAAAAA" + doRequest(t, router, http.MethodPost, "/api/v1/projects/"+hash+"/index/files", map[string]any{ + "run_id": begin.RunID, + "files": []map[string]any{ + {"path": "/proj-floor/a.txt", "content": body, "content_hash": shaHex(body), "language": "text"}, + }, + }) + doRequest(t, router, http.MethodPost, "/api/v1/projects/"+hash+"/index/finish", map[string]any{ + "run_id": begin.RunID, + }) + + // Default request: no min_score. Under the new default 0.2 the + // ~0.25-cosine chunk survives; under the old default 0.4 it + // would have been filtered. + w := doRequest(t, router, http.MethodPost, "/api/v1/projects/"+hash+"/search", map[string]any{ + "query": "Z", + "limit": 10, + }) + if w.Code != http.StatusOK { + t.Fatalf("default min_score: status=%d body=%s", w.Code, w.Body.String()) + } + var defaultResp searchResponse + _ = json.Unmarshal(w.Body.Bytes(), &defaultResp) + if defaultResp.Total == 0 { + t.Fatalf("default min_score=0.2 should surface a chunk with cos≈0.25, got 0 results") + } + + // Same query at explicit min_score=0.4: chunk drops out. + w = doRequest(t, router, http.MethodPost, "/api/v1/projects/"+hash+"/search", map[string]any{ + "query": "Z", + "limit": 10, + "min_score": 0.4, + }) + if w.Code != http.StatusOK { + t.Fatalf("min_score=0.4: status=%d", w.Code) + } + var strictResp searchResponse + _ = json.Unmarshal(w.Body.Bytes(), &strictResp) + if strictResp.Total != 0 { + t.Fatalf("min_score=0.4 should reject the cos≈0.25 chunk, got %d results: %+v", + strictResp.Total, strictResp.Results) + } +} diff --git a/server/internal/httpapi/jobs.go b/server/internal/httpapi/jobs.go new file mode 100644 index 0000000..0b8aa23 --- /dev/null +++ b/server/internal/httpapi/jobs.go @@ -0,0 +1,89 @@ +package httpapi + +import ( + "encoding/json" + "net/http" + "time" + + "github.com/dvcdsys/code-index/server/internal/httpapi/openapi" + "github.com/dvcdsys/code-index/server/internal/jobs" +) + +type jobPayload struct { + ID string `json:"id"` + Type string `json:"type"` + Status string `json:"status"` + DedupeKey *string `json:"dedupe_key"` + Payload map[string]any `json:"payload"` + Attempts int `json:"attempts"` + MaxAttempts int `json:"max_attempts"` + LastError *string `json:"last_error"` + ScheduledAt time.Time `json:"scheduled_at"` + StartedAt *time.Time `json:"started_at"` + CompletedAt *time.Time `json:"completed_at"` + CreatedAt time.Time `json:"created_at"` +} + +func jobToPayload(j jobs.Job) jobPayload { + var dedupe *string + if j.DedupeKey != "" { + v := j.DedupeKey + dedupe = &v + } + var lastErr *string + if j.LastError != "" { + v := j.LastError + lastErr = &v + } + payload := map[string]any{} + if len(j.Payload) > 0 { + _ = json.Unmarshal(j.Payload, &payload) + } + return jobPayload{ + ID: j.ID, + Type: j.Type, + Status: j.Status, + DedupeKey: dedupe, + Payload: payload, + Attempts: j.Attempts, + MaxAttempts: j.MaxAttempts, + LastError: lastErr, + ScheduledAt: j.ScheduledAt, + StartedAt: j.StartedAt, + CompletedAt: j.CompletedAt, + CreatedAt: j.CreatedAt, + } +} + +// ListJobs — GET /api/v1/jobs. +func (s *Server) ListJobs(w http.ResponseWriter, r *http.Request, params openapi.ListJobsParams) { + if !s.Deps.WorkspacesEnabled || s.Deps.Jobs == nil { + writeError(w, http.StatusServiceUnavailable, "workspaces feature is disabled (set CIX_WORKSPACES_ENABLED=true and restart)") + return + } + status := "" + if params.Status != nil { + status = string(*params.Status) + } + jobType := "" + if params.Type != nil { + jobType = *params.Type + } + limit := 100 + if params.Limit != nil { + limit = *params.Limit + } + list, err := s.Deps.Jobs.List(r.Context(), status, jobType, limit) + if err != nil { + writeError(w, http.StatusInternalServerError, "could not list jobs") + return + } + out := make([]jobPayload, 0, len(list)) + for _, j := range list { + out = append(out, jobToPayload(j)) + } + writeJSON(w, http.StatusOK, map[string]any{ + "jobs": out, + "total": len(out), + }) +} diff --git a/server/internal/httpapi/middleware.go b/server/internal/httpapi/middleware.go index 1a909b4..9dc01cb 100644 --- a/server/internal/httpapi/middleware.go +++ b/server/internal/httpapi/middleware.go @@ -193,6 +193,12 @@ func isPublicPath(p string) bool { if strings.HasPrefix(p, "/dashboard/") { return true } + // GitHub webhook deliveries are authenticated by per-row HMAC secret, + // NOT by the cix Bearer/session auth — leave the prefix public so + // deliveries from github.com don't get 401'd at the gate. + if strings.HasPrefix(p, "/api/v1/webhooks/") { + return true + } return false } diff --git a/server/internal/httpapi/openapi/openapi.gen.go b/server/internal/httpapi/openapi/openapi.gen.go index a7b29ae..94436b0 100644 --- a/server/internal/httpapi/openapi/openapi.gen.go +++ b/server/internal/httpapi/openapi/openapi.gen.go @@ -26,6 +26,27 @@ const ( BearerAuthScopes bearerAuthContextKey = "bearerAuth.Scopes" ) +// Defines values for AddWorkspaceRepoRequestWebhookMode. +const ( + AddWorkspaceRepoRequestWebhookModeAuto AddWorkspaceRepoRequestWebhookMode = "auto" + AddWorkspaceRepoRequestWebhookModeDisabled AddWorkspaceRepoRequestWebhookMode = "disabled" + AddWorkspaceRepoRequestWebhookModeManual AddWorkspaceRepoRequestWebhookMode = "manual" +) + +// Valid indicates whether the value is a known member of the AddWorkspaceRepoRequestWebhookMode enum. +func (e AddWorkspaceRepoRequestWebhookMode) Valid() bool { + switch e { + case AddWorkspaceRepoRequestWebhookModeAuto: + return true + case AddWorkspaceRepoRequestWebhookModeDisabled: + return true + case AddWorkspaceRepoRequestWebhookModeManual: + return true + default: + return false + } +} + // Defines values for CreateUserRequestRole. const ( CreateUserRequestRoleAdmin CreateUserRequestRole = "admin" @@ -44,6 +65,24 @@ func (e CreateUserRequestRole) Valid() bool { } } +// Defines values for GithubAccountType. +const ( + GithubAccountTypeOrg GithubAccountType = "org" + GithubAccountTypeUser GithubAccountType = "user" +) + +// Valid indicates whether the value is a known member of the GithubAccountType enum. +func (e GithubAccountType) Valid() bool { + switch e { + case GithubAccountTypeOrg: + return true + case GithubAccountTypeUser: + return true + default: + return false + } +} + // Defines values for HealthResponseStatus. const ( HealthResponseStatusOk HealthResponseStatus = "ok" @@ -161,6 +200,30 @@ func (e IndexProgressResponseStatus) Valid() bool { } } +// Defines values for JobStatus. +const ( + JobStatusCompleted JobStatus = "completed" + JobStatusFailed JobStatus = "failed" + JobStatusPending JobStatus = "pending" + JobStatusRunning JobStatus = "running" +) + +// Valid indicates whether the value is a known member of the JobStatus enum. +func (e JobStatus) Valid() bool { + switch e { + case JobStatusCompleted: + return true + case JobStatusFailed: + return true + case JobStatusPending: + return true + case JobStatusRunning: + return true + default: + return false + } +} + // Defines values for MeResponseAuthMethod. const ( MeResponseAuthMethodApiKey MeResponseAuthMethod = "api_key" @@ -203,6 +266,33 @@ func (e ProjectStatus) Valid() bool { } } +// Defines values for ProjectWorkspaceEntryStatus. +const ( + ProjectWorkspaceEntryStatusCloning ProjectWorkspaceEntryStatus = "cloning" + ProjectWorkspaceEntryStatusFailed ProjectWorkspaceEntryStatus = "failed" + ProjectWorkspaceEntryStatusIndexed ProjectWorkspaceEntryStatus = "indexed" + ProjectWorkspaceEntryStatusIndexing ProjectWorkspaceEntryStatus = "indexing" + ProjectWorkspaceEntryStatusPending ProjectWorkspaceEntryStatus = "pending" +) + +// Valid indicates whether the value is a known member of the ProjectWorkspaceEntryStatus enum. +func (e ProjectWorkspaceEntryStatus) Valid() bool { + switch e { + case ProjectWorkspaceEntryStatusCloning: + return true + case ProjectWorkspaceEntryStatusFailed: + return true + case ProjectWorkspaceEntryStatusIndexed: + return true + case ProjectWorkspaceEntryStatusIndexing: + return true + case ProjectWorkspaceEntryStatusPending: + return true + default: + return false + } +} + // Defines values for ReferenceItemChunkType. const ( Reference ReferenceItemChunkType = "reference" @@ -218,6 +308,24 @@ func (e ReferenceItemChunkType) Valid() bool { } } +// Defines values for ReindexEnqueuedResponseStatus. +const ( + ReindexEnqueuedResponseStatusAlreadyRunning ReindexEnqueuedResponseStatus = "already_running" + ReindexEnqueuedResponseStatusEnqueued ReindexEnqueuedResponseStatus = "enqueued" +) + +// Valid indicates whether the value is a known member of the ReindexEnqueuedResponseStatus enum. +func (e ReindexEnqueuedResponseStatus) Valid() bool { + switch e { + case ReindexEnqueuedResponseStatusAlreadyRunning: + return true + case ReindexEnqueuedResponseStatusEnqueued: + return true + default: + return false + } +} + // Defines values for RuntimeConfigSource. const ( Db RuntimeConfigSource = "db" @@ -335,6 +443,123 @@ func (e UserWithStatsRole) Valid() bool { } } +// Defines values for WebhookAcceptedStatus. +const ( + WebhookAcceptedStatusAlreadyRunning WebhookAcceptedStatus = "already_running" + WebhookAcceptedStatusEnqueued WebhookAcceptedStatus = "enqueued" + WebhookAcceptedStatusIgnored WebhookAcceptedStatus = "ignored" + WebhookAcceptedStatusPing WebhookAcceptedStatus = "ping" +) + +// Valid indicates whether the value is a known member of the WebhookAcceptedStatus enum. +func (e WebhookAcceptedStatus) Valid() bool { + switch e { + case WebhookAcceptedStatusAlreadyRunning: + return true + case WebhookAcceptedStatusEnqueued: + return true + case WebhookAcceptedStatusIgnored: + return true + case WebhookAcceptedStatusPing: + return true + default: + return false + } +} + +// Defines values for WorkspaceRepoStatus. +const ( + WorkspaceRepoStatusCloning WorkspaceRepoStatus = "cloning" + WorkspaceRepoStatusFailed WorkspaceRepoStatus = "failed" + WorkspaceRepoStatusIndexed WorkspaceRepoStatus = "indexed" + WorkspaceRepoStatusIndexing WorkspaceRepoStatus = "indexing" + WorkspaceRepoStatusPending WorkspaceRepoStatus = "pending" +) + +// Valid indicates whether the value is a known member of the WorkspaceRepoStatus enum. +func (e WorkspaceRepoStatus) Valid() bool { + switch e { + case WorkspaceRepoStatusCloning: + return true + case WorkspaceRepoStatusFailed: + return true + case WorkspaceRepoStatusIndexed: + return true + case WorkspaceRepoStatusIndexing: + return true + case WorkspaceRepoStatusPending: + return true + default: + return false + } +} + +// Defines values for WorkspaceRepoWebhookMode. +const ( + Auto WorkspaceRepoWebhookMode = "auto" + Disabled WorkspaceRepoWebhookMode = "disabled" + Manual WorkspaceRepoWebhookMode = "manual" +) + +// Valid indicates whether the value is a known member of the WorkspaceRepoWebhookMode enum. +func (e WorkspaceRepoWebhookMode) Valid() bool { + switch e { + case Auto: + return true + case Disabled: + return true + case Manual: + return true + default: + return false + } +} + +// Defines values for WorkspaceSearchPendingRepoStatus. +const ( + WorkspaceSearchPendingRepoStatusCloning WorkspaceSearchPendingRepoStatus = "cloning" + WorkspaceSearchPendingRepoStatusFailed WorkspaceSearchPendingRepoStatus = "failed" + WorkspaceSearchPendingRepoStatusIndexing WorkspaceSearchPendingRepoStatus = "indexing" + WorkspaceSearchPendingRepoStatusPending WorkspaceSearchPendingRepoStatus = "pending" +) + +// Valid indicates whether the value is a known member of the WorkspaceSearchPendingRepoStatus enum. +func (e WorkspaceSearchPendingRepoStatus) Valid() bool { + switch e { + case WorkspaceSearchPendingRepoStatusCloning: + return true + case WorkspaceSearchPendingRepoStatusFailed: + return true + case WorkspaceSearchPendingRepoStatusIndexing: + return true + case WorkspaceSearchPendingRepoStatusPending: + return true + default: + return false + } +} + +// Defines values for WorkspaceSearchResponseStatus. +const ( + Empty WorkspaceSearchResponseStatus = "empty" + Ok WorkspaceSearchResponseStatus = "ok" + PartialFailure WorkspaceSearchResponseStatus = "partial_failure" +) + +// Valid indicates whether the value is a known member of the WorkspaceSearchResponseStatus enum. +func (e WorkspaceSearchResponseStatus) Valid() bool { + switch e { + case Empty: + return true + case Ok: + return true + case PartialFailure: + return true + default: + return false + } +} + // Defines values for ListApiKeysParamsOwner. const ( All ListApiKeysParamsOwner = "all" @@ -350,6 +575,48 @@ func (e ListApiKeysParamsOwner) Valid() bool { } } +// Defines values for ListTokenReposParamsAccountType. +const ( + ListTokenReposParamsAccountTypeOrg ListTokenReposParamsAccountType = "org" + ListTokenReposParamsAccountTypeUser ListTokenReposParamsAccountType = "user" +) + +// Valid indicates whether the value is a known member of the ListTokenReposParamsAccountType enum. +func (e ListTokenReposParamsAccountType) Valid() bool { + switch e { + case ListTokenReposParamsAccountTypeOrg: + return true + case ListTokenReposParamsAccountTypeUser: + return true + default: + return false + } +} + +// Defines values for ListJobsParamsStatus. +const ( + ListJobsParamsStatusCompleted ListJobsParamsStatus = "completed" + ListJobsParamsStatusFailed ListJobsParamsStatus = "failed" + ListJobsParamsStatusPending ListJobsParamsStatus = "pending" + ListJobsParamsStatusRunning ListJobsParamsStatus = "running" +) + +// Valid indicates whether the value is a known member of the ListJobsParamsStatus enum. +func (e ListJobsParamsStatus) Valid() bool { + switch e { + case ListJobsParamsStatusCompleted: + return true + case ListJobsParamsStatusFailed: + return true + case ListJobsParamsStatusPending: + return true + case ListJobsParamsStatusRunning: + return true + default: + return false + } +} + // Defines values for IndexFilesParamsAccept. const ( Applicationjson IndexFilesParamsAccept = "application/json" @@ -368,6 +635,40 @@ func (e IndexFilesParamsAccept) Valid() bool { } } +// AddWorkspaceRepoRequest defines model for AddWorkspaceRepoRequest. +type AddWorkspaceRepoRequest struct { + // AutoWebhook Legacy field. New clients should send `webhook_mode` instead. + // When both are provided, `webhook_mode` wins; when only the + // bool is set, `true` is mapped to `webhook_mode = "auto"`. + // Deprecated: this property has been marked as deprecated upstream, but no `x-deprecated-reason` was set + AutoWebhook *bool `json:"auto_webhook,omitempty"` + Branch string `json:"branch"` + + // GithubUrl https://github.com/owner/repo URL. + GithubUrl string `json:"github_url"` + + // TokenId Optional id of a stored GitHub PAT. Required for private repos. + TokenId *string `json:"token_id,omitempty"` + + // WebhookMode How the server should keep this repo fresh: + // - `auto` — server registers the webhook in GitHub on your + // behalf (requires admin:repo_hook on the PAT). + // - `manual` — server stores a webhook_secret and returns it + // once; you paste the URL + secret into GitHub yourself. + // - `disabled` — no auto-sync at all; reindex via the + // dashboard button only. + WebhookMode *AddWorkspaceRepoRequestWebhookMode `json:"webhook_mode,omitempty"` +} + +// AddWorkspaceRepoRequestWebhookMode How the server should keep this repo fresh: +// - `auto` — server registers the webhook in GitHub on your +// behalf (requires admin:repo_hook on the PAT). +// - `manual` — server stores a webhook_secret and returns it +// once; you paste the URL + secret into GitHub yourself. +// - `disabled` — no auto-sync at all; reindex via the +// dashboard button only. +type AddWorkspaceRepoRequestWebhookMode string + // ApiKey defines model for ApiKey. type ApiKey struct { CreatedAt time.Time `json:"created_at"` @@ -422,6 +723,25 @@ type CreateApiKeyRequest struct { Name string `json:"name"` } +// CreateGithubTokenRequest defines model for CreateGithubTokenRequest. +type CreateGithubTokenRequest struct { + // Name Human-friendly label shown in the dashboard. + Name string `json:"name"` + + // Scopes Ignored. The server now derives real scopes from GitHub's + // X-OAuth-Scopes response header by calling GET /user with the + // supplied token, so user-supplied scope hints are no longer + // consulted. Kept for backwards compatibility with older + // clients that still send it. + // Deprecated: this property has been marked as deprecated upstream, but no `x-deprecated-reason` was set + Scopes *[]string `json:"scopes,omitempty"` + + // Token The plaintext PAT. The server encrypts it with AES-GCM before + // persisting; this is the only request body that ever carries + // the plaintext value. + Token string `json:"token"` +} + // CreateProjectRequest defines model for CreateProjectRequest. type CreateProjectRequest struct { HostPath string `json:"host_path"` @@ -440,6 +760,13 @@ type CreateUserRequest struct { // CreateUserRequestRole defines model for CreateUserRequest.Role. type CreateUserRequestRole string +// CreateWorkspaceRequest defines model for CreateWorkspaceRequest. +type CreateWorkspaceRequest struct { + // Description Optional free-form description. + Description *string `json:"description,omitempty"` + Name string `json:"name"` +} + // DefinitionItem defines model for DefinitionItem. type DefinitionItem struct { EndLine int `json:"end_line"` @@ -530,6 +857,56 @@ type FileSearchResponse struct { Total int `json:"total"` } +// GithubAccount A GitHub account the PAT can see. The user owning the PAT is +// returned first, followed by every org accessible via /user/orgs. +// The dashboard's add-repo flow shows these in a Select before +// the repository picker so the operator can drill into a specific +// org instead of relying on the affiliations-aggregated view. +type GithubAccount struct { + AvatarUrl *string `json:"avatar_url,omitempty"` + + // Login GitHub login (user name or org slug). + Login string `json:"login"` + + // Type "user" for the PAT owner; "org" for organisations. + Type GithubAccountType `json:"type"` +} + +// GithubAccountType "user" for the PAT owner; "org" for organisations. +type GithubAccountType string + +// GithubRepo A repository visible to a stored PAT. +type GithubRepo struct { + // DefaultBranch The repo's default branch; the dashboard pre-fills the branch + // input with this when the user picks a repo from the list. + DefaultBranch string `json:"default_branch"` + Description *string `json:"description,omitempty"` + + // FullName owner/name + FullName string `json:"full_name"` + HtmlUrl string `json:"html_url"` + Private bool `json:"private"` +} + +// GithubToken defines model for GithubToken. +type GithubToken struct { + CreatedAt time.Time `json:"created_at"` + Id string `json:"id"` + LastUsedAt *time.Time `json:"last_used_at,omitempty"` + Name string `json:"name"` + + // Scopes Best-effort scope list. PR1 stores whatever the client supplies; + // later releases populate this by calling GitHub's /user endpoint + // with the plaintext token. + Scopes []string `json:"scopes"` +} + +// GithubTokenListResponse defines model for GithubTokenListResponse. +type GithubTokenListResponse struct { + Tokens []GithubToken `json:"tokens"` + Total int `json:"total"` +} + // HealthResponse defines model for HealthResponse. type HealthResponse struct { // Reason Set only when `status` is `unhealthy`. @@ -646,6 +1023,44 @@ type IndexProgressResponse struct { // `completed`/`cancelled`/`failed`/`running` — last-run status from `index_runs`. type IndexProgressResponseStatus string +// Job defines model for Job. +type Job struct { + Attempts int `json:"attempts"` + CompletedAt *time.Time `json:"completed_at,omitempty"` + CreatedAt time.Time `json:"created_at"` + DedupeKey *string `json:"dedupe_key,omitempty"` + Id string `json:"id"` + LastError *string `json:"last_error,omitempty"` + MaxAttempts int `json:"max_attempts"` + + // Payload Raw JSON payload — shape depends on `type`. Render as-is in + // the dashboard; don't assume structure. + Payload *map[string]interface{} `json:"payload,omitempty"` + ScheduledAt time.Time `json:"scheduled_at"` + StartedAt *time.Time `json:"started_at,omitempty"` + Status JobStatus `json:"status"` + Type string `json:"type"` +} + +// JobStatus defines model for Job.Status. +type JobStatus string + +// JobListResponse defines model for JobListResponse. +type JobListResponse struct { + Jobs []Job `json:"jobs"` + Total int `json:"total"` +} + +// LinkExistingProjectRequest defines model for LinkExistingProjectRequest. +type LinkExistingProjectRequest struct { + // ProjectHash The 16-hex `path_hash` of an indexed project — the same value + // used in /api/v1/projects/{path}. The server resolves it to + // the canonical `host_path` and inserts a linked workspace_repo + // row. The project must already be in status='indexed' and have + // a host_path of the form "github.com/owner/repo@branch". + ProjectHash string `json:"project_hash"` +} + // LoginRequest defines model for LoginRequest. type LoginRequest struct { Email openapi_types.Email `json:"email"` @@ -765,6 +1180,26 @@ type ProjectSummary struct { TotalSymbols int `json:"total_symbols"` } +// ProjectWorkspaceEntry defines model for ProjectWorkspaceEntry. +type ProjectWorkspaceEntry struct { + Branch string `json:"branch"` + IsLinked bool `json:"is_linked"` + + // RepoId workspace_repos.id — same value used in /repos endpoints. + RepoId string `json:"repo_id"` + Status ProjectWorkspaceEntryStatus `json:"status"` + WorkspaceId string `json:"workspace_id"` + WorkspaceName string `json:"workspace_name"` +} + +// ProjectWorkspaceEntryStatus defines model for ProjectWorkspaceEntry.Status. +type ProjectWorkspaceEntryStatus string + +// ProjectWorkspaceList defines model for ProjectWorkspaceList. +type ProjectWorkspaceList struct { + Workspaces []ProjectWorkspaceEntry `json:"workspaces"` +} + // ReferenceItem defines model for ReferenceItem. type ReferenceItem struct { ChunkType ReferenceItemChunkType `json:"chunk_type"` @@ -796,6 +1231,15 @@ type ReferenceResponse struct { Total int `json:"total"` } +// ReindexEnqueuedResponse defines model for ReindexEnqueuedResponse. +type ReindexEnqueuedResponse struct { + Repo *WorkspaceRepo `json:"repo,omitempty"` + Status ReindexEnqueuedResponseStatus `json:"status"` +} + +// ReindexEnqueuedResponseStatus defines model for ReindexEnqueuedResponse.Status. +type ReindexEnqueuedResponseStatus string + // RestartAccepted defines model for RestartAccepted. type RestartAccepted struct { // RestartId Opaque ID; future versions may expose per-restart progress under this id. @@ -1012,6 +1456,14 @@ type UpdateUserRequest struct { // when set to `viewer`. type UpdateUserRequestRole string +// UpdateWorkspaceRequest Both fields are optional — omitting a field leaves the existing +// value unchanged. Passing an empty string for `description` clears +// it. `name` must be non-empty when provided. +type UpdateWorkspaceRequest struct { + Description *string `json:"description,omitempty"` + Name *string `json:"name,omitempty"` +} + // User defines model for User. type User struct { CreatedAt time.Time `json:"created_at"` @@ -1075,6 +1527,249 @@ type VersionCheckStatus struct { Error *string `json:"error,omitempty"` } +// WebhookAccepted defines model for WebhookAccepted. +type WebhookAccepted struct { + RepoId *string `json:"repo_id,omitempty"` + Status WebhookAcceptedStatus `json:"status"` +} + +// WebhookAcceptedStatus defines model for WebhookAccepted.Status. +type WebhookAcceptedStatus string + +// WebhookInfoResponse defines model for WebhookInfoResponse. +type WebhookInfoResponse struct { + // AutoRegistered True when the server successfully auto-registered the webhook + // against the GitHub API (auto_webhook=true on create + PAT had + // admin:repo_hook). When false, the operator must register manually. + AutoRegistered bool `json:"auto_registered"` + + // WebhookSecret HMAC secret. Treat as sensitive — rotates on repo recreate. + WebhookSecret string `json:"webhook_secret"` + + // WebhookUrl Full URL to paste into GitHub's webhook config. Empty path-only + // value when CIX_PUBLIC_URL is unset — prepend your tunnel origin. + WebhookUrl string `json:"webhook_url"` +} + +// Workspace defines model for Workspace. +type Workspace struct { + CreatedAt time.Time `json:"created_at"` + + // Description Free-form description. Empty string when absent. + Description string `json:"description"` + + // Id ULID-like opaque identifier. + Id string `json:"id"` + + // Name Unique workspace name. + Name string `json:"name"` + UpdatedAt time.Time `json:"updated_at"` +} + +// WorkspaceListResponse defines model for WorkspaceListResponse. +type WorkspaceListResponse struct { + Total int `json:"total"` + Workspaces []Workspace `json:"workspaces"` +} + +// WorkspaceRepo defines model for WorkspaceRepo. +type WorkspaceRepo struct { + // AutoWebhook Legacy alias for `webhook_mode == "auto"`. Always present so + // old clients keep working; new clients should consult + // `webhook_mode` instead. + AutoWebhook bool `json:"auto_webhook"` + Branch string `json:"branch"` + CreatedAt time.Time `json:"created_at"` + + // GithubUrl Canonical https://github.com/owner/repo URL. + GithubUrl string `json:"github_url"` + Id string `json:"id"` + + // IsLinked True when this row is a lightweight pointer to a project + // already owned by another workspace_repo — added via the + // "Add Existing Project" flow. Linked rows have no clone on + // disk, no webhook, and no token; reindex is a no-op (must + // be triggered from the canonical owning row). + IsLinked bool `json:"is_linked"` + LastError *string `json:"last_error,omitempty"` + LastIndexedAt *time.Time `json:"last_indexed_at,omitempty"` + + // LastSha HEAD SHA at last successful clone. + LastSha *string `json:"last_sha,omitempty"` + + // ProjectPath Indexed project's host_path — "github.com/owner/repo@branch". + // Use this with the existing /api/v1/projects/{path}/* endpoints + // (path = first 16 hex chars of SHA1). + ProjectPath string `json:"project_path"` + Status WorkspaceRepoStatus `json:"status"` + + // TokenId GitHub token used for clone+webhook calls. Null when the + // repo is public. + TokenId *string `json:"token_id,omitempty"` + UpdatedAt time.Time `json:"updated_at"` + + // WebhookMode Operator's intent for how this repo gets kept fresh. `auto` + // asks the server to register the GitHub webhook; `manual` + // means the operator pastes the URL+secret into GitHub + // themselves; `disabled` skips auto-sync entirely — reindex + // via the dashboard button only. + WebhookMode WorkspaceRepoWebhookMode `json:"webhook_mode"` + WorkspaceId string `json:"workspace_id"` +} + +// WorkspaceRepoStatus defines model for WorkspaceRepo.Status. +type WorkspaceRepoStatus string + +// WorkspaceRepoWebhookMode Operator's intent for how this repo gets kept fresh. `auto` +// asks the server to register the GitHub webhook; `manual` +// means the operator pastes the URL+secret into GitHub +// themselves; `disabled` skips auto-sync entirely — reindex +// via the dashboard button only. +type WorkspaceRepoWebhookMode string + +// WorkspaceRepoCreated defines model for WorkspaceRepoCreated. +type WorkspaceRepoCreated struct { + Repo WorkspaceRepo `json:"repo"` + + // WebhookSecret HMAC secret. **Returned once on create + once via + // webhook-info.** Use as the "Secret" field in GitHub's webhook + // UI; deliveries are validated by HMAC-SHA256 over the body. + // Empty string for linked rows (no webhook). + WebhookSecret string `json:"webhook_secret"` + + // WebhookUrl Publicly-reachable POST endpoint to register in GitHub when + // doing the webhook setup manually. Includes the workspace_repo + // id segment. Empty string for linked rows (no webhook). + WebhookUrl string `json:"webhook_url"` +} + +// WorkspaceRepoListResponse defines model for WorkspaceRepoListResponse. +type WorkspaceRepoListResponse struct { + Repos []WorkspaceRepo `json:"repos"` + Total int `json:"total"` +} + +// WorkspaceSearchChunk defines model for WorkspaceSearchChunk. +type WorkspaceSearchChunk struct { + Content string `json:"content"` + EndLine int `json:"end_line"` + FilePath string `json:"file_path"` + Language *string `json:"language,omitempty"` + ProjectPath string `json:"project_path"` + + // Score Raw cosine similarity between the query and this chunk — + // the value chunks are sorted by. No per-project boost is + // applied (a previous revision multiplied this by + // project_score, which let one repo dominate every result + // for short queries like product-name acronyms). + Score float32 `json:"score"` + StartLine int `json:"start_line"` + SymbolName *string `json:"symbol_name,omitempty"` +} + +// WorkspaceSearchFailedRepo defines model for WorkspaceSearchFailedRepo. +type WorkspaceSearchFailedRepo struct { + ProjectPath string `json:"project_path"` + + // Reason Short category for the failure — `vectorstore_error`, + // `timeout`, etc. Intentionally not the raw error message so + // internal details don't leak; check the server logs by + // `workspace_id` for the full error. + Reason string `json:"reason"` +} + +// WorkspaceSearchPendingRepo defines model for WorkspaceSearchPendingRepo. +type WorkspaceSearchPendingRepo struct { + ProjectPath string `json:"project_path"` + + // Status Current row state in `workspace_repos.status`. Anything + // other than `indexed` means the repo hasn't contributed to + // this response. + Status WorkspaceSearchPendingRepoStatus `json:"status"` +} + +// WorkspaceSearchPendingRepoStatus Current row state in `workspace_repos.status`. Anything +// other than `indexed` means the repo hasn't contributed to +// this response. +type WorkspaceSearchPendingRepoStatus string + +// WorkspaceSearchProject defines model for WorkspaceSearchProject. +type WorkspaceSearchProject struct { + // Bm25Score Mean of the top-N raw BM25 scores in this project (sign + // flipped from SQLite's bm25() so positive = better). + // Surfaced so the dashboard can show "this repo surfaced + // on literal token overlap" vs. "pure semantic similarity". + Bm25Score float32 `json:"bm25_score"` + + // DenseScore Mean of the top-N raw cosine similarities in this + // project. Together with `bm25_score`, the two raw signals + // that feed into `project_score`. + DenseScore float32 `json:"dense_score"` + + // Label Short human-readable label derived from the project_path's + // last segment (e.g. "owner/repo@main" → "repo@main"). + Label string `json:"label"` + + // NumHits Chunks from this project that survived the per-project + // chunk cap and made it into the global chunks list. + NumHits int `json:"num_hits"` + ProjectPath string `json:"project_path"` + + // ProjectScore Hybrid candidacy in [0,1] — the α-blend of per-query + // min-max normalized BM25 and dense signals (α=0.5) the + // project-relevance gate ranks by. The "Top projects" + // panel sorts by this value. + ProjectScore float32 `json:"project_score"` +} + +// WorkspaceSearchResponse defines model for WorkspaceSearchResponse. +type WorkspaceSearchResponse struct { + Chunks []WorkspaceSearchChunk `json:"chunks"` + + // FailedRepos Repos whose per-project vector search returned an error + // during this request (e.g. corrupt collection on disk). The + // rest of the workspace is still aggregated; surface this so + // the operator knows the result set is incomplete. + FailedRepos *[]WorkspaceSearchFailedRepo `json:"failed_repos,omitempty"` + + // PendingRepos Repos that belong to the workspace but weren't queryable + // yet — clone or index hasn't completed (or the last attempt + // failed). Their matches will appear once they reach + // `status=indexed`. Empty if every repo is ready. + PendingRepos *[]WorkspaceSearchPendingRepo `json:"pending_repos,omitempty"` + + // Projects Top projects ranked by `project_score`. Surfaces which repos + // in the workspace the query is most relevant to, independent + // of which individual chunks rank highest globally. + Projects []WorkspaceSearchProject `json:"projects"` + + // StaleFtsRepos Repos that were indexed before the BM25 mirror + // (`chunks_fts`) was added: they're queryable via dense + // search but the sparse half of the hybrid is empty, which + // collapses the algorithm to pure-dense fan-out for these + // entries. Trigger a reindex on each to backfill the FTS + // side. Empty once every workspace repo has been reindexed + // under the new schema. + StaleFtsRepos *[]WorkspaceSearchStaleFTSRepo `json:"stale_fts_repos,omitempty"` + + // Status `ok` — results follow. `empty` — workspace queried fine but + // nothing cleared the `min_score` floor. `partial_failure` — + // no chunks returned but at least one repo errored out during + // the fan-out (see `failed_repos`). + Status WorkspaceSearchResponseStatus `json:"status"` +} + +// WorkspaceSearchResponseStatus `ok` — results follow. `empty` — workspace queried fine but +// nothing cleared the `min_score` floor. `partial_failure` — +// no chunks returned but at least one repo errored out during +// the fan-out (see `failed_repos`). +type WorkspaceSearchResponseStatus string + +// WorkspaceSearchStaleFTSRepo defines model for WorkspaceSearchStaleFTSRepo. +type WorkspaceSearchStaleFTSRepo struct { + ProjectPath string `json:"project_path"` +} + // ProjectHash defines model for ProjectHash. type ProjectHash = string @@ -1099,6 +1794,9 @@ type Unauthorized = Error // Unprocessable defines model for Unprocessable. type Unprocessable = Error +// WorkspacesDisabled defines model for WorkspacesDisabled. +type WorkspacesDisabled = Error + // bearerAuthContextKey is the context key for bearerAuth security scheme type bearerAuthContextKey string @@ -1112,6 +1810,32 @@ type ListApiKeysParams struct { // ListApiKeysParamsOwner defines parameters for ListApiKeys. type ListApiKeysParamsOwner string +// ListTokenReposParams defines parameters for ListTokenRepos. +type ListTokenReposParams struct { + // Q Optional case-insensitive substring filter on full_name. + Q *string `form:"q,omitempty" json:"q,omitempty"` + + // Account Optional account login to scope the listing to. When set, + // `account_type` must also be set. + Account *string `form:"account,omitempty" json:"account,omitempty"` + + // AccountType Required when `account` is set; ignored otherwise. + AccountType *ListTokenReposParamsAccountType `form:"account_type,omitempty" json:"account_type,omitempty"` +} + +// ListTokenReposParamsAccountType defines parameters for ListTokenRepos. +type ListTokenReposParamsAccountType string + +// ListJobsParams defines parameters for ListJobs. +type ListJobsParams struct { + Status *ListJobsParamsStatus `form:"status,omitempty" json:"status,omitempty"` + Type *string `form:"type,omitempty" json:"type,omitempty"` + Limit *int `form:"limit,omitempty" json:"limit,omitempty"` +} + +// ListJobsParamsStatus defines parameters for ListJobs. +type ListJobsParamsStatus string + // IndexFilesParams defines parameters for IndexFiles. type IndexFilesParams struct { // Accept `application/x-ndjson` switches to a streamed response @@ -1122,6 +1846,31 @@ type IndexFilesParams struct { // IndexFilesParamsAccept defines parameters for IndexFiles. type IndexFilesParamsAccept string +// ReceiveGithubWebhookJSONBody defines parameters for ReceiveGithubWebhook. +type ReceiveGithubWebhookJSONBody map[string]interface{} + +// ReceiveGithubWebhookParams defines parameters for ReceiveGithubWebhook. +type ReceiveGithubWebhookParams struct { + // XHubSignature256 HMAC-SHA256 over the body, hex-encoded with sha256= prefix. + XHubSignature256 *string `json:"X-Hub-Signature-256,omitempty"` + XGitHubEvent *string `json:"X-GitHub-Event,omitempty"` +} + +// WorkspaceSearchParams defines parameters for WorkspaceSearch. +type WorkspaceSearchParams struct { + Q string `form:"q" json:"q"` + TopProjects *int `form:"top_projects,omitempty" json:"top_projects,omitempty"` + TopChunks *int `form:"top_chunks,omitempty" json:"top_chunks,omitempty"` + + // MinScore Floor on raw cosine similarity. Chunks below this are + // dropped before aggregation. Default 0 — relies on + // chromem's natural ordering. Set higher (e.g. 0.3) to cut + // noise when querying long natural-language sentences; + // keep at 0 for short tokens / acronyms where embedding + // magnitudes are inherently smaller. + MinScore *float32 `form:"min_score,omitempty" json:"min_score,omitempty"` +} + // PutRuntimeConfigJSONRequestBody defines body for PutRuntimeConfig for application/json ContentType. type PutRuntimeConfigJSONRequestBody = RuntimeConfigUpdate @@ -1140,6 +1889,9 @@ type ChangePasswordJSONRequestBody = ChangePasswordRequest // LoginJSONRequestBody defines body for Login for application/json ContentType. type LoginJSONRequestBody = LoginRequest +// CreateGithubTokenJSONRequestBody defines body for CreateGithubToken for application/json ContentType. +type CreateGithubTokenJSONRequestBody = CreateGithubTokenRequest + // CreateProjectJSONRequestBody defines body for CreateProject for application/json ContentType. type CreateProjectJSONRequestBody = CreateProjectRequest @@ -1170,6 +1922,21 @@ type SearchReferencesJSONRequestBody = ReferenceRequest // SearchSymbolsJSONRequestBody defines body for SearchSymbols for application/json ContentType. type SearchSymbolsJSONRequestBody = SymbolSearchRequest +// ReceiveGithubWebhookJSONRequestBody defines body for ReceiveGithubWebhook for application/json ContentType. +type ReceiveGithubWebhookJSONRequestBody ReceiveGithubWebhookJSONBody + +// CreateWorkspaceJSONRequestBody defines body for CreateWorkspace for application/json ContentType. +type CreateWorkspaceJSONRequestBody = CreateWorkspaceRequest + +// UpdateWorkspaceJSONRequestBody defines body for UpdateWorkspace for application/json ContentType. +type UpdateWorkspaceJSONRequestBody = UpdateWorkspaceRequest + +// AddWorkspaceRepoJSONRequestBody defines body for AddWorkspaceRepo for application/json ContentType. +type AddWorkspaceRepoJSONRequestBody = AddWorkspaceRepoRequest + +// LinkExistingProjectJSONRequestBody defines body for LinkExistingProject for application/json ContentType. +type LinkExistingProjectJSONRequestBody = LinkExistingProjectRequest + // ServerInterface represents all server handlers. type ServerInterface interface { // List GGUF model files cached on disk (admin only) @@ -1229,6 +1996,24 @@ type ServerInterface interface { // End one of my sessions (sign out a single device) // (DELETE /api/v1/auth/sessions/{id}) DeleteMySession(w http.ResponseWriter, r *http.Request, id string) + // List stored GitHub PATs (metadata only) + // (GET /api/v1/github-tokens) + ListGithubTokens(w http.ResponseWriter, r *http.Request) + // Store a new GitHub PAT (encrypted-at-rest) + // (POST /api/v1/github-tokens) + CreateGithubToken(w http.ResponseWriter, r *http.Request) + // Delete a stored GitHub PAT + // (DELETE /api/v1/github-tokens/{id}) + DeleteGithubToken(w http.ResponseWriter, r *http.Request, id string) + // List the GitHub accounts visible to a stored PAT + // (GET /api/v1/github-tokens/{id}/accounts) + ListTokenAccounts(w http.ResponseWriter, r *http.Request, id string) + // List GitHub repositories visible to a stored PAT + // (GET /api/v1/github-tokens/{id}/repos) + ListTokenRepos(w http.ResponseWriter, r *http.Request, id string, params ListTokenReposParams) + // List background jobs (status / type filter) + // (GET /api/v1/jobs) + ListJobs(w http.ResponseWriter, r *http.Request, params ListJobsParams) // List all registered projects // (GET /api/v1/projects) ListProjects(w http.ResponseWriter, r *http.Request) @@ -1277,9 +2062,51 @@ type ServerInterface interface { // Project overview (top dirs, recent symbols, totals) // (GET /api/v1/projects/{path}/summary) GetProjectSummary(w http.ResponseWriter, r *http.Request, path ProjectHash) + // List workspaces that contain this project + // (GET /api/v1/projects/{path}/workspaces) + ListProjectWorkspaces(w http.ResponseWriter, r *http.Request, path ProjectHash) // Server / sidecar status (authenticated) // (GET /api/v1/status) GetStatus(w http.ResponseWriter, r *http.Request) + // Receive a GitHub webhook delivery (public, HMAC-authenticated) + // (POST /api/v1/webhooks/github/{repo_id}) + ReceiveGithubWebhook(w http.ResponseWriter, r *http.Request, repoId string, params ReceiveGithubWebhookParams) + // List all workspaces + // (GET /api/v1/workspaces) + ListWorkspaces(w http.ResponseWriter, r *http.Request) + // Create a new workspace + // (POST /api/v1/workspaces) + CreateWorkspace(w http.ResponseWriter, r *http.Request) + // Delete a workspace + // (DELETE /api/v1/workspaces/{id}) + DeleteWorkspace(w http.ResponseWriter, r *http.Request, id string) + // Get a single workspace + // (GET /api/v1/workspaces/{id}) + GetWorkspace(w http.ResponseWriter, r *http.Request, id string) + // Update workspace metadata + // (PATCH /api/v1/workspaces/{id}) + UpdateWorkspace(w http.ResponseWriter, r *http.Request, id string) + // List repositories attached to a workspace + // (GET /api/v1/workspaces/{id}/repos) + ListWorkspaceRepos(w http.ResponseWriter, r *http.Request, id string) + // Attach a GitHub repository to a workspace + // (POST /api/v1/workspaces/{id}/repos) + AddWorkspaceRepo(w http.ResponseWriter, r *http.Request, id string) + // Attach an already-indexed project to a workspace + // (POST /api/v1/workspaces/{id}/repos/link) + LinkExistingProject(w http.ResponseWriter, r *http.Request, id string) + // Detach a repository from a workspace + // (DELETE /api/v1/workspaces/{id}/repos/{repo_id}) + DeleteWorkspaceRepo(w http.ResponseWriter, r *http.Request, id string, repoId string) + // Manually re-trigger the clone + index pipeline + // (POST /api/v1/workspaces/{id}/repos/{repo_id}/reindex) + ReindexWorkspaceRepo(w http.ResponseWriter, r *http.Request, id string, repoId string) + // Get the webhook URL + secret for manual GitHub setup + // (GET /api/v1/workspaces/{id}/repos/{repo_id}/webhook-info) + GetWorkspaceRepoWebhookInfo(w http.ResponseWriter, r *http.Request, id string, repoId string) + // Hybrid BM25+dense search across all repos in a workspace + // (GET /api/v1/workspaces/{id}/search) + WorkspaceSearch(w http.ResponseWriter, r *http.Request, id string, params WorkspaceSearchParams) // Liveness probe (public) // (GET /health) GetHealth(w http.ResponseWriter, r *http.Request) @@ -1403,6 +2230,42 @@ func (_ Unimplemented) DeleteMySession(w http.ResponseWriter, r *http.Request, i w.WriteHeader(http.StatusNotImplemented) } +// List stored GitHub PATs (metadata only) +// (GET /api/v1/github-tokens) +func (_ Unimplemented) ListGithubTokens(w http.ResponseWriter, r *http.Request) { + w.WriteHeader(http.StatusNotImplemented) +} + +// Store a new GitHub PAT (encrypted-at-rest) +// (POST /api/v1/github-tokens) +func (_ Unimplemented) CreateGithubToken(w http.ResponseWriter, r *http.Request) { + w.WriteHeader(http.StatusNotImplemented) +} + +// Delete a stored GitHub PAT +// (DELETE /api/v1/github-tokens/{id}) +func (_ Unimplemented) DeleteGithubToken(w http.ResponseWriter, r *http.Request, id string) { + w.WriteHeader(http.StatusNotImplemented) +} + +// List the GitHub accounts visible to a stored PAT +// (GET /api/v1/github-tokens/{id}/accounts) +func (_ Unimplemented) ListTokenAccounts(w http.ResponseWriter, r *http.Request, id string) { + w.WriteHeader(http.StatusNotImplemented) +} + +// List GitHub repositories visible to a stored PAT +// (GET /api/v1/github-tokens/{id}/repos) +func (_ Unimplemented) ListTokenRepos(w http.ResponseWriter, r *http.Request, id string, params ListTokenReposParams) { + w.WriteHeader(http.StatusNotImplemented) +} + +// List background jobs (status / type filter) +// (GET /api/v1/jobs) +func (_ Unimplemented) ListJobs(w http.ResponseWriter, r *http.Request, params ListJobsParams) { + w.WriteHeader(http.StatusNotImplemented) +} + // List all registered projects // (GET /api/v1/projects) func (_ Unimplemented) ListProjects(w http.ResponseWriter, r *http.Request) { @@ -1499,12 +2362,96 @@ func (_ Unimplemented) GetProjectSummary(w http.ResponseWriter, r *http.Request, w.WriteHeader(http.StatusNotImplemented) } -// Server / sidecar status (authenticated) -// (GET /api/v1/status) +// List workspaces that contain this project +// (GET /api/v1/projects/{path}/workspaces) +func (_ Unimplemented) ListProjectWorkspaces(w http.ResponseWriter, r *http.Request, path ProjectHash) { + w.WriteHeader(http.StatusNotImplemented) +} + +// Server / sidecar status (authenticated) +// (GET /api/v1/status) func (_ Unimplemented) GetStatus(w http.ResponseWriter, r *http.Request) { w.WriteHeader(http.StatusNotImplemented) } +// Receive a GitHub webhook delivery (public, HMAC-authenticated) +// (POST /api/v1/webhooks/github/{repo_id}) +func (_ Unimplemented) ReceiveGithubWebhook(w http.ResponseWriter, r *http.Request, repoId string, params ReceiveGithubWebhookParams) { + w.WriteHeader(http.StatusNotImplemented) +} + +// List all workspaces +// (GET /api/v1/workspaces) +func (_ Unimplemented) ListWorkspaces(w http.ResponseWriter, r *http.Request) { + w.WriteHeader(http.StatusNotImplemented) +} + +// Create a new workspace +// (POST /api/v1/workspaces) +func (_ Unimplemented) CreateWorkspace(w http.ResponseWriter, r *http.Request) { + w.WriteHeader(http.StatusNotImplemented) +} + +// Delete a workspace +// (DELETE /api/v1/workspaces/{id}) +func (_ Unimplemented) DeleteWorkspace(w http.ResponseWriter, r *http.Request, id string) { + w.WriteHeader(http.StatusNotImplemented) +} + +// Get a single workspace +// (GET /api/v1/workspaces/{id}) +func (_ Unimplemented) GetWorkspace(w http.ResponseWriter, r *http.Request, id string) { + w.WriteHeader(http.StatusNotImplemented) +} + +// Update workspace metadata +// (PATCH /api/v1/workspaces/{id}) +func (_ Unimplemented) UpdateWorkspace(w http.ResponseWriter, r *http.Request, id string) { + w.WriteHeader(http.StatusNotImplemented) +} + +// List repositories attached to a workspace +// (GET /api/v1/workspaces/{id}/repos) +func (_ Unimplemented) ListWorkspaceRepos(w http.ResponseWriter, r *http.Request, id string) { + w.WriteHeader(http.StatusNotImplemented) +} + +// Attach a GitHub repository to a workspace +// (POST /api/v1/workspaces/{id}/repos) +func (_ Unimplemented) AddWorkspaceRepo(w http.ResponseWriter, r *http.Request, id string) { + w.WriteHeader(http.StatusNotImplemented) +} + +// Attach an already-indexed project to a workspace +// (POST /api/v1/workspaces/{id}/repos/link) +func (_ Unimplemented) LinkExistingProject(w http.ResponseWriter, r *http.Request, id string) { + w.WriteHeader(http.StatusNotImplemented) +} + +// Detach a repository from a workspace +// (DELETE /api/v1/workspaces/{id}/repos/{repo_id}) +func (_ Unimplemented) DeleteWorkspaceRepo(w http.ResponseWriter, r *http.Request, id string, repoId string) { + w.WriteHeader(http.StatusNotImplemented) +} + +// Manually re-trigger the clone + index pipeline +// (POST /api/v1/workspaces/{id}/repos/{repo_id}/reindex) +func (_ Unimplemented) ReindexWorkspaceRepo(w http.ResponseWriter, r *http.Request, id string, repoId string) { + w.WriteHeader(http.StatusNotImplemented) +} + +// Get the webhook URL + secret for manual GitHub setup +// (GET /api/v1/workspaces/{id}/repos/{repo_id}/webhook-info) +func (_ Unimplemented) GetWorkspaceRepoWebhookInfo(w http.ResponseWriter, r *http.Request, id string, repoId string) { + w.WriteHeader(http.StatusNotImplemented) +} + +// Hybrid BM25+dense search across all repos in a workspace +// (GET /api/v1/workspaces/{id}/search) +func (_ Unimplemented) WorkspaceSearch(w http.ResponseWriter, r *http.Request, id string, params WorkspaceSearchParams) { + w.WriteHeader(http.StatusNotImplemented) +} + // Liveness probe (public) // (GET /health) func (_ Unimplemented) GetHealth(w http.ResponseWriter, r *http.Request) { @@ -1955,6 +2902,249 @@ func (siw *ServerInterfaceWrapper) DeleteMySession(w http.ResponseWriter, r *htt handler.ServeHTTP(w, r) } +// ListGithubTokens operation middleware +func (siw *ServerInterfaceWrapper) ListGithubTokens(w http.ResponseWriter, r *http.Request) { + + ctx := r.Context() + + ctx = context.WithValue(ctx, BearerAuthScopes, []string{}) + + r = r.WithContext(ctx) + + handler := http.Handler(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + siw.Handler.ListGithubTokens(w, r) + })) + + for _, middleware := range siw.HandlerMiddlewares { + handler = middleware(handler) + } + + handler.ServeHTTP(w, r) +} + +// CreateGithubToken operation middleware +func (siw *ServerInterfaceWrapper) CreateGithubToken(w http.ResponseWriter, r *http.Request) { + + ctx := r.Context() + + ctx = context.WithValue(ctx, BearerAuthScopes, []string{}) + + r = r.WithContext(ctx) + + handler := http.Handler(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + siw.Handler.CreateGithubToken(w, r) + })) + + for _, middleware := range siw.HandlerMiddlewares { + handler = middleware(handler) + } + + handler.ServeHTTP(w, r) +} + +// DeleteGithubToken operation middleware +func (siw *ServerInterfaceWrapper) DeleteGithubToken(w http.ResponseWriter, r *http.Request) { + + var err error + _ = err + + // ------------- Path parameter "id" ------------- + var id string + + err = runtime.BindStyledParameterWithOptions("simple", "id", chi.URLParam(r, "id"), &id, runtime.BindStyledParameterOptions{ParamLocation: runtime.ParamLocationPath, Explode: false, Required: true, Type: "string", Format: ""}) + if err != nil { + siw.ErrorHandlerFunc(w, r, &InvalidParamFormatError{ParamName: "id", Err: err}) + return + } + + ctx := r.Context() + + ctx = context.WithValue(ctx, BearerAuthScopes, []string{}) + + r = r.WithContext(ctx) + + handler := http.Handler(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + siw.Handler.DeleteGithubToken(w, r, id) + })) + + for _, middleware := range siw.HandlerMiddlewares { + handler = middleware(handler) + } + + handler.ServeHTTP(w, r) +} + +// ListTokenAccounts operation middleware +func (siw *ServerInterfaceWrapper) ListTokenAccounts(w http.ResponseWriter, r *http.Request) { + + var err error + _ = err + + // ------------- Path parameter "id" ------------- + var id string + + err = runtime.BindStyledParameterWithOptions("simple", "id", chi.URLParam(r, "id"), &id, runtime.BindStyledParameterOptions{ParamLocation: runtime.ParamLocationPath, Explode: false, Required: true, Type: "string", Format: ""}) + if err != nil { + siw.ErrorHandlerFunc(w, r, &InvalidParamFormatError{ParamName: "id", Err: err}) + return + } + + ctx := r.Context() + + ctx = context.WithValue(ctx, BearerAuthScopes, []string{}) + + r = r.WithContext(ctx) + + handler := http.Handler(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + siw.Handler.ListTokenAccounts(w, r, id) + })) + + for _, middleware := range siw.HandlerMiddlewares { + handler = middleware(handler) + } + + handler.ServeHTTP(w, r) +} + +// ListTokenRepos operation middleware +func (siw *ServerInterfaceWrapper) ListTokenRepos(w http.ResponseWriter, r *http.Request) { + + var err error + _ = err + + // ------------- Path parameter "id" ------------- + var id string + + err = runtime.BindStyledParameterWithOptions("simple", "id", chi.URLParam(r, "id"), &id, runtime.BindStyledParameterOptions{ParamLocation: runtime.ParamLocationPath, Explode: false, Required: true, Type: "string", Format: ""}) + if err != nil { + siw.ErrorHandlerFunc(w, r, &InvalidParamFormatError{ParamName: "id", Err: err}) + return + } + + ctx := r.Context() + + ctx = context.WithValue(ctx, BearerAuthScopes, []string{}) + + r = r.WithContext(ctx) + + // Parameter object where we will unmarshal all parameters from the context + var params ListTokenReposParams + + // ------------- Optional query parameter "q" ------------- + + err = runtime.BindQueryParameterWithOptions("form", true, false, "q", r.URL.Query(), ¶ms.Q, runtime.BindQueryParameterOptions{Type: "string", Format: ""}) + if err != nil { + var requiredError *runtime.RequiredParameterError + if errors.As(err, &requiredError) { + siw.ErrorHandlerFunc(w, r, &RequiredParamError{ParamName: "q"}) + } else { + siw.ErrorHandlerFunc(w, r, &InvalidParamFormatError{ParamName: "q", Err: err}) + } + return + } + + // ------------- Optional query parameter "account" ------------- + + err = runtime.BindQueryParameterWithOptions("form", true, false, "account", r.URL.Query(), ¶ms.Account, runtime.BindQueryParameterOptions{Type: "string", Format: ""}) + if err != nil { + var requiredError *runtime.RequiredParameterError + if errors.As(err, &requiredError) { + siw.ErrorHandlerFunc(w, r, &RequiredParamError{ParamName: "account"}) + } else { + siw.ErrorHandlerFunc(w, r, &InvalidParamFormatError{ParamName: "account", Err: err}) + } + return + } + + // ------------- Optional query parameter "account_type" ------------- + + err = runtime.BindQueryParameterWithOptions("form", true, false, "account_type", r.URL.Query(), ¶ms.AccountType, runtime.BindQueryParameterOptions{Type: "string", Format: ""}) + if err != nil { + var requiredError *runtime.RequiredParameterError + if errors.As(err, &requiredError) { + siw.ErrorHandlerFunc(w, r, &RequiredParamError{ParamName: "account_type"}) + } else { + siw.ErrorHandlerFunc(w, r, &InvalidParamFormatError{ParamName: "account_type", Err: err}) + } + return + } + + handler := http.Handler(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + siw.Handler.ListTokenRepos(w, r, id, params) + })) + + for _, middleware := range siw.HandlerMiddlewares { + handler = middleware(handler) + } + + handler.ServeHTTP(w, r) +} + +// ListJobs operation middleware +func (siw *ServerInterfaceWrapper) ListJobs(w http.ResponseWriter, r *http.Request) { + + var err error + _ = err + + ctx := r.Context() + + ctx = context.WithValue(ctx, BearerAuthScopes, []string{}) + + r = r.WithContext(ctx) + + // Parameter object where we will unmarshal all parameters from the context + var params ListJobsParams + + // ------------- Optional query parameter "status" ------------- + + err = runtime.BindQueryParameterWithOptions("form", true, false, "status", r.URL.Query(), ¶ms.Status, runtime.BindQueryParameterOptions{Type: "string", Format: ""}) + if err != nil { + var requiredError *runtime.RequiredParameterError + if errors.As(err, &requiredError) { + siw.ErrorHandlerFunc(w, r, &RequiredParamError{ParamName: "status"}) + } else { + siw.ErrorHandlerFunc(w, r, &InvalidParamFormatError{ParamName: "status", Err: err}) + } + return + } + + // ------------- Optional query parameter "type" ------------- + + err = runtime.BindQueryParameterWithOptions("form", true, false, "type", r.URL.Query(), ¶ms.Type, runtime.BindQueryParameterOptions{Type: "string", Format: ""}) + if err != nil { + var requiredError *runtime.RequiredParameterError + if errors.As(err, &requiredError) { + siw.ErrorHandlerFunc(w, r, &RequiredParamError{ParamName: "type"}) + } else { + siw.ErrorHandlerFunc(w, r, &InvalidParamFormatError{ParamName: "type", Err: err}) + } + return + } + + // ------------- Optional query parameter "limit" ------------- + + err = runtime.BindQueryParameterWithOptions("form", true, false, "limit", r.URL.Query(), ¶ms.Limit, runtime.BindQueryParameterOptions{Type: "integer", Format: ""}) + if err != nil { + var requiredError *runtime.RequiredParameterError + if errors.As(err, &requiredError) { + siw.ErrorHandlerFunc(w, r, &RequiredParamError{ParamName: "limit"}) + } else { + siw.ErrorHandlerFunc(w, r, &InvalidParamFormatError{ParamName: "limit", Err: err}) + } + return + } + + handler := http.Handler(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + siw.Handler.ListJobs(w, r, params) + })) + + for _, middleware := range siw.HandlerMiddlewares { + handler = middleware(handler) + } + + handler.ServeHTTP(w, r) +} + // ListProjects operation middleware func (siw *ServerInterfaceWrapper) ListProjects(w http.ResponseWriter, r *http.Request) { @@ -2467,18 +3657,561 @@ func (siw *ServerInterfaceWrapper) GetProjectSummary(w http.ResponseWriter, r *h handler.ServeHTTP(w, r) } -// GetStatus operation middleware -func (siw *ServerInterfaceWrapper) GetStatus(w http.ResponseWriter, r *http.Request) { - - ctx := r.Context() +// ListProjectWorkspaces operation middleware +func (siw *ServerInterfaceWrapper) ListProjectWorkspaces(w http.ResponseWriter, r *http.Request) { - ctx = context.WithValue(ctx, BearerAuthScopes, []string{}) + var err error + _ = err - r = r.WithContext(ctx) + // ------------- Path parameter "path" ------------- + var path ProjectHash - handler := http.Handler(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { - siw.Handler.GetStatus(w, r) - })) + err = runtime.BindStyledParameterWithOptions("simple", "path", chi.URLParam(r, "path"), &path, runtime.BindStyledParameterOptions{ParamLocation: runtime.ParamLocationPath, Explode: false, Required: true, Type: "string", Format: ""}) + if err != nil { + siw.ErrorHandlerFunc(w, r, &InvalidParamFormatError{ParamName: "path", Err: err}) + return + } + + ctx := r.Context() + + ctx = context.WithValue(ctx, BearerAuthScopes, []string{}) + + r = r.WithContext(ctx) + + handler := http.Handler(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + siw.Handler.ListProjectWorkspaces(w, r, path) + })) + + for _, middleware := range siw.HandlerMiddlewares { + handler = middleware(handler) + } + + handler.ServeHTTP(w, r) +} + +// GetStatus operation middleware +func (siw *ServerInterfaceWrapper) GetStatus(w http.ResponseWriter, r *http.Request) { + + ctx := r.Context() + + ctx = context.WithValue(ctx, BearerAuthScopes, []string{}) + + r = r.WithContext(ctx) + + handler := http.Handler(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + siw.Handler.GetStatus(w, r) + })) + + for _, middleware := range siw.HandlerMiddlewares { + handler = middleware(handler) + } + + handler.ServeHTTP(w, r) +} + +// ReceiveGithubWebhook operation middleware +func (siw *ServerInterfaceWrapper) ReceiveGithubWebhook(w http.ResponseWriter, r *http.Request) { + + var err error + _ = err + + // ------------- Path parameter "repo_id" ------------- + var repoId string + + err = runtime.BindStyledParameterWithOptions("simple", "repo_id", chi.URLParam(r, "repo_id"), &repoId, runtime.BindStyledParameterOptions{ParamLocation: runtime.ParamLocationPath, Explode: false, Required: true, Type: "string", Format: ""}) + if err != nil { + siw.ErrorHandlerFunc(w, r, &InvalidParamFormatError{ParamName: "repo_id", Err: err}) + return + } + + // Parameter object where we will unmarshal all parameters from the context + var params ReceiveGithubWebhookParams + + headers := r.Header + + // ------------- Optional header parameter "X-Hub-Signature-256" ------------- + if valueList, found := headers[http.CanonicalHeaderKey("X-Hub-Signature-256")]; found { + var XHubSignature256 string + n := len(valueList) + if n != 1 { + siw.ErrorHandlerFunc(w, r, &TooManyValuesForParamError{ParamName: "X-Hub-Signature-256", Count: n}) + return + } + + err = runtime.BindStyledParameterWithOptions("simple", "X-Hub-Signature-256", valueList[0], &XHubSignature256, runtime.BindStyledParameterOptions{ParamLocation: runtime.ParamLocationHeader, Explode: false, Required: false, Type: "string", Format: ""}) + if err != nil { + siw.ErrorHandlerFunc(w, r, &InvalidParamFormatError{ParamName: "X-Hub-Signature-256", Err: err}) + return + } + + params.XHubSignature256 = &XHubSignature256 + + } + + // ------------- Optional header parameter "X-GitHub-Event" ------------- + if valueList, found := headers[http.CanonicalHeaderKey("X-GitHub-Event")]; found { + var XGitHubEvent string + n := len(valueList) + if n != 1 { + siw.ErrorHandlerFunc(w, r, &TooManyValuesForParamError{ParamName: "X-GitHub-Event", Count: n}) + return + } + + err = runtime.BindStyledParameterWithOptions("simple", "X-GitHub-Event", valueList[0], &XGitHubEvent, runtime.BindStyledParameterOptions{ParamLocation: runtime.ParamLocationHeader, Explode: false, Required: false, Type: "string", Format: ""}) + if err != nil { + siw.ErrorHandlerFunc(w, r, &InvalidParamFormatError{ParamName: "X-GitHub-Event", Err: err}) + return + } + + params.XGitHubEvent = &XGitHubEvent + + } + + handler := http.Handler(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + siw.Handler.ReceiveGithubWebhook(w, r, repoId, params) + })) + + for _, middleware := range siw.HandlerMiddlewares { + handler = middleware(handler) + } + + handler.ServeHTTP(w, r) +} + +// ListWorkspaces operation middleware +func (siw *ServerInterfaceWrapper) ListWorkspaces(w http.ResponseWriter, r *http.Request) { + + ctx := r.Context() + + ctx = context.WithValue(ctx, BearerAuthScopes, []string{}) + + r = r.WithContext(ctx) + + handler := http.Handler(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + siw.Handler.ListWorkspaces(w, r) + })) + + for _, middleware := range siw.HandlerMiddlewares { + handler = middleware(handler) + } + + handler.ServeHTTP(w, r) +} + +// CreateWorkspace operation middleware +func (siw *ServerInterfaceWrapper) CreateWorkspace(w http.ResponseWriter, r *http.Request) { + + ctx := r.Context() + + ctx = context.WithValue(ctx, BearerAuthScopes, []string{}) + + r = r.WithContext(ctx) + + handler := http.Handler(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + siw.Handler.CreateWorkspace(w, r) + })) + + for _, middleware := range siw.HandlerMiddlewares { + handler = middleware(handler) + } + + handler.ServeHTTP(w, r) +} + +// DeleteWorkspace operation middleware +func (siw *ServerInterfaceWrapper) DeleteWorkspace(w http.ResponseWriter, r *http.Request) { + + var err error + _ = err + + // ------------- Path parameter "id" ------------- + var id string + + err = runtime.BindStyledParameterWithOptions("simple", "id", chi.URLParam(r, "id"), &id, runtime.BindStyledParameterOptions{ParamLocation: runtime.ParamLocationPath, Explode: false, Required: true, Type: "string", Format: ""}) + if err != nil { + siw.ErrorHandlerFunc(w, r, &InvalidParamFormatError{ParamName: "id", Err: err}) + return + } + + ctx := r.Context() + + ctx = context.WithValue(ctx, BearerAuthScopes, []string{}) + + r = r.WithContext(ctx) + + handler := http.Handler(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + siw.Handler.DeleteWorkspace(w, r, id) + })) + + for _, middleware := range siw.HandlerMiddlewares { + handler = middleware(handler) + } + + handler.ServeHTTP(w, r) +} + +// GetWorkspace operation middleware +func (siw *ServerInterfaceWrapper) GetWorkspace(w http.ResponseWriter, r *http.Request) { + + var err error + _ = err + + // ------------- Path parameter "id" ------------- + var id string + + err = runtime.BindStyledParameterWithOptions("simple", "id", chi.URLParam(r, "id"), &id, runtime.BindStyledParameterOptions{ParamLocation: runtime.ParamLocationPath, Explode: false, Required: true, Type: "string", Format: ""}) + if err != nil { + siw.ErrorHandlerFunc(w, r, &InvalidParamFormatError{ParamName: "id", Err: err}) + return + } + + ctx := r.Context() + + ctx = context.WithValue(ctx, BearerAuthScopes, []string{}) + + r = r.WithContext(ctx) + + handler := http.Handler(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + siw.Handler.GetWorkspace(w, r, id) + })) + + for _, middleware := range siw.HandlerMiddlewares { + handler = middleware(handler) + } + + handler.ServeHTTP(w, r) +} + +// UpdateWorkspace operation middleware +func (siw *ServerInterfaceWrapper) UpdateWorkspace(w http.ResponseWriter, r *http.Request) { + + var err error + _ = err + + // ------------- Path parameter "id" ------------- + var id string + + err = runtime.BindStyledParameterWithOptions("simple", "id", chi.URLParam(r, "id"), &id, runtime.BindStyledParameterOptions{ParamLocation: runtime.ParamLocationPath, Explode: false, Required: true, Type: "string", Format: ""}) + if err != nil { + siw.ErrorHandlerFunc(w, r, &InvalidParamFormatError{ParamName: "id", Err: err}) + return + } + + ctx := r.Context() + + ctx = context.WithValue(ctx, BearerAuthScopes, []string{}) + + r = r.WithContext(ctx) + + handler := http.Handler(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + siw.Handler.UpdateWorkspace(w, r, id) + })) + + for _, middleware := range siw.HandlerMiddlewares { + handler = middleware(handler) + } + + handler.ServeHTTP(w, r) +} + +// ListWorkspaceRepos operation middleware +func (siw *ServerInterfaceWrapper) ListWorkspaceRepos(w http.ResponseWriter, r *http.Request) { + + var err error + _ = err + + // ------------- Path parameter "id" ------------- + var id string + + err = runtime.BindStyledParameterWithOptions("simple", "id", chi.URLParam(r, "id"), &id, runtime.BindStyledParameterOptions{ParamLocation: runtime.ParamLocationPath, Explode: false, Required: true, Type: "string", Format: ""}) + if err != nil { + siw.ErrorHandlerFunc(w, r, &InvalidParamFormatError{ParamName: "id", Err: err}) + return + } + + ctx := r.Context() + + ctx = context.WithValue(ctx, BearerAuthScopes, []string{}) + + r = r.WithContext(ctx) + + handler := http.Handler(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + siw.Handler.ListWorkspaceRepos(w, r, id) + })) + + for _, middleware := range siw.HandlerMiddlewares { + handler = middleware(handler) + } + + handler.ServeHTTP(w, r) +} + +// AddWorkspaceRepo operation middleware +func (siw *ServerInterfaceWrapper) AddWorkspaceRepo(w http.ResponseWriter, r *http.Request) { + + var err error + _ = err + + // ------------- Path parameter "id" ------------- + var id string + + err = runtime.BindStyledParameterWithOptions("simple", "id", chi.URLParam(r, "id"), &id, runtime.BindStyledParameterOptions{ParamLocation: runtime.ParamLocationPath, Explode: false, Required: true, Type: "string", Format: ""}) + if err != nil { + siw.ErrorHandlerFunc(w, r, &InvalidParamFormatError{ParamName: "id", Err: err}) + return + } + + ctx := r.Context() + + ctx = context.WithValue(ctx, BearerAuthScopes, []string{}) + + r = r.WithContext(ctx) + + handler := http.Handler(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + siw.Handler.AddWorkspaceRepo(w, r, id) + })) + + for _, middleware := range siw.HandlerMiddlewares { + handler = middleware(handler) + } + + handler.ServeHTTP(w, r) +} + +// LinkExistingProject operation middleware +func (siw *ServerInterfaceWrapper) LinkExistingProject(w http.ResponseWriter, r *http.Request) { + + var err error + _ = err + + // ------------- Path parameter "id" ------------- + var id string + + err = runtime.BindStyledParameterWithOptions("simple", "id", chi.URLParam(r, "id"), &id, runtime.BindStyledParameterOptions{ParamLocation: runtime.ParamLocationPath, Explode: false, Required: true, Type: "string", Format: ""}) + if err != nil { + siw.ErrorHandlerFunc(w, r, &InvalidParamFormatError{ParamName: "id", Err: err}) + return + } + + ctx := r.Context() + + ctx = context.WithValue(ctx, BearerAuthScopes, []string{}) + + r = r.WithContext(ctx) + + handler := http.Handler(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + siw.Handler.LinkExistingProject(w, r, id) + })) + + for _, middleware := range siw.HandlerMiddlewares { + handler = middleware(handler) + } + + handler.ServeHTTP(w, r) +} + +// DeleteWorkspaceRepo operation middleware +func (siw *ServerInterfaceWrapper) DeleteWorkspaceRepo(w http.ResponseWriter, r *http.Request) { + + var err error + _ = err + + // ------------- Path parameter "id" ------------- + var id string + + err = runtime.BindStyledParameterWithOptions("simple", "id", chi.URLParam(r, "id"), &id, runtime.BindStyledParameterOptions{ParamLocation: runtime.ParamLocationPath, Explode: false, Required: true, Type: "string", Format: ""}) + if err != nil { + siw.ErrorHandlerFunc(w, r, &InvalidParamFormatError{ParamName: "id", Err: err}) + return + } + + // ------------- Path parameter "repo_id" ------------- + var repoId string + + err = runtime.BindStyledParameterWithOptions("simple", "repo_id", chi.URLParam(r, "repo_id"), &repoId, runtime.BindStyledParameterOptions{ParamLocation: runtime.ParamLocationPath, Explode: false, Required: true, Type: "string", Format: ""}) + if err != nil { + siw.ErrorHandlerFunc(w, r, &InvalidParamFormatError{ParamName: "repo_id", Err: err}) + return + } + + ctx := r.Context() + + ctx = context.WithValue(ctx, BearerAuthScopes, []string{}) + + r = r.WithContext(ctx) + + handler := http.Handler(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + siw.Handler.DeleteWorkspaceRepo(w, r, id, repoId) + })) + + for _, middleware := range siw.HandlerMiddlewares { + handler = middleware(handler) + } + + handler.ServeHTTP(w, r) +} + +// ReindexWorkspaceRepo operation middleware +func (siw *ServerInterfaceWrapper) ReindexWorkspaceRepo(w http.ResponseWriter, r *http.Request) { + + var err error + _ = err + + // ------------- Path parameter "id" ------------- + var id string + + err = runtime.BindStyledParameterWithOptions("simple", "id", chi.URLParam(r, "id"), &id, runtime.BindStyledParameterOptions{ParamLocation: runtime.ParamLocationPath, Explode: false, Required: true, Type: "string", Format: ""}) + if err != nil { + siw.ErrorHandlerFunc(w, r, &InvalidParamFormatError{ParamName: "id", Err: err}) + return + } + + // ------------- Path parameter "repo_id" ------------- + var repoId string + + err = runtime.BindStyledParameterWithOptions("simple", "repo_id", chi.URLParam(r, "repo_id"), &repoId, runtime.BindStyledParameterOptions{ParamLocation: runtime.ParamLocationPath, Explode: false, Required: true, Type: "string", Format: ""}) + if err != nil { + siw.ErrorHandlerFunc(w, r, &InvalidParamFormatError{ParamName: "repo_id", Err: err}) + return + } + + ctx := r.Context() + + ctx = context.WithValue(ctx, BearerAuthScopes, []string{}) + + r = r.WithContext(ctx) + + handler := http.Handler(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + siw.Handler.ReindexWorkspaceRepo(w, r, id, repoId) + })) + + for _, middleware := range siw.HandlerMiddlewares { + handler = middleware(handler) + } + + handler.ServeHTTP(w, r) +} + +// GetWorkspaceRepoWebhookInfo operation middleware +func (siw *ServerInterfaceWrapper) GetWorkspaceRepoWebhookInfo(w http.ResponseWriter, r *http.Request) { + + var err error + _ = err + + // ------------- Path parameter "id" ------------- + var id string + + err = runtime.BindStyledParameterWithOptions("simple", "id", chi.URLParam(r, "id"), &id, runtime.BindStyledParameterOptions{ParamLocation: runtime.ParamLocationPath, Explode: false, Required: true, Type: "string", Format: ""}) + if err != nil { + siw.ErrorHandlerFunc(w, r, &InvalidParamFormatError{ParamName: "id", Err: err}) + return + } + + // ------------- Path parameter "repo_id" ------------- + var repoId string + + err = runtime.BindStyledParameterWithOptions("simple", "repo_id", chi.URLParam(r, "repo_id"), &repoId, runtime.BindStyledParameterOptions{ParamLocation: runtime.ParamLocationPath, Explode: false, Required: true, Type: "string", Format: ""}) + if err != nil { + siw.ErrorHandlerFunc(w, r, &InvalidParamFormatError{ParamName: "repo_id", Err: err}) + return + } + + ctx := r.Context() + + ctx = context.WithValue(ctx, BearerAuthScopes, []string{}) + + r = r.WithContext(ctx) + + handler := http.Handler(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + siw.Handler.GetWorkspaceRepoWebhookInfo(w, r, id, repoId) + })) + + for _, middleware := range siw.HandlerMiddlewares { + handler = middleware(handler) + } + + handler.ServeHTTP(w, r) +} + +// WorkspaceSearch operation middleware +func (siw *ServerInterfaceWrapper) WorkspaceSearch(w http.ResponseWriter, r *http.Request) { + + var err error + _ = err + + // ------------- Path parameter "id" ------------- + var id string + + err = runtime.BindStyledParameterWithOptions("simple", "id", chi.URLParam(r, "id"), &id, runtime.BindStyledParameterOptions{ParamLocation: runtime.ParamLocationPath, Explode: false, Required: true, Type: "string", Format: ""}) + if err != nil { + siw.ErrorHandlerFunc(w, r, &InvalidParamFormatError{ParamName: "id", Err: err}) + return + } + + ctx := r.Context() + + ctx = context.WithValue(ctx, BearerAuthScopes, []string{}) + + r = r.WithContext(ctx) + + // Parameter object where we will unmarshal all parameters from the context + var params WorkspaceSearchParams + + // ------------- Required query parameter "q" ------------- + + err = runtime.BindQueryParameterWithOptions("form", true, true, "q", r.URL.Query(), ¶ms.Q, runtime.BindQueryParameterOptions{Type: "string", Format: ""}) + if err != nil { + var requiredError *runtime.RequiredParameterError + if errors.As(err, &requiredError) { + siw.ErrorHandlerFunc(w, r, &RequiredParamError{ParamName: "q"}) + } else { + siw.ErrorHandlerFunc(w, r, &InvalidParamFormatError{ParamName: "q", Err: err}) + } + return + } + + // ------------- Optional query parameter "top_projects" ------------- + + err = runtime.BindQueryParameterWithOptions("form", true, false, "top_projects", r.URL.Query(), ¶ms.TopProjects, runtime.BindQueryParameterOptions{Type: "integer", Format: ""}) + if err != nil { + var requiredError *runtime.RequiredParameterError + if errors.As(err, &requiredError) { + siw.ErrorHandlerFunc(w, r, &RequiredParamError{ParamName: "top_projects"}) + } else { + siw.ErrorHandlerFunc(w, r, &InvalidParamFormatError{ParamName: "top_projects", Err: err}) + } + return + } + + // ------------- Optional query parameter "top_chunks" ------------- + + err = runtime.BindQueryParameterWithOptions("form", true, false, "top_chunks", r.URL.Query(), ¶ms.TopChunks, runtime.BindQueryParameterOptions{Type: "integer", Format: ""}) + if err != nil { + var requiredError *runtime.RequiredParameterError + if errors.As(err, &requiredError) { + siw.ErrorHandlerFunc(w, r, &RequiredParamError{ParamName: "top_chunks"}) + } else { + siw.ErrorHandlerFunc(w, r, &InvalidParamFormatError{ParamName: "top_chunks", Err: err}) + } + return + } + + // ------------- Optional query parameter "min_score" ------------- + + err = runtime.BindQueryParameterWithOptions("form", true, false, "min_score", r.URL.Query(), ¶ms.MinScore, runtime.BindQueryParameterOptions{Type: "number", Format: "float"}) + if err != nil { + var requiredError *runtime.RequiredParameterError + if errors.As(err, &requiredError) { + siw.ErrorHandlerFunc(w, r, &RequiredParamError{ParamName: "min_score"}) + } else { + siw.ErrorHandlerFunc(w, r, &InvalidParamFormatError{ParamName: "min_score", Err: err}) + } + return + } + + handler := http.Handler(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + siw.Handler.WorkspaceSearch(w, r, id, params) + })) for _, middleware := range siw.HandlerMiddlewares { handler = middleware(handler) @@ -2671,6 +4404,24 @@ func HandlerWithOptions(si ServerInterface, options ChiServerOptions) http.Handl r.Group(func(r chi.Router) { r.Delete(options.BaseURL+"/api/v1/auth/sessions/{id}", wrapper.DeleteMySession) }) + r.Group(func(r chi.Router) { + r.Get(options.BaseURL+"/api/v1/github-tokens", wrapper.ListGithubTokens) + }) + r.Group(func(r chi.Router) { + r.Post(options.BaseURL+"/api/v1/github-tokens", wrapper.CreateGithubToken) + }) + r.Group(func(r chi.Router) { + r.Delete(options.BaseURL+"/api/v1/github-tokens/{id}", wrapper.DeleteGithubToken) + }) + r.Group(func(r chi.Router) { + r.Get(options.BaseURL+"/api/v1/github-tokens/{id}/accounts", wrapper.ListTokenAccounts) + }) + r.Group(func(r chi.Router) { + r.Get(options.BaseURL+"/api/v1/github-tokens/{id}/repos", wrapper.ListTokenRepos) + }) + r.Group(func(r chi.Router) { + r.Get(options.BaseURL+"/api/v1/jobs", wrapper.ListJobs) + }) r.Group(func(r chi.Router) { r.Get(options.BaseURL+"/api/v1/projects", wrapper.ListProjects) }) @@ -2719,9 +4470,51 @@ func HandlerWithOptions(si ServerInterface, options ChiServerOptions) http.Handl r.Group(func(r chi.Router) { r.Get(options.BaseURL+"/api/v1/projects/{path}/summary", wrapper.GetProjectSummary) }) + r.Group(func(r chi.Router) { + r.Get(options.BaseURL+"/api/v1/projects/{path}/workspaces", wrapper.ListProjectWorkspaces) + }) r.Group(func(r chi.Router) { r.Get(options.BaseURL+"/api/v1/status", wrapper.GetStatus) }) + r.Group(func(r chi.Router) { + r.Post(options.BaseURL+"/api/v1/webhooks/github/{repo_id}", wrapper.ReceiveGithubWebhook) + }) + r.Group(func(r chi.Router) { + r.Get(options.BaseURL+"/api/v1/workspaces", wrapper.ListWorkspaces) + }) + r.Group(func(r chi.Router) { + r.Post(options.BaseURL+"/api/v1/workspaces", wrapper.CreateWorkspace) + }) + r.Group(func(r chi.Router) { + r.Delete(options.BaseURL+"/api/v1/workspaces/{id}", wrapper.DeleteWorkspace) + }) + r.Group(func(r chi.Router) { + r.Get(options.BaseURL+"/api/v1/workspaces/{id}", wrapper.GetWorkspace) + }) + r.Group(func(r chi.Router) { + r.Patch(options.BaseURL+"/api/v1/workspaces/{id}", wrapper.UpdateWorkspace) + }) + r.Group(func(r chi.Router) { + r.Get(options.BaseURL+"/api/v1/workspaces/{id}/repos", wrapper.ListWorkspaceRepos) + }) + r.Group(func(r chi.Router) { + r.Post(options.BaseURL+"/api/v1/workspaces/{id}/repos", wrapper.AddWorkspaceRepo) + }) + r.Group(func(r chi.Router) { + r.Post(options.BaseURL+"/api/v1/workspaces/{id}/repos/link", wrapper.LinkExistingProject) + }) + r.Group(func(r chi.Router) { + r.Delete(options.BaseURL+"/api/v1/workspaces/{id}/repos/{repo_id}", wrapper.DeleteWorkspaceRepo) + }) + r.Group(func(r chi.Router) { + r.Post(options.BaseURL+"/api/v1/workspaces/{id}/repos/{repo_id}/reindex", wrapper.ReindexWorkspaceRepo) + }) + r.Group(func(r chi.Router) { + r.Get(options.BaseURL+"/api/v1/workspaces/{id}/repos/{repo_id}/webhook-info", wrapper.GetWorkspaceRepoWebhookInfo) + }) + r.Group(func(r chi.Router) { + r.Get(options.BaseURL+"/api/v1/workspaces/{id}/search", wrapper.WorkspaceSearch) + }) r.Group(func(r chi.Router) { r.Get(options.BaseURL+"/health", wrapper.GetHealth) }) @@ -2734,206 +4527,352 @@ func HandlerWithOptions(si ServerInterface, options ChiServerOptions) http.Handl // const string: with thousands of chunks the chained `+` fold is several // times slower for the Go compiler than parsing a slice literal. var swaggerSpec = []string{ - "7H3bchs5kuivZHA3YiQvSclu91zU0Q+yZLd92m17JXtnz5nqwwKrkiRGRaAaQEnidjhin/YDNvYL50tO", - "IAHUhUSRlC3ZPRPnyRarCpdEZiLv+esgk8tSChRGD05+HZRMsSUaVPTXOyX/ipl5yfTC/pmjzhQvDZdi", - "cDJ4wZU28Pj3sMBbyBZMaZAzSC9fnj4+WEhtJiUzi8N0DJeIiUi5MKgEK45KN6ge22HfMbNIx4kYDAfc", - "Dmq/GQwHgi2x+UvhLxVXmA9OjKpwONDZApfMrghv2bIs7KvfTv+QP8n+hI/ZN7M/Hj99Mhjar+2Ug5PB", - "//0LG82OR3/6+dfHv//4z4PhwKxK+5E2iov54OPHj3YSXUqhkTZ+JsWs4Jmx/8+kMCjov6wsC54xC4Cj", - "v2oLhV9bi/lnhbPByeCfjhqQHrmn+ui5UlK5ibpQvEAtK5UhsEIhy1eAt1wbDQc4no8Bl4wXYNgVisPB", - "x+HghVRTnucoHn5hp5VZoDB2VMyHMK0MFCy70mAWCOFEQMkC7cJeiRxvUX0Q7Jrxgk3tmTz0CmlOLuag", - "UV3zDEFIA5kUMz6vLLbQshzSuTEefEUfxIKJvMCcloQK0L05HLyR5oWsRP4FEcpCY0ZzfhwOPghWmYVU", - "/D/wC6zhJ661PRipgItrVvAcTt+9gitcubWUSmao9ZdBk59YMZNqaZEVf6lQG5jKfGXXtvTLrLF5xrHI", - "9cCO4Ye1s56W/EdcEXdUskRluGMSmUJLGxNGK7dz2P8NcmZwZPgSN/nMcMAJ+hs/F0ybSaW3DyaqwlOW", - "Y4NbRuGlHeUOH1Rsrw8cX45sQN4IVHYoNenZYqlwxm83r5FzrsuCrUZSFCtwL9l7xHKZWVUUFmk8M0wz", - "fjthj6dPsm/yp+nhOBGvpZgDClnNF2AkKMzkXHCNwAUUlo0OQS+kMvU7C2aAm0RkTFjysB8IbVSVGZpQ", - "Kj7nghXuQtrYgsJreYXt7U2lLJCJ1sPPOMCP7ZvuLxZV1uHqD6AG5rCNg836fq6HllN71drlOSQ+c69v", - "4jIr+eTKIfk2IvOk8HE4sGcTvuge6PsFQlkwe9/fGjq+a1ZUOIZHjy7QVEpgDnjLMlOsQIoMx48ewaWR", - "CulkNGaVwmIFf/vP/7FnYn/WICTcsJU7Y6M4XtuXoWAGVfSs1kAZdtdadj+MXnNtLrww0Aso+j83uNT7", - "g8zPx5Ri7m9pWNFCJguxOaq+1etB+CS29mdSGm0UKy8NM5Xu34BAzPVkGl6PnJ+qEG4WKIgkLOppMBZt", - "7UHgsjSrcQPwmgDW1rw+S2zJZwsm5viOaX0jVX7hmHOEzVZKobDipHvR/rbk4jWKuVkMTh7H2BTedF5f", - "v50EX1ZL+CNJrSyz0u4Y3kioyhIVTO2dabfYmuSPuzBsY5Fri4jun4jR4Ufv7gPH7W7hZbVkYjRTHEVe", - "rKBgUywsq7sRlvXZc8uZXkwlU/kY3rdYaSKIGO1RzlGgstzACysjzXMEJuw9GSNTorOtgF/HAbv0/o17", - "5aJ357UOEblO1mZqXu2f7oNG1TsXydkdvu1+id3gghvOii349VY4jg/hFToQgTdETLCstLGYJ+YIUsCM", - "1KhCzrkYJ8KeFcuXXIBeMIVW2uYaZGVGcjaaMpFvHMMfYxeVdIIVimpJHMSOOBgOrjneoGoBqQeeYfMb", - "e/VDx6B8jjN6XYpXBpcREIt8UnCBMYY3HMx4gX2HPRxccdEnN4l5xeZxmaR/tl4xpmRExb3PNZ8LZiqF", - "u3HS39S09Pb+/LqGDUBa29gO2F70/VTo8SU3Dn9nrCrM4OTxMeGWZY+Dk+NhBHR6tZzKYicPXgOG/2rX", - "9vruLIW6Ksz+d+4aLm67e7ftdm0TYRXbruFzrp4Lo1Y9Z5TJyuk524G8H9fz6NQaOLaiWvXtLidH41ne", - "9kn8e7GRX/ACf1CyKi8IMJtzTFGbic6kI5eatc4KSeKqH1BUy+k+TGArrS+ZyRa4P4bYtf9kv9lEjjUA", - "tCm3taFmyj7QuOE3xZlFJa4m7ovIRlq68Maz7SxUoLaawILfgVDe0DcvuYnRyB1OThumzJa1Ofrv46vr", - "zKIZrMMlA2jCyoZtWPadwju2KiSLaDwtQK8Zcd6/GP0RrPIyhmdcMLUCiwPayldVkZNdZYqgq+mSG4P5", - "OCYl+NEni6jp9PLl6ejJt85ymvM5akOmU/9RGh1xK/r3Eo3m/4F3ZHMe1xtod/bih+wDt2MFcQlgf/Je", - "sxCgwczKqeGVIUgFVpkGPoNK5P75+M4qdudW3nYH261dIlPZovcO3rxMn+y8TH+pUEU06Mtq6hYMjsfk", - "wOaMC20grVecju8ojbu5dm3uvm7gNVz4gjfwS2SF2boT5u2Ka0BHA2SDItU31aRGp1ZRSiuxoEFXccp0", - "r7blbXk1GA7qr3bL236E2HbIyP0M53yL9FcVRQfxZqzQuG4G/TNp9JYo4IaXqJ2jwSKZd8iAXQXCFGdS", - "IcgShX1oVReNWnMpetT+rUvuPYVK9BkKtZHK3mNM+wud5TnJcqx41xlj48t1u28JMyWXxL3B0gz87b/+", - "GwLvlTPwSnuxGrk5wXO6MTxflmaViNoKEkC0YBoEXqOCKaLVtXO8xRwOpILUHgNxnRRumCblD/PDjnkq", - "wGgdrR0w1rfeiw5nTGRY9AM3o+dF3FK5brio3+2dztKy3qp73I0vhCuZxLbbV+6zb483WUSDJHdhdDU0", - "3cp2basXiFa20JOssZhu5+U024RlGZZ3eN97QjCffAo/XJtzuL7ovlm2wERw3X/H5ViglTEtMXXPfIMW", - "P/EsPWefuHXnXGfyGtVueMZxYOc+7/fwazDv/mDzzrDEQtDd+7rYnHYDAXoB8E7JuUKtn19HZeC3AgHt", - "o2BOfHP+vy7fvgFtFLIloJN8YbqC9N3by/dwRJzwiNaT0g2aCPtZVnA7iEaRa0hPCVFPoO3kux2J/K9a", - "itTZKVOaNXWeuERYBFB8yQUz6DzP10xxJsx3IM0ClffYAVMIpSyrguyZTIPCAq+ZMI79rqmlVqiaBMl4", - "82wcDLc9ayPG5ju4nGI+cXRRq05cmN8/HcRQAcMRBEwgGY+UoJqEJzRv8ydNkTd/55I0JPeMFP7hYIFM", - "mSmSwua27N9yL/wcob0Z63okWu4tGppOud+A12V/d2B5m68uUes7KztbhAqj9/XRrptC6XR20tErMZOb", - "ZBSeQumuPIfjLDP8GkdeqgoYDRlTilu57BpJ5cQi/85R0YJbwYBnrBjNWFFMWXZVf0Uia/g0XYNwOkyE", - "/41gnQ7Jvp92sTiNEcldOSAWrLRnqjGTIl+DtqysStZj8bkLm/8ETtva/h6GtwXTHcu5wgz5tUWM4VYO", - "vQX5Pu7Cnf5rqPRv7JKqNlGxc8V0kTLleYEp+VeFDLI9YR0cQY1fVRPJM6boLRdr474LHzlMts9r2KRH", - "aS1SpkfpjHH3H1UJUX9fMG1GqhLg1ujEdDfHRFVCe4wMh2AXTN4It4bOUQxbEqxlYNz9x0/3WarXa7lN", - "67qDy2hvz2WPG2arF9Gvsg+FKo1qF/p80BERij6MTfgTbvGOV2YxWaJZyIhf7D0Whe46J62oQPe4kaAr", - "NWMZQjIo5FxWJhnAgUe0Q5AqEQuek9v/wDvEwd42WjeRAr/TIKRZkNoqoZBzkJUBOTvsopMfdDCs4wJi", - "9Px5gBt2QBEFo8yx6PEV8Aj0Xr4AhaWEV+eQo+LXmDuyITGLZRaqXGFmpFqBYEv0QTMUQHK0tJMdjuPI", - "aSIWytOplkVlvN5sJE0zns+rmVOnpYCc66u4PYT/B06mK4NxCegOYjypcd4+1xq1F5yveTSAwEJnknMV", - "j1U5e/Xvkx9++PBicnZ69vL55PzVhYsTskq8zpgQmHuDAMUUwQ03CxBSjCgWAurR4XvLTxsYaRd9FwUR", - "ncf+WnMLV3a5K/zIw9auY+BqDP93dVBsd0L85nwGzWbC4mLg8LEIMWAouWSTOJFcoJaFJUR6C5ejuYRM", - "FgVm9oUWPc6kco58b0caw5sPr187S6MLWl2W1X4W7GFY0h2orGfItlojhWFcoOrZ6TvLBbigCBFiOOF9", - "OJAzgwLwl4oVlk80kd9xv8gnxEx2AkF62BQR3EobXDqOJZ3aao+SGal+p2HJsgUXOI7HdJAdb2JJe0IU", - "tDnVc1K5yChvXwCeozB8xlF5MSjETDXHTCzEyjqJOFB46Gfxhy8FKHmjHa8pFY4sDCBXfGbAKJZd2an8", - "1ZaI5sa0GrjRbgymIRl8EFdC3ohkAIq5u3TBhH1EY7mrb49IUOf9uKNVhwJIA/Q+J3TVHlqPsyyeZ7CW", - "ZuDEUhem9uHidet0xnfKBBgONBrDxXwnT/Ys4zK8bj/9peAGdzGLy399ze1JM8OmTPsb1nGIoBs6FGsQ", - "pT59jy65FL8zgLel1AhWN2RzBC5mci8G4pd5rwzEitF7g4zejRvBasNlS9j3+LXVclGV+R35SsTrGTyc", - "DcPZ4IxtSmnhSgDAsLHNdSJzW8vbJJotF9L2WNSQSbO3HBGuuXtyx9Xzb/PHrdPJpiJ1mxVVTmRjifSO", - "HGjJbifOYHZ3T/fGzOvDbdtPQPg1yd0fa+0V2W5vcMbuxuC4z9t3GtoJUfqOgGlPNFzb09qi1yfaBrJq", - "uWQxdWdbpOcnX02/nRtFYYbCtI9iL2K9pPd7pP4284y4UMpJED75HZxzdfBaH3/47WFqH9uu2XCbXXfR", - "eisabwJx4xxjmH6BM1QoMoxHwHRVq8bG6D+K3my9cUqnxQ1b+Yh8b5VDQJGXkgsDrXejIm9bjYuPGyT6", - "tNGtUjhQOAvpAOSt1mDkFQo9JEVGMTFH3Rb9947x3RrfdK+6YjvqZ3esWVeDbM2zI2SoRoVPjNrdjCf6", - "9ssH57Y2cV+RQV0S+YKBQRdIh33aclFt7ISwIWZ9e1uyXyqEV+ffwawylUK4RqW5FFaxXAVRvEQ18qNA", - "sN1TgJpX/3nMGrS5lbCK6C4qYeXZM0pzjVmlvZbap8a2zIhSAevTn40E1rJlxSMSC7Zkk643tT6zxzH8", - "dF9k5vZO74vJvKwmBVv5rPTuhkaP4XtgRQHuBTj4CQ0rjs4+nJ8eDuEYvoezdx/ITRbnSmEOs1DI8sgE", - "dogCDdCLI5/YyyojRy7wcDzYRZZWqGwOJpPCBR5lq90QUJjJ5RJF7hB2K2G1MeOi9Z1lDJQSvC2YKlxG", - "+ZQY4fWgO3fsZlozEaEakdPSZ1GGpCS5ZvHPmACFRBMMksH5s2QAR4lIBs/Ftf0vJIPW4pMBlLwoQOCt", - "sUiJLFuEfMIfcaVdhKSzkbQiAsgCrk8gXaOHdAhpFwnTIYzH0SCtdaUyFk63QFAO7JOgC4KSN7XhB24U", - "NwZFE7FKRiJy2qK4PmqBmGIYuACczTxSfZolJSx6uootWgLXukKXkkQrfPfh/RAyVlqm1nIpeENEK/Tv", - "bqG164xog/ij1L1JjtuoJ8KCalTfyTsvupS1k43uxf72YXn7srm9WNUdmc0uhfjrHNrOs/pAOB0TVYsQ", - "ASRLx9XGcIkiB+aYBPkV0RwpLAuWOdu1vEaleI4wkyoRZE+jMYYUpgRpMkgGKRz4CGw3/KGl3/Q4hQNR", - "LVHxrP7dyEScvX5+etEd+4AYloUGudQ1kFPdMjBxDUfQovvDcSLe+ngqv5crxNIOx1UIUfUsLxKnEcHU", - "3cbeCObuNvFtYvK+36xj9v7ftTB990dbMX/X57EojUtcMmF4tiPy35uRIqLDs4JlV+Q0tPpZrmQJXlKF", - "m4UMtl+fSARMNAUQFOiQBGB5710s8p9myI+mAq4HVN9S3rRz4YGcwYtXr5/DXMmq1HBAjizSpg99on6l", - "xB7CERdNjlg8UTuTmgsEzZe8YIqb1RgsxZDR3MtjftlwcDx+6gj7TOZ4wcQV+W1G//rHQ88ZLBXjbVnw", - "jJuCSgrknCqRuJIThZS+pMBuD2YdBrt+y3KD9akTMdOF/9BHX2eT3E9ayDr29ymANMKEZKBlDBqsKEZZ", - "IbMroDepaoPIVkNQsiLBx0h4DDlmfMkKID7dlX56o8c+JSmlnbD4QMrncA0kceC6GJR7KSqDtyVXqO+j", - "EA3XE3/l9BSGCI6qEAKWMaVWLlGE61BhJ5Yp4v0eGlHcaaHNV3cpakMf7FXUJhZy0vHdtKC7tocOuLac", - "8nYvjofkHQzDHnc+o6RIPec2g8klzzFj6rI2NK+7Oiazgs8XZpuv/JcKK4QcS7MA5orsLOXSSjRyBpot", - "y8Kzue2XBIEdQyZzPJwmHiscM+Ych8gLyBa8yMFHkwLXwAqr9Ry4OEI4CndDfrh7jVS+ra82kLfp9IOM", - "iGuK5gZRgAujtiBy0fXancRRsC25Ih26ZDcCfChkT36Ws4R3zc0+NNIbP91//cjujzqKMuy+x+Xq9d46", - "4PcOPNOtKgBt2EKmKCbuqGnjlNVJ8BlP/iqnkZvorE738iDohJxSNMbuU2Yln3jTX7cA4fXjGPeyYj+K", - "CA4+cw/aYSS+ytVcpvHYmd22vWo+t9t6YVWdEKUShmU3ViyxmHS0LhpNjkc//PDhRc+09r7Wpr3p7qyv", - "6TllXDDdlL7z78PBDTcLWTnaT93Do+vUizvDRLjlHY+/HT9OD8fwpioKsMpf4QQy8tXpikI9Z1UBpSyK", - "gPSo9wxvIWBMCsm8tr9hy/GBM9i18nm6I4upMhpchUa7IS7g2+NjWNoFvGCFblVMCh9xDYGmrFC3YBoy", - "xfQC8465qUWqbX/+GnewTBoUzrk2qDCHuojmHmyJzmVSqQjG/MDNy2oazg5KNnfOTnvLp92DT/3R0D4r", - "F3O0X7gJwbKNP/tl0PaHeUw6RSa3Va/yk46yBWZXdZ1IexQUmwmswchHaSICHKQADxk7NZVAEngTQqu8", - "1U+E0pNUg+wFmR25Bul1eUvWVMiMFtNdCFU0M3BjuWEiDjQaSM9e/fvk355fXL56+2Zy9vL52Y+T529O", - "n71+fv49pfOmbU3F0gAX88M+RPKzTWi2XeLEv7mXz+y7/q7vTTgL7GzjVLuMcY3ghhHDUit2JMq9o9dA", - "yyN+bzVwtrkd9/Mnbqn0s81B6HazrWDC/y+Z9Iklkxxod5ht7DSfbSW5Q42He9LKO1u7L6fsBi5+Qb+s", - "M+7uqgb3yWGZH3un3FoRrhZ9ezw/VEghFEgkp5aQUEgxd87BunTyGC5wVlm5yJubvdsFBQ3vi73RHWGv", - "ASNp6D7GHgq7dVf0Bm+oEHOtnNs1dSaG+Ly+woGfOHXl4daSr/apHbcJYJ888/m2jf5zaG76NLw0YYYq", - "dmg0Yzj3P/oSlq7kayLahwPXnDVV+t5ehCrFfeBvzfPpoc53qTUYZ+HLSpuJqyDYKUPYjy97H+Y9RNJy", - "d+G7LdECela8LUY2onl2sWu7TWVLZjFhw96M0U71Z24Wdczy1uQbN/ZWZtcZz6quRfF2Njj5yz6JZsMe", - "vTdYc5rycmuKr/0Z5MxlLZE5Kw8GPN0kpxDX2EsBvsLVfpP5isSBrjSlo1HdgDvMSMYfKpAZ9cT/JEn7", - "zFyhAW+VtIhl/2MxVhu2LOHg4sXZN9988ycrNL/xtaNq/t3UdSnkfI45UDHOT/TBr5fvjR7SBiA3seXn", - "j8NBRDiPRBVidtUTpvDa8nxSmVuQ+PD+bAgXL87AwcPpdb7uT6Nz268+PQzB3zPble0SFZc5z4K2RQvl", - "OmhXcYNWbQaM7JSegS8eMAxHvGxhCE3hDFx+41IE88InRDmIPi5Fim9WKW5Wl5aIQ0VEplCdVtFcJkcj", - "vnAQMA3pqa/ZT7h8As/oa0iq4+NvMqsvnr57Nfnx+f+mHzAd+Kr1BC96tYHfwpjSFcfn0UoFL9+/f0dU", - "GoSFNOO33gySgvauIMhkjiNS1CBnuJSWSlzRWqvLgkMVe4LTlcGRj55mmZJar9mF9HdumpbamCbCBe9w", - "AekRK/nR9eOjUIfL8OxKU8GlEkVOZjyfNt7VREP1EUYe9xumcj3iwrJUZrhdja+8WzCRa1r9P/0TtHpd", - "cCloSzcSSqZYUWBB8hz564Ibf4Gg2RK98c6snLHtxH44gkePnil5QzLHURP39OjRSaiV4ndmRz0itpa6", - "nFLX7ONfEgGNTELROhqYgJfGlG+p2IOUV9wdUGAqvniKf0ICkDB2HFYZuWR2YwWVcSbXohV5La9kSxz5", - "WC3vt9FjuAzXgpJFYYeYSWWhCI+fQs5WuokQInEyOHzcxs9ev4IjuDz/kXa7DXs98/OYa8/M8x5LATdM", - "25l9pFJTZCYAruQjyzhTHwLGFLo055HOZGlJR7gwsynaYcIdxEXOr3leESiCUZJRtBWZ6IgruYI1DjHe", - "VdOCZ3VINbm4HS4EFnB4AukPz9/Dkav0lg79n7nMNNXeoL9kiYKVfLxiy6J+pY0EdZn0kcd2+2kfrtgj", - "It3GmYxOP7x/OTl/delMRa78mL7ipXYxcc7e5Os7rJrw8IMcr7GQpXPFCF9vn8ENU2TX4trfhIcEijra", - "LRhGDVNGO7Rlwsegd0rFmwAknQha6LO3b99fvr84fTc5Pf/p1ZvJ859OX71O4V8g+vTd6eXln99enKcu", - "ggdzpzq5m8npTAczqTLnPvY0XVNNt6z24RhOocA5y1Z+LZ5vpqT5SAEMZgr1osnJ4xr4spTKVxNioLmY", - "F5iIFMX1qD6vNAg2bbmG+QUG5uI1UmB5rpAajhBy+V/TOmshdUE+OiTXgi6oDpwb0nesmCKE2j9g9bcP", - "F69B49weo4bMXpHFaghaBgNxIIkGiQ27QmCQ/mrn/JjCh4vXiajbPvkWGa5Mw6NHs/16PD16NE7EmUti", - "tkdPeLGj4xPB5pLqSRHCeeuffdDF/fD1kVtxt9TUQgpZKbdcX18qhQWyHNVJIjQK8kJurzwF+oa7MAzX", - "2sgpFRQCnQiBNwUXOMqRjD9WcHY1sCwcNktppeAEAD30xJGItC7ElPqaWo4WHx+D952N4W2RB9bjToBC", - "0YQEt/BEuC25QoPrPXLSQ5ij8684LPfYOqKSXWE/AeRUD1rbP06LwgtMde8rS8PN9UZ9cvSClXgC6a+J", - "rwWdDE4gGTg27iUtx8aTwUd7sB2OGFDJRbze2s1YsdznNUNdI7OuZ9TEJxerRNSFjX5NfOFON/t4PPaz", - "WRGHG/LANRKLJctBbQZ3zrmPw4FnxIOTwTfj4/E3g1YET81oLeUeNeUZ5miiIS1X2vGtbuGItNU+QYMU", - "CCiMWlk5tx1MDx+0ZWjELVqh0b/TUNvJR859V/LsyrJb6ViK9pm/C3aNFHlopTtoQvqt3AULJtYKVgTm", - "7cqE8FbOUDuxvBunSCwRR9TDpXTZ/2Xly8ERO9LeSOVS7LkUr3IrhXNtfgolKDot1Z4cH99br6em2Eek", - "39MZo5K5DoAFvTQcPD1+3DdovcqjTpMs+uib3R81LdlI5g85jgQJsOjhV+JKgmRucb58Chy4q8xSx6HF", - "ZDbXjZnmZztgFzF9sPkoq9M/ogh64THQ8zOXd+6/9R3S4OD8GYWq/+2//puCUu2/7bBUJz+0nJl1tdRW", - "j7VQmHgIZVFpcqdR+HUKS1a6DICCmDrF7ZN0/zsdEgS2pQbYByE5AOrcgERsTw4gvtoKle3i5g9outkz", - "D4ih3YkiWPrcCZ7XuHYuXwdZL5DlPvNgc0m7sHQ4KKsoElIsn+7NkhjDCx+7HcKfg2rhtYpEWIlG+VDo", - "Jrb6e+JV/SHVlrwIJ35AY+XXc4ka3rx9DyF+pu2mD1dRg4ZB5wKNVi4ymAgvkBANbgTjzAzlsrRiFN59", - "eB9DwHdVBAFpp8+kCx26f9zzofMfu5YLqyd8/Jro75aVf2mkHw6ePnmyzzTtDoVdUrlkmwQSUFPfmaGv", - "IRMZ96SOUNO5soyWMBXXousOvjnWZN6TlTkcgkHVLpjq2bZVBVuxbsN2DJlX5JxS74J5OvsbJyLcKE+O", - "nwBfLjHnzGCx+s7Z05xG29mQr+xnJMgpCWVOgQtxE+62qeNj6E//yCgmNMmAY3glRi4srKUfTEPM9Ho4", - "YSDIG+5NfG5bz5W6rEpU11xLZbedCGrBNLV8ZpQrfo0CvCwW6gXBQZrxW1DobF1O2PWKiLdZHMYo3OeY", - "+sjJzQvmyf1R2Fo2a7wTqGNQ9TtfjMq+dV88bGPPOspU12GaFimsdk5cPeADt7xcyJEsN2695jqIxn19", - "KjU38UxePNsQRLqxtQ/IibsTRaDonoAWrNQL+ZWEZb/KOhLXc4+7wr/27UXBbiXyD95D92Dw3vBNxi4/", - "jeprayZWgHIGu93SXfRCanWqQ0MW6nS9XZxTh30jO27W2ti9bzfF4xpmBSPXWxpzFnvLph2P2PsUE7Fh", - "/+Nms6HeBotuugE+kPi12W5wL+Hr8b2iYFQx9sW1vqCwdfyn3V/UvdbvQzp7Ja65QcvvA2Z9Eg85+pXn", - "H5uOBxEvN9MZy6n6Se1F/p1unOoWUYPTO0Ti0MtuwL5AoBjCntMXNcJ2kOZprFuSK4z8JU/56e4v6gbo", - "3fNyqwW211lRsKGzU2sKmeB2wz5E0AUguuiTLq0NW3Sz7s79mWyAWbReX+vMltIgSNVJFYyEcflmCa6Q", - "Ruwsm8CzB2I+m5FtX1jz62M+XuH77aLlPTAf19DYheI1yJLTzXYXPuT9m1sFGdc2WA82aGKt2DsrCldz", - "nSaiLuvD2kLtDGZXuNrA3FOxcoW0sdBIbodKaDSH9afOnlwUxPaIyxG6E0W66NWaJMk3O2hTYR2OVhSx", - "gMKfHxA/Iz2+I9j6I66+toC2XDVRUxb+zDVw1sBn7iw7WBRQpl9ea9uEHz2qG7Q/euQ6V02ucJV2Gj4H", - "nGg5kN537GR6IW907e5jkMlyBdPKGCno/mOQDFzZo8YHlDi7wkpWTo7TiC4UjKy2ySA4oMdw2UQqUA0A", - "/7nDP+fvcylEab+U53uvP6Sc123f/YUlvW5T/x48zj5X7PtsmUzrKohkHqXjqBvhgTsFMYuSxGC89+Ba", - "XmEwGN8IL3+dCn9Bt95hYpWIK1xZ6exaXvmghxLVktnN1XZhJW+sOmoJz6GdC3BYMnWFeSKcq9vHmFD8", - "sXdrsCrnVMmZU0hbqZCMC/nQkkgiWoE4PjCGIkuYMbgsTcsi5+qYNOasp8eP45Ynu4Ia4R9CUNote7pF", - "/L3InhcBEfbHyli0zk4vXPprMhCIuZ7UnyaDE4rz/5g23tlO+Iz30W7wXOceI3Ubb8uCCUYF5nWmXD/C", - "xjsLB8mA6StfGivYNUmaLQvpIqAgFnrziBwq14xmyS3HJStZMjgcwxvZTm7gUtShUD0Ot2dhxw9v6Vqb", - "atv1Xr/qDU2dcM3ByV9+bqNJO2C1OQg6UGdroE429dHCQUlxY53ruTKLCCY5s8WoHcIfv7v/DRWfURyE", - "t+Y3JpYhuLh5UlRSgTftR+Q+04mImlTS4AOwVBBkQRcFF8Ko5Ywu50Q47cw0MYatsg8hpLLeR3DfWSy+", - "ougVyqo/HEPtiDOyyhaNfON4rdRIsXyxgL3oHU/TvmvSCR7klu9Mcqd7/mmsj4IHUfW56tA96SotD1Gw", - "YbTyM3bgL9nY+rH2bR1YPXQ2wvQSzeiMEOgEWuGr3zv/Cs+da+W7Otb1u0RcsiVecoPfX1Im7nfwjpnF", - "90epvbYbgZbw0zdZ86EIfVjvtDGLcTfdRK5WJIxUGdIQ65jt+ayPlWciEAyjjh/RgBiC0cPgZqdB1RfW", - "87ttpyI89nVIpHCtCHNvmW9QINaC2WdvOB5zENBgCGtYcDjYJqp8/NJE1XNxPL/1dmkf2N3Ep84kBQys", - "bXfve8P1x9riKyZZWbcCdkfUuSVMaEVay/q50EZVmXFvTl3UOsWVubiLTog55cr1UvB38BO7HZ3O8fvj", - "tIcM7JL34ZEBC+ripZ9wlh1W91zkHT7X9P/aAWeX0bwzwoqYDzPGhXZ5g3C3fmY3t/CVcAXjoIdDbQRG", - "XQmraTe90hIxZwZhVin6QbBrPnfi2BQXnFTvOOfqkdJ+wgeN1sNtfOKsdfvcx2mH8dp5ni4HdPeBt0sx", - "bT12JyxF0umcyBRMY0MqG6HNiOREFxKciLRdRIoq0rZKXIUWne0qVjVGhFzjROhSGqjEjC15wZly7i7t", - "e+g2Van8bWeVVd0u2+UiazfrdvVFdK4um4JRD+eqjlTLijmsPaQ/wz7XQZjTDqXq+gTbeLk35kTsFTFv", - "Tg3Qr6aq3weX/Tz127JlKdDCe7lqwH+g+VxQ48aQdQE5XvMMt1+M7do5vUbzd03RkQfD4ljnnggWh/yP", - "z7Iyf+vWvf2jVz4NI8TRREMBIqWFWuCuf2qbl2MG19Bg6CEtrmuVIr6wybVuodR/pJ9tcHVe84cNnjpt", - "mtQ5SyPXnVwkVrhiV3jLtct8/xTp+B5Q9MJjpjcglzWGRZAzwg981tA2E/Jp7Rkbw7mSLnWuBg9pk9xo", - "8H1fhqBwpoeuDeqCMrSGibB3dh2Xqcdwjk60tjcLClnNF84y5wovhKysdrhAImpzCIUnUjsVCpvnpj8s", - "oE1we0YGUBrjVOarw9+yM/az8aaOLAgHSS6poqCz9C2tKOGvzzPbYXp9cXy98D/+ghznS13f93AqP1DC", - "ZUNdlK9E3Zvid01XNorN27wSIPXSDrglsqNN76FOkNeXqEwM6IXiwqV5h7Dj0J06EQfrXeOG0Gkad/id", - "J/IWHU8RXNiITITmhcuiqTPpaxTtjxh52Hs1WoHpC9uTtmB5yBUoPxvbf5MhJPdAVe+oCUqgqTrHZjdj", - "678wfZrtFIOJ9/MocWtIAoNUVWLCfcAoRfWz0ioGdc7ZiJqNUb6IxTV/8SbiIHUPJu6H9DCoyi6blsjZ", - "d1cGBjkWho3hHdPaRT0QWqeJMBJueNmwJSpPG8y8gQeMwVId1XgMzSYiBEs5wc/w4ay+zQQtUn1I0mxP", - "uFstlyWKL6dRfilpWbRuAr9RqhbthGRv7Ftv8/1lOUKddbHrKyv6qA+iKaLaZSZvSxTANjfc4iB1c949", - "OEjGROaKFT8EC6HFGkmumE2LX+MkSn9NBm4lBeYt7zufAUtEONIbpuGK21eGkPqyq5wqWSE9c+fsrGxn", - "r1+RlU97FykXkOMM1YiqMFQlFd1gitLeuaGaDHNGyU4uTN0X/pNVkUOB7BoToSoBhcyuNFhhhLrPSFVL", - "E01N5MejhawUvH//upcBnTmoPzRXcNNsNe86oIdggdB34e9EXHWrd9jlaHyDDRzwHJeltAA9/EQSqfuk", - "PgSFXKLI7RVLaXP2TiXF1ftKNUyxkA7XvOPC8u/6Ph4nInR9+fbYp7CXZIMuCioo8ehRU8JD4FwaTif9", - "6NGJ76a0vfKGFYgVZmghS/r9J9XaSMQB1Vqg4hol5dYIbFLSu/U3fOWNwzH8uS5MzroVNlxMV2zlvtxG", - "p04I96V2IrU33KJfWLAFGkl9BC3mPnGYXG/2bBKxVtU8QtYvfO/aHTG/O6qcSGAespjX+/BAjMI5wPRw", - "DOeu8u1JpAxJKwLYAbOxYDtYxmOA1znQcBBbf2+c8AMJWP7QvooutIk1xJaiYGl6ELgyuXSs9L+aib05", - "J2T0tBQhFcu3KMxwcDL4NRnQw2RwkrgSzBT9ZS/NYTJwbIGeqdFj+skyMvphybgYzyX9SB8Sc0sGJ4+H", - "yaBpNpYMTp4cf0zE5kTUPMpPFB3VdZeyIz6JDhAqI+01wjBxVcknS/v3t0/ja8qlwE9aUM106EWj6ccn", - "x09+Pzp+Onryh/eP/3Dy5NuT4+P/kwzWP3WwqmcmrjsJSbwEvnrqibc1J4OTb57+oX7ZS5OYT6gSqX16", - "bPfnbrf9cbDDBqIpuNyFvwRPjUM0h3lw4KtkU/M81uLlDiETQVvWcNAUqHFKm7QbBS5cyMHWG4SysD9T", - "nnhY1SF4BYQ0vunA2wtwdNT67ahWPZdcU1ewr6Q8PCwwvPJRp8ZQX9Qf3n2oM66nlV6NgcrN2f8OIb1A", - "o1ajU3tXpvUt7Wo8h8QDXc3nqC3O3DBu4MCX1fIZ+q3go9ZY3c1slDH/uJafXE2X3KxLURoOluwWvj3+", - "dMFPcL24P8kvKjHQFA96U9oZvu5V6Vaw2ziRyaWrNfP3yzN865XfDsf4THPDGR3JmrH9sywOO8Lyibmw", - "jhmHVLuTOrdtyXOK6a5buPu4nXLBNKZDSN0tm3OdyWtUmB/VF+4RXbj2ne4FTSU9sWClvZQ9fwqhPkHX", - "cmxPyMjSEtGtzrZe1Xi9kVVI8aPCppQ3la5JBn6hbgVrax3Dq1nbB5oIqo8tYcE1pRswChpwRQEdtElw", - "4XmBTVm+CDN6+CSAjtiyI/7CnW2IGraKn93A4VcJhnltNeAa39ZCkFTlGohTxnFQEqmgW5xCPs9vtoW+", - "NLUdeShLBdVzcc531ymU0hMqco9b6U/JW75kBkEgU6jNSCCfL6ayUuAWlghfjq6NvL/TkC2UXOJyNJeQ", - "yaJAF6wNF74fK1OYCLuk0YwXLgZmuoK0bkxrqZkKRaZWVPW9XYe+gdno7cWo7teaCGLEh0NIQ1PgFA6m", - "oQvw0JXVpHe4mB/WoX6+ZW/qSjIaywaWqFz4tJFWTxuR1ca32iW4MKsr2YVOURvfBp+WS3aZ1urr6t36", - "JBEAo7rc2t/+67/X2+emx+OnKRxkrOBTRWbUmVSw0UrXjRO66FJLXVfQiXIzycjisofZVF4jvHxz+WfX", - "WXftw1JqTlY1+7XrxO3eSkTa6aFKdUS39IuNMJxu+9oHEoDiHaK/sAzU06g3xvQ6iEQBKL6we2jU8+Xc", - "Nv9oqk2kGtUQrjEzUgH5KK2cZtVkYuVtfScRBy3FxFcvtvrNTgVmXcglscLSB+lBtVrt7SKkNtkJY9kT", - "/apPQJEDt5lDz2xbV4//Yato5t45ynFGVYJ8kPVDqD2ODM5bEz0M7TczfCW6by+gn+Z/8rcNtEH/D0jm", - "3TAjOTJy1OzY3pMufg98k7lPwN17dtPEsDbY9x8CX+3YX/Weai9gD3z1fiqz+MdHVwuZEVXZrkXJz+Ky", - "CmeoUGQP51Vs5z8V0qGD9m08UkdpqZXZkdLISYJvdTo28opykVKPVyTecfqvL/3uS5STfBy6PsL337vG", - "E/SXF5ZLWVaF86trwcsSjQZahfOse+wGBrOqcEW4QeFIIcutWE3Z7VVhvqur7+sFfTeTRSFvoCqdjbGW", - "kxyAgWrms9w5/mjQnCvMTDxROSB9fSgPVPc3TPCV6Ls1fz95t6Dwj0/VlF4f9uv9rkQbn0bWPgL9Ya+g", - "Sz/JAylMkc6sX1pdinVQ3XYRBbD/o+OrA0xgcdMViUpw4OwcR/XNdHhX5A0T/Loriv7Sv/nwYcZhppir", - "IDz6uwlSCt4CeY3qmuMNHBhZ2guJclVcX8SQu0JWX334EPH2LRTYszCONz4t0bCcGXYSesMN2x0mmmrn", - "1EdjGNKgfQnNWNlmSmhl2YJi7g/Sdi/y9HAIolpOUYGc2dt/I+3NGeXqd1ohyKFAee2m+Kuc6p6M6i9Q", - "2HlnlRtf2dkbuO8jVfbSwfmohnRdp7kdermGXVP0/MG1+tmrXFLLqSCvkkG7TJI/a1fJi2ULuPzX19wg", - "yWlcwONEOPdKE/v57fE3PqSrO3LdfMiFtoT2QnV3Id9J0E56/gxK0kkYL/QYfGe4v/3n/4CQa43uQvup", - "XZWRXjpoPCCGuBm2u0UJltyKzASJ+7aW3WUJTS+oAxzPxxbmlajp+HBrqY3X/BoFeXMsvsVKadSI2B2l", - "25LzLz9b6chhWCyy7rLVMtHivBzlaDDr9C5xHdnUdZ1br0uklVSqGJwMjkgA86va6MJJAHBVsz3x2mXr", - "JpDObePj8NfedOIZZqvM8r2ziw/nh50vHZ/f/Nhd/MOWhWjYyK0u09FxxTU1qBnc/7059PuFQhyR47Th", - "m6WSRmYkBYcM9+C82hzh9N0ryGVWLVEYQsHmq1xm0e34rMqhq4rgy6UMm/IrLtdruJ7i70el3PLIOuqs", - "MVeGYckEm6NdVetTKua6+a2vgFhX2+q2q6zzPsnX8vrV0eX5j3aO1rihGt3Hnz/+vwAAAP//", + "7L3tbhtJljZ4KwecF7DkIinZVa7pkWHglWW7rC5/aCR7qnc7a5nBzCAZrWREVkSkKHbBwP4aYP8OFniv", + "YC+gr+H93xfRV7KIcyLyg8wkKUuyqxrzp7ssZmZ8nTjf5zm/9hI1z5Xk0pre0a+9nGk255Zr/NeZVn/h", + "iX3NzMz9M+Um0SK3QsneUe+V0MbCo+9hxq8hmTFtQE0gvnh9/Ghvpowd5czO9uMhXHAeyVhIy7Vk2UFO", + "HzVD99kzZmfxMJK9fk+4j7p3ev2eZHNe/UvzXwqhedo7srrg/Z5JZnzO3Iz4NZvnmXv0yfhf08fJv/FH", + "7NvJHw6/e9zru7fdkL2j3v/1ZzaYHA7+7edfH33/6X/0+j27zN1Lxmohp71Pnz65QUyupOG48BMlJ5lI", + "rPvvREnLJf4ny/NMJMxtwMFfjNuFX2uT+R+aT3pHvX85qLb0gH41By+1VpoGau7iOTeq0AkHlmnO0iXw", + "a2GsgT0+nA6Bz5nIwLJLLvd7n/q9V0qPRZpyef8TOy7sjEvrvsrTPowLCxlLLg3YGYdwIqBVxt3ETmXK", + "r7n+KNkVExkbuzO57xnimEJOwXB9JRIOUllIlJyIaeGoBadFREffuPcZfZQzJtOMpzglroHTk/3eO2Vf", + "qUKmX5Cg3G5McMxP/d5HyQo7U1r8lX+BObwVxriDURqEvGKZSOH47BQu+ZLmkmuVcGO+DJm8ZdlE6bkj", + "Vv5LwY2FsUqXbm5zP82SmieCZ6lxc/xJ6UuTs4SbFwLn+QV2rRoTJpzZQnMQBlI/PigJdiaMJy3HVm0k", + "45PTP41+en/+48XZ8cnLi9HLd8fP37x88cwxyhiYdIs2lmkLVgGX7kuO27rB/XzcdI/TtBz8nOfqnDYK", + "BYJWOddWEF9khVWjBR/PlLokiTBhRWZ7RxOWGe4WlGuOHCPw6uYS3/ApS5a0z0N4xxeQZMLtDpiZKjJ3", + "b2QKsR9hNFcpj0FIYzlLh5H8acYljJWdAdMccq2uROqY08oLCyHNU1i4h5XMlo5jRXKsVAa4fbYPMe2P", + "MDBnec5TtzuNj8AziHC1Uc+LJy8y3Gc4k45GxprJBAXjXMg3XE7trHf0aE249HtTYWfFeFTobF2KzqzN", + "zdHBAT0zTNT8QC0k1wea5wo+nr8Z9lq+aNUllyORrn/vPf4Hy0CkThwzMFY50v5B2NfFGM6OPwzhvKR3", + "pSHX4opZx9JzZRpLrYar70zj2HtzJguW9VbP+bVaoJzwbNAf7iXnOdEwLm6iuZkdRXIAsdvpGP7xf/+/", + "4Q3Np8I4NQQ/48cHIcM6lISlKnQkAcZ8xrIJ7PlbbIClcyGP3BAjfAsvDndL3x/iaDTpxni4SwZYGGpk", + "eKK59TfIFloaENYNp2TCn7rBIWfGcvz0x/M38A34V4S0KkzTzdHwbELjhqtMI0sFbtkDs5QJMAssy56C", + "5sJJNbgSjMgWIGVmNlZMp04GW0U0TQfFZTHvHf25OgX3QXcYgWf9vK7q1NWpP9dJsyTo6iU1dpqao4Dj", + "XPzIl+sMIdHc3fYRQ2bh+Kz7r17KLB9YMedt1ERku/bnjBk7Kszmj8ki89oFsZcNXxG5+8oNXijYTi+Q", + "btqyALy37lN61LHEXPOJuF6/tC+EyTO2HCC3oofc5XWkNSmyzAlOrxDGibgesUfjx8m36Xexo+c3Sk6B", + "S1VMZ46LaZ6oqRSGu8uSOVWy766ftuUzM2aRlBMmnYrgXpDG6iKxOKDSYiokyzpYgeZX6pLXl1fjiP7H", + "WxzgCnmKtLe6r/4Ays3s12mwml83EZ/Q4y3CLRejSyLyTfLbX4VP/Z47m/BG80A/zDjkGXM2z7XF47ti", + "WcGH8PDhOXITngK/ZonNlshQhg8fwoVjQXgyhieF5tkS2YSdcVIEpIIFW9IZWy34lXsYMma5bj2rla0M", + "q6tNu3uP3ghjz71B1LlR+N/C8rnZfcv8eExrRv9WlmU1YnI7NuW6a/amF15pm/tzpayxmuUXltnCdC9A", + "cp6a0Tg83nJ+uuCkQLgr4UjPgHVk6w6Cz3O7HLaoBCtzXh2lbconMyan/IwZs1A67dS7kkJrLp1JTQ/u", + "oHRIvmg8vqqhSzEv5vAHtNxZ4kTtEN4pKPKcaxg7u8EtsTbIH7ZR2NokVybRun68jEQfnasPHHdFxyjm", + "TA4mWnCZZkvI2JhnjtUtpGN97txKyTmEDzVWGkm8jO4op1xy7biB1wMGRqS8JvRXrynes40bv0oDburd", + "C/8Bxe8Hp83d4+q3zdnZAiqn0bZp8KdT6fRJ2lGvPEm1gJRrccWdZscyoM/BRKu514QemEj+afD+uLCz", + "wQX9GhwuMOMsdTS3hIRlmTPIfnj5AQ7crYOFsDNShEzhLC5U1S+57INReC8H5d9xUJgJZ00480AqyJSc", + "ch1JJ+GKzLpp/8hzi3rvmCWXC6ZTA45hMSvGIhN2SSOqLMX3vHGCMtNYkWVkoAjrXVaB+a0r6Gt87pKc", + "NpvkBCrntX3lMtHL3DrNk6Z1/PJi8MPJWxjzidI8kjnXRhgr5PQp6dWC9GXUIxrWLq6Au48mTGvBTSRt", + "Y2yST59H32F53XTuHYmdNF76C1s2c2XE6tHu4T4arjvHQp9aQz+hv7RpqlJYwbINfPS9JM0GwiO4/5Iv", + "kDhhXhjrOKycukOBCbpMMzUVchhJd9Joq4CZMWd94BGqwg7UZDBmMl07jj+0KWSKnCjBFsAv9vq9K8EX", + "XG+3AMLi19bqP929yzWXQcdWN/aq01KdaM4H7jCg9kCr2RtY4Z1w4Bd8gmtW8tTyeQudyHSUCcnbtJN+", + "byIy3kWx/d6lkF1GjpwWbNpuQHSP1mlz5AxFbufvRkwlupK2Xyx/lXHq9fX5efWrDaktY/PGdhLG5+6e", + "mAvbcEA8OsQL4nSZ3tFhv2XrzHI+VtlNqca/tW15XQqm5k7e7K4gr9DiJkV502pXFhFmsUlnfiH0S2n1", + "suOMElWQv3PzJu/Guj051T7cNqPSV7/KS6zn25sH8c+1ffmVyPgPWhX5OW7M+hhjbuzIJIquSykfJplC", + "29J/UBbz8S5MYONdnzObzPjuFOLm/ta9s04cKxtQv7m1BVVDdm0NfX7d9pgV8nJEb7QspOYTX/ttMwuV", + "3DizfSZucFHe4TuvhW27Izc4OfSJb5gb3f8uvrrKLKqPNbhk2Jows359L7tO4YwtM8Va3BO1jV6JOn14", + "NfgDOC1uCM+FZHoJjgZKn7pUFsYcTDGeC+uU4DbR6r8+mrXGei9eHw8eP6FQbyqmTq1UE4j9S3HrFzeS", + "f+elMeKv/IZsztN6tduNtfhPdm03sYJ2DWD3673izuOWJ86oDI/0QWmQzvwUEyhk6n8f3tgf1pDKm2Sw", + "W9oFZzqZdcrgdWH6eKsw/aXgusXddVGMacJAPCYFNmVCGgtxOeN4eEPTgsbatri7ksArtPAFJTD5AI6T", + "UtSuxOFDLIHREyGWAQmTYDgnixGtDbWQ7gzCA8JEsnRjoOnRh4nKMrXgqbO2nTm4BKWn7tPcGDHOOAYe", + "0PQ+UHpqvJVSuhIeGGBpOqDwTaYW6HFAm5MczgwueMYTW1qolCyQKyOs0kvIRXLJtbPd0UzNuWZWaVxK", + "qp15jcETBibniZiIJJJuej4A6FiO5tkSI8vk42CTicgERmHNgE2nmk/Rm+OMH7KeVjyXV8wyHSJx61fa", + "WWbrR+APAH+FPdxqJxjcpXbTM1kx3W8P03mh2fxc1HOfiHrohgiHhU7upxD1lJ76n5SeMikMra4Z8HEf", + "6PXds9sNPFqUf6qbAM95rtqor3Z6V4JohI6I4opnxx+Ga9vsOcqoipCuuz3cdx8Y8I8CPfq06bmCXPPB", + "RGQZuTXomUgKmRc2uIaEaTpqkcYMsBBjVHP8KRPGdoQ0VqzUtd/RX97uhaNQbXCCrL44s/Osk9Z83LUt", + "kLLK9Mvx+6s7W32mNlr3GX8IXqjfeBCv25yt+Snr5/CcGzvgk4nS1vsB8bzh7PxRCO0uZsyiA8xRAzn2", + "wPsOzdNIYhzFsRfODDeQq7xwfyICq7smvTvT+ye5THMlpI1k8FTWfGroFbuZp7At/uWP3q+9Ee/actSb", + "wzg4vd0lZJ2EbhHJ8aNukoivOcvsRtnOTJtH6YJbcn0iQ4gNRoEwyyMu5Aw/umzXVenRuhtNXfb6vfKt", + "7VzWf6FtOZin9pxPxQZ/SJFl7fk0jQQh5HPuysBC5NxQrqCjSp9TCW4W3AtfJ15LfcA4+d7wqtW5zcYp", + "d55CIbvi3CQbUP/2cjdNBXn7zpokuJEX996ynDi40yLBaZHwj//8LwjWiJqAjzlly4GXR173H8LLeW6X", + "kSxlQ9iiGTMgkRGMOZeA6RY8hT2lIXbHcESpQQtm0KfL0/2G2Ah7tKro0WasLr2THE6YTHjWvbkJ/p61", + "B9pX427ls53DOe3WbPTG3UxTDkYqOjKuT+m1J4frXKEikpuo/uVu0sy2LatzE521bUZJFfDfbN3gaCOn", + "DOc3eN4nM/J09DkWwsqY/dVJd42yYU+kMLMNLvmMO+HhLlPzzLfGsHY8S8/ZRzTvVJhEXXG9fT/baWDr", + "Ou/28Mtt3v7CusxwlwV3d2dxsT7sGgF0bsCZVlPNjXl51eoVei+5s/CkDfHgdy/+ePH+HRirOZsDJ1+Q", + "U23is/cXH+AAOeEBzidGCUrWW1CVuEwNxMdIqEdQT3+9Hsj0L0bJmEzRGEeNKckzko4AtJgLySwnNf6K", + "acGkfQrKzrj2SbeU0em1rhSYQV3siknbZseNmU1mo+ArWj8b2sNNv9UJY/0ZPh/zdET3olRhhbTff9dr", + "IwUejiBQAno90C1YXuERjlv9E4dIq3+nCn2G9Bu6wPu9GWfajjm6MGnJ/il64OeWuzdhTTWslp2Fn8ZT", + "7g5pNdnfDVje+qNzbsyN3X8blAprdrVOViOceDpb79GpnLQYwOFXyEnkEY2zxIorPvBaVaDoEFv3nhUk", + "7Kd0i2bCKQYiYdlgwrJszJLL8i1UWcOr8coOx/1I+r/hXsd9TE+Jm1Qct12Sm3JAnrHcnanhiZLpym6r", + "whlsHTGQm7D5z+C0teXvEIqaMdMIiGuecHHlCKO/kUNvIL5P22inWwzl/oltWtU6KTZETJMoY5FmvMwi", + "DlSISu0BlPRVVMU4QyzAonKZkPdMLxElu9/LvYkP4lKljA/iCRP0H7qQsnzfGfoDXUigOZKaTmOMdCFN", + "3HRYuQljkgHNoXEU/ZoG6xiYoP/ww93K9PqjGrekL1rL5zmxk820VM7xVt6Mz3GvpDwtch6SS7cOsckb", + "w0M8detX5ux6tPvm5FWgqt28a0sfO2cLQDXEv020OGM5h5TnqGMoCbEbLR7COZcp18DMQBgQXiEpvYNP", + "IVXygQVmTDHnQDnMheat9hrVu6RFdsOD8EL8VgSwrim6ldI1CFTevBD+Evy8wae8Q+I0PtKvdM3ybFeO", + "emVvtnqY/qjGmz1Lf1Hj3e1Jd0dv4U/CsTZ5k94IefnSO0q25aF5J0FHFNRJ/kffD2b8GmKnweBjMZbZ", + "VI6E4Gfwudtg2JxTdl0kC8NTp44fsFwcXD0qS2EPfnWf+9RI/tPcqOyKY/KfVV4XZ1JJp0RAXKbBUZWX", + "kIZra4BBJpyWCYuQnDXSPFeR1GpBnw/zw9S0UHQ6xugNEcqzB34pD/DLM3bFI8mgHLCsTFB6DlGvtXDp", + "f3pXfc+nsLHr0l78vhkF/H4bd28cSev5qk2etRtk++2cXN2RQbcx0dnPsuvGYEBny0X5aFrIH19sG/At", + "35DAX9jZaM7tTLWkNH7gId5SxWEWM462mlVgCj1hCYeol6mpKmzUgz2vTOyD0pGciRQrE/Z8zr6PLlbF", + "DA8MSGVn6JpUkKkpqMKCmuw3VQb/Uce3fOlCGzu83cb1G1vRuo0q5VlHhlRb/d3rVxR4On3hU6LTKgaV", + "sMTtqtA8wXgaBhGproeuztwN1h5KDBbTSnhubFRWWO8btRRWHU6nxYRcpkpCKsxlu89b/JWPxkvL263c", + "G7hqUNj4rITaVzu30wmPNpdnMuOjVOh2xnty+qfRDz98fDU6OT55/XL04vSckpoXzIBJmJQ89U5fjP1Q", + "eFAqOcByDSi/Ds+czlztkaEi6dYtwvPYXZLVaGVbYMd/uV9bddt2VelON03L2px69ZvLlKoWEybXth1e", + "fLdthlZzNmq/JOckSVPAp/h8MFWQqCzjiXugdh8pHi9MkJFDePfxzRuKJhG2wDwvdsvb6Ycp3eCWdXyy", + "YY9Iy4TkumOlZ44LCIlFLMhwwvOwpyaWS+C/FCxzfKIC6GjPBvsMk6WRw9/BpvDCLY3lc+JYPo0jJII8", + "MDBnyUxIPmxPx0e9ZOSuNtYit9RTv0S3GqYiuQdApFxaMRHOjEBTN2QLVMeMLMSZSpHc03zfj+IPX0nQ", + "lOLCLKYjuD2AVIuJBatZcumG8qItkpXEtG4HDX2DGYh6H+WlVAsZ9UAzkqUzJt1P+C0SfTsUq1LO1w09", + "92gHht27jSVTar27wsGsoMGQuUeVdB/P39ROZ3gjwJZ+z3DrFPqtPNmzjIvwuHv1l0xYvo1ZXPz7G+FO", + "mlk2ZsZL2JCx40kJSawilPL0PbmQecqvc2U4piGwqdOzJ2onBuKneacMxGn4O28ZPtse6CiDUzWHjqev", + "jd7pIk9vyFdacj1DXmfFcNY4Y/2m1GglbEDNJm4UD9emt35pNgikzdZwsPJ21iOCmLujJMRy/E1W8uo9", + "WTekrpOsSPHauEt6Qw40Z9cjCorcPL93beTVz21aTyD4Fc3dH2sZ+d7s6qKAZhVU2uXpG32alChzw42p", + "D9RfWdPKpFcH2rRlxXzO2sydTUV6ny2afjsSRfOES1s/ip0u6wU+36H115lnS5g8HwXlU9wgAaMs2eni", + "D789Su1i2yUbrrPrJllvJOP1TVw7xw2UXhYwdtj3VfrquipqRuRo6wLCyFUrPE/TLWeGwnvASychlD5C", + "fKBMMDQ75q1VnuUkU7ImmVeF9AYXczXJjrhC9cBuZl/jg2uvV9tVQtDUKKPa6V1Ost21UA54Yxm8QiLb", + "zPraQG2zPecTrrlMeHu9SdOkr+KX/qXWs+qsCjrOFmzpwSp8xI+X1LS10rbuPmj/brAk48qmj2FP80lA", + "yvB5t5Tv2UcDWjM55aZucu5cUbuxmuhOfRT1GpvtlV1Nz0VtnC0FOiUpfGaN7Hr1zpMvXwpbW8Rd1eE0", + "r8gXLMM5J+Stl/KXghc83bQiKpTYtIoGml47o+Z+oF6/56Mwo7uIeJ8T3N9xLYtn7UCQqNvB49gvBYfT", + "F09hUiAA4RXXRihpYM6WwZLNuR4EVMGQ3oBVbd57JtqcqesnEmbRuopCOnPwBME824I63snT5QWqeeGV", + "BtblfsJClsoV3F7GmLE5GzUTzkrSe9R2zeiNxF7f6Hk5mubFKGNLj73bXNDgETwDlmVAD8DeW25ZdnDy", + "8cXxfh8O4RmcnH0kZLjepjHszJFaywDuExm3gA8OfAwS8emoWnHY28ZdnE1WHUyiJOVmJ8vtO6B5ouZz", + "LlMi2I38oU4Z57X33CVD4NNN+ebh8qVj5OdXvebYP2+rDuqdcT3AvC6PkxaAd9RKwCxhErTPXoCo9+J5", + "1IODSEa9l/LK/SdEvdrkox7kIstAUtUIcJbMAmLYj3xpqKySXIy1pEkMIJkjiFfuQ9yHuEmEcR+Gw468", + "iKZPpq3iYMZB07aPgisFtFqUflNYaGEtl1WZa4W9yeXVQW2LMc1TSOCTiSeqz3NEhkmPl22TViCMKTiB", + "seAMzz5+6EPCcsfUahE578erVUfcrB53lRGtXf7W271+HTfdnhYWVJL6Vt553rxZW9noTuxvF5a3K5vb", + "iVXdkNls8yd9nUPbelYfkabbNO4sJEkrj5kzhAsuU2DeZrQKDLfOYsxYQqEfdcW1FimmaUQS3dH4jb6H", + "2I16US+GPV+2TZ/fd/c3PoxhTxZzrkVS/t2qSJ68eXl83vz2HjIstxuYdWgQVIuQhq/gAGr3fn8Yyfc+", + "5dyvxaPBcqFDFU8dhGorpW6PlbRQ7nYP+Tol7/rOKmXv/l6N0re/tJHyt73elsh6wedMWpFsgQvwXti2", + "4suMJZcYc3dmZqpVDl7hhsVMhdCJRx8BJiuIUw0mIAcMbwSn9rlxsFb8oNWas2tERqQIOKgJvDp98xKm", + "WhW5gT2MA6Mzat9DcRZa7qAcCVkBy7RDMSbKCMnBiLnImBZ2OQR3YzDm5PWxUCy9dzj8ji72iUr5OZOX", + "GPYc/Psf9j1ncLeYX+eZSITNEDTUYwJT6lamlAcN3Z4AUFYKrUpZYXl56niZPezc/R59CUFxN1gSq9Tf", + "ZfXhF0aoA83bdoNl2SDJVHIJ+CTisspk2QetClR8rIJHkPJEzFkGyKeb2k9ngv3nIFnUUY7uyYbur2xJ", + "++ZSCtedVJzz61xobu6iSl2YkRc5HdCvIc4bsuQTpvWSamkRvRx55LAVFR7DhoZzeaOJVm/dBLYaX9gJ", + "trotY6sR+qzt7soaGtu14ZQ3B0H9Tt4gruJp5xapweWYm/w+FyLlCdMXpYNmNVI4mmRiOrObUk3QkwMp", + "z+0MGCG0zNXcaTRqAobN88yzuc1Copmu356N1l5O1ebMOQyJS5DMRIaJyZiHKQywzFk9exQCgIOyx8T+", + "9jmip6or6OF9Ot1bhpdrzO2CcwlUaea2iAoQDZ3EQfAt+e4VOVtI8MnxHSXs5F9res3LvHr8mE+257V/", + "lIUmG9DynaHp7d6yJuoGPJNmFTatXyOmVkrcglpNxuooRHNGIcu+ucUnZUW834JGVQ4mM20/ZZaLkXf9", + "NdssXT1q415O7eeyhQaf0w/1LCyPYz9VcXvq2XbfXjGdumW9cqZOSPIKn2ULp5Y4SjpYVY1Gh4Mffvj4", + "qmNYJ6+NrS96pXMJ/h4AQsoGP/552FsIO1MF3f2Yfjy4ir26048kTe9w+GT4KN4fwrsiy8AZfxkpZBjq", + "NgVmSk+KDHKVZYHoudkxOww3Y5Qp5q39NV+OzzvjTS+fv3foMdUWIZxTwkEWEp4cHsLcTeAVy0wNEz28", + "JAyEO+WUuhkzkGhmZjztaptST4dZ4Q6OSZetP6paCrMLW8Jzae+x4vGb/DOQsynlCjgpHzcPPvZHg+ss", + "KGVvt2wt3Ms6/ewGMtKdJTVqtNLahE/vBx0kM55clt2w3FFgajOwiiIfxpEM+6DKbipuaAQ5l3wRMhO9", + "10+GBlvYZeAVuh2FAeVteXetsVUBTqY5EexZYGHhuGEk9wy3gO2K/uPl+cXp+3ejk9cvT34sOxYh4klc", + "t1TcHRByut9FSH60EY62TZ34D3r4xD3rZX1nTX5gZ2un2mSMKxeu3+JYqqVetXLvVjFQSyi5M+DcTdHT", + "3cKiG+CBN8U5aTWbUBb/G2f5M3GWaWu3uG3cMLf2ktwAGPKOrPLG0u4qtrxGi18wvEzO3W31h5+d1fyp", + "c8iNWPhprbldJ9ZUiayXMFl1VQBWaxA5hHM+wcwl7272YRfqOJd6mHuUEU4MWIWf7mLsAdK+OaN3fIHt", + "Jkvj3M2pMTC0j+tBoPzAMQHjr9Sn74Ka37HBbTD4K1qwsrM6zEnw4qPnDCWpE7LMB/Uyzq48UkrA9wod", + "SwpJvQTSIZwx6l3IpE/xCZ58p9PUho8hyTjTJpLCDiF2zCem2s8xr5VH4Q6FhnptHvhtIIm7QvOvb6Iv", + "4Lu9g6ibmCt1qezBNmI29u0AhxB6PPpOP9QZK5J1Ckdg1LLJw/vz0NCyi4Zr43x+ucVNWlW0y0F30iMi", + "mkYXi+5Lt/ONuINsfkFaEy0JJ9Ax4015+i3me5O6tqEgdiLYIDXsLF3cUD8JOyvrJjZmCtK3N0qMxvec", + "/Z9l7ye9oz/vUuza73AeBJfYqANt+AQhhtWEWAP6BNPgBTVVgRyy3p28CJd8udtgvnFbuFcGS2IRn+oG", + "I6IHDQFvW9MZ3io04RMCtPKuXUdY7j8cxRrL5jnsnb86+fbbb//NWR7vPGp3KQQr/MBMTaeYq7sSSfn8", + "dnfth7S2kevU8vOnfq/FwmnJMOXJZUeuxxsnONHvUNuJjx9O+nD+6gRoP8g49viSlePCvfX5uRxeWG/2", + "WORcC5WKJJisOFFhgona7hUsfaktK8XfwINU9cMRz2sUgkOQl9AvXMngo/mMVBHZzaV+oqajm5L2ynzy", + "HdwKG1IL+z1BDbycgL9trqGf9qmcqI1YA2pU+XS2+TKCT6p0hWVLSkOruYVqLWEjGZDm3R89bTgWsldv", + "VYztkCFcdg7fIOT2jKWRXGkWuz8E1HwJf7WJUo5aU5gGUM/VrFMBaDaSbXFjvj0+8S1jh/DBzQuYU0ek", + "EZiT5HRDrSyi5ilJ6Yya0wKGm5r0trrAXjny/Xj+xmnA1Le21qb2gSk77FL79FBI7+xSbEoaFFA8ppPT", + "P43OPj5/c3oywoobA4V0yrWbca4RxQcb34ItpOSZz5TbpUdlfQlrO9hfI6VWmgza+B3plJs6WL1qbVzl", + "987r47hjbGy4tMNuvW2loceb0xeDTFw62sOE3GZRU6fuvdqMXrh3y5IEzBZsff9u9LgSobyaRLfatvHo", + "Plth+4w6j4peblDbsUlta6Z+79BBvaU/OssEI21rpSl5vSs5+EqMXHODCo2KpMrSsqM6pmS4SWN7QLne", + "a913RYxkZ7/1LW3P7wQ2YFNz9JMS7+jz2qR3h/+r4q1uYSQoy1UQqtJ0Zhfc/S9gBQ2B4bAQoohkgFIq", + "1VbmU3WbJV/IJVmaYpMK3+M76h2nKQSMKvDOnaiHLTaG8IYQnRCGYMausLNlkinpJBriq172sTkwHSLh", + "Q0pFhTdVR3FchlQDlcOek2SRHHOwWkynXDdBasKe+3YiWi06XfA3BJi7MxwCSlCYsRax+vL4BVy8PgZm", + "yRVUC6rhpu0UzQmgU+0YAadNqK8HpgaO5Y53B1Csj8a3FijbBpRY7h3gYAcPq1rASO7hYM+8/t1WI7vf", + "0e3ifkoGkdhaa0q8ToYPQOmpw6P4ptQ6WJaZeuQNrwVeF2EgL8aZSHaMgd5cllWKhmN/bUUxJTiJwKov", + "XMBMLUIyUK5gypHh5hYmmpvZEGLHpeNIMnNp6notpgx6DbKmsfoZPIWY1Mo4knPOpGkqoKi40d8+nr/5", + "hvSiuiKHnte54dkVN08rX1cM5lLkhvRos5RJFbZDLZM4RCQ9Q6qVMYwLaxVl8jf9pTRNr5Nty6HYUk/a", + "pkysVIzWhEStRrRxS+s4h3UZu3K8de7/meqJE+ydTeQ/qzrsRrZCrYW8kknTrME/XAkWSf/JAcKNPHwI", + "juEwop2od4GfcgKG4rlyzRCI5MfTp5DyTFxxxCVmGguUBW6RE29uToOL18fYiCH0VRmr1BHKy1WPdFYT", + "YXuVrOriURtNmTNkB9lyoDlLZhgpxvyhsra1fsfKpXlE81SFjhiB9xhui7yy5uBUUp4zPbSCoihSMHw6", + "rxpMfPYi10JIuer1NxtAW4lys96M9eQ314sDjX5uEh4Nu5PGTPG+k1khLze2P7wh2NktCotX9YC2hkRt", + "Od3nbLGez10mvjnKoixhp6qhFMGUcseNKXhG1jblmePVM0rTvcMm/TnXgwBeNVbKWOz1xnwj9D2nlfIr", + "oQonna4EujnnRWaFb6BOLY0iGVaHi+jDYiaSGRYBOt0S5VqqCLjfI5pToDOSmJI+U9riMhx7QGs11yot", + "EjtAXEOWaCWXc7O/a575nRZxr4iGXWq6yx6dntB2INVXqBC1G3pbSaezixFubMIsn1agdBwdkYUmv1B8", + "hXgXWGFP6jfCxDv9RhU27gO3ieNkbh0YacyWmIqDOT1s0fR9ouGIFo1kGVD7XOOxqzLOLp8CpfPUtJhM", + "TYmC4rqcjqu5OkUOB9mF962cld+XHbb/jBTXz9z/LnB1n0CJth/1UxIS4lXgDt9cagjHcolYpqEyE5On", + "Yq80x1CpcXihZsy4fXU0psW4sFgY4O48apLEupua1hbtvFMn37zJm7y6K5vchbk4nj9+0lnTwpkMOMFW", + "5YN3SHbP3z5+AviGoaYkNQy+PSOmMpKTTOR5MEcJgu2BATfU3j4YBdiLUFxxeOa4qeXasZcLgqRN28tw", + "jVPVo16lrHsE2zSSSkImLNeIZOHsE6fIZCyPenBlhhD1cnfjjC8RqbHyAGy8naulXBp+s21aFRyi2q6S", + "Zw/hg5pSdAQtyLg6jZjc1nah8GuYmZSZUAPIMWJlFcQN7h/vuh6sd+5iWrNizqTTy1JUy6g2eg0Gt06K", + "D0wkyUwntcrn8Ua9mtk8Z0JGPez8FfVqf+nSHmUxL5tKr1xtEqh+JjXqw70xhb7Cmfp406D07ZB0TliO", + "AnvOCOAYt9E9O83UmGVBXK91mqxj5m9jSo1DaTEElmMtkKxTkbJk6ejiz4f9Rz+XcON//9tgnHGJzVLd", + "GlDPiORcyMGcXYN0B5yJv/KUbqNbD5JooBPY+/vfnh0On+yTDe7nM/DNeBIOU8cRNXMrddrIBzQoPqi8", + "TNaNepHMmcSqeG1NGcGtFXduI7PNvItIcHWvaufer/Om5hXcgeFtayl1cxW6rte2aNLEwkelfr4KSJmr", + "spqupveRCgAGvw1ln18mSfBGMi30asWSv12J0rrIbR0E1wNF7+OBRlL7Dt8NG4hyvUWWQdVp92mJB47j", + "GA9UX7osLqXvD+x1R8zCwmYOIbl9pT/mDTa0pn21AcuR3Ny8rXjvxzxThEXeXO64sLDgmjuBjdfIMbVI", + "Ln2oy7tgNSH/V5Ldd3HAjoJlQprvtRBJOmzaZ6HL0sgFbmuec6bJiLcz7vRtlswi6TtZPgtKRbA9xaTU", + "y8lThh7oz9/Quj7VtqMbEver64+8wTcWWxEx4AW18aYGHo3TPVc2vrKPhAkReeoFBlb1cb/d6XJpI6km", + "/mNCpuJKpEXFiN1EYCamM0fMxKOz2+xONyqnsSzjo4k1O1Cbo6iyWUQtkwLZ8VzQ3d0L/Z0m1sT7CLSB", + "MYMjpIsHmlcEiWEEZHGR9Mxg7OtQTM604TBj2SRc5hkJEOGBwbzRF0nHClhuvNuDZVOlhZ3NMVxcaD4g", + "GTFhcqAKG/R8NyR3iiw3Q/hAoQRsvEw3QkkCMrEKcQEmjsTd1199uIgQozrQMRI8UXJFBEFdpjad/ptO", + "aQtIQxxjWnRYn3+qF+7oXn246CL6zhZM6jL23lOqd6bO6kOIcWPpt2o1ZCenMHGa3biwkQwtEDA90ysd", + "cVkmHvsabYhzpq1g2cjbfzE5CbCqj6g8cH536gxNNlOz4FEY8BRbLJBEIA4djnLPcA5xXQTFK30Y0IeK", + "i0JAzcZsbtDosFYK4eXoDrK4cTo3NfE2aRE/tzV0NTwpnHZ/4UjFGznucPRx0YpqTplqvk0sMAOxe1Bp", + "8VfMKDuC5/g2RMXh4bfJyemfRsdnp6MfX/4f+Ace+8Y7cwyq4aOVKjSzNu99+oSo52196V5/+HCGiS7B", + "5o4Tce0ruuLKZElUygd0HVPG50oOI0k9/RdCUw8XhgJ5vLR84HFUWaKVMSslboZ6s8e1Cpg4kqUDOQ5h", + "q9B12WIb9hqvzpa+SVizqCb0mmTIJBZMp2ZA2gGz2G2eckEhYzI1OPt/+Rc4rjJzhZK4pIWCnGmWZTzD", + "1HRMXgmIJKEPDyVS2CXVDR65Fwfw8OFzrRaY+XtQ2Y4PHx6Fzph+Ze6rB5hcGJPRhZmr8E0kocoMRuAh", + "49Sw19bm77G1n1KXgg4opPb5Vpn+F0xDdsIMMEQzZ25hWbakfLOxccqbtLiCgYed8gqdGcJFSM7UKsvc", + "JyZKu12ER99Bypam1gKfoSSl2nVa+MmbUziAixc/4mo3Ua9PQfSU687Myy13AxbMuJE96FLVUjRsXC4G", + "l3xpYo9mhTnwzr4bYE/zFK0QZ6qPuftMyAStJHpG5YiOXzEEjsJqQ7R+fU4DEgZFBqowKXrhkRYCH9g/", + "gviHlx/ggPp6x33/z1QlBl1o+C+Vc8lyMVyyeVY+UieCsVLWWM3ygad292oXrbgjogx/rH47/vjh9ejF", + "6QVVvVGzaQrPocFFvjbfzW9ZBTX2Un7FM5VTVbn0oTsGC6YxSi+Mz0fdx634aTWfzjJniyHZliUDlO1O", + "d17YsEkmkjjR5+/ff7j4cH58Njp+8fb03ejl2+PTNzF8A62/nh1fXPz0/vxFTGBETlBX+aFU/rE3UToh", + "JAx/p8tbo6R/ErdsfwjHkGEqjJ+L55sxmg9KAqM4a4XO70yKeU5OcqcsgRFy6rT1mMurQXlecUgvrmcX", + "Mz/BwFy8ouc0Ls2xygKJy/+13nkLTVoT2myAybDrN32SighgzCF0esUOdh/P3wRfh0HZL7NlHz1b3tL2", + "V6IiYssuOTCIKQ0gho/nb5yBrdmcUxSZGLjb7YcP2/MB4hXQ7Pjhw2EkT6idiTt68iEFJ3CZejB8zczs", + "zC017M0Fdg9GgvM+SPdDk/ZXExcajYVnSqpC03R9N+EYZpylXB85BRYtkC19hsEsBJlN3rBEex3zcSIp", + "+SIT0mmsWMfG09Dx2O3DeuPkGEgBMH1/OSIZl213Y99Bme7io0PwMABDeF9LtCLnEaesG5p4JGlJ1Fa+", + "vghcwD5MOanoROWeWgfYGbHmBsYtf+k0OOP+cRy86uUzmBdfibexSpfUTfEI4l+jHjnzo94RRD1i497n", + "T2w86n1yB9vgiIGUCLzv2i1GKFm6lwpJzy3L7rUV1GK2jGTZxvbXyDvyafThcOhHcyqOsAgmUGks7lr2", + "yopewhn41O95Rtw76n07PBx+26uBEZWM1t3cg6pR07Qtfv4Tyy4N8a1mC6nYg0w4Fdqg0uzsmSXkXNdx", + "QeGjcQwNuUXNvfzAQFnyOyAkglwkl47dKmIpIY6CaVvo6F/mHCp0UvSOzZhcaV0VmDc1DBM1FOd6i5km", + "5BqyRD6w/NoS6pKQeeGbfyM7CoEF8s0IJU/T3lHvjTD2bWhGVZKV28LHh4crcddVOsaKRTSrduqMhdjc", + "qNKuZvklM556KIcMH+r3vjt81PXRcpYHH7FEy6ksBAD63eG32196pfRYpCmXpPOHbge4E+DIw8+EmoMl", + "NDnvH4M9EmXuduw7SmZTUxVL/ew+2CRMj5s5SEok21YCPfcU6PkZdaDx7/qcbNh78RxjUv/4z/9CfD33", + "/3WEPdIfargMSUAC8V9A6EufLt+HPCsMIgMgkmQMc5aTwz5Dpo6WO2r3D0zAOt2EckqJJYRzCiXMaSQ3", + "45wiX605hpu0+QO3TSDge6TQ5kAtVPqSFM8rvnIuX4dYzzlLPYjq+pS2UWm/lxetRIiwZKYT8HUIrzwM", + "ZUByDKaFtyoiid4Mj+pYwUQ+Q17VjQ7prhfSxA/cOv31heIG3r3/AAEKqI44EkRRRYbB5gLDnV5keSS9", + "QoJ3cA1XaGLRT1XLDD37+KGNAM+KFgLElT5XhIJ097TnUUA/Nd0Xzk749DXJn6aVfmmi7/e+e/x4l2E8", + "uhXmhjavygVbvyCBNM2NGfoKMaEbSrUVfb/QjtFSau8KUNjet4cGfLLGfh8s15RlQ097tu1MwRpsV78O", + "h1VLG848LlFjfcNIBony+PAxiPmcp4JZni2fUlUbWbSNBfk+7laBGqNSRgZcgIAhaVNC/eA//U9WMyxa", + "UnIIp3JACFc1+2Ac4B9XkdHChcTgx4SJjJb1UuuLIuf6Shil3bIjGQrWNR+kWlxxCV4XKwNNe3EirkvX", + "Mym7IX5LPov9thvu4fI9CNy6gHl8dzdsBZi/5Y6dBwZVPvPFbtkTeuNOVooGS6sIDffAlIhzjiicdR5c", + "6bj+qlRhTepV4qAVwupzb3Pl5ffq2Zoi0oQJvEdO3ByoZRfpFzCS5WamvpKy7GdZggp67nHT/S8r7Fu3", + "3WnkH32d/L3t9xpCQJvwM1x/bcvEKVDksNuu3bUKpA8z7h1xhlv0UMdCCozmBD8cmcNmxjT1nVeFHajJ", + "YOwMVIoaSL6gMnhhYJIxLICP2yAbvGfTfQ/Z+5hj2mjT/yfsiuevjUVTqv1Hatd9H+pXNUBAcNlJ+Xp0", + "pyTYahj7IoMvqGwd/tv2N5ySmAmK191aOzuVV8Jyx+8DZX0WDzn4VaSfiOYz3gaif8JMwjCtX5VYDg9M", + "BW3hCDVATwRQIXyYPtiFadRGsC/wjZJgG0TzXYuiiI9/2VP+bvsb75R9pQqZrpwXzRbYTmeF8WLyUxsE", + "LhFuwT6NixK5qdynedf6tXuzGtP9GX2ASWvn3tqZzZXF3Jw66nkLIpVPPaGeQG1nWWFo3RPzWQfp+sKW", + "Xxfz8Qbfb5cs74D5nJAYQlSxilhSlGw34UM+vrlRkTnOxY/umbU7sZJXwrIs9kXCcyERfaFfeqjJYXbJ", + "l2uUG9LQgWeGY9gBQRn2y1fJn5xlyPaQy1HOjBuUgPjKK4mx2V79FpagUFnWlvLx8z3SJ+3bNg3tR778", + "2grafFlhF7n9dwob/kNM6CwbVBRIpltfq/uEHz7MMyak5df24UOIJ0WWjS75MgZ+zRC4FVOoPE3UAkgf", + "Gn4yM1MLU4b7GCQqX4YST4Y4sT5NvhYDisivsFQF6XGG81o6b9QLAeghXFSZCtjOxL9O9EfxPkJDjru1", + "PDrse9XzaIivpOnR4KVe107HyW3VvlvrZMYUQSXzJN1Oui08cKsi5kgSGYyPHlypSx4cxgvp9a9j6QV0", + "7Rkml5G85EunnV2pS5/0kHM9Z25xpV/Y1+0sTbgPlOAwZ/qSp5GkULfPMUEUQB/WYEUqLFjNBAJLIayG", + "vuJpn/L1aok4PjEGM0t8Ym/NI0e1e5U767vDR+2eJzeDkuDvQ1HarnvSJH4vuud5IITdqbItW2drFC7+", + "NepJzlMzKl+NekcIWfoprqKzjfQZH6Nd47kUHkNzm1/nGZPMKr0Ek2jOZSM6C3tRj5lL3+Uv+DVRm80z", + "RRlQ0JZ68xADKlcMR0kpeZ9pG/X2sViUNXLlylSojoDb87Di+/d0rQy1SbyXj3pHUyNds3f055/rZFKH", + "jasOAg+UfA0DXUgojxb2CGCiIZ4LO2uhJHJbDOpAmu2y+z+4FhPMg/De/MrF0gfCGUBDJZZ8Uf8pYLW2", + "ulTiEANwtyDogpQFF8AMMctbmEiSdWarHMNaB5uQUlmuoywPEQRiEUlsELI/hDIQZ1WRzCr9hnitMhxz", + "+doS9lplPA57VoF63ouUbwxyIznfwiDDd/yhfUWh7G2VWoQo+DBqKKlb6Bd9bN1U+76EN+yTjzC+4HZw", + "ggR0BLX01WcUXxEphVaelrmuTyN5web8Qlj+7AKbCjyFM2Znzw5iJ7YrhRbpM2fLTLHUpyJ0UT1ZY5hO", + "38SkrmXCKJ1w/MQqZXs+6+ssmAwXhmHZU2tCDO7R/dAmfvsr2fl+7G4e+ybAmfb6PcpewzlUJNBS9Rkw", + "VInH7AUy6MMKFez3Nqkqn770peoQHC+vvV/aJ3ZX+akThQkDK8vdWW5kaqqKTbFi1JXrQEEDI9Kq/ZhT", + "aR3rF9JYXSSWnhxT1jrmlVHeRSPFHAuDOm/wU3jLrgfHU/7sMO64Bm7Ku/DIQAVlH+bPOMsGq3sp0waf", + "83Pevs+EDLE1wwqZD7OWUru8Q7jZCriJ8F1iwnRwqLXEKCp7XARFREUSy2YnhcY/SHYlpqSOjflMoOnd", + "zrk6tLS3/F6z9fgmPnFSkz53cdrhe3W0dUJi337g9a5yG4+dlKUWUOtQGEeusT52wDF2gHoipQRHMq73", + "w8Pm2rVufV4ri+sN+UqKCG0TImlyZaGQEzYXmWCawl2GykDiqsGel3bOWDX1DoSUWbvegrAro3N5UfW+", + "u79QdUvjv7aAtd/pW/jnGgRz3LippjzBOl3uTDkt/oq2aE65oV/NVL8LLns789uxZSx5nsB8WW0/Indg", + "fWGouoCUX4mEbxaMBOk2QNyN7fd3zi1LmWUoiekyI/pN6oE7nHroNr8PWD9h+hWIuRlG8ix4T0PJhjNb", + "3r38j5fntfpJj1MQKi+eVnnw7luRLF2wWLAVMEfEehVCoxqisc6u+/oDPvSB9uIeb2xtnG23Fh+6nU/9", + "yS7eobLi04QuHG3OdX/YHsvt7PiDgb2SJlZDNE3S6vawUy6WAQbV0RI5lZ50sosR1g5Ztc834jLRy9xi", + "VxVyzBy/vBj8cPIWla6yjoaA2yi4nHNthLHGUxTWeol8xrUblj5e0lCoomissHRw1umwwp0VshnimWG9", + "JFy46xCQ+xz/WMdLj6TTdISBlE+4pjsFDDMLdWia9xTOzh/RKXi5VHi0MbpvkbziesysmGO4Qy67ffw1", + "GrxXR39tnK/k7a+vtPOGEWXfgqF/gYyNu7rKFxatcAwsVFcZ9vx14umA2YHmxm66zV0yZGvk4SyECrBL", + "+VyVXZbC6DDO1LgRyKowv4NLFkUe+my1uy1cBugTzzjSKpt1iHcmVRz7IEZyjO3BcXbuIpZ+ZP+mqb6Y", + "NtEDRh4oI2NUHScknJ0/poGEtIi6x2lSr37szkhZvXj3n5hyw3j+3RBZmaGyJjM2i4g71+o2EuoBS7CZ", + "i9nJYnW3BINmDzDiZbB7mP8Cuc+8N1hPmRTGV8WHNxEFjXMSMuuhCSRfjw3rG9FanoOa0BdYmg6wXG2S", + "qUUwbMr4hPt2qj0lOrFhcp6IiUgiGebnnW+5SC6pzxmSs8A4iKPzwvBJ4YGfMZvrwNO7uzrygS0hhso1", + "UvkcgRldHL99M8i1sjxxV1jpaYg8ezgdwgs5cD8c/Ipm1ycaYL/EQnCbVElV392nasNGQv/pio3vB6Ei", + "Yf8kXeXxEkTaperh/TsOh39LXW+1x1NFUjthoRBH8JO5Dcgrq9bThfO6Lv+OqdpRTQIlG9h7RCbzN3A4", + "HL7Dw9z/cvzHi8b7TYIvmxH/hQi2JBvigF9gBifY/UEqS/hSnkHescJeAzcvT/dKGET0ID5B3Pm3xpZL", + "AKcd6jQJ06nBY/ugdIoNFcbLejMpxyryAttiYzExtNQSNxmtVZCrvHBaeA0On+qLK4iH2G9uXG/K3LBH", + "vX3KJhORCdJfBpGsENzgSvAF7GFFUMV895Ez19DFauuMpOEIJo4SqY8wcmysUB649XspRDAiMOfzMdfU", + "VimSjfkaXwTvjaqZsAbikM9b59QxtRX1SadBrigNcQtbJxhaJt00+kDwM4TPWm3XyFFFjEfhnb/45wC+", + "4IyoFnk0F8jo6dNeBtlljs0y3JgtouieZQx8zB2hPDk89ORI4Vjv0dh7gr3ODaKkPTo83B/CG6YRpKtG", + "DaEdDPYDUNLDGlAAws01khORWa59h3dHgcBg7kR6kPVh/zbKvHMPBL4x/fF9aI2aMMMHQlb9sEwxDljr", + "OB2sHCgyAoQedmQy/rIx7tTvHD2QGNIVIpqgGY1ZxL5Nh1W+VZjhtl8RNlEW9QpjmVEw5thjtDPZ0r93", + "s4mee3bnm5quXKqn4Ju8UbBzIULp/4bxcd6tOZ/en6r09D5SP2+DUU/qy30D1G9SXEpN1s3/v/WU36Ge", + "Ui6zOsjfnp7yFzXenFz+R/dA+yRWrnuJx7d+0Suk8apbZAlnuhlqvH2sNZay43vUgL7+YtmJ/tHhYb83", + "Z9fUA/bJYb0x/aOWRu33mZn+RzXe5kT/oxr/ZlzoY5ZcTrXjIeDoCfZ8zeYBICgNSdW6463W8K7hdavj", + "wHZS5FkF93hvB+DH2HYIAbnrlgdxuP2lU+/4DyyttYiz1si0hokZNr38Uz1s0eZGD3C09+lC92N8Jfd5", + "CbjbfaS3TpUn7/n9yrHjEmbe54gL00CRCxEc7ABnbuOjvyWJnoeuSeShz0sKayHOFn7g8d42ueCPy5qm", + "IbzQikAPy+1BR6YzAqnVi+mD5hPTRyQmmCG2Xj+STKYVooYZwgtOSVHOSOBSFdMZucKpcX0w6eqFnhTw", + "wkRW1NMrFUPYbvd5/cLt6DpHAMqxSpf7v+UyulvTTelxDweJxURZhmcZoK4xjtlRU9dgel0IDJ37f/gF", + "Oc6XDIbc8lR+8I2cykZRS7xDnbKmqTu2jVs9Enbqtfvghprc+n2/CEhSlOlmfHMWLSQB9AbAmODuj+Qe", + "v8a8O8cn3TpNH+bseoTNnIz4K99/6i957R6POVDBr4qkERlF+0oM5JJEu2t971euNsb4SpnAG6g8oDzl", + "t6b232Tx7x3cqjNH6OWdKtHRtjO2boHpAVLHPCTn3+4mbiwmZRDrQmKzLuazbOcsBzWp0AIH3uL1tOYF", + "byT3YvphRH+I90MskByGeJ0TDynLIOWZZUM4Y8ZQvSqSdRxJq2Ah8ootUYstHyMMPGAI7tahT9u75Nou", + "LKK5Puf3l69fDVC7qvd5NesDbk+oVDmXXzI+/2W0ZVmTBH6hwpRKsk/TJtT7qoPTl+UIO9rl1Cdaf5Ts", + "iomspabnfc4lsPUF1zhI2eptBw6SMJlQc677YCE4WauwiGY9V7sq74l/jXo0k4yntbpJMQEWyXCkC2bg", + "UrhH+hBPWGY4PiGdxoLtTvCcKWXh5M0ptYj2xW1CUqraAPGzi5w6jWgELBYW0bSnDJ30BDBEvHmB7kQM", + "uURSFxIylVxintw0dHAN2kQhraBeJY8GM1Vo+PDhTScDOqFdv2+uQMNsTMynTQ9lnqbIfk/qKs2eqIvu", + "+Bob2BMpn+fKbej+Z14RBPW9rxtywWXqRCwCHjqZioarr3Izvs+UqFpNOf5dyuNhJN+SXxOeHHrw4Ryr", + "B7IMg4gPH1bg65JPlaX43cOHRwQ+vgUz3SnEmifc7Sza95+Fkh7JPUTJRlj0HFHRJK/AhJvI6R4zfX8I", + "P/nuDc4wb2CjUzV+28w9UHpLLDuSLajpNOlXbtvCHYlrXdEmoeE6nk1Udh8zndcaP7UdrWULPj1FENzO", + "8rRch9/E1n0Oe7o/hBfk9D5qAZCvhfNoMyvnOe1lO3rLKgfq99rm3xnmuycFyx/aV7GF1qkG2VLrtrid", + "vGaObnAadKz4XyUTe/cCidHfpZar4vgWliD0jnq/Rj38MeodRdShGOv2ndDsRz1iC/ibHjzCPzlGhn+Y", + "MyGHU4V/xBepgX/v6FE/6iGFo30c9Y4eH36K5PpA2IHJD9T6VWrR5L74uPUDoafFTl/oRz18fjR3/37y", + "XfucUiX5Z02oZDr4oDX4x8eHj78fHH43ePyvHx7969HjJ0eHh/9n1Ft9lfaqHBm57ijAr+L2lUOPvK85", + "6h19+92/lg97bZKnI4wfu18P3fpIuu1Ogw020AqeKqhwOdTYEKER5cGeD5DtAxWqlrycCDKSuGQDe1Vr", + "ATLaFOY9C0nFohslCOLn3lKfuF/TIUQFpLIwwUDX+3Oge1T720Fpes6Fwf6LX8l4uN/N8MZHCWqGmVk/", + "nH0ssXLHhVn6FBb3n32Iz7nVy8Gxk5VxKaV9npaHjDLFdMqNo5kFE9g9DhuieGzlWtl47VvNxayFaD+t", + "1AgU47mwq1qUgb05u4Ynh5+v+ElhZnen+bVqDDjEvUpKN8LXFZU0g+3OiUTNKR3x98szCnkp1UL+djjG", + "Ld0NJ3gkK872W3kctgAqIXNhDTcOmnZHJSrhXKSIxpN78RcqrvMZMzzuQ0xSNhUmUVdc8/SgFLgHKHDd", + "M00Bjc3YOHYzTUeeP4Ui7WBrEduTqmVqkWz21SHI+jKR17f2H+lCmhi0WgRwRkyPxHTNeEUz8BOlGazM", + "dQink3oMNJIzZtzEZsJggQTDpAFq50S7jYqLSDNeNVRqYUb3D9/UUFu25F/Q2Qa8F2f4uQXsf5Uy5jfO", + "Ai7pbaV4XBcSBSVixVa9nHXRcUNuFzfbcL+ole99eSoQid/UGi0jsFSB4XGn/Wl1LebMcpCcaW7sQHIx", + "nY1VERqOR7JeR+on/8BAMtNqzueDqap1Fx/CuW+SyzSPpJvSgNKNfLfIqu1tH2Js8RU7VVVYngnsVYS4", + "kYP354MyCziSyIj3+xD7KKF7Z5yx5JLewYZo+IyQ0/0SpEFOCzZ1z2IzLevYwJxrAr6xCjuso9dmqlWR", + "U+Vu1dB6zI2lbwJOF/0y9aa9oe+qOYokwKDMyP/Hf/5XyHL3mjrEh8PvYthLWCbGGt2oE6XhRKX8nMlL", + "PKDBv/9hn77Dr90tFe6t2LfiQFRNdLIQ7isbqysOr99d/EStg1dexPRGR/nubcyP8E9FMkYSGGG7n7nB", + "JGJME6NC4EeQ8kTMWQbYHaiN4Vz4VVPP3ntSgJqDfCUdaHUSG5heg5AwAcW35PUNo794uvA/j2nT0kek", + "D1c8sUpTuq7T05yZjKy8bu9Ecq9mmPi+k86+2WrArCq5qFa4+4F2UGlWe78Imk1uwLb0127TJ5DIHi1m", + "3zPbmujxf9iomtEzBymfYH8HD49zH2YPXYMXtYHu5+5XI3yle1+fQPedf+ulDdS3/p/wmjfTjNTAqkG1", + "YicnKX8PAVk+j3bvOEzTRrXBv38f9Oq+/VXlVH0CO9Crj1PZ2T8/ubqdGWB/1Kqg7DZctsRsMPeduoPF", + "ZyrxRXvUgD2mmxYD1bpjGT+Tvs39RCD23CWiyMWerlC9E/ifvmmvby6L+jGX6SgTksOzZ9QyHP/llWVf", + "joo7JkWec2sAZ7Hw9YNI3cCwLI9oSvOB5gzBbxCXuMjs07Jvsq87nKgsUwsocvIxlnoSbTBgt2OWUuAP", + "P5oKzRPbDjEbiL48lHvq2BgG+Er3uzZ+9/Wu7cI//61GYOSwXh93xbvxedfaZ6Dfrwi68IPck8GEX/+6", + "5lJjCjsIorDt/+z0ShsTWNx4iaoS7JGf46CUTPs3Jd4wwK/bsugv/JP3n2YcRmoLFYSffjdJSiFaoK64", + "JtwEq3InkLBWBf3DZe0Ken3N/n3k228ggVrR3m6YqOUL5BWaMdNIrCxxcvsI95CC0pHMhLzkKSXnlcU7", + "bIp4pybAYWG7GYh6VXVi1INkJnJDuEIBG9XpA+Rq/0sxz4PLvZpWyi0TGX4f/W1tjfLDJATiBeVME6yM", + "rC9vyS3pMZwi6c5sZ7X6FK9VYCS83FbQzPcSYBQjZ5Fs4nKhjyEUuxDYhXGLpJ7/Y84lrsBtYRc8gj/c", + "aqO+wLUsB+vqyv+2Wsrv6IJilWV1B3x3DSUtQ3zEWsLw/V7LHTuNeJ9wwHk8giuujVCyX2/ZX7WPnquU", + "Z/2AK+17Erb1wUUwOZbMsBRmL8b3RpliKU/j/T7Iwp0tIny1VKOSr7x8plYZEDo+l9HDv6ix6YCo/gKd", + "cre2DfGtcn3c6S6why9onw/KnS4b39YzoleY/pg3xfaCj2dKXRqPP3Twq+MjI49euB1IwD9947aJrdbl", + "GWLYl6C3CPsT++8eF3aG9uJ4mTNjHMc/bjaTESaSOdcDrRbIHbHL158Gr4vx4EJMJbOF5oPHT76Pg6t1", + "MRPJDHw770i+fnt8Mrh4ffz4yfchElfHXIVLvqywkpp894GJZOx3ckRAq/EQ3voAPU/BhAmYEC357vDR", + "0xDUj2Ts9zEug9HfHX43hPcSGOGlQpwXZhb7WgMOVrMEA0GayWRGt6/Eg8XO7pgDO+E2meEUY3c/YC/l", + "aZFzgknKnYAcF9rYSKY8E1dcC+5RgjwURZwLOY2h+jVM//HhIZnIUiH5AZ9MUEZRHU4kDbdFTpxDz+mA", + "EEoKN669BxXmAxOOyk+0l9tSXhtHdhWyalW67MOMXw+4TFTKU2/Lz9jjJ98/8/G7YVfKagvB7AQbsf4d", + "2sIBJdFthdz+PJuLpakg38lZDbuGLmALdMyXs7b8AW7qMH/mQZywuXogMVTA77LV/Q4TeeGHLnvdw56/", + "QynpUyWo8SQT05mtpwrcb3DJ0XfFO5oZP18iIemjTzpqcjsQt4cD6eh6cl4WBXgG5JlqSR6h0Ukf8O53", + "SrouvJBbmCSNLqehOQOBlA6BZPFgIVLH+2ZMI5idEWORCVvBZRMiNhjOTV0z9Bk4Trg4OwFrmDr08y+j", + "mDc08k1aTfngbwZaBuHy6pvUShKb8UzKke4V0aQc5SthmlSr3Hiwd4Rr8vtABaeT8aAjixoZ7M5YtmJ/", + "n9fwviv2otWCQLop6a7WQNkqSKhjfCT3VoC4IWMyNR6Fe/8pTAqUFGfnhiC5/YuEKdbHWqN5IYXTE/pl", + "1XXCpdVKpHAy02rO6tlSnQgkzSvyzw7fvZUSuqFCNmzU4Re+yr+37f+B1/q47HAEG22FiqGdvoC9j29O", + "XwwyccnBx13rrVeSJnveL42F23bWKZFJ2sA+7lvorIzylcIwGyk1QH4svjzF/q6kFO1TTXgEp93NBdUa", + "yPQGlTOg5t4/cbiRtqme7pnb9tn/KmwN1dQGyGe9/x+7U053d6yr1WN3Kg3X2LVoVS/RauG0Eu+SjD2e", + "JxUeVB6qSMZJpiT3LqomLCTFdPB3dF4J4/MzgqUUyVAAUTq4VNk2dhW+OldZZiIZb7wHMcaKfIsHP3er", + "GeI+K2yZFckzra5EyiHGdBZ02jkNDafJINfiyt3Nqq8ELiOSMSusGnl71rfWw0oP73PwjVTGBdX/LJ3s", + "Q7DbIleSgkVn59/CQoSuR+57g+Ayx4ypln5KcHx2WsJ+V7XroYckAyX5wMyUhdJ/Wegshm9g1Z8ZSaMQ", + "brzebWPOJHZYLoEkSan1RruQYRZisrYB7jRZZnjoZeUGIXAVo1YaofFI+tcGQk5U5SBmaRp27ts2VfU4", + "TRss5Z4k6+owX9uec3M4CVZbB+MsWc43nnSDy+u/RW4r0z7G/aocU7W+Mbsx7W0i+CAT8nK3oMuX4d5z", + "pi95ig1BKcb+DFGzAK8eVnnYSDJJUKFlJK6K3WHDf6KtGctzLk0fpKKnPD+PZKA6/GnF54eBwzIiGFxo", + "KFhQdKAzdsHd/9aC3TQ/rsGoalmDjF/xDCacgjCR3KMUlj40DOLQSgPViv3AJutR/ZKXVmVGdfhUH0wK", + "8aOJ0vNIRh6ie5io+QH2qcAT/58+dtODPTHkw9IJONFqXtKZ47qFTvh+aayX5Z9+KCEhJkn17IE/gQfx", + "EM64HtTSA6CQ4peCS24MYF9JbFOeloEyrFDCm2GE5fDx3em/f3wZSRbai0yLjGk4TlM4L7uCuBOp4WzW", + "e/+5o63G3gvZGkCEtF+Lcv0b7ij1vhCmyqLAjIwFkxbTN8Zoo9WJjZqTaDWvD3RMhVPVH55j7EcVFnLm", + "G9Z6jeIbgqmQiXLyciHaq4neCHn50g95v2CMLSP9xsXIGzzL/xYXG8WFDPdisMId70pqNOL1N/I4Vqy+", + "pmiH3CMn15SEVJhLQikuC7VrS9Bq4Vj0nAnpmQHp54MihwCH5hg9+igRKm2lk+cQSKXNMlL7Uj5oKJJN", + "gRBJ5A51NRJF0g5OylL52+6oJLXo9+ep9OpJTS/xguTm5uTtVY7+3SWL7H4DDjRHCr0XJap//+kvL6vE", + "jYnmZgYN29hdpJD94f4yhBcruRzYlot0CKvFdMq1IXmIxYCVljagMDZ+0V1bbOirJI8kaWuCYp8B+i7H", + "+4omAzBsP92evoF7v+XGPb7DqgMcz+9ZutlRRDpnGdTfq0X1QxqZ2whKjtn/vd39t5URPvAHX+k58I3X", + "uHOR80zIO5A0B3VjfKf2dBSzz5aDKhEwKPkfz9+Uqi2lO6AnYBjJMyY8rirm73hZUCJ3/+M//wt8cocJ", + "radWHAwIn7HiYdDKMssNiEmLNKTG0wSzgFJP80EwlDtSCxsE7+dz6vbl/pNr3DAbY/N+ixOldCqkW/bv", + "MQJU9yg5avkmHKXjhuSACsSBSWf/dGKugt9ovWir+BloxkqYMGkAcS4XCtzKs4xnYIrxwD0lCNEzkl6X", + "O4KUS8Nhz4NmQKKMkN7sNDnT7reLf3/jjMNXHy6ewPO3j59EEpPuPIzMxJp9sjtLFXfGcXQPWpFhy3mm", + "OUwKw9NIOuPznCfCaeAsg3MmL+FVQfCql8++P6RUxONEK1PmSmNN4d//NhhnHFEhEiZTkSICZ6I0h734", + "73+D//2/YDx//GQk0f7+BvYeDf7+t333Z1wl/j0mtvL3vz07HD7pw1jZGaVaZQbmQg7m7DqS7kGWuUuA", + "YhD3dz8gjGqeMZSQdubEtcrSSO7F1YT+8f/8f4TS8b//FxwOv4v3IcU2I+VKMCEck4VAqkiW1YaY8qwg", + "49fYztFtcsby0LHFH/MQzgrNB7ggx+jkwB126Th1z70LkCkekAD9KUynGYHNRJKNjcoKy52eb5lMeL/h", + "NUHsDSskz5bBKZ5GUminvV8xaUOfRwtSCROcLEQ5YMRcZEwLuyRPPBHMlFkOE3Ed0uHHS1+rSQ4lcGYB", + "tYrwLhS7wOYJdC4WWxgymHPmhPakyGCiGWYqhOfdhpdOGqJMqs8gvFYJ40JkNK5jVQOtxkJiEarOOLsS", + "cnoUSUewg0ekQFMamCn0lXC/Vi0sCCWeySXS9+BxH7hNhv1IJizPiWDKm2AUrilVcyHDxjnSfWDBskvf", + "TTqSJlN2CMfZgi3drK84qiVS1QMjmrsVYIAEnWYpH6tCtrsvSv5aoqHs0JXtl42May7kGy6ndlbvcba1", + "+ZrKR7X+Uq291Bqt1LZ0UtswDB15+yCP64M8PtxhlCanfZUphb0+NVusk/kQTojcxhx75mInc80j6W69", + "I4hAMb7dLYIReVxdOMST1jxzjFnJSHo+/MAAJn86NqBTbHyK3UtgJqYzrn3f8cPht/sYhCqs41nCcNKI", + "cH8c0WZKTsOHBgF4CAx3ykjCzdNIXnKeuwt4iGLVzJS2VK9t4ABYopVczkOld1kEEsk5m0phKZaEPRPc", + "A9jnxMwx17G72WcJVdR+WIf93sRxX9s76k0yxWyvdniPakd3WB4dlYncc6e9lTu1GX4QSyq/OMjO3ehe", + "r5djLVLk29+QcuBLRBkJZOof5wQAOna+hpsBtaYZZ5mdbbVDVkDr1GXU+xRX/mZfPJEw6dtveWXHCV0h", + "4ZETz4mSadVb4Mnhtx4yvPnlQtKMlgSdzJlREn8YDoflmORQe/EccsS8YCIzTqBj/Yvn+fGxpwKk0rJq", + "JexOhz3ymnbjHi8AjbCZ7nEvhQG/E3eNxnSTKZTH4Xnli+dQyNIO3d+Yjv5GXFHIBAunQu55a0VV8yu/", + "9sZO89LuCN1HHUciCmtL2Lhgcz5QWkyFxOItNUi579uNKogjFWf5uC9QEEMY7HftZlLorHfUO8ACfz+r", + "tTIZ3ABS5X0Vmpu2qe4dLWNd4lXtKic8WSYZh72T848v9htvklhff5kKy/s1BKJ+hYtAnfQoB3UFZqPW", + "IJb+vf7pDzPN+QCBOasCwFwrqxJEWQjsJIAjrn/h+OwUUpUUcy4tkmD1VqqS1uX4rn196sR9kKmpKmwf", + "cmbMQunU9xLrl/iNvll16G3tSKFlHmVXMgp5zZlkUz6nmqHwqnum5d1TYwpOsID8Sl1yaoIfugyWfQUR", + "y+/N6cHFix/dGLXv5mLgnmj5dCUdCDSutTex+zDKgkrPbZ7kMJK1wgjwdRFVN//11i3IgAk0keKhfXLE", + "zFUqJstmFTWlS1OBs6NKdGQ+rXvAyXfjNrNfppDU8p7tQg2MJU0IJVuZWJK5R4REXB7+S8GljaSPWZQw", + "nTUzyceVfTUecWa/xzURuL7LflPPuDbY6v0YM5fgA+ldzkheqzesBpsoTR5bov1a2hEucC90q8qW+2Uc", + "3T0a9mEIF9hDK5JcJnqZW54OmB2QvSgYHL+8GPxw8pastzxjTjG+Rjsq2ILAr1lis2UklUy4U4zP3l98", + "IPMVPch1a1RzbHzR2Jxm6+pPP3/6/wMAAP//", } // decodeSpec returns the embedded OpenAPI spec as raw JSON bytes, diff --git a/server/internal/httpapi/project_workspaces.go b/server/internal/httpapi/project_workspaces.go new file mode 100644 index 0000000..644d9e6 --- /dev/null +++ b/server/internal/httpapi/project_workspaces.go @@ -0,0 +1,83 @@ +package httpapi + +import ( + "database/sql" + "errors" + "net/http" + + "github.com/dvcdsys/code-index/server/internal/httpapi/openapi" + "github.com/dvcdsys/code-index/server/internal/projects" +) + +// projectWorkspaceEntryPayload is the wire shape for one membership. +// Kept JSON-tagged here rather than reusing the generated type so the +// handler can set fields by name without aligning to openapi-codegen's +// nullability quirks for the embedded enums. +type projectWorkspaceEntryPayload struct { + WorkspaceID string `json:"workspace_id"` + WorkspaceName string `json:"workspace_name"` + RepoID string `json:"repo_id"` + Branch string `json:"branch"` + Status string `json:"status"` + IsLinked bool `json:"is_linked"` +} + +// ListProjectWorkspaces — GET /api/v1/projects/{path}/workspaces. +// +// Returns every workspace that has this project attached. Used by the +// project detail page to render "Workspaces" chips linking to each +// workspace. Empty list when the project isn't part of any workspace. +// +// The workspaces feature flag is NOT consulted here: even if workspaces +// are disabled, returning an empty membership list is the right +// response — the project page should still render cleanly. +func (s *Server) ListProjectWorkspaces(w http.ResponseWriter, r *http.Request, path openapi.ProjectHash) { + hash := string(path) + + // Resolve the project first so 404 vs empty membership are clearly + // distinguishable: unknown hash → 404; known hash with zero + // memberships → 200 with workspaces=[]. + proj, err := projects.GetByHash(r.Context(), s.Deps.DB, hash) + if err != nil { + if errors.Is(err, projects.ErrNotFound) { + writeError(w, http.StatusNotFound, "project not found") + return + } + writeError(w, http.StatusInternalServerError, "could not load project") + return + } + + rows, err := s.Deps.DB.QueryContext(r.Context(), ` + SELECT w.id, w.name, wr.id, wr.branch, wr.status, wr.is_linked + FROM workspaces w + JOIN workspace_repos wr ON wr.workspace_id = w.id + WHERE wr.project_path = ? + ORDER BY w.name`, proj.HostPath) + if err != nil { + writeError(w, http.StatusInternalServerError, "could not list workspaces: "+err.Error()) + return + } + defer rows.Close() + + entries := []projectWorkspaceEntryPayload{} + for rows.Next() { + var ( + e projectWorkspaceEntryPayload + isLinked int + ) + if scanErr := rows.Scan(&e.WorkspaceID, &e.WorkspaceName, &e.RepoID, &e.Branch, &e.Status, &isLinked); scanErr != nil { + writeError(w, http.StatusInternalServerError, "could not read row: "+scanErr.Error()) + return + } + e.IsLinked = isLinked == 1 + entries = append(entries, e) + } + if err := rows.Err(); err != nil && !errors.Is(err, sql.ErrNoRows) { + writeError(w, http.StatusInternalServerError, "could not scan rows: "+err.Error()) + return + } + + writeJSON(w, http.StatusOK, map[string]any{ + "workspaces": entries, + }) +} diff --git a/server/internal/httpapi/router.go b/server/internal/httpapi/router.go index a279c29..444cf0b 100644 --- a/server/internal/httpapi/router.go +++ b/server/internal/httpapi/router.go @@ -13,13 +13,17 @@ import ( "github.com/dvcdsys/code-index/server/internal/apikeys" "github.com/dvcdsys/code-index/server/internal/embeddings" + "github.com/dvcdsys/code-index/server/internal/githubtokens" "github.com/dvcdsys/code-index/server/internal/httpapi/openapi" "github.com/dvcdsys/code-index/server/internal/indexer" + "github.com/dvcdsys/code-index/server/internal/jobs" "github.com/dvcdsys/code-index/server/internal/runtimecfg" "github.com/dvcdsys/code-index/server/internal/sessions" "github.com/dvcdsys/code-index/server/internal/users" "github.com/dvcdsys/code-index/server/internal/vectorstore" "github.com/dvcdsys/code-index/server/internal/versioncheck" + "github.com/dvcdsys/code-index/server/internal/workspacerepos" + "github.com/dvcdsys/code-index/server/internal/workspaces" "github.com/go-chi/chi/v5" "github.com/go-chi/chi/v5/middleware" ) @@ -72,6 +76,31 @@ type Deps struct { // VersionCheck polls GitHub for newer server releases. Nil = feature // off; GetStatus then omits the version-check fields entirely. VersionCheck *versioncheck.Service + + // Workspaces enables the workspaces feature endpoints. Set via + // CIX_WORKSPACES_ENABLED=true. PR1 ships CRUD over workspaces and + // github_tokens behind this flag — when disabled, the handlers return + // 503 so the dashboard can render a "feature off" state without a 404 + // confusion. Both Workspaces and GithubTokens services must be set + // together; either nil disables both endpoint groups. + WorkspacesEnabled bool + Workspaces *workspaces.Service + GithubTokens *githubtokens.Service + // PR2 additions — repo attachment + the persistent job queue. + WorkspaceRepos *workspacerepos.Service + Jobs *jobs.Service + // PublicBaseURL is the operator-set externally-reachable URL of the + // server. Used to build the webhook URL surfaced when adding a repo + // — when empty, handlers return the path-only form and rely on the + // operator to prepend their tunnel origin manually. Source: + // CIX_PUBLIC_URL. + PublicBaseURL string + // GithubAPIBaseURL overrides the GitHub REST API base for the + // per-request client constructed inside token / webhook handlers. + // Empty in production (the client defaults to https://api.github.com). + // Tests set this to an httptest server so they can assert the + // scopes / validation flow without hitting the real API. + GithubAPIBaseURL string } // NewRouter builds the chi router with middleware and the generated @@ -85,6 +114,12 @@ type Deps struct { // middleware. The generated chi-server mounts under a sub-router so the gate // stays in one place. func NewRouter(d Deps) http.Handler { + // Ensure handlers can call d.Logger.* without nil-checking everywhere. + // Tests routinely leave Logger zero — fall back to the global slog + // default which writes to stderr. + if d.Logger == nil { + d.Logger = slog.Default() + } r := chi.NewRouter() r.Use(middleware.RequestID) diff --git a/server/internal/httpapi/server.go b/server/internal/httpapi/server.go index cc6547a..00a228d 100644 --- a/server/internal/httpapi/server.go +++ b/server/internal/httpapi/server.go @@ -605,7 +605,12 @@ func (s *Server) SemanticSearch(w http.ResponseWriter, r *http.Request, path ope paths := derefStringSlice(body.Paths) excludes := derefStringSlice(body.Excludes) - minScore := float32(0.4) + // Default 0.2 — light relevance floor. Was 0.4, but that silently + // returned an empty result for abstract natural-language queries + // (e.g. "end-to-end workflow lifecycle") whose best chunks score + // in [0.25, 0.35]. 0.2 still rejects clean noise. Callers that + // want strict code-symbol matching can pass 0.4+ explicitly. + minScore := float32(0.2) if body.MinScore != nil { minScore = *body.MinScore } diff --git a/server/internal/httpapi/webhooks.go b/server/internal/httpapi/webhooks.go new file mode 100644 index 0000000..aa664a7 --- /dev/null +++ b/server/internal/httpapi/webhooks.go @@ -0,0 +1,187 @@ +package httpapi + +import ( + "crypto/hmac" + "crypto/sha256" + "encoding/hex" + "encoding/json" + "errors" + "io" + "net/http" + "strings" + + "github.com/dvcdsys/code-index/server/internal/httpapi/openapi" + "github.com/dvcdsys/code-index/server/internal/jobs" + "github.com/dvcdsys/code-index/server/internal/secrets" + "github.com/dvcdsys/code-index/server/internal/workspacejobs" + "github.com/dvcdsys/code-index/server/internal/workspacerepos" +) + +// GetWorkspaceRepoWebhookInfo — GET /workspaces/{id}/repos/{repo_id}/webhook-info. +// +// Authenticated. Returns the publicly-reachable webhook URL + the HMAC +// secret. Operators copy these into GitHub's webhook config when +// auto_webhook=false. +func (s *Server) GetWorkspaceRepoWebhookInfo(w http.ResponseWriter, r *http.Request, id, repoID string) { + if s.workspaceReposUnavailable(w) { + return + } + if !s.requireWorkspace(w, r, id) { + return + } + wr, err := s.Deps.WorkspaceRepos.GetByID(r.Context(), repoID) + if err != nil { + if errors.Is(err, workspacerepos.ErrNotFound) { + writeError(w, http.StatusNotFound, "repo not found") + return + } + writeError(w, http.StatusInternalServerError, "could not load repo") + return + } + if wr.WorkspaceID != id { + writeError(w, http.StatusNotFound, "repo not found") + return + } + writeJSON(w, http.StatusOK, map[string]any{ + "webhook_url": s.buildWebhookURL(wr.ID), + "webhook_secret": wr.WebhookSecret, + "auto_registered": wr.WebhookID != nil, + }) +} + +// pushEvent is the minimal subset of GitHub's push webhook body we care +// about. We don't bind to go-github here because we only need two fields +// — the ref and the head SHA — and pulling in the dependency for that +// would be heavyweight. +type pushEvent struct { + Ref string `json:"ref"` // "refs/heads/main" + After string `json:"after"` // post-push HEAD SHA +} + +// ReceiveGithubWebhook — POST /api/v1/webhooks/github/{repo_id}. +// +// Public endpoint (added to publicPaths in middleware.go). Authenticated +// per-row by HMAC-SHA256 over the body keyed by workspace_repos.webhook_secret. +func (s *Server) ReceiveGithubWebhook(w http.ResponseWriter, r *http.Request, repoID string, params openapi.ReceiveGithubWebhookParams) { + if !s.Deps.WorkspacesEnabled || s.Deps.WorkspaceRepos == nil || s.Deps.Jobs == nil { + writeError(w, http.StatusServiceUnavailable, "workspaces feature is disabled (set CIX_WORKSPACES_ENABLED=true and restart)") + return + } + + // Read the raw body BEFORE any JSON parsing so we can compute HMAC + // against the exact byte sequence GitHub signed. + body, err := io.ReadAll(r.Body) + if err != nil { + writeError(w, http.StatusBadRequest, "could not read body") + return + } + + wr, err := s.Deps.WorkspaceRepos.GetByID(r.Context(), repoID) + if err != nil { + // Don't leak existence — both unknown and bad-HMAC look like 404 + // to a probing attacker. + writeError(w, http.StatusNotFound, "webhook target not found") + return + } + + sigHeader := "" + if params.XHubSignature256 != nil { + sigHeader = *params.XHubSignature256 + } + if sigHeader == "" { + // Fall back to direct header read in case oapi-codegen casing + // differs from what GitHub sends. + sigHeader = r.Header.Get("X-Hub-Signature-256") + } + if !validHMAC(body, []byte(wr.WebhookSecret), sigHeader) { + s.Deps.Logger.Warn("workspaces webhook: HMAC mismatch", "repo_id", repoID) + writeError(w, http.StatusUnauthorized, "invalid signature") + return + } + + event := "" + if params.XGitHubEvent != nil { + event = *params.XGitHubEvent + } + if event == "" { + event = r.Header.Get("X-GitHub-Event") + } + + switch event { + case "ping": + // GitHub sends ping on webhook creation — return 200 so the UI + // confirms the setup is wired. + writeJSON(w, http.StatusOK, map[string]any{"status": "ping"}) + return + case "push": + // fall through + default: + // Unknown / unsupported events are ack'd quietly so GitHub stops + // retrying. We log so operators can see what arrived. + s.Deps.Logger.Info("workspaces webhook: ignored event", + "repo_id", repoID, + "event", event) + writeJSON(w, http.StatusOK, map[string]any{"status": "ignored"}) + return + } + + var p pushEvent + if jerr := json.Unmarshal(body, &p); jerr != nil { + writeError(w, http.StatusBadRequest, "invalid push payload") + return + } + + // Only react to pushes on the tracked branch. GitHub sends one delivery + // per ref; deletes have After=000…000 and we treat those as ignored. + wantRef := "refs/heads/" + wr.Branch + if p.Ref != wantRef { + writeJSON(w, http.StatusOK, map[string]any{"status": "ignored"}) + return + } + if strings.Trim(p.After, "0") == "" { + // Branch deletion → ignore (cleanup story lives in PR4+). + writeJSON(w, http.StatusOK, map[string]any{"status": "ignored"}) + return + } + + enqueued := true + if _, eerr := s.Deps.Jobs.Enqueue(r.Context(), jobs.EnqueueRequest{ + Type: workspacejobs.TypeCloneRepo, + DedupeKey: "clone:" + wr.ID, + Payload: workspacejobs.ClonePayload{RepoID: wr.ID}, + }); eerr != nil { + if errors.Is(eerr, jobs.ErrDuplicate) { + enqueued = false + } else { + s.Deps.Logger.Error("workspaces webhook: enqueue failed", "repo_id", repoID, "err", eerr) + writeError(w, http.StatusInternalServerError, "could not enqueue reindex") + return + } + } + status := "enqueued" + if !enqueued { + status = "already_running" + } + writeJSON(w, http.StatusAccepted, map[string]any{"status": status, "repo_id": wr.ID}) +} + +// validHMAC returns true when the given header matches HMAC-SHA256(body, secret). +// Header format is "sha256=" per GitHub's spec. Constant-time compare +// against the expected value to avoid leaking timing signals. +func validHMAC(body, secret []byte, header string) bool { + header = strings.TrimSpace(header) + const prefix = "sha256=" + if !strings.HasPrefix(header, prefix) { + return false + } + got, err := hex.DecodeString(header[len(prefix):]) + if err != nil { + return false + } + mac := hmac.New(sha256.New, secret) + mac.Write(body) + want := mac.Sum(nil) + // Use the secrets package's constant-time helper — same byte semantics + // as hmac.Equal, kept in one place across the codebase. + return secrets.ConstantTimeEqual(got, want) +} diff --git a/server/internal/httpapi/webhooks_test.go b/server/internal/httpapi/webhooks_test.go new file mode 100644 index 0000000..db16662 --- /dev/null +++ b/server/internal/httpapi/webhooks_test.go @@ -0,0 +1,289 @@ +package httpapi + +import ( + "bytes" + "context" + "crypto/hmac" + "crypto/sha256" + "encoding/hex" + "encoding/json" + "net/http" + "net/http/httptest" + "testing" + "time" + + "github.com/dvcdsys/code-index/server/internal/githubtokens" + "github.com/dvcdsys/code-index/server/internal/jobs" + "github.com/dvcdsys/code-index/server/internal/secrets" + "github.com/dvcdsys/code-index/server/internal/workspacerepos" + "github.com/dvcdsys/code-index/server/internal/workspaces" +) + +// addRepo helper — creates a workspace + repo and returns the repo +// payload so tests can lift webhook_secret/id directly. +func addRepo(t *testing.T, router http.Handler, wsName, githubURL, branch string) workspaceRepoPayload { + t.Helper() + wsID := createWS(t, router, wsName) + rr := doJSON(t, router, http.MethodPost, "/api/v1/workspaces/"+wsID+"/repos", map[string]any{ + "github_url": githubURL, + "branch": branch, + }) + if rr.Code != http.StatusCreated { + t.Fatalf("add repo: %d (%s)", rr.Code, rr.Body.String()) + } + var got struct { + Repo workspaceRepoPayload `json:"repo"` + WebhookSecret string `json:"webhook_secret"` + } + _ = json.Unmarshal(rr.Body.Bytes(), &got) + // Stash the secret onto the payload via the URL — tests pluck it + // from the response body directly when needed; this helper just + // returns the repo. Test bodies that need the secret call addRepoWithSecret. + return got.Repo +} + +func addRepoWithSecret(t *testing.T, router http.Handler, wsName, githubURL, branch string) (workspaceRepoPayload, string) { + t.Helper() + wsID := createWS(t, router, wsName) + rr := doJSON(t, router, http.MethodPost, "/api/v1/workspaces/"+wsID+"/repos", map[string]any{ + "github_url": githubURL, + "branch": branch, + }) + if rr.Code != http.StatusCreated { + t.Fatalf("add repo: %d (%s)", rr.Code, rr.Body.String()) + } + var got struct { + Repo workspaceRepoPayload `json:"repo"` + WebhookSecret string `json:"webhook_secret"` + } + _ = json.Unmarshal(rr.Body.Bytes(), &got) + return got.Repo, got.WebhookSecret +} + +func signBody(body []byte, secret string) string { + mac := hmac.New(sha256.New, []byte(secret)) + mac.Write(body) + return "sha256=" + hex.EncodeToString(mac.Sum(nil)) +} + +func postWebhook(t *testing.T, router http.Handler, repoID string, body []byte, sig, event string) *httptest.ResponseRecorder { + t.Helper() + req := httptest.NewRequest(http.MethodPost, "/api/v1/webhooks/github/"+repoID, bytes.NewReader(body)) + req.Header.Set("Content-Type", "application/json") + if sig != "" { + req.Header.Set("X-Hub-Signature-256", sig) + } + if event != "" { + req.Header.Set("X-GitHub-Event", event) + } + rr := httptest.NewRecorder() + router.ServeHTTP(rr, req) + return rr +} + +func TestWebhook_PingReturns200(t *testing.T) { + router, _ := reposRouter(t) + repo, secret := addRepoWithSecret(t, router, "platform", "https://github.com/x/y", "main") + body := []byte(`{"zen":"Speak like a human."}`) + rr := postWebhook(t, router, repo.ID, body, signBody(body, secret), "ping") + if rr.Code != http.StatusOK { + t.Fatalf("ping: expected 200, got %d (%s)", rr.Code, rr.Body.String()) + } +} + +func TestWebhook_PushEnqueuesCloneJob(t *testing.T) { + router, jobsSvc := reposRouter(t) + repo, secret := addRepoWithSecret(t, router, "platform", "https://github.com/x/y", "main") + + // Drain the initial clone job from the add-repo call so we can see the + // webhook's own dedupe behaviour clearly. + ctx := context.Background() + initial, _ := jobsSvc.List(ctx, jobs.StatusPending, "clone_repo", 10) + if len(initial) != 1 { + t.Fatalf("expected 1 initial clone, got %d", len(initial)) + } + + body := []byte(`{"ref":"refs/heads/main","after":"abc123def456"}`) + rr := postWebhook(t, router, repo.ID, body, signBody(body, secret), "push") + // Dedupe with the in-flight initial clone → 202 already_running. + if rr.Code != http.StatusAccepted { + t.Fatalf("push: expected 202, got %d (%s)", rr.Code, rr.Body.String()) + } + var resp struct { + Status string `json:"status"` + } + _ = json.Unmarshal(rr.Body.Bytes(), &resp) + if resp.Status != "already_running" { + t.Fatalf("expected dedupe → already_running, got %q", resp.Status) + } +} + +func TestWebhook_PushOnDifferentBranchIgnored(t *testing.T) { + router, _ := reposRouter(t) + repo, secret := addRepoWithSecret(t, router, "platform", "https://github.com/x/y", "main") + body := []byte(`{"ref":"refs/heads/develop","after":"abc123"}`) + rr := postWebhook(t, router, repo.ID, body, signBody(body, secret), "push") + if rr.Code != http.StatusOK { + t.Fatalf("ignored: expected 200, got %d", rr.Code) + } + var resp struct { + Status string `json:"status"` + } + _ = json.Unmarshal(rr.Body.Bytes(), &resp) + if resp.Status != "ignored" { + t.Fatalf("expected ignored, got %q", resp.Status) + } +} + +func TestWebhook_BadSignatureRejected(t *testing.T) { + router, _ := reposRouter(t) + repo, _ := addRepoWithSecret(t, router, "platform", "https://github.com/x/y", "main") + body := []byte(`{"ref":"refs/heads/main","after":"abc"}`) + // Sign with the wrong secret. + rr := postWebhook(t, router, repo.ID, body, signBody(body, "wrong"), "push") + if rr.Code != http.StatusUnauthorized { + t.Fatalf("bad sig: expected 401, got %d (%s)", rr.Code, rr.Body.String()) + } +} + +func TestWebhook_MissingSignatureRejected(t *testing.T) { + router, _ := reposRouter(t) + repo, _ := addRepoWithSecret(t, router, "platform", "https://github.com/x/y", "main") + body := []byte(`{"ref":"refs/heads/main","after":"abc"}`) + rr := postWebhook(t, router, repo.ID, body, "", "push") + if rr.Code != http.StatusUnauthorized { + t.Fatalf("no sig: expected 401, got %d", rr.Code) + } +} + +func TestWebhook_UnknownRepoReturns404(t *testing.T) { + router, _ := reposRouter(t) + body := []byte(`{}`) + // Use the right HMAC math but a bogus repo id — must still 404 (we + // short-circuit before HMAC since there's no secret to compare against). + rr := postWebhook(t, router, "no-such-repo", body, signBody(body, "anything"), "push") + if rr.Code != http.StatusNotFound { + t.Fatalf("unknown repo: expected 404, got %d", rr.Code) + } +} + +func TestWebhook_PathIsPublic(t *testing.T) { + // Spin up a router with auth ENABLED (not the test-default) to verify + // the webhook path is reachable without credentials. + d, err := dbOpenMemory(t) + if err != nil { + t.Fatalf("open db: %v", err) + } + router := NewRouter(Deps{ + DB: d, + Users: seedlessUsers(d), + Sessions: seedlessSessions(d), + APIKeys: seedlessAPIKeys(d), + AuthDisabled: false, + // Workspaces disabled — but auth middleware should still let + // the request reach our handler before the 503 fires. + }) + rr := httptest.NewRecorder() + req := httptest.NewRequest(http.MethodPost, "/api/v1/webhooks/github/anything", bytes.NewReader([]byte(`{}`))) + router.ServeHTTP(rr, req) + // We expect either 503 (feature off) or 404, NOT 401 — the public-path + // gate should let us through the auth middleware. + if rr.Code == http.StatusUnauthorized { + t.Fatalf("webhook path leaked into auth-gated set, got 401") + } +} + +func TestAddRepo_AutoRegisterFailsCleanlyWithoutPublicURL(t *testing.T) { + // reposRouter sets PublicBaseURL=https://cix.example.test, but the + // auto-register flow tries a real github.com call which the test + // can't allow. So skip when wired with a real URL — this test + // exercises the empty-URL branch by building a separate router. + d, err := dbOpenMemory(t) + if err != nil { + t.Fatalf("open db: %v", err) + } + t.Setenv("CIX_SECRET_KEY", "") + t.Setenv("CIX_SECRET_KEYFILE", "") + sec, err := secrets.Open(secrets.OpenOptions{DataDir: t.TempDir(), AllowGenerate: true}) + if err != nil { + t.Fatalf("open secrets: %v", err) + } + wsSvc := workspaces.New(d) + ghSvc := githubtokens.New(d, sec) + wrSvc := workspacerepos.New(d) + jobsSvc := jobs.New(d, jobs.Options{Concurrency: 1, PollEvery: time.Hour}) + + router := NewRouter(Deps{ + DB: d, + AuthDisabled: true, + Users: seedlessUsers(d), + Sessions: seedlessSessions(d), + APIKeys: seedlessAPIKeys(d), + WorkspacesEnabled: true, + Workspaces: wsSvc, + GithubTokens: ghSvc, + WorkspaceRepos: wrSvc, + Jobs: jobsSvc, + // PublicBaseURL deliberately unset. + }) + + wsID := createWS(t, router, "platform") + rr := doJSON(t, router, http.MethodPost, "/api/v1/workspaces/"+wsID+"/repos", map[string]any{ + "github_url": "https://github.com/x/y", + "branch": "main", + "auto_webhook": true, + }) + if rr.Code != http.StatusCreated { + t.Fatalf("create: %d (%s)", rr.Code, rr.Body.String()) + } + var resp struct { + AutoRegistered bool `json:"auto_registered"` + Note string `json:"auto_register_note"` + } + _ = json.Unmarshal(rr.Body.Bytes(), &resp) + if resp.AutoRegistered { + t.Fatalf("AutoRegistered should be false without public URL") + } + if resp.Note == "" { + t.Fatalf("operator-facing note should explain the reason") + } +} + +func TestWebhookInfo_ReturnsURLAndSecret(t *testing.T) { + router, _ := reposRouter(t) + wsID := createWS(t, router, "platform") + // Manual add — we want the wsID + repo for the URL construction. + rr := doJSON(t, router, http.MethodPost, "/api/v1/workspaces/"+wsID+"/repos", map[string]any{ + "github_url": "https://github.com/a/b", + "branch": "main", + }) + if rr.Code != http.StatusCreated { + t.Fatalf("add: %d", rr.Code) + } + var created struct { + Repo workspaceRepoPayload `json:"repo"` + WebhookSecret string `json:"webhook_secret"` + } + _ = json.Unmarshal(rr.Body.Bytes(), &created) + + rr = doJSON(t, router, http.MethodGet, + "/api/v1/workspaces/"+wsID+"/repos/"+created.Repo.ID+"/webhook-info", nil) + if rr.Code != http.StatusOK { + t.Fatalf("webhook-info: %d (%s)", rr.Code, rr.Body.String()) + } + var info struct { + WebhookURL string `json:"webhook_url"` + WebhookSecret string `json:"webhook_secret"` + AutoRegistered bool `json:"auto_registered"` + } + _ = json.Unmarshal(rr.Body.Bytes(), &info) + if info.WebhookSecret != created.WebhookSecret { + t.Fatalf("secret mismatch between create and info") + } + if info.WebhookURL != "https://cix.example.test/api/v1/webhooks/github/"+created.Repo.ID { + t.Fatalf("URL wrong: %q", info.WebhookURL) + } + if info.AutoRegistered { + t.Fatalf("AutoRegistered should be false for manual setup") + } +} diff --git a/server/internal/httpapi/workspacerepos.go b/server/internal/httpapi/workspacerepos.go new file mode 100644 index 0000000..d135149 --- /dev/null +++ b/server/internal/httpapi/workspacerepos.go @@ -0,0 +1,425 @@ +package httpapi + +import ( + "context" + "encoding/json" + "errors" + "net/http" + "strings" + "time" + + "github.com/dvcdsys/code-index/server/internal/githubapi" + "github.com/dvcdsys/code-index/server/internal/githubtokens" + "github.com/dvcdsys/code-index/server/internal/httpapi/openapi" + "github.com/dvcdsys/code-index/server/internal/jobs" + "github.com/dvcdsys/code-index/server/internal/projects" + "github.com/dvcdsys/code-index/server/internal/workspacejobs" + "github.com/dvcdsys/code-index/server/internal/workspacerepos" + "github.com/dvcdsys/code-index/server/internal/workspaces" +) + +type workspaceRepoPayload struct { + ID string `json:"id"` + WorkspaceID string `json:"workspace_id"` + GitHubURL string `json:"github_url"` + Branch string `json:"branch"` + ProjectPath string `json:"project_path"` + TokenID *string `json:"token_id"` + AutoWebhook bool `json:"auto_webhook"` + WebhookMode string `json:"webhook_mode"` + Status string `json:"status"` + LastSHA *string `json:"last_sha"` + LastError *string `json:"last_error"` + LastIndexedAt *time.Time `json:"last_indexed_at"` + IsLinked bool `json:"is_linked"` + CreatedAt time.Time `json:"created_at"` + UpdatedAt time.Time `json:"updated_at"` +} + +func workspaceRepoToPayload(wr workspacerepos.WorkspaceRepo) workspaceRepoPayload { + var tokenID *string + if wr.TokenID != "" { + v := wr.TokenID + tokenID = &v + } + var lastSHA *string + if wr.LastSHA != "" { + v := wr.LastSHA + lastSHA = &v + } + var lastErr *string + if wr.LastError != "" { + v := wr.LastError + lastErr = &v + } + mode := wr.WebhookMode + if mode == "" { + mode = workspacerepos.WebhookModeManual + } + return workspaceRepoPayload{ + ID: wr.ID, + WorkspaceID: wr.WorkspaceID, + GitHubURL: wr.GitHubURL, + Branch: wr.Branch, + ProjectPath: wr.ProjectPath, + TokenID: tokenID, + AutoWebhook: wr.AutoWebhook, + WebhookMode: mode, + Status: wr.Status, + LastSHA: lastSHA, + LastError: lastErr, + LastIndexedAt: wr.LastIndexedAt, + IsLinked: wr.IsLinked, + CreatedAt: wr.CreatedAt, + UpdatedAt: wr.UpdatedAt, + } +} + +// workspaceReposUnavailable returns 503 when the feature flag is off OR +// any required service is nil. +func (s *Server) workspaceReposUnavailable(w http.ResponseWriter) bool { + if !s.Deps.WorkspacesEnabled || s.Deps.WorkspaceRepos == nil || s.Deps.Workspaces == nil || s.Deps.Jobs == nil { + writeError(w, http.StatusServiceUnavailable, "workspaces feature is disabled (set CIX_WORKSPACES_ENABLED=true and restart)") + return true + } + return false +} + +// requireWorkspace loads the parent workspace and returns 404 if missing. +// Used by every workspace-scoped endpoint to make "wrong workspace id" +// vs "wrong repo id" distinguishable in error responses. +func (s *Server) requireWorkspace(w http.ResponseWriter, r *http.Request, workspaceID string) bool { + _, err := s.Deps.Workspaces.GetByID(r.Context(), workspaceID) + if err != nil { + if errors.Is(err, workspaces.ErrNotFound) { + writeError(w, http.StatusNotFound, "workspace not found") + } else { + writeError(w, http.StatusInternalServerError, "could not load workspace") + } + return false + } + return true +} + +// ListWorkspaceRepos — GET /api/v1/workspaces/{id}/repos. +func (s *Server) ListWorkspaceRepos(w http.ResponseWriter, r *http.Request, id string) { + if s.workspaceReposUnavailable(w) { + return + } + if !s.requireWorkspace(w, r, id) { + return + } + list, err := s.Deps.WorkspaceRepos.ListByWorkspace(r.Context(), id) + if err != nil { + writeError(w, http.StatusInternalServerError, "could not list repos") + return + } + out := make([]workspaceRepoPayload, 0, len(list)) + for _, wr := range list { + out = append(out, workspaceRepoToPayload(wr)) + } + writeJSON(w, http.StatusOK, map[string]any{ + "repos": out, + "total": len(out), + }) +} + +// AddWorkspaceRepo — POST /api/v1/workspaces/{id}/repos. +// +// Creates the workspace_repos row + enqueues the clone_repo job (which +// chains to index_repo on success). Response carries the freshly-minted +// webhook_secret + a constructed webhook_url so the operator can set up +// the GitHub webhook manually (or wait for PR3's auto-register flow). +func (s *Server) AddWorkspaceRepo(w http.ResponseWriter, r *http.Request, id string) { + if s.workspaceReposUnavailable(w) { + return + } + if !s.requireWorkspace(w, r, id) { + return + } + var body openapi.AddWorkspaceRepoRequest + if err := json.NewDecoder(r.Body).Decode(&body); err != nil { + writeError(w, http.StatusUnprocessableEntity, "invalid JSON body") + return + } + req := workspacerepos.CreateRequest{ + WorkspaceID: id, + GitHubURL: body.GithubUrl, + Branch: body.Branch, + } + if body.TokenId != nil { + req.TokenID = *body.TokenId + } + if body.AutoWebhook != nil { + req.AutoWebhook = *body.AutoWebhook + } + if body.WebhookMode != nil { + req.WebhookMode = string(*body.WebhookMode) + } + wr, err := s.Deps.WorkspaceRepos.Create(r.Context(), req) + if err != nil { + switch { + case errors.Is(err, workspacerepos.ErrInvalidURL): + writeError(w, http.StatusUnprocessableEntity, "github_url must be an https://github.com/owner/repo URL") + case errors.Is(err, workspacerepos.ErrBranchEmpty): + writeError(w, http.StatusUnprocessableEntity, "branch is required") + case errors.Is(err, workspacerepos.ErrInvalidWebhookMode): + writeError(w, http.StatusUnprocessableEntity, "webhook_mode must be one of manual, auto, disabled") + case errors.Is(err, workspacerepos.ErrDuplicate): + writeError(w, http.StatusConflict, "this repo+branch is already attached to the workspace") + default: + writeError(w, http.StatusInternalServerError, "could not attach repo") + } + return + } + + if err := workspacejobs.EnqueueClone(r.Context(), s.Deps.Jobs, wr.ID); err != nil { + // Row created, job not — surface the error but leave the row. + // A manual reindex will retry the clone. + writeError(w, http.StatusInternalServerError, "repo attached but clone could not be enqueued: "+err.Error()) + return + } + + webhookURL := s.buildWebhookURL(wr.ID) + autoRegistered := false + autoNote := "" + if wr.AutoWebhook { + ok, note := s.tryAutoRegisterWebhook(r.Context(), wr, webhookURL) + autoRegistered = ok + autoNote = note + if ok { + // Reload so the response reflects the persisted webhook_id. + if fresh, ferr := s.Deps.WorkspaceRepos.GetByID(r.Context(), wr.ID); ferr == nil { + wr = fresh + } + } + } + + resp := map[string]any{ + "repo": workspaceRepoToPayload(wr), + "webhook_url": webhookURL, + "webhook_secret": wr.WebhookSecret, + "auto_registered": autoRegistered, + } + if autoNote != "" { + resp["auto_register_note"] = autoNote + } + writeJSON(w, http.StatusCreated, resp) +} + +// tryAutoRegisterWebhook calls the GitHub API to register a push hook for +// the given repo. Best-effort — failure does NOT roll back the +// workspace_repos row; the operator can rerun manually via the +// webhook-info endpoint. Returns (success, human-readable note). +// +// Public URL is required — without it GitHub would deliver to a path +// that's not reachable. We refuse to attempt the call when PublicBaseURL +// is empty and surface that as the note. +func (s *Server) tryAutoRegisterWebhook(ctx context.Context, wr workspacerepos.WorkspaceRepo, deliveryURL string) (bool, string) { + logger := s.Deps.Logger + if !strings.HasPrefix(deliveryURL, "http") { + return false, "CIX_PUBLIC_URL is not set — register the webhook manually" + } + if wr.TokenID == "" { + return false, "auto_webhook=true requires a token_id with admin:repo_hook scope" + } + pat, err := s.Deps.GithubTokens.Reveal(ctx, wr.TokenID) + if err != nil { + if errors.Is(err, githubtokens.ErrNotFound) { + return false, "token_id not found" + } + return false, "could not decrypt the GitHub token" + } + _ = s.Deps.GithubTokens.Touch(ctx, wr.TokenID) + + owner, repo, perr := githubapi.ParseOwnerRepo(wr.GitHubURL) + if perr != nil { + return false, "github_url is not a parseable owner/repo URL" + } + hr, herr := githubapi.New().CreateWebhook(ctx, githubapi.CreateWebhookOptions{ + Owner: owner, + Repo: repo, + PAT: pat, + URL: deliveryURL, + Secret: wr.WebhookSecret, + }) + if herr != nil { + if logger != nil { + logger.Warn("workspaces: auto-register webhook failed", + "repo_id", wr.ID, "owner", owner, "repo", repo, "err", herr) + } + if errors.Is(herr, githubapi.ErrUnauthorized) { + return false, "GitHub rejected the token — add admin:repo_hook scope or register manually" + } + return false, "GitHub API rejected the call: " + herr.Error() + } + if uerr := s.Deps.WorkspaceRepos.SetWebhookID(ctx, wr.ID, hr.ID); uerr != nil && logger != nil { + logger.Warn("workspaces: could not persist webhook id", "repo_id", wr.ID, "err", uerr) + } + return true, "" +} + +// DeleteWorkspaceRepo — DELETE /api/v1/workspaces/{id}/repos/{repo_id}. +func (s *Server) DeleteWorkspaceRepo(w http.ResponseWriter, r *http.Request, id, repoID string) { + if s.workspaceReposUnavailable(w) { + return + } + if !s.requireWorkspace(w, r, id) { + return + } + // Authorisation: a repo only belongs to its workspace, so we also + // require the repo's workspace_id to match. Otherwise users could + // detach repos across workspaces by guessing ids. + existing, err := s.Deps.WorkspaceRepos.GetByID(r.Context(), repoID) + if err != nil { + if errors.Is(err, workspacerepos.ErrNotFound) { + writeError(w, http.StatusNotFound, "repo not found") + return + } + writeError(w, http.StatusInternalServerError, "could not load repo") + return + } + if existing.WorkspaceID != id { + writeError(w, http.StatusNotFound, "repo not found") + return + } + if err := s.Deps.WorkspaceRepos.Delete(r.Context(), repoID); err != nil { + if errors.Is(err, workspacerepos.ErrNotFound) { + writeError(w, http.StatusNotFound, "repo not found") + return + } + writeError(w, http.StatusInternalServerError, "could not delete repo") + return + } + w.WriteHeader(http.StatusNoContent) +} + +// ReindexWorkspaceRepo — POST /api/v1/workspaces/{id}/repos/{repo_id}/reindex. +func (s *Server) ReindexWorkspaceRepo(w http.ResponseWriter, r *http.Request, id, repoID string) { + if s.workspaceReposUnavailable(w) { + return + } + if !s.requireWorkspace(w, r, id) { + return + } + wr, err := s.Deps.WorkspaceRepos.GetByID(r.Context(), repoID) + if err != nil { + if errors.Is(err, workspacerepos.ErrNotFound) { + writeError(w, http.StatusNotFound, "repo not found") + return + } + writeError(w, http.StatusInternalServerError, "could not load repo") + return + } + if wr.WorkspaceID != id { + writeError(w, http.StatusNotFound, "repo not found") + return + } + + enqueued := true + if _, eerr := s.Deps.Jobs.Enqueue(r.Context(), jobs.EnqueueRequest{ + Type: workspacejobs.TypeCloneRepo, + DedupeKey: "clone:" + wr.ID, + Payload: workspacejobs.ClonePayload{RepoID: wr.ID}, + }); eerr != nil { + if errors.Is(eerr, jobs.ErrDuplicate) { + enqueued = false + } else { + writeError(w, http.StatusInternalServerError, "could not enqueue reindex") + return + } + } + + status := "enqueued" + if !enqueued { + status = "already_running" + } + writeJSON(w, http.StatusAccepted, map[string]any{ + "status": status, + "repo": workspaceRepoToPayload(wr), + }) +} + +// buildWebhookURL constructs the publicly-reachable webhook delivery URL +// for a workspace_repo. When PublicBaseURL is empty (no operator-set +// origin), returns only the path so the dashboard can render it with a +// helper note. +func (s *Server) buildWebhookURL(repoID string) string { + path := "/api/v1/webhooks/github/" + repoID + base := strings.TrimRight(s.Deps.PublicBaseURL, "/") + if base == "" { + return path + } + return base + path +} + +// LinkExistingProject — POST /api/v1/workspaces/{id}/repos/link. +// +// Attaches an already-indexed project to the workspace as a lightweight +// linked row. No clone, no index job, no webhook. The response mirrors +// AddWorkspaceRepo's shape so the dashboard can reuse the same refresh +// pattern; webhook_url + webhook_secret are empty because linked rows +// have no webhook to register. +func (s *Server) LinkExistingProject(w http.ResponseWriter, r *http.Request, id string) { + if s.workspaceReposUnavailable(w) { + return + } + if !s.requireWorkspace(w, r, id) { + return + } + var body openapi.LinkExistingProjectRequest + if err := json.NewDecoder(r.Body).Decode(&body); err != nil { + writeError(w, http.StatusUnprocessableEntity, "invalid JSON body") + return + } + hash := strings.TrimSpace(body.ProjectHash) + if hash == "" { + writeError(w, http.StatusUnprocessableEntity, "project_hash is required") + return + } + + // Resolve the project by hash so we can validate status + extract + // host_path. The same lookup is used by /projects/{path} so the + // behaviour is consistent — 404 for unknown hashes, 422 for known + // but not-yet-indexed projects. + proj, perr := projects.GetByHash(r.Context(), s.Deps.DB, hash) + if perr != nil { + if errors.Is(perr, projects.ErrNotFound) { + writeError(w, http.StatusNotFound, "project not found") + return + } + writeError(w, http.StatusInternalServerError, "could not load project") + return + } + if proj.Status != "indexed" { + writeError(w, http.StatusUnprocessableEntity, + "project is not yet indexed (status="+proj.Status+") — wait for indexing to complete before linking") + return + } + + wr, err := s.Deps.WorkspaceRepos.CreateLink(r.Context(), id, proj.HostPath) + if err != nil { + switch { + case errors.Is(err, workspacerepos.ErrInvalidURL): + writeError(w, http.StatusUnprocessableEntity, + "project host_path is not a github.com/owner/repo@branch — local-path projects cannot be linked") + case errors.Is(err, workspacerepos.ErrBranchEmpty): + writeError(w, http.StatusUnprocessableEntity, "project host_path has no branch suffix") + case errors.Is(err, workspacerepos.ErrDuplicate): + writeError(w, http.StatusConflict, "this repo+branch is already attached to the workspace") + default: + writeError(w, http.StatusInternalServerError, "could not link project") + } + return + } + + // Mirror AddWorkspaceRepo's envelope so the dashboard can decode one + // shape regardless of which create path it called. Linked rows have + // no webhook, so URL/secret are empty. + writeJSON(w, http.StatusCreated, map[string]any{ + "repo": workspaceRepoToPayload(wr), + "webhook_url": "", + "webhook_secret": "", + "auto_registered": false, + }) +} diff --git a/server/internal/httpapi/workspacerepos_test.go b/server/internal/httpapi/workspacerepos_test.go new file mode 100644 index 0000000..cf137ee --- /dev/null +++ b/server/internal/httpapi/workspacerepos_test.go @@ -0,0 +1,558 @@ +package httpapi + +import ( + "context" + "database/sql" + "encoding/json" + "net/http" + "testing" + "time" + + "github.com/dvcdsys/code-index/server/internal/githubtokens" + "github.com/dvcdsys/code-index/server/internal/jobs" + "github.com/dvcdsys/code-index/server/internal/projects" + "github.com/dvcdsys/code-index/server/internal/secrets" + "github.com/dvcdsys/code-index/server/internal/workspacerepos" + "github.com/dvcdsys/code-index/server/internal/workspaces" +) + +// reposRouter spins up a router with the full workspaces+repos surface +// wired against an in-memory DB. Auth is disabled — the focus here is +// the persistence + enqueue paths. +// +// We deliberately do NOT start the jobs worker pool: we only assert the +// job row landed in the right state. End-to-end clone+index runs against +// real git remotes and the embeddings sidecar — out of scope for unit +// tests. +func reposRouter(t *testing.T) (http.Handler, *jobs.Service) { + t.Helper() + d, err := dbOpenMemory(t) + if err != nil { + t.Fatalf("open db: %v", err) + } + + t.Setenv("CIX_SECRET_KEY", "") + t.Setenv("CIX_SECRET_KEYFILE", "") + sec, err := secrets.Open(secrets.OpenOptions{DataDir: t.TempDir(), AllowGenerate: true}) + if err != nil { + t.Fatalf("open secrets: %v", err) + } + wsSvc := workspaces.New(d) + ghSvc := githubtokens.New(d, sec) + wrSvc := workspacerepos.New(d) + jobsSvc := jobs.New(d, jobs.Options{Concurrency: 1, PollEvery: time.Hour}) // never poll in tests + + router := NewRouter(Deps{ + DB: d, + ServerVersion: "test", + APIVersion: "v1", + Backend: "go", + AuthDisabled: true, + Users: seedlessUsers(d), + Sessions: seedlessSessions(d), + APIKeys: seedlessAPIKeys(d), + WorkspacesEnabled: true, + Workspaces: wsSvc, + GithubTokens: ghSvc, + WorkspaceRepos: wrSvc, + Jobs: jobsSvc, + PublicBaseURL: "https://cix.example.test", + }) + return router, jobsSvc +} + +func createWS(t *testing.T, router http.Handler, name string) string { + t.Helper() + rr := doJSON(t, router, http.MethodPost, "/api/v1/workspaces", map[string]any{ + "name": name, + }) + if rr.Code != http.StatusCreated { + t.Fatalf("create workspace: %d (%s)", rr.Code, rr.Body.String()) + } + var got workspacePayload + _ = json.Unmarshal(rr.Body.Bytes(), &got) + return got.ID +} + +func TestRepos_AddEnqueuesCloneJob(t *testing.T) { + router, jobsSvc := reposRouter(t) + wsID := createWS(t, router, "platform") + + rr := doJSON(t, router, http.MethodPost, "/api/v1/workspaces/"+wsID+"/repos", map[string]any{ + "github_url": "https://github.com/spf13/cobra", + "branch": "main", + }) + if rr.Code != http.StatusCreated { + t.Fatalf("add repo: %d (%s)", rr.Code, rr.Body.String()) + } + var resp struct { + Repo workspaceRepoPayload `json:"repo"` + WebhookURL string `json:"webhook_url"` + WebhookSecret string `json:"webhook_secret"` + } + if err := json.Unmarshal(rr.Body.Bytes(), &resp); err != nil { + t.Fatalf("decode: %v", err) + } + if resp.Repo.ProjectPath != "github.com/spf13/cobra@main" { + t.Fatalf("unexpected project_path %q", resp.Repo.ProjectPath) + } + if resp.Repo.Status != workspacerepos.StatusPending { + t.Fatalf("expected status=pending, got %q", resp.Repo.Status) + } + if resp.WebhookSecret == "" { + t.Fatalf("webhook secret should be present in response") + } + if resp.WebhookURL != "https://cix.example.test/api/v1/webhooks/github/"+resp.Repo.ID { + t.Fatalf("webhook URL wrong: %q", resp.WebhookURL) + } + + // Verify the job landed on the queue. + jobList, err := jobsSvc.List(context.Background(), jobs.StatusPending, "clone_repo", 10) + if err != nil { + t.Fatalf("jobs list: %v", err) + } + if len(jobList) != 1 { + t.Fatalf("expected 1 pending clone_repo job, got %d", len(jobList)) + } + if jobList[0].DedupeKey != "clone:"+resp.Repo.ID { + t.Fatalf("unexpected dedupe_key %q", jobList[0].DedupeKey) + } +} + +func TestRepos_DuplicateRejected(t *testing.T) { + router, _ := reposRouter(t) + wsID := createWS(t, router, "platform") + body := map[string]any{ + "github_url": "https://github.com/a/b", + "branch": "main", + } + rr := doJSON(t, router, http.MethodPost, "/api/v1/workspaces/"+wsID+"/repos", body) + if rr.Code != http.StatusCreated { + t.Fatalf("first add: %d", rr.Code) + } + rr = doJSON(t, router, http.MethodPost, "/api/v1/workspaces/"+wsID+"/repos", body) + if rr.Code != http.StatusConflict { + t.Fatalf("duplicate should 409, got %d", rr.Code) + } +} + +// TestRepos_WebhookModeStored covers the three-state webhook_mode +// introduced for the add-repo UI. The DB column should round-trip the +// chosen mode; the legacy auto_webhook bool is derived (true iff +// mode == "auto") so old API clients keep behaving the same. +func TestRepos_WebhookModeStored(t *testing.T) { + router, _ := reposRouter(t) + wsID := createWS(t, router, "platform") + + cases := []struct { + name string + body map[string]any + wantMode string + wantAutoBool bool + }{ + { + name: "manual explicit", + body: map[string]any{"github_url": "https://github.com/a/manual", "branch": "main", "webhook_mode": "manual"}, + wantMode: "manual", + wantAutoBool: false, + }, + { + name: "auto explicit", + body: map[string]any{"github_url": "https://github.com/a/auto", "branch": "main", "webhook_mode": "auto"}, + wantMode: "auto", + wantAutoBool: true, + }, + { + name: "disabled explicit", + body: map[string]any{"github_url": "https://github.com/a/disabled", "branch": "main", "webhook_mode": "disabled"}, + wantMode: "disabled", + wantAutoBool: false, + }, + { + name: "legacy auto_webhook bool", + body: map[string]any{"github_url": "https://github.com/a/legacy", "branch": "main", "auto_webhook": true}, + wantMode: "auto", + wantAutoBool: true, + }, + { + name: "default when omitted", + body: map[string]any{"github_url": "https://github.com/a/default", "branch": "main"}, + wantMode: "manual", + wantAutoBool: false, + }, + } + for _, tc := range cases { + t.Run(tc.name, func(t *testing.T) { + rr := doJSON(t, router, http.MethodPost, "/api/v1/workspaces/"+wsID+"/repos", tc.body) + if rr.Code != http.StatusCreated { + t.Fatalf("add: %d (%s)", rr.Code, rr.Body.String()) + } + var resp struct { + Repo workspaceRepoPayload `json:"repo"` + } + _ = json.Unmarshal(rr.Body.Bytes(), &resp) + if resp.Repo.WebhookMode != tc.wantMode { + t.Fatalf("webhook_mode = %q, want %q", resp.Repo.WebhookMode, tc.wantMode) + } + if resp.Repo.AutoWebhook != tc.wantAutoBool { + t.Fatalf("auto_webhook = %v, want %v (for mode=%q)", + resp.Repo.AutoWebhook, tc.wantAutoBool, tc.wantMode) + } + }) + } +} + +// TestRepos_WebhookModeRejectsUnknown ensures the DB never receives an +// unknown enum value — the dashboard's three radio buttons are the only +// supported inputs. +func TestRepos_WebhookModeRejectsUnknown(t *testing.T) { + router, _ := reposRouter(t) + wsID := createWS(t, router, "platform") + rr := doJSON(t, router, http.MethodPost, "/api/v1/workspaces/"+wsID+"/repos", map[string]any{ + "github_url": "https://github.com/a/b", + "branch": "main", + "webhook_mode": "totally-bogus", + }) + if rr.Code != http.StatusUnprocessableEntity { + t.Fatalf("expected 422 on unknown mode, got %d", rr.Code) + } +} + +func TestRepos_BadURLRejected(t *testing.T) { + router, _ := reposRouter(t) + wsID := createWS(t, router, "platform") + rr := doJSON(t, router, http.MethodPost, "/api/v1/workspaces/"+wsID+"/repos", map[string]any{ + "github_url": "https://gitlab.com/x/y", + "branch": "main", + }) + if rr.Code != http.StatusUnprocessableEntity { + t.Fatalf("expected 422 for non-github URL, got %d", rr.Code) + } +} + +func TestRepos_DeleteCrossWorkspaceForbidden(t *testing.T) { + router, _ := reposRouter(t) + wsA := createWS(t, router, "alpha") + wsB := createWS(t, router, "bravo") + + rr := doJSON(t, router, http.MethodPost, "/api/v1/workspaces/"+wsA+"/repos", map[string]any{ + "github_url": "https://github.com/x/y", + "branch": "main", + }) + if rr.Code != http.StatusCreated { + t.Fatalf("add: %d", rr.Code) + } + var resp struct { + Repo workspaceRepoPayload `json:"repo"` + } + _ = json.Unmarshal(rr.Body.Bytes(), &resp) + + // Try to delete repo from workspace B — must 404 (don't leak existence). + rr = doJSON(t, router, http.MethodDelete, "/api/v1/workspaces/"+wsB+"/repos/"+resp.Repo.ID, nil) + if rr.Code != http.StatusNotFound { + t.Fatalf("cross-workspace delete should 404, got %d", rr.Code) + } + + // Correct workspace should succeed. + rr = doJSON(t, router, http.MethodDelete, "/api/v1/workspaces/"+wsA+"/repos/"+resp.Repo.ID, nil) + if rr.Code != http.StatusNoContent { + t.Fatalf("delete: %d", rr.Code) + } +} + +func TestRepos_ReindexDedupeCollapsesInFlightJob(t *testing.T) { + router, jobsSvc := reposRouter(t) + wsID := createWS(t, router, "platform") + + rr := doJSON(t, router, http.MethodPost, "/api/v1/workspaces/"+wsID+"/repos", map[string]any{ + "github_url": "https://github.com/foo/bar", + "branch": "main", + }) + var created struct { + Repo workspaceRepoPayload `json:"repo"` + } + _ = json.Unmarshal(rr.Body.Bytes(), &created) + + // Add-time already enqueued a clone_repo job — reindex should be + // dedup'd and return status="already_running". + rr = doJSON(t, router, http.MethodPost, "/api/v1/workspaces/"+wsID+"/repos/"+created.Repo.ID+"/reindex", nil) + if rr.Code != http.StatusAccepted { + t.Fatalf("reindex: %d (%s)", rr.Code, rr.Body.String()) + } + var rresp struct { + Status string `json:"status"` + } + _ = json.Unmarshal(rr.Body.Bytes(), &rresp) + if rresp.Status != "already_running" { + t.Fatalf("expected already_running on dedupe, got %q", rresp.Status) + } + + // Exactly one job on the queue still. + all, _ := jobsSvc.List(context.Background(), jobs.StatusPending, "clone_repo", 10) + if len(all) != 1 { + t.Fatalf("expected dedupe to collapse into 1 job, got %d", len(all)) + } +} + +func TestRepos_DisabledFeatureReturns503(t *testing.T) { + router := workspaceRouter(t, false) + rr := doJSON(t, router, http.MethodGet, "/api/v1/workspaces/any/repos", nil) + if rr.Code != http.StatusServiceUnavailable { + t.Fatalf("expected 503, got %d", rr.Code) + } +} + +func TestJobs_ListEndpointFiltersByStatus(t *testing.T) { + router, jobsSvc := reposRouter(t) + ctx := context.Background() + if _, err := jobsSvc.Enqueue(ctx, jobs.EnqueueRequest{Type: "test_a"}); err != nil { + t.Fatalf("enqueue: %v", err) + } + if _, err := jobsSvc.Enqueue(ctx, jobs.EnqueueRequest{Type: "test_b"}); err != nil { + t.Fatalf("enqueue: %v", err) + } + rr := doJSON(t, router, http.MethodGet, "/api/v1/jobs", nil) + if rr.Code != http.StatusOK { + t.Fatalf("jobs list: %d", rr.Code) + } + var lr struct { + Jobs []jobPayload `json:"jobs"` + Total int `json:"total"` + } + _ = json.Unmarshal(rr.Body.Bytes(), &lr) + if lr.Total != 2 { + t.Fatalf("expected 2 jobs, got %d", lr.Total) + } + rr = doJSON(t, router, http.MethodGet, "/api/v1/jobs?type=test_a", nil) + if rr.Code != http.StatusOK { + t.Fatalf("typed list: %d", rr.Code) + } + _ = json.Unmarshal(rr.Body.Bytes(), &lr) + if lr.Total != 1 { + t.Fatalf("expected 1 typed job, got %d", lr.Total) + } +} + +// reposRouterWithDB is the same router setup as reposRouter but also +// returns the underlying *sql.DB so link-existing tests can seed +// projects directly. We keep reposRouter signature unchanged so the +// existing call sites in webhooks_test.go and elsewhere stay green. +func reposRouterWithDB(t *testing.T) (http.Handler, *jobs.Service, *sql.DB) { + t.Helper() + d, err := dbOpenMemory(t) + if err != nil { + t.Fatalf("open db: %v", err) + } + t.Setenv("CIX_SECRET_KEY", "") + t.Setenv("CIX_SECRET_KEYFILE", "") + sec, err := secrets.Open(secrets.OpenOptions{DataDir: t.TempDir(), AllowGenerate: true}) + if err != nil { + t.Fatalf("open secrets: %v", err) + } + wsSvc := workspaces.New(d) + ghSvc := githubtokens.New(d, sec) + wrSvc := workspacerepos.New(d) + jobsSvc := jobs.New(d, jobs.Options{Concurrency: 1, PollEvery: time.Hour}) + + router := NewRouter(Deps{ + DB: d, + ServerVersion: "test", + APIVersion: "v1", + Backend: "go", + AuthDisabled: true, + Users: seedlessUsers(d), + Sessions: seedlessSessions(d), + APIKeys: seedlessAPIKeys(d), + WorkspacesEnabled: true, + Workspaces: wsSvc, + GithubTokens: ghSvc, + WorkspaceRepos: wrSvc, + Jobs: jobsSvc, + PublicBaseURL: "https://cix.example.test", + }) + return router, jobsSvc, d +} + +// seedIndexedProject creates an indexed project row with the given +// host_path and returns its path_hash. Used by the link-existing tests +// so they don't need to invoke the real cloner+indexer. +func seedIndexedProject(t *testing.T, db *sql.DB, hostPath string) string { + t.Helper() + if _, err := projects.Create(context.Background(), db, projects.CreateRequest{HostPath: hostPath}); err != nil { + t.Fatalf("seed project: %v", err) + } + if _, err := db.Exec( + `UPDATE projects SET status = 'indexed', last_indexed_at = ?, updated_at = ? WHERE host_path = ?`, + time.Now().UTC().Format(time.RFC3339Nano), + time.Now().UTC().Format(time.RFC3339Nano), + hostPath, + ); err != nil { + t.Fatalf("mark indexed: %v", err) + } + return projects.HashPath(hostPath) +} + +func TestLinkExistingProject_SkipsCloneJob(t *testing.T) { + router, jobsSvc, d := reposRouterWithDB(t) + wsID := createWS(t, router, "platform") + hash := seedIndexedProject(t, d, "github.com/spf13/cobra@main") + + rr := doJSON(t, router, http.MethodPost, + "/api/v1/workspaces/"+wsID+"/repos/link", + map[string]any{"project_hash": hash}) + if rr.Code != http.StatusCreated { + t.Fatalf("link: %d (%s)", rr.Code, rr.Body.String()) + } + var resp struct { + Repo workspaceRepoPayload `json:"repo"` + WebhookURL string `json:"webhook_url"` + WebhookSecret string `json:"webhook_secret"` + AutoRegistered bool `json:"auto_registered"` + } + if err := json.Unmarshal(rr.Body.Bytes(), &resp); err != nil { + t.Fatalf("decode: %v", err) + } + if !resp.Repo.IsLinked { + t.Fatalf("expected IsLinked=true, got %+v", resp.Repo) + } + if resp.Repo.Status != workspacerepos.StatusIndexed { + t.Fatalf("expected status=indexed, got %q", resp.Repo.Status) + } + if resp.Repo.WebhookMode != workspacerepos.WebhookModeDisabled { + t.Fatalf("expected webhook_mode=disabled, got %q", resp.Repo.WebhookMode) + } + if resp.WebhookURL != "" || resp.WebhookSecret != "" { + t.Fatalf("linked rows must not surface webhook info, got url=%q secret-len=%d", + resp.WebhookURL, len(resp.WebhookSecret)) + } + + // Critical: no clone_repo job should have been enqueued. + jobList, err := jobsSvc.List(context.Background(), jobs.StatusPending, "clone_repo", 10) + if err != nil { + t.Fatalf("jobs list: %v", err) + } + if len(jobList) != 0 { + t.Fatalf("expected 0 clone_repo jobs, got %d (linked rows must not clone)", len(jobList)) + } +} + +func TestLinkExistingProject_409OnDuplicate(t *testing.T) { + router, _, d := reposRouterWithDB(t) + wsID := createWS(t, router, "platform") + hash := seedIndexedProject(t, d, "github.com/foo/bar@main") + + rr := doJSON(t, router, http.MethodPost, + "/api/v1/workspaces/"+wsID+"/repos/link", + map[string]any{"project_hash": hash}) + if rr.Code != http.StatusCreated { + t.Fatalf("first link: %d (%s)", rr.Code, rr.Body.String()) + } + rr = doJSON(t, router, http.MethodPost, + "/api/v1/workspaces/"+wsID+"/repos/link", + map[string]any{"project_hash": hash}) + if rr.Code != http.StatusConflict { + t.Fatalf("expected 409 on duplicate link, got %d (%s)", rr.Code, rr.Body.String()) + } +} + +func TestLinkExistingProject_422IfProjectNotIndexed(t *testing.T) { + router, _, d := reposRouterWithDB(t) + wsID := createWS(t, router, "platform") + // Create the project but leave status=created (the default). + hostPath := "github.com/foo/bar@main" + if _, err := projects.Create(context.Background(), d, + projects.CreateRequest{HostPath: hostPath}); err != nil { + t.Fatalf("seed project: %v", err) + } + rr := doJSON(t, router, http.MethodPost, + "/api/v1/workspaces/"+wsID+"/repos/link", + map[string]any{"project_hash": projects.HashPath(hostPath)}) + if rr.Code != http.StatusUnprocessableEntity { + t.Fatalf("expected 422, got %d (%s)", rr.Code, rr.Body.String()) + } +} + +func TestLinkExistingProject_404OnUnknownHash(t *testing.T) { + router, _ := reposRouter(t) + wsID := createWS(t, router, "platform") + rr := doJSON(t, router, http.MethodPost, + "/api/v1/workspaces/"+wsID+"/repos/link", + map[string]any{"project_hash": "0000000000000000"}) + if rr.Code != http.StatusNotFound { + t.Fatalf("expected 404 for unknown project_hash, got %d (%s)", rr.Code, rr.Body.String()) + } +} + +func TestListProjectWorkspaces_ReturnsAllMemberships(t *testing.T) { + router, _, d := reposRouterWithDB(t) + hash := seedIndexedProject(t, d, "github.com/foo/bar@main") + wsA := createWS(t, router, "alpha") + wsB := createWS(t, router, "beta") + + for _, ws := range []string{wsA, wsB} { + rr := doJSON(t, router, http.MethodPost, + "/api/v1/workspaces/"+ws+"/repos/link", + map[string]any{"project_hash": hash}) + if rr.Code != http.StatusCreated { + t.Fatalf("link to %s: %d (%s)", ws, rr.Code, rr.Body.String()) + } + } + + rr := doJSON(t, router, http.MethodGet, + "/api/v1/projects/"+hash+"/workspaces", nil) + if rr.Code != http.StatusOK { + t.Fatalf("list workspaces: %d (%s)", rr.Code, rr.Body.String()) + } + var resp struct { + Workspaces []struct { + WorkspaceID string `json:"workspace_id"` + WorkspaceName string `json:"workspace_name"` + RepoID string `json:"repo_id"` + IsLinked bool `json:"is_linked"` + Status string `json:"status"` + } `json:"workspaces"` + } + if err := json.Unmarshal(rr.Body.Bytes(), &resp); err != nil { + t.Fatalf("decode: %v", err) + } + if len(resp.Workspaces) != 2 { + t.Fatalf("expected 2 memberships, got %d (%+v)", len(resp.Workspaces), resp.Workspaces) + } + for _, m := range resp.Workspaces { + if !m.IsLinked { + t.Fatalf("workspace %s membership should be linked", m.WorkspaceName) + } + if m.Status != workspacerepos.StatusIndexed { + t.Fatalf("status should be indexed, got %q", m.Status) + } + } +} + +func TestListProjectWorkspaces_EmptyWhenUnused(t *testing.T) { + router, _, d := reposRouterWithDB(t) + hash := seedIndexedProject(t, d, "github.com/lonely/project@main") + + rr := doJSON(t, router, http.MethodGet, + "/api/v1/projects/"+hash+"/workspaces", nil) + if rr.Code != http.StatusOK { + t.Fatalf("list workspaces: %d (%s)", rr.Code, rr.Body.String()) + } + var resp struct { + Workspaces []any `json:"workspaces"` + } + if err := json.Unmarshal(rr.Body.Bytes(), &resp); err != nil { + t.Fatalf("decode: %v", err) + } + if len(resp.Workspaces) != 0 { + t.Fatalf("expected empty list, got %d", len(resp.Workspaces)) + } +} + +func TestListProjectWorkspaces_404OnUnknownHash(t *testing.T) { + router, _ := reposRouter(t) + rr := doJSON(t, router, http.MethodGet, + "/api/v1/projects/0000000000000000/workspaces", nil) + if rr.Code != http.StatusNotFound { + t.Fatalf("expected 404, got %d (%s)", rr.Code, rr.Body.String()) + } +} diff --git a/server/internal/httpapi/workspaces.go b/server/internal/httpapi/workspaces.go new file mode 100644 index 0000000..3d9d0e3 --- /dev/null +++ b/server/internal/httpapi/workspaces.go @@ -0,0 +1,152 @@ +package httpapi + +import ( + "encoding/json" + "errors" + "net/http" + "time" + + "github.com/dvcdsys/code-index/server/internal/httpapi/openapi" + "github.com/dvcdsys/code-index/server/internal/workspaces" +) + +// workspacePayload is the JSON shape sent back to clients. It mirrors the +// generated openapi.Workspace exactly (no oneOf magic), keeping the surface +// stable across regeneration cycles. +type workspacePayload struct { + ID string `json:"id"` + Name string `json:"name"` + Description string `json:"description"` + CreatedAt time.Time `json:"created_at"` + UpdatedAt time.Time `json:"updated_at"` +} + +func workspaceToPayload(w workspaces.Workspace) workspacePayload { + return workspacePayload{ + ID: w.ID, + Name: w.Name, + Description: w.Description, + CreatedAt: w.CreatedAt, + UpdatedAt: w.UpdatedAt, + } +} + +// workspacesUnavailable returns 503 when the feature flag is off OR the +// service is nil. Single source for the message so the dashboard's +// "feature off" UI key is stable. +func (s *Server) workspacesUnavailable(w http.ResponseWriter) bool { + if !s.Deps.WorkspacesEnabled || s.Deps.Workspaces == nil { + writeError(w, http.StatusServiceUnavailable, "workspaces feature is disabled (set CIX_WORKSPACES_ENABLED=true and restart)") + return true + } + return false +} + +// ListWorkspaces — GET /api/v1/workspaces. +func (s *Server) ListWorkspaces(w http.ResponseWriter, r *http.Request) { + if s.workspacesUnavailable(w) { + return + } + list, err := s.Deps.Workspaces.List(r.Context()) + if err != nil { + writeError(w, http.StatusInternalServerError, "could not list workspaces") + return + } + out := make([]workspacePayload, 0, len(list)) + for _, ws := range list { + out = append(out, workspaceToPayload(ws)) + } + writeJSON(w, http.StatusOK, map[string]any{ + "workspaces": out, + "total": len(out), + }) +} + +// CreateWorkspace — POST /api/v1/workspaces. +func (s *Server) CreateWorkspace(w http.ResponseWriter, r *http.Request) { + if s.workspacesUnavailable(w) { + return + } + var body openapi.CreateWorkspaceRequest + if err := json.NewDecoder(r.Body).Decode(&body); err != nil { + writeError(w, http.StatusUnprocessableEntity, "invalid JSON body") + return + } + description := "" + if body.Description != nil { + description = *body.Description + } + ws, err := s.Deps.Workspaces.Create(r.Context(), body.Name, description) + if err != nil { + switch { + case errors.Is(err, workspaces.ErrNameEmpty): + writeError(w, http.StatusUnprocessableEntity, "name is required") + case errors.Is(err, workspaces.ErrNameTaken): + writeError(w, http.StatusConflict, "workspace name already exists") + default: + writeError(w, http.StatusInternalServerError, "could not create workspace") + } + return + } + writeJSON(w, http.StatusCreated, workspaceToPayload(ws)) +} + +// GetWorkspace — GET /api/v1/workspaces/{id}. +func (s *Server) GetWorkspace(w http.ResponseWriter, r *http.Request, id string) { + if s.workspacesUnavailable(w) { + return + } + ws, err := s.Deps.Workspaces.GetByID(r.Context(), id) + if err != nil { + if errors.Is(err, workspaces.ErrNotFound) { + writeError(w, http.StatusNotFound, "workspace not found") + return + } + writeError(w, http.StatusInternalServerError, "could not load workspace") + return + } + writeJSON(w, http.StatusOK, workspaceToPayload(ws)) +} + +// UpdateWorkspace — PATCH /api/v1/workspaces/{id}. +func (s *Server) UpdateWorkspace(w http.ResponseWriter, r *http.Request, id string) { + if s.workspacesUnavailable(w) { + return + } + var body openapi.UpdateWorkspaceRequest + if err := json.NewDecoder(r.Body).Decode(&body); err != nil { + writeError(w, http.StatusUnprocessableEntity, "invalid JSON body") + return + } + ws, err := s.Deps.Workspaces.Update(r.Context(), id, body.Name, body.Description) + if err != nil { + switch { + case errors.Is(err, workspaces.ErrNotFound): + writeError(w, http.StatusNotFound, "workspace not found") + case errors.Is(err, workspaces.ErrNameEmpty): + writeError(w, http.StatusUnprocessableEntity, "name is required") + case errors.Is(err, workspaces.ErrNameTaken): + writeError(w, http.StatusConflict, "workspace name already exists") + default: + writeError(w, http.StatusInternalServerError, "could not update workspace") + } + return + } + writeJSON(w, http.StatusOK, workspaceToPayload(ws)) +} + +// DeleteWorkspace — DELETE /api/v1/workspaces/{id}. +func (s *Server) DeleteWorkspace(w http.ResponseWriter, r *http.Request, id string) { + if s.workspacesUnavailable(w) { + return + } + if err := s.Deps.Workspaces.Delete(r.Context(), id); err != nil { + if errors.Is(err, workspaces.ErrNotFound) { + writeError(w, http.StatusNotFound, "workspace not found") + return + } + writeError(w, http.StatusInternalServerError, "could not delete workspace") + return + } + w.WriteHeader(http.StatusNoContent) +} diff --git a/server/internal/httpapi/workspaces_test.go b/server/internal/httpapi/workspaces_test.go new file mode 100644 index 0000000..efad306 --- /dev/null +++ b/server/internal/httpapi/workspaces_test.go @@ -0,0 +1,506 @@ +package httpapi + +import ( + "bytes" + "encoding/json" + "net/http" + "net/http/httptest" + "testing" + + "github.com/dvcdsys/code-index/server/internal/githubtokens" + "github.com/dvcdsys/code-index/server/internal/secrets" + "github.com/dvcdsys/code-index/server/internal/workspaces" +) + +// fakeGithubAPIScopes is the comma-separated X-OAuth-Scopes value the +// in-test GitHub stub returns from GET /user. Tests that need to check +// scope-from-header propagation read this constant. +const fakeGithubAPIScopes = "repo, admin:repo_hook" + +// fakeGithubAPI returns the base URL of an httptest server that +// answers GET /user with 200 + a stable X-OAuth-Scopes header — the +// minimum the token-creation handler needs to think a PAT is valid. +// Exposed so individual tests can swap in different responses (e.g. a +// 401 to exercise the rejection path) by overriding Deps.GithubAPIBaseURL. +func fakeGithubAPI(t *testing.T) string { + t.Helper() + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + if r.URL.Path != "/user" { + http.Error(w, "not found", http.StatusNotFound) + return + } + w.Header().Set("X-OAuth-Scopes", fakeGithubAPIScopes) + w.Header().Set("Content-Type", "application/json") + _, _ = w.Write([]byte(`{"login": "test-user"}`)) + })) + t.Cleanup(srv.Close) + return srv.URL +} + +// workspaceRouter spins up a chi router with auth disabled, workspaces +// enabled, and an in-memory backing store. Helpers stay tight; the +// existing dbOpenMemory + seedless* shims live in auth_test.go. +// +// The token-creation handler now calls GET /user to validate the PAT +// and read X-OAuth-Scopes. Tests get a deterministic stub via +// fakeGithubAPI so we don't hit the real api.github.com. +func workspaceRouter(t *testing.T, enabled bool) http.Handler { + t.Helper() + d, err := dbOpenMemory(t) + if err != nil { + t.Fatalf("open db: %v", err) + } + + t.Setenv("CIX_SECRET_KEY", "") + t.Setenv("CIX_SECRET_KEYFILE", "") + var ( + wsSvc *workspaces.Service + ghSvc *githubtokens.Service + ) + if enabled { + sec, err := secrets.Open(secrets.OpenOptions{DataDir: t.TempDir(), AllowGenerate: true}) + if err != nil { + t.Fatalf("open secrets: %v", err) + } + wsSvc = workspaces.New(d) + ghSvc = githubtokens.New(d, sec) + } + + return NewRouter(Deps{ + DB: d, + ServerVersion: "test", + APIVersion: "v1", + Backend: "go", + Logger: nil, + AuthDisabled: true, + Users: seedlessUsers(d), + Sessions: seedlessSessions(d), + APIKeys: seedlessAPIKeys(d), + WorkspacesEnabled: enabled, + Workspaces: wsSvc, + GithubTokens: ghSvc, + GithubAPIBaseURL: fakeGithubAPI(t), + }) +} + +func doJSON(t *testing.T, router http.Handler, method, path string, body any) *httptest.ResponseRecorder { + t.Helper() + var reader *bytes.Reader + if body != nil { + b, _ := json.Marshal(body) + reader = bytes.NewReader(b) + } else { + reader = bytes.NewReader(nil) + } + req := httptest.NewRequest(method, path, reader) + if body != nil { + req.Header.Set("Content-Type", "application/json") + } + rr := httptest.NewRecorder() + router.ServeHTTP(rr, req) + return rr +} + +func TestWorkspaces_DisabledByDefault(t *testing.T) { + router := workspaceRouter(t, false) + rr := doJSON(t, router, http.MethodGet, "/api/v1/workspaces", nil) + if rr.Code != http.StatusServiceUnavailable { + t.Fatalf("expected 503 when feature disabled, got %d (body: %s)", rr.Code, rr.Body.String()) + } +} + +func TestWorkspaces_CRUD(t *testing.T) { + router := workspaceRouter(t, true) + + // Create + rr := doJSON(t, router, http.MethodPost, "/api/v1/workspaces", map[string]any{ + "name": "platform", + "description": "microservices", + }) + if rr.Code != http.StatusCreated { + t.Fatalf("create: expected 201, got %d (%s)", rr.Code, rr.Body.String()) + } + var created workspacePayload + if err := json.Unmarshal(rr.Body.Bytes(), &created); err != nil { + t.Fatalf("decode created: %v", err) + } + if created.ID == "" || created.Name != "platform" || created.Description != "microservices" { + t.Fatalf("unexpected created payload: %+v", created) + } + + // Duplicate name → 409 + rr = doJSON(t, router, http.MethodPost, "/api/v1/workspaces", map[string]any{"name": "platform"}) + if rr.Code != http.StatusConflict { + t.Fatalf("duplicate: expected 409, got %d", rr.Code) + } + + // Get + rr = doJSON(t, router, http.MethodGet, "/api/v1/workspaces/"+created.ID, nil) + if rr.Code != http.StatusOK { + t.Fatalf("get: expected 200, got %d", rr.Code) + } + + // List + rr = doJSON(t, router, http.MethodGet, "/api/v1/workspaces", nil) + if rr.Code != http.StatusOK { + t.Fatalf("list: expected 200, got %d", rr.Code) + } + var listResp struct { + Workspaces []workspacePayload `json:"workspaces"` + Total int `json:"total"` + } + _ = json.Unmarshal(rr.Body.Bytes(), &listResp) + if listResp.Total != 1 || len(listResp.Workspaces) != 1 { + t.Fatalf("list mismatch: %+v", listResp) + } + + // Patch + rr = doJSON(t, router, http.MethodPatch, "/api/v1/workspaces/"+created.ID, map[string]any{ + "description": "renamed", + }) + if rr.Code != http.StatusOK { + t.Fatalf("patch: expected 200, got %d", rr.Code) + } + var patched workspacePayload + _ = json.Unmarshal(rr.Body.Bytes(), &patched) + if patched.Description != "renamed" || patched.Name != "platform" { + t.Fatalf("patch did not apply: %+v", patched) + } + + // Delete + rr = doJSON(t, router, http.MethodDelete, "/api/v1/workspaces/"+created.ID, nil) + if rr.Code != http.StatusNoContent { + t.Fatalf("delete: expected 204, got %d", rr.Code) + } + // Second delete → 404 + rr = doJSON(t, router, http.MethodDelete, "/api/v1/workspaces/"+created.ID, nil) + if rr.Code != http.StatusNotFound { + t.Fatalf("delete-twice: expected 404, got %d", rr.Code) + } +} + +func TestGithubTokens_CRUD_PlaintextNotEchoed(t *testing.T) { + router := workspaceRouter(t, true) + + const secret = "ghp_super_secret_test_value_donotleak" + // User-supplied scopes in the body are deliberately wrong here; + // the server must ignore them and use what the (stubbed) GitHub + // API advertises via X-OAuth-Scopes. + rr := doJSON(t, router, http.MethodPost, "/api/v1/github-tokens", map[string]any{ + "name": "personal", + "token": secret, + "scopes": []string{"deliberately-wrong-scope"}, + }) + if rr.Code != http.StatusCreated { + t.Fatalf("create: expected 201, got %d (%s)", rr.Code, rr.Body.String()) + } + if bytes.Contains(rr.Body.Bytes(), []byte(secret)) { + t.Fatalf("CRITICAL: plaintext leaked in POST response body: %s", rr.Body.String()) + } + var created githubTokenPayload + _ = json.Unmarshal(rr.Body.Bytes(), &created) + if created.ID == "" || created.Name != "personal" { + t.Fatalf("unexpected payload: %+v", created) + } + // Scopes must come from the stub's X-OAuth-Scopes header, + // not from the request body — that's the whole point of the + // validate-against-GitHub flow. + if len(created.Scopes) != 2 || + created.Scopes[0] != "repo" || + created.Scopes[1] != "admin:repo_hook" { + t.Fatalf("expected scopes from X-OAuth-Scopes header, got %v", created.Scopes) + } + + // List must not contain plaintext anywhere. + rr = doJSON(t, router, http.MethodGet, "/api/v1/github-tokens", nil) + if rr.Code != http.StatusOK { + t.Fatalf("list: expected 200, got %d", rr.Code) + } + if bytes.Contains(rr.Body.Bytes(), []byte(secret)) { + t.Fatalf("CRITICAL: plaintext leaked in GET list body: %s", rr.Body.String()) + } + + // Delete. + rr = doJSON(t, router, http.MethodDelete, "/api/v1/github-tokens/"+created.ID, nil) + if rr.Code != http.StatusNoContent { + t.Fatalf("delete: expected 204, got %d", rr.Code) + } +} + +// TestGithubTokens_RejectInvalidToken — when GitHub answers 401 we must +// surface a 422 with a clear message rather than persisting an unusable +// token. Exercised with a one-off stub that always rejects. +func TestGithubTokens_RejectInvalidToken(t *testing.T) { + d, err := dbOpenMemory(t) + if err != nil { + t.Fatalf("open db: %v", err) + } + sec, err := secrets.Open(secrets.OpenOptions{DataDir: t.TempDir(), AllowGenerate: true}) + if err != nil { + t.Fatalf("open secrets: %v", err) + } + stub := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) { + w.WriteHeader(http.StatusUnauthorized) + _, _ = w.Write([]byte(`{"message": "Bad credentials"}`)) + })) + t.Cleanup(stub.Close) + + router := NewRouter(Deps{ + DB: d, + ServerVersion: "test", + APIVersion: "v1", + Backend: "go", + AuthDisabled: true, + Users: seedlessUsers(d), + Sessions: seedlessSessions(d), + APIKeys: seedlessAPIKeys(d), + WorkspacesEnabled: true, + Workspaces: workspaces.New(d), + GithubTokens: githubtokens.New(d, sec), + GithubAPIBaseURL: stub.URL, + }) + + rr := doJSON(t, router, http.MethodPost, "/api/v1/github-tokens", map[string]any{ + "name": "personal", + "token": "ghp_bad", + }) + if rr.Code != http.StatusUnprocessableEntity { + t.Fatalf("expected 422 on invalid token, got %d (%s)", rr.Code, rr.Body.String()) + } + if !bytes.Contains(rr.Body.Bytes(), []byte("Bad credentials")) { + t.Fatalf("error body should surface GitHub message, got %s", rr.Body.String()) + } +} + +// TestGithubTokens_ListRepos exercises the new add-repo flow's first +// step: the dashboard fetches the repos visible to a stored PAT so it +// can render the repo picker. Validates that: +// - the PAT is never echoed in the response +// - the X-OAuth-Scopes-validated token survives long enough for the +// subsequent /repos call to use it +// - the optional q= filter is applied server-side +func TestGithubTokens_ListRepos(t *testing.T) { + d, err := dbOpenMemory(t) + if err != nil { + t.Fatalf("open db: %v", err) + } + sec, err := secrets.Open(secrets.OpenOptions{DataDir: t.TempDir(), AllowGenerate: true}) + if err != nil { + t.Fatalf("open secrets: %v", err) + } + + // Combined stub serves /user (for token validation) and /user/repos + // (for the new endpoint). Two repos returned so the q= filter test + // has something to discriminate. + stub := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + switch r.URL.Path { + case "/user": + w.Header().Set("X-OAuth-Scopes", "repo") + _, _ = w.Write([]byte(`{"login": "alice"}`)) + case "/user/repos": + _, _ = w.Write([]byte(`[ + {"full_name":"alice/services","default_branch":"main","private":true,"html_url":"https://github.com/alice/services"}, + {"full_name":"alice/docs","default_branch":"main","private":false,"html_url":"https://github.com/alice/docs"} + ]`)) + default: + http.Error(w, "unexpected: "+r.URL.Path, http.StatusNotFound) + } + })) + t.Cleanup(stub.Close) + + router := NewRouter(Deps{ + DB: d, + ServerVersion: "test", + APIVersion: "v1", + Backend: "go", + AuthDisabled: true, + Users: seedlessUsers(d), + Sessions: seedlessSessions(d), + APIKeys: seedlessAPIKeys(d), + WorkspacesEnabled: true, + Workspaces: workspaces.New(d), + GithubTokens: githubtokens.New(d, sec), + GithubAPIBaseURL: stub.URL, + }) + + // Create the token so we have an id to address. + const secret = "ghp_secret_value" + rr := doJSON(t, router, http.MethodPost, "/api/v1/github-tokens", map[string]any{ + "name": "personal", + "token": secret, + }) + if rr.Code != http.StatusCreated { + t.Fatalf("create token: expected 201, got %d (%s)", rr.Code, rr.Body.String()) + } + var created githubTokenPayload + _ = json.Unmarshal(rr.Body.Bytes(), &created) + + // Unfiltered list — two repos. + rr = doJSON(t, router, http.MethodGet, "/api/v1/github-tokens/"+created.ID+"/repos", nil) + if rr.Code != http.StatusOK { + t.Fatalf("list repos: expected 200, got %d (%s)", rr.Code, rr.Body.String()) + } + if bytes.Contains(rr.Body.Bytes(), []byte(secret)) { + t.Fatalf("CRITICAL: PAT plaintext leaked in repos list body") + } + var allResp struct { + Repos []struct { + FullName string `json:"full_name"` + DefaultBranch string `json:"default_branch"` + Private bool `json:"private"` + } `json:"repos"` + Total int `json:"total"` + } + _ = json.Unmarshal(rr.Body.Bytes(), &allResp) + if allResp.Total != 2 { + t.Fatalf("expected 2 repos, got %d (%s)", allResp.Total, rr.Body.String()) + } + + // Filtered list (q=docs) — server applies the substring filter. + rr = doJSON(t, router, http.MethodGet, "/api/v1/github-tokens/"+created.ID+"/repos?q=docs", nil) + if rr.Code != http.StatusOK { + t.Fatalf("filtered: expected 200, got %d", rr.Code) + } + _ = json.Unmarshal(rr.Body.Bytes(), &allResp) + if allResp.Total != 1 || allResp.Repos[0].FullName != "alice/docs" { + t.Fatalf("expected only alice/docs, got %+v", allResp) + } +} + +// TestGithubTokens_ListAccountsAndScopedRepos covers the new +// add-repo flow: the dashboard fetches the accounts visible to a PAT, +// then asks for repos scoped to a specific account. Both paths must +// keep the PAT plaintext server-side and use the right GitHub endpoint. +func TestGithubTokens_ListAccountsAndScopedRepos(t *testing.T) { + d, err := dbOpenMemory(t) + if err != nil { + t.Fatalf("open db: %v", err) + } + sec, err := secrets.Open(secrets.OpenOptions{DataDir: t.TempDir(), AllowGenerate: true}) + if err != nil { + t.Fatalf("open secrets: %v", err) + } + + // Records which path GitHub was hit on so the test can assert + // account-scoped requests reach the right endpoint. + var hitPath string + stub := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { + hitPath = r.URL.Path + switch r.URL.Path { + case "/user": + w.Header().Set("X-OAuth-Scopes", "repo") + _, _ = w.Write([]byte(`{"login":"alice"}`)) + case "/orgs/acme/repos": + _, _ = w.Write([]byte(`[{"full_name":"acme/api","default_branch":"main","private":true,"html_url":"https://github.com/acme/api","owner":{"login":"acme","type":"Organization"}}]`)) + case "/users/alice/repos": + _, _ = w.Write([]byte(`[{"full_name":"alice/dotfiles","default_branch":"main","private":false,"html_url":"https://github.com/alice/dotfiles","owner":{"login":"alice","type":"User"}}]`)) + case "/user/repos": + // /user/repos is the new source-of-truth for ListAccounts. + // owner.type tells the dashboard whether to render this as + // a user or org account in the dropdown. + _, _ = w.Write([]byte(`[ + {"full_name":"alice/personal","default_branch":"main","private":false,"html_url":"https://github.com/alice/personal","owner":{"login":"alice","type":"User"}}, + {"full_name":"acme/shared","default_branch":"main","private":true,"html_url":"https://github.com/acme/shared","owner":{"login":"acme","type":"Organization"}} + ]`)) + default: + http.Error(w, "unexpected: "+r.URL.Path, http.StatusNotFound) + } + })) + t.Cleanup(stub.Close) + + router := NewRouter(Deps{ + DB: d, + ServerVersion: "test", + APIVersion: "v1", + Backend: "go", + AuthDisabled: true, + Users: seedlessUsers(d), + Sessions: seedlessSessions(d), + APIKeys: seedlessAPIKeys(d), + WorkspacesEnabled: true, + Workspaces: workspaces.New(d), + GithubTokens: githubtokens.New(d, sec), + GithubAPIBaseURL: stub.URL, + }) + + // Create token. + rr := doJSON(t, router, http.MethodPost, "/api/v1/github-tokens", map[string]any{ + "name": "personal", "token": "ghp_x", + }) + if rr.Code != http.StatusCreated { + t.Fatalf("create token: %d (%s)", rr.Code, rr.Body.String()) + } + var created githubTokenPayload + _ = json.Unmarshal(rr.Body.Bytes(), &created) + + // List accounts. + rr = doJSON(t, router, http.MethodGet, "/api/v1/github-tokens/"+created.ID+"/accounts", nil) + if rr.Code != http.StatusOK { + t.Fatalf("list accounts: %d (%s)", rr.Code, rr.Body.String()) + } + var accResp struct { + Accounts []struct { + Login string `json:"login"` + Type string `json:"type"` + } `json:"accounts"` + Total int `json:"total"` + } + _ = json.Unmarshal(rr.Body.Bytes(), &accResp) + if accResp.Total != 2 || + accResp.Accounts[0].Login != "alice" || accResp.Accounts[0].Type != "user" || + accResp.Accounts[1].Login != "acme" || accResp.Accounts[1].Type != "org" { + t.Fatalf("unexpected accounts payload: %+v", accResp) + } + + // Account-scoped repos (org). + rr = doJSON(t, router, http.MethodGet, + "/api/v1/github-tokens/"+created.ID+"/repos?account=acme&account_type=org", nil) + if rr.Code != http.StatusOK { + t.Fatalf("scoped org repos: %d", rr.Code) + } + if hitPath != "/orgs/acme/repos" { + t.Fatalf("expected GitHub /orgs/acme/repos hit, got %q", hitPath) + } + + // Account-scoped repos (user). + rr = doJSON(t, router, http.MethodGet, + "/api/v1/github-tokens/"+created.ID+"/repos?account=alice&account_type=user", nil) + if rr.Code != http.StatusOK { + t.Fatalf("scoped user repos: %d", rr.Code) + } + if hitPath != "/users/alice/repos" { + t.Fatalf("expected GitHub /users/alice/repos hit, got %q", hitPath) + } + + // No account → legacy aggregated /user/repos. + rr = doJSON(t, router, http.MethodGet, + "/api/v1/github-tokens/"+created.ID+"/repos", nil) + if rr.Code != http.StatusOK { + t.Fatalf("unscoped: %d", rr.Code) + } + if hitPath != "/user/repos" { + t.Fatalf("expected GitHub /user/repos hit, got %q", hitPath) + } + + // account without account_type → 422. + rr = doJSON(t, router, http.MethodGet, + "/api/v1/github-tokens/"+created.ID+"/repos?account=acme", nil) + if rr.Code != http.StatusUnprocessableEntity { + t.Fatalf("missing account_type should 422, got %d", rr.Code) + } +} + +func TestGithubTokens_RejectMissingFields(t *testing.T) { + router := workspaceRouter(t, true) + + // Missing token value. + rr := doJSON(t, router, http.MethodPost, "/api/v1/github-tokens", map[string]any{"name": "x"}) + if rr.Code != http.StatusUnprocessableEntity { + t.Fatalf("expected 422 on missing token, got %d", rr.Code) + } + // Missing name. + rr = doJSON(t, router, http.MethodPost, "/api/v1/github-tokens", map[string]any{"token": "y"}) + if rr.Code != http.StatusUnprocessableEntity { + t.Fatalf("expected 422 on missing name, got %d", rr.Code) + } +} diff --git a/server/internal/httpapi/workspacesearch.go b/server/internal/httpapi/workspacesearch.go new file mode 100644 index 0000000..9a2518a --- /dev/null +++ b/server/internal/httpapi/workspacesearch.go @@ -0,0 +1,751 @@ +package httpapi + +import ( + "context" + "errors" + "math" + "net/http" + "runtime" + "sort" + "strconv" + "sync" + + "golang.org/x/sync/errgroup" + + "github.com/dvcdsys/code-index/server/internal/chunksfts" + "github.com/dvcdsys/code-index/server/internal/httpapi/openapi" + "github.com/dvcdsys/code-index/server/internal/workspacerepos" + "github.com/dvcdsys/code-index/server/internal/workspaces" +) + +// Tuning constants for the hybrid workspace search. +// +// - perProjectLimit / bm25Limit: per-side retrieval depth per project. +// 50 leaves room for RRF fusion to differentiate the top candidates +// without making rare-but-real later hits unreachable. +// - topNPerProject: how many of a project's strongest hits feed into +// the per-side aggregate signal used for candidacy. +// - topProjectsDefault: default "Top projects" panel size. +// - perProjectChunkCap: max chunks from any single project in the +// final flat chunks list. Prevents the dominant repo from eating +// every slot. +// - alpha: weight of the BM25 (sparse) signal in the candidacy +// blend. 0.5 = equal weighting. BM25 carries the project-gating +// signal (a project with zero literal token matches is a strong +// "irrelevant" cue dense alone can't produce) so we don't tilt +// toward dense even when scores look more authoritative there. +// - relativeProjThreshold: surviving projects must score ≥ best * +// this fraction. Relative-not-absolute so the gate stays useful +// across queries of varying strength. +// - rrfK: standard RRF constant from Cormack 2009. 60 is the +// widely-used default — small enough that rank-1 dominates, +// large enough that ranks 5-10 still contribute. +const ( + workspaceSearchPerProjectLimit = 50 + workspaceSearchBM25Limit = 50 + workspaceSearchTopNPerProject = 5 + workspaceSearchTopProjects = 10 + workspaceSearchPerProjChunkCap = 5 + workspaceSearchAlpha = 0.5 + workspaceSearchProjThreshold = 0.4 + rrfK = 60 +) + +// workspaceSearchProjectPayload mirrors WorkspaceSearchProject from +// the OpenAPI spec. Hand-rolled so the JSON shape stays plain Go +// types rather than the generated alias indirection. +type workspaceSearchProjectPayload struct { + ProjectPath string `json:"project_path"` + Label string `json:"label"` + ProjectScore float32 `json:"project_score"` + NumHits int `json:"num_hits"` + // BM25Score and DenseScore are the per-signal aggregates that + // feed into ProjectScore. Surfaced so the dashboard can show + // "this repo ranked high because BM25 matched literal tokens" vs. + // "ranked on dense semantic similarity only". + BM25Score float32 `json:"bm25_score"` + DenseScore float32 `json:"dense_score"` +} + +type workspaceSearchChunkPayload struct { + ProjectPath string `json:"project_path"` + FilePath string `json:"file_path"` + StartLine int `json:"start_line"` + EndLine int `json:"end_line"` + SymbolName string `json:"symbol_name,omitempty"` + Language string `json:"language,omitempty"` + Score float32 `json:"score"` + Content string `json:"content"` +} + +type workspaceSearchPendingRepoPayload struct { + ProjectPath string `json:"project_path"` + Status string `json:"status"` +} + +type workspaceSearchFailedRepoPayload struct { + ProjectPath string `json:"project_path"` + Reason string `json:"reason"` +} + +// workspaceSearchStaleFTSRepoPayload reports a repo that was indexed +// before the chunks_fts mirror existed: dense search works, BM25 +// returns nothing for it, hybrid degrades to pure-dense for that one +// entry. Dashboard renders a banner telling the operator to reindex. +type workspaceSearchStaleFTSRepoPayload struct { + ProjectPath string `json:"project_path"` +} + +// projectHits is the per-project intermediate state accumulated across +// the parallel fan-out. Dense and BM25 sides arrive separately and are +// fused inside the goroutine before being collected. +type projectHits struct { + ProjectPath string + // FusedChunks are the per-project chunks ranked by RRF over the + // dense + BM25 lists. Highest fused rank first. + FusedChunks []workspaceSearchChunkPayload + // DenseSignal is the mean of the top-N dense scores in the + // project (cosine, [0,1]). + DenseSignal float32 + // BM25Signal is the mean of the top-N BM25 scores in the project + // (positive, unbounded — SQLite's bm25() flipped via -bm25 at + // the chunksfts boundary). Normalized into candidacy via + // per-query min-max before being blended. + BM25Signal float32 + // Candidacy is the α-blended, per-query-normalized score the + // projects panel ranks by; recomputed after every project's + // fan-out completes so the normalization sees all candidates. + Candidacy float32 +} + +// WorkspaceSearch — GET /api/v1/workspaces/{id}/search. +// +// Hybrid BM25+dense fan-out. Each project runs two queries in +// parallel: dense (chromem cosine) and sparse (SQLite FTS5 BM25 over +// chunks_fts). Per project, the two ranked lists are fused via +// Reciprocal Rank Fusion. Across projects, an α-blended candidacy +// score (with per-query min-max normalization on both signals) plus +// a relative threshold (`candidacy ≥ best × 0.4`) keeps the result +// set focused on repos that actually share vocabulary or semantics +// with the query — pure-dense fan-out leaked every workspace repo at +// noise-level cosine similarity, since chromem returns the N nearest +// vectors regardless of how far away "nearest" actually is. +// +// Observed pre-hybrid: in a workspace of N repos, the repos that +// contained zero literal mentions of the query term still surfaced +// 50 chunks each at noise-level cosine (0.17-0.27). With hybrid + +// project threshold those repos drop out, restoring the cross-project +// signal the user needs to scope an agent's follow-up search. +func (s *Server) WorkspaceSearch(w http.ResponseWriter, r *http.Request, id string, params openapi.WorkspaceSearchParams) { + if s.workspaceReposUnavailable(w) { + return + } + if s.Deps.VectorStore == nil || s.Deps.EmbeddingSvc == nil { + writeError(w, http.StatusServiceUnavailable, + "embeddings or vectorstore not configured — workspace search requires both") + return + } + if _, err := s.Deps.Workspaces.GetByID(r.Context(), id); err != nil { + if errors.Is(err, workspaces.ErrNotFound) { + writeError(w, http.StatusNotFound, "workspace not found") + return + } + writeError(w, http.StatusInternalServerError, "could not load workspace") + return + } + + if params.Q == "" { + writeError(w, http.StatusUnprocessableEntity, "q is required") + return + } + topProjects := clampInt(params.TopProjects, workspaceSearchTopProjects, 1, 50) + topChunks := clampInt(params.TopChunks, 20, 1, 200) + // Default 0.4 matches per-project SemanticSearch default so a query + // that returns nothing from a single project doesn't surface a wall + // of weak-cosine noise when broadcast across the workspace. Cross- + // project sweeps that want long-tail recall must pass min_score=0 + // explicitly. + minScore := clampFloat32(params.MinScore, 0.4, 0, 1) + + queryEmbedding, err := s.Deps.EmbeddingSvc.EmbedQuery(r.Context(), params.Q) + if err != nil { + writeError(w, http.StatusServiceUnavailable, "could not embed query: "+err.Error()) + return + } + if len(queryEmbedding) == 0 { + writeError(w, http.StatusServiceUnavailable, "embedder returned empty vector") + return + } + + repos, err := s.Deps.WorkspaceRepos.ListByWorkspace(r.Context(), id) + if err != nil { + writeError(w, http.StatusInternalServerError, "could not load workspace repos: "+err.Error()) + return + } + if len(repos) == 0 { + writeJSON(w, http.StatusOK, workspaceSearchResponse( + "empty", + []workspaceSearchProjectPayload{}, + []workspaceSearchChunkPayload{}, + nil, + nil, + nil, + )) + return + } + + seenProjects := make(map[string]struct{}, len(repos)) + projectPaths := make([]string, 0, len(repos)) + pendingRepos := make([]workspaceSearchPendingRepoPayload, 0) + for _, rp := range repos { + if rp.Status != workspacerepos.StatusIndexed { + pendingRepos = append(pendingRepos, workspaceSearchPendingRepoPayload{ + ProjectPath: rp.ProjectPath, + Status: rp.Status, + }) + continue + } + if _, ok := seenProjects[rp.ProjectPath]; ok { + continue + } + seenProjects[rp.ProjectPath] = struct{}{} + projectPaths = append(projectPaths, rp.ProjectPath) + } + + if len(projectPaths) == 0 { + writeJSON(w, http.StatusOK, workspaceSearchResponse( + "empty", + []workspaceSearchProjectPayload{}, + []workspaceSearchChunkPayload{}, + pendingRepos, + nil, + nil, + )) + return + } + + // Detect repos that were indexed before chunks_fts existed: + // file_hashes has rows for them (so they're "indexed") but + // chunks_meta is empty, meaning the BM25 side is permanently 0 + // until a reindex backfills it. We still run the search (dense + // works) but surface the list so the dashboard can prompt for a + // reindex — otherwise the operator sees no observable difference + // from the pre-hybrid algorithm and assumes the change didn't + // take effect. + staleRepos := s.detectStaleFTSRepos(r.Context(), projectPaths) + + hits, failedRepos, err := s.fanOutHybrid(r.Context(), id, projectPaths, params.Q, queryEmbedding, minScore) + if err != nil { + writeError(w, http.StatusInternalServerError, "fan-out search failed: "+err.Error()) + return + } + + // Per-query min-max normalization on each signal independently, + // then α-blend. Both signals are >=0; using raw/max instead of + // (raw-min)/(max-min) means a project at 60% of best gets 0.6 + // candidacy rather than being projected to 0 (which the strict + // min-max form would do whenever the workspace has even one weak + // project). + var bm25Max, denseMax float32 + for _, ph := range hits { + if ph.BM25Signal > bm25Max { + bm25Max = ph.BM25Signal + } + if ph.DenseSignal > denseMax { + denseMax = ph.DenseSignal + } + } + for i := range hits { + var bm25Norm, denseNorm float32 + if bm25Max > 0 { + bm25Norm = hits[i].BM25Signal / bm25Max + } + if denseMax > 0 { + denseNorm = hits[i].DenseSignal / denseMax + } + hits[i].Candidacy = workspaceSearchAlpha*bm25Norm + (1-workspaceSearchAlpha)*denseNorm + } + + var bestCand float32 + for _, ph := range hits { + if ph.Candidacy > bestCand { + bestCand = ph.Candidacy + } + } + threshold := bestCand * workspaceSearchProjThreshold + surviving := make([]projectHits, 0, len(hits)) + for _, ph := range hits { + // A project with zero chunks contributes nothing regardless of + // candidacy — keeping the entry would create a row in the + // projects panel with num_hits=0 which is just visual noise. + if len(ph.FusedChunks) == 0 { + continue + } + if ph.Candidacy < threshold || ph.Candidacy <= 0 { + continue + } + surviving = append(surviving, ph) + } + + if len(surviving) == 0 { + status := "empty" + if len(failedRepos) > 0 { + status = "partial_failure" + } + writeJSON(w, http.StatusOK, workspaceSearchResponse( + status, + []workspaceSearchProjectPayload{}, + []workspaceSearchChunkPayload{}, + pendingRepos, + failedRepos, + staleRepos, + )) + return + } + + // Build the projects panel + the flat chunk list. Per-project cap + // is applied to each project's fused chunk list so one dominant + // repo can't take every slot in the round-robin interleave below; + // the projects panel sees every surviving project (its num_hits + // reflects the post-cap count so the UI doesn't dangle a "10 + // hits" badge against a chunk list with 5 entries). + for i := range surviving { + if len(surviving[i].FusedChunks) > workspaceSearchPerProjChunkCap { + surviving[i].FusedChunks = surviving[i].FusedChunks[:workspaceSearchPerProjChunkCap] + } + } + + projectPayloads := make([]workspaceSearchProjectPayload, 0, len(surviving)) + for _, ph := range surviving { + projectPayloads = append(projectPayloads, workspaceSearchProjectPayload{ + ProjectPath: ph.ProjectPath, + Label: projectLabel(ph.ProjectPath), + ProjectScore: round4(ph.Candidacy), + NumHits: len(ph.FusedChunks), + BM25Score: round4(ph.BM25Signal), + DenseScore: round4(ph.DenseSignal), + }) + } + + sort.SliceStable(projectPayloads, func(i, j int) bool { + return projectPayloads[i].ProjectScore > projectPayloads[j].ProjectScore + }) + if len(projectPayloads) > topProjects { + projectPayloads = projectPayloads[:topProjects] + } + + // Restrict the interleave to projects that survived the panel + // truncation. Otherwise a workspace with > top_projects surviving + // repos can surface chunks whose project_path is absent from + // projects[] — agents lose access to bm25_score/dense_score and + // the response looks inconsistent. Filter to the panel before + // round-robin. + panelSet := make(map[string]struct{}, len(projectPayloads)) + for _, p := range projectPayloads { + panelSet[p.ProjectPath] = struct{}{} + } + panelSurviving := make([]projectHits, 0, len(projectPayloads)) + for _, ph := range surviving { + if _, ok := panelSet[ph.ProjectPath]; ok { + panelSurviving = append(panelSurviving, ph) + } + } + + // Round-robin across surviving projects so rank-1 from each + // project lands in the first N slots, then rank-2, etc. This + // gives every surviving repo a chance to surface its top chunk + // before any repo's tail entries appear — matches the project- + // picker use case where the user wants to see each project's + // most-relevant hit before diving into the dominant repo's tail. + merged := interleaveByRank(panelSurviving, topChunks) + + status := "ok" + if len(merged) == 0 { + status = "empty" + if len(failedRepos) > 0 { + status = "partial_failure" + } + } + writeJSON(w, http.StatusOK, workspaceSearchResponse( + status, + projectPayloads, + merged, + pendingRepos, + failedRepos, + staleRepos, + )) +} + +// interleaveByRank returns up to `limit` chunks by walking the surviving +// projects round-robin — rank-1 from every project before any rank-2, +// then rank-2, and so on. Projects are visited in candidacy-desc order +// so the strongest project still leads, but every other surviving +// project gets a chance to surface its top chunk before tail entries +// from the leader appear. +// +// Inside the same rank tier, dedupe keeps the natural workspace order +// so two chunks of identical content from different projects still +// both appear (with their respective project_path). +func interleaveByRank(projects []projectHits, limit int) []workspaceSearchChunkPayload { + if limit <= 0 || len(projects) == 0 { + return []workspaceSearchChunkPayload{} + } + ordered := make([]projectHits, len(projects)) + copy(ordered, projects) + sort.SliceStable(ordered, func(i, j int) bool { + return ordered[i].Candidacy > ordered[j].Candidacy + }) + + out := make([]workspaceSearchChunkPayload, 0, limit) + dedupKey := func(c workspaceSearchChunkPayload) string { + return c.ProjectPath + "|" + c.FilePath + "|" + + strconv.Itoa(c.StartLine) + "-" + strconv.Itoa(c.EndLine) + } + seen := make(map[string]struct{}, limit) + // rank index walks 0,1,2,... ; we stop when no project has a + // chunk at this rank (every list exhausted). + for r := 0; ; r++ { + progressed := false + for _, p := range ordered { + if r >= len(p.FusedChunks) { + continue + } + c := p.FusedChunks[r] + if c.ProjectPath == "" { + c.ProjectPath = p.ProjectPath + } + k := dedupKey(c) + if _, ok := seen[k]; ok { + continue + } + seen[k] = struct{}{} + out = append(out, c) + progressed = true + if len(out) >= limit { + return out + } + } + if !progressed { + break + } + } + return out +} + +// workspaceSearchResponse builds the final JSON payload. Single +// builder so the early-empty path and the happy path can't drift on +// which optional fields they include. +func workspaceSearchResponse( + status string, + projects []workspaceSearchProjectPayload, + chunks []workspaceSearchChunkPayload, + pending []workspaceSearchPendingRepoPayload, + failed []workspaceSearchFailedRepoPayload, + stale []workspaceSearchStaleFTSRepoPayload, +) map[string]any { + out := map[string]any{ + "status": status, + "projects": projects, + "chunks": chunks, + } + if len(pending) > 0 { + out["pending_repos"] = pending + } + if len(failed) > 0 { + out["failed_repos"] = failed + } + if len(stale) > 0 { + out["stale_fts_repos"] = stale + } + return out +} + +// detectStaleFTSRepos returns the subset of projectPaths whose +// chunks_meta is empty but file_hashes has at least one row — meaning +// the project was indexed before the FTS5 mirror existed and needs a +// reindex before BM25 can contribute. A best-effort detector: if any +// SQL probe errors out we log + return nil rather than fail the +// request, since the warning is informational, not load-bearing. +func (s *Server) detectStaleFTSRepos(ctx context.Context, projectPaths []string) []workspaceSearchStaleFTSRepoPayload { + out := make([]workspaceSearchStaleFTSRepoPayload, 0) + for _, pp := range projectPaths { + var nMeta, nFiles int + if err := s.Deps.DB.QueryRowContext(ctx, + `SELECT COUNT(*) FROM chunks_meta WHERE project_path = ? LIMIT 1`, pp).Scan(&nMeta); err != nil { + s.Deps.Logger.Warn("workspaces search: stale-fts probe (chunks_meta)", + "project_path", pp, "err", err) + return nil + } + if nMeta > 0 { + continue + } + if err := s.Deps.DB.QueryRowContext(ctx, + `SELECT COUNT(*) FROM file_hashes WHERE project_path = ? LIMIT 1`, pp).Scan(&nFiles); err != nil { + s.Deps.Logger.Warn("workspaces search: stale-fts probe (file_hashes)", + "project_path", pp, "err", err) + return nil + } + if nFiles > 0 { + out = append(out, workspaceSearchStaleFTSRepoPayload{ProjectPath: pp}) + } + } + return out +} + +// fanOutHybrid runs dense + BM25 in parallel per project, fuses each +// project's two ranked lists via RRF, and returns the per-project +// aggregates the candidacy step needs. Bounded by NumCPU goroutines +// across the workspace; each project is one slot regardless of +// whether it issues one or two sub-queries. +// +// Per-project failures: a BM25-side error is logged but does not mark +// the project as failed (FTS5 might not be populated yet for a +// pre-existing install; dense still works). A dense-side error is +// surfaced via failed_repos and dense_signal is left at 0 — the +// project can still be retained if BM25 alone is strong. +func (s *Server) fanOutHybrid( + ctx context.Context, + workspaceID string, + projectPaths []string, + rawQuery string, + queryEmbedding []float32, + minScore float32, +) ([]projectHits, []workspaceSearchFailedRepoPayload, error) { + concurrency := runtime.NumCPU() + if concurrency < 1 { + concurrency = 1 + } + + g, gctx := errgroup.WithContext(ctx) + g.SetLimit(concurrency) + + results := make([]projectHits, len(projectPaths)) + failures := make([]workspaceSearchFailedRepoPayload, len(projectPaths)) + failed := make([]bool, len(projectPaths)) + var mu sync.Mutex + + for i, pp := range projectPaths { + i, pp := i, pp + g.Go(func() error { + var ( + denseRes []workspaceSearchChunkPayload + bm25Res []workspaceSearchChunkPayload + denseErr error + ) + + rawDense, derr := s.Deps.VectorStore.Search(gctx, pp, queryEmbedding, workspaceSearchPerProjectLimit, nil) + if derr != nil { + denseErr = derr + s.Deps.Logger.Warn("workspaces search: dense query failed", + "workspace_id", workspaceID, + "project_path", pp, + "err", derr) + } else { + denseRes = make([]workspaceSearchChunkPayload, 0, len(rawDense)) + for _, h := range rawDense { + if h.Score < minScore { + continue + } + denseRes = append(denseRes, workspaceSearchChunkPayload{ + ProjectPath: pp, + FilePath: h.FilePath, + StartLine: h.StartLine, + EndLine: h.EndLine, + SymbolName: h.SymbolName, + Language: h.Language, + Score: h.Score, + Content: h.Content, + }) + } + } + + rawBM25, berr := chunksfts.SearchProject(gctx, s.Deps.DB, pp, rawQuery, workspaceSearchBM25Limit) + if berr != nil { + s.Deps.Logger.Warn("workspaces search: bm25 query failed", + "workspace_id", workspaceID, + "project_path", pp, + "err", berr) + } else { + bm25Res = make([]workspaceSearchChunkPayload, 0, len(rawBM25)) + for _, h := range rawBM25 { + bm25Res = append(bm25Res, workspaceSearchChunkPayload{ + ProjectPath: pp, + FilePath: h.FilePath, + StartLine: h.StartLine, + EndLine: h.EndLine, + SymbolName: h.SymbolName, + Language: h.Language, + // Score field carries the dense cosine for the + // merged chunk; for BM25-only hits we leave it + // at 0 (BM25 score is on a different scale and + // would mislead a client reading "score" as + // cosine). + Score: 0, + Content: h.Content, + }) + } + } + + fused := fuseRRF(denseRes, bm25Res) + denseSig := meanTopN(denseScoresOf(denseRes), workspaceSearchTopNPerProject) + bm25Sig := meanTopN(bm25ScoresOf(rawBM25), workspaceSearchTopNPerProject) + + mu.Lock() + if denseErr != nil { + failures[i] = workspaceSearchFailedRepoPayload{ + ProjectPath: pp, + Reason: "vectorstore_error", + } + failed[i] = true + } + results[i] = projectHits{ + ProjectPath: pp, + FusedChunks: fused, + DenseSignal: float32(denseSig), + BM25Signal: float32(bm25Sig), + } + mu.Unlock() + return nil + }) + } + if err := g.Wait(); err != nil { + return nil, nil, err + } + failedOut := make([]workspaceSearchFailedRepoPayload, 0) + for i, f := range failed { + if f { + failedOut = append(failedOut, failures[i]) + } + } + return results, failedOut, nil +} + +// fuseRRF returns chunks ranked by Reciprocal Rank Fusion over the two +// per-project lists. RRF score per chunk is sum(1/(k+rank_i)) across +// the lists where it appears. Chunks present in both lists naturally +// bubble to the top; chunks unique to one list still score positively. +// +// Chunk identity is (project_path, file_path, start_line, end_line) — +// matching chunks across the two lists must be the same span. The +// dense-side payload is preferred when both exist (it carries the +// non-zero `score` field). +func fuseRRF(dense, bm25 []workspaceSearchChunkPayload) []workspaceSearchChunkPayload { + type entry struct { + c workspaceSearchChunkPayload + rrf float64 + } + key := func(c workspaceSearchChunkPayload) string { + return c.ProjectPath + "|" + c.FilePath + "|" + + strconv.Itoa(c.StartLine) + "-" + strconv.Itoa(c.EndLine) + } + byKey := make(map[string]*entry) + for rank, c := range dense { + k := key(c) + byKey[k] = &entry{c: c, rrf: 1.0 / float64(rrfK+rank+1)} + } + for rank, c := range bm25 { + k := key(c) + add := 1.0 / float64(rrfK+rank+1) + if e, ok := byKey[k]; ok { + e.rrf += add + continue + } + byKey[k] = &entry{c: c, rrf: add} + } + out := make([]entry, 0, len(byKey)) + for _, e := range byKey { + out = append(out, *e) + } + sort.SliceStable(out, func(i, j int) bool { + return out[i].rrf > out[j].rrf + }) + chunks := make([]workspaceSearchChunkPayload, len(out)) + for i, e := range out { + chunks[i] = e.c + } + return chunks +} + +func denseScoresOf(chunks []workspaceSearchChunkPayload) []float64 { + out := make([]float64, len(chunks)) + for i, c := range chunks { + out[i] = float64(c.Score) + } + return out +} + +func bm25ScoresOf(hits []chunksfts.Hit) []float64 { + out := make([]float64, len(hits)) + for i, h := range hits { + out[i] = h.Score + } + return out +} + +// meanTopN returns the arithmetic mean of the top-n values in xs. +// Returns 0 when xs is empty. xs is sorted in descending order +// in-place; callers must pass a slice they own. +func meanTopN(xs []float64, n int) float64 { + if len(xs) == 0 { + return 0 + } + sort.Sort(sort.Reverse(sort.Float64Slice(xs))) + if n > len(xs) { + n = len(xs) + } + var sum float64 + for i := 0; i < n; i++ { + sum += xs[i] + } + return sum / float64(n) +} + +// projectLabel derives a short display label from a project_path. +// The path convention is `host/owner/repo@branch`; we strip everything +// up to the last `/` so the dashboard's "Top projects" panel shows +// compact, recognisable entries. +func projectLabel(projectPath string) string { + for i := len(projectPath) - 1; i >= 0; i-- { + if projectPath[i] == '/' { + return projectPath[i+1:] + } + } + return projectPath +} + +// round4 rounds f to 4 decimal places — matches the chunk-side +// rounding chromem already applies, so scores in the response look +// consistent across nested fields. +func round4(f float32) float32 { + if math.IsNaN(float64(f)) { + return 0 + } + const scale = 10000 + return float32(int(f*scale+0.5)) / scale +} + +func clampInt(v *int, def, min, max int) int { + if v == nil { + return def + } + if *v < min { + return min + } + if *v > max { + return max + } + return *v +} + +func clampFloat32(v *float32, def, min, max float32) float32 { + if v == nil { + return def + } + if *v < min { + return min + } + if *v > max { + return max + } + return *v +} diff --git a/server/internal/httpapi/workspacesearch_test.go b/server/internal/httpapi/workspacesearch_test.go new file mode 100644 index 0000000..4c74160 --- /dev/null +++ b/server/internal/httpapi/workspacesearch_test.go @@ -0,0 +1,1093 @@ +package httpapi + +import ( + "context" + "database/sql" + "encoding/json" + "math" + "net/http" + "path/filepath" + "strconv" + "testing" + "time" + + "github.com/google/uuid" + + "github.com/dvcdsys/code-index/server/internal/chunksfts" + "github.com/dvcdsys/code-index/server/internal/githubtokens" + "github.com/dvcdsys/code-index/server/internal/jobs" + "github.com/dvcdsys/code-index/server/internal/secrets" + "github.com/dvcdsys/code-index/server/internal/vectorstore" + "github.com/dvcdsys/code-index/server/internal/workspacerepos" + "github.com/dvcdsys/code-index/server/internal/workspaces" +) + +// fixedEmbedder is the stub query embedder. Workspace search calls +// EmbedQuery once before the fan-out; tests need a vector with the +// same dim as the seeded chunks, so the embedding is wired per-test +// via a struct field rather than baked into the type. +type fixedEmbedder struct { + q []float32 +} + +func (e fixedEmbedder) EmbedQuery(_ context.Context, _ string) ([]float32, error) { + return e.q, nil +} +func (e fixedEmbedder) Ready(_ context.Context) error { return nil } + +// newSearchRouter wires the minimum surface workspace search needs: +// workspaces, workspace_repos, jobs (unused but in Deps), vectorstore +// (real, on tmpdir), and a query embedder the caller controls. +func newSearchRouter(t *testing.T, d *sql.DB, vs *vectorstore.Store, emb fixedEmbedder) http.Handler { + t.Helper() + t.Setenv("CIX_SECRET_KEY", "") + t.Setenv("CIX_SECRET_KEYFILE", "") + sec, err := secrets.Open(secrets.OpenOptions{DataDir: t.TempDir(), AllowGenerate: true}) + if err != nil { + t.Fatalf("secrets: %v", err) + } + return NewRouter(Deps{ + DB: d, + AuthDisabled: true, + Users: seedlessUsers(d), + Sessions: seedlessSessions(d), + APIKeys: seedlessAPIKeys(d), + WorkspacesEnabled: true, + Workspaces: workspaces.New(d), + GithubTokens: githubtokens.New(d, sec), + WorkspaceRepos: workspacerepos.New(d), + Jobs: jobs.New(d, jobs.Options{Concurrency: 1, PollEvery: time.Hour}), + VectorStore: vs, + EmbeddingSvc: emb, + }) +} + +// seedRepoWithChunks inserts a projects + workspace_repos row for the +// given project_path inside the workspace, then upserts the supplied +// chunks into chromem so /search has something to retrieve. Bypasses +// the clone+index job chain — those are exercised in workspacerepos +// tests already. +func seedRepoWithChunks( + t *testing.T, + d *sql.DB, + vs *vectorstore.Store, + wsID, projectPath string, + chunks []vectorstore.Chunk, + embeddings [][]float32, +) { + t.Helper() + now := time.Now().UTC().Format(time.RFC3339Nano) + if _, err := d.Exec( + `INSERT INTO projects (host_path, container_path, languages, settings, stats, status, created_at, updated_at, path_hash) + VALUES (?, ?, '[]', '{}', '{}', 'created', ?, ?, 'h')`, + projectPath, projectPath, now, now, + ); err != nil { + t.Fatalf("insert project %q: %v", projectPath, err) + } + if _, err := d.Exec( + `INSERT INTO workspace_repos + (id, workspace_id, github_url, branch, project_path, webhook_secret, status, created_at, updated_at, last_indexed_at) + VALUES (?, ?, ?, 'main', ?, 'sec', 'indexed', ?, ?, ?)`, + uuid.NewString(), wsID, "https://"+projectPath, projectPath, now, now, now, + ); err != nil { + t.Fatalf("insert workspace_repo %q: %v", projectPath, err) + } + if err := vs.UpsertChunks(context.Background(), projectPath, chunks, embeddings); err != nil { + t.Fatalf("upsert chunks for %q: %v", projectPath, err) + } + // Mirror the production indexer: every chunk that lands in + // chromem also gets written to chunks_fts so the BM25 side of + // workspace search has data. Grouped by file so the per-file + // upsert path is exercised the same way as in the indexer. + byFile := map[string][]chunksfts.Chunk{} + for _, c := range chunks { + byFile[c.FilePath] = append(byFile[c.FilePath], chunksfts.Chunk{ + Content: c.Content, + FilePath: c.FilePath, + StartLine: c.StartLine, + EndLine: c.EndLine, + ChunkType: c.ChunkType, + SymbolName: c.SymbolName, + Language: c.Language, + }) + } + tx, err := d.Begin() + if err != nil { + t.Fatalf("begin tx: %v", err) + } + for fp, cs := range byFile { + if err := chunksfts.UpsertByFileTx(context.Background(), tx, projectPath, fp, cs); err != nil { + t.Fatalf("chunksfts upsert %q: %v", fp, err) + } + } + if err := tx.Commit(); err != nil { + t.Fatalf("commit chunksfts tx: %v", err) + } +} + +// l2 returns v scaled to unit length. chromem expects normalized +// vectors for clean cosine similarity; pre-normalising saves the +// re-norm chromem does internally and makes scores predictable. +func l2(v []float32) []float32 { + var sum float64 + for _, x := range v { + sum += float64(x) * float64(x) + } + if sum == 0 { + return v + } + scale := float32(1.0 / math.Sqrt(sum)) + out := make([]float32, len(v)) + for i, x := range v { + out[i] = x * scale + } + return out +} + +func openTestVectorStore(t *testing.T) *vectorstore.Store { + t.Helper() + vs, err := vectorstore.Open(filepath.Join(t.TempDir(), "chroma")) + if err != nil { + t.Fatalf("vectorstore: %v", err) + } + return vs +} + +type searchProjectResp struct { + ProjectPath string `json:"project_path"` + Label string `json:"label"` + ProjectScore float32 `json:"project_score"` + NumHits int `json:"num_hits"` + BM25Score float32 `json:"bm25_score"` + DenseScore float32 `json:"dense_score"` +} + +type searchChunkResp struct { + ProjectPath string `json:"project_path"` + FilePath string `json:"file_path"` + StartLine int `json:"start_line"` + EndLine int `json:"end_line"` + SymbolName string `json:"symbol_name,omitempty"` + Score float32 `json:"score"` + Content string `json:"content"` +} + +type searchPendingResp struct { + ProjectPath string `json:"project_path"` + Status string `json:"status"` +} + +type searchFailedResp struct { + ProjectPath string `json:"project_path"` + Reason string `json:"reason"` +} + +type searchResp struct { + Status string `json:"status"` + Projects []searchProjectResp `json:"projects"` + Chunks []searchChunkResp `json:"chunks"` + PendingRepos []searchPendingResp `json:"pending_repos,omitempty"` + FailedRepos []searchFailedResp `json:"failed_repos,omitempty"` + StaleFTSRepos []searchStaleFTSRepoR `json:"stale_fts_repos,omitempty"` +} + +type searchStaleFTSRepoR struct { + ProjectPath string `json:"project_path"` +} + +// TestWorkspaceSearch_EmptyWorkspace covers the no-repos case — the +// handler must return a well-formed response with empty arrays rather +// than a 500 or a misleading "ok" status. Mirrors what the dashboard +// expects when an operator opens search on a fresh workspace. +func TestWorkspaceSearch_EmptyWorkspace(t *testing.T) { + d, err := dbOpenMemory(t) + if err != nil { + t.Fatalf("open db: %v", err) + } + vs := openTestVectorStore(t) + router := newSearchRouter(t, d, vs, fixedEmbedder{q: l2([]float32{1, 0, 0, 0})}) + wsID := createWS(t, router, "empty") + + rr := doJSON(t, router, http.MethodGet, "/api/v1/workspaces/"+wsID+"/search?q=anything", nil) + if rr.Code != http.StatusOK { + t.Fatalf("expected 200, got %d (%s)", rr.Code, rr.Body.String()) + } + var resp searchResp + _ = json.Unmarshal(rr.Body.Bytes(), &resp) + if resp.Status != "empty" { + t.Fatalf("expected status=empty, got %q", resp.Status) + } + if len(resp.Projects) != 0 || len(resp.Chunks) != 0 { + t.Fatalf("expected empty arrays, got %+v", resp) + } +} + +// TestWorkspaceSearch_ProjectsRankByMeanNotCount verifies the change +// motivated by a workspace where one large repo (many hits, mean +// ≈ 0.5) drowned out a smaller-but-equally-relevant repo (few hits, +// mean ≈ 0.41) with the previous mean×log(1+N_hits) formula. The new +// pure-mean project_score keeps these projects close together so +// both surface in the panel, and the per-project chunk cap stops the +// higher-count project from monopolising the chunks list. +func TestWorkspaceSearch_ProjectsRankByMeanNotCount(t *testing.T) { + d, err := dbOpenMemory(t) + if err != nil { + t.Fatalf("open db: %v", err) + } + vs := openTestVectorStore(t) + + query := l2([]float32{1, 0, 0, 0}) + router := newSearchRouter(t, d, vs, fixedEmbedder{q: query}) + wsID := createWS(t, router, "rank") + + // "Big" project — many marginal hits. Top mean comes out small + // because chunks past the first few are well off-axis. + bigChunks := make([]vectorstore.Chunk, 10) + bigEmbs := make([][]float32, 10) + for i := range bigChunks { + bigChunks[i] = vectorstore.Chunk{ + Content: "b", FilePath: "b.go", + StartLine: i*10 + 1, EndLine: i*10 + 9, + ChunkType: "function", SymbolName: "B", Language: "go", + } + // X coefficient decays from 0.6 → 0.15 so the top-5 mean is + // well below the "small" project's top-1 score. + bigEmbs[i] = l2([]float32{0.6 - 0.05*float32(i), 0.4, 0.0, 0.0}) + } + seedRepoWithChunks(t, d, vs, wsID, "github.com/o/big@main", bigChunks, bigEmbs) + + // "Small" project — one very strong hit, nothing else. Top-N + // mean is dominated by that one chunk. + seedRepoWithChunks(t, d, vs, wsID, "github.com/o/small@main", + []vectorstore.Chunk{ + {Content: "s", FilePath: "s.go", StartLine: 1, EndLine: 9, ChunkType: "function", SymbolName: "S", Language: "go"}, + }, + [][]float32{l2([]float32{1.0, 0.0, 0.0, 0.0})}, + ) + + rr := doJSON(t, router, http.MethodGet, "/api/v1/workspaces/"+wsID+"/search?q=x", nil) + if rr.Code != http.StatusOK { + t.Fatalf("expected 200, got %d (%s)", rr.Code, rr.Body.String()) + } + var resp searchResp + _ = json.Unmarshal(rr.Body.Bytes(), &resp) + if resp.Status != "ok" { + t.Fatalf("expected status=ok, got %q (body=%s)", resp.Status, rr.Body.String()) + } + if len(resp.Projects) != 2 { + t.Fatalf("expected 2 projects, got %d (%+v)", len(resp.Projects), resp.Projects) + } + // Small project (mean ≈ 1.0) must beat big project (mean of + // top-5 well below 1.0) — i.e. count alone doesn't determine + // rank. + if resp.Projects[0].ProjectPath != "github.com/o/small@main" { + t.Fatalf("strong-mean project should rank first, got %q (scores=%v)", + resp.Projects[0].ProjectPath, + []float32{resp.Projects[0].ProjectScore, resp.Projects[1].ProjectScore}) + } + + // Chunks are sorted by raw score (no boost), so the small + // project's strongest hit must lead. + if len(resp.Chunks) == 0 { + t.Fatalf("expected chunks, got none") + } + if resp.Chunks[0].ProjectPath != "github.com/o/small@main" { + t.Fatalf("top chunk should be the small project's strong hit, got %+v", resp.Chunks[0]) + } + for i := 1; i < len(resp.Chunks); i++ { + if resp.Chunks[i-1].Score < resp.Chunks[i].Score { + t.Fatalf("chunks not sorted by score at i=%d: %v < %v", + i, resp.Chunks[i-1].Score, resp.Chunks[i].Score) + } + } +} + +// TestWorkspaceSearch_PerProjectChunkCap is the regression for the +// "one repo eats every slot" failure mode where a dominant repo's +// hits crowded out every other surviving project from the chunks +// list. Seeds one project with 12 strong hits and another with 1; +// the global chunks list must not exceed the per-project cap for +// the dominant repo. +func TestWorkspaceSearch_PerProjectChunkCap(t *testing.T) { + d, err := dbOpenMemory(t) + if err != nil { + t.Fatalf("open db: %v", err) + } + vs := openTestVectorStore(t) + query := l2([]float32{1, 0, 0, 0}) + router := newSearchRouter(t, d, vs, fixedEmbedder{q: query}) + wsID := createWS(t, router, "cap") + + // 12 strong chunks in one project — slightly varying scores so + // they sort cleanly, all well above the small project's hits. + bigChunks := make([]vectorstore.Chunk, 12) + bigEmbs := make([][]float32, 12) + for i := range bigChunks { + bigChunks[i] = vectorstore.Chunk{ + Content: "b", FilePath: "b.go", + StartLine: i*10 + 1, EndLine: i*10 + 9, + ChunkType: "function", SymbolName: "B", Language: "go", + } + bigEmbs[i] = l2([]float32{1.0 - 0.01*float32(i), 0.0, 0.0, 0.0}) + } + seedRepoWithChunks(t, d, vs, wsID, "github.com/o/big@main", bigChunks, bigEmbs) + + // One smaller-magnitude chunk in the other project. Its score + // (~0.5) is below every chunk from the big project, so without + // the per-project cap it would never make it into top-20. + seedRepoWithChunks(t, d, vs, wsID, "github.com/o/small@main", + []vectorstore.Chunk{ + {Content: "s", FilePath: "s.go", StartLine: 1, EndLine: 9, ChunkType: "function", SymbolName: "S", Language: "go"}, + }, + [][]float32{l2([]float32{0.5, 0.5, 0.0, 0.0})}, + ) + + rr := doJSON(t, router, http.MethodGet, "/api/v1/workspaces/"+wsID+"/search?q=x", nil) + if rr.Code != http.StatusOK { + t.Fatalf("expected 200, got %d", rr.Code) + } + var resp searchResp + _ = json.Unmarshal(rr.Body.Bytes(), &resp) + + byProject := map[string]int{} + for _, c := range resp.Chunks { + byProject[c.ProjectPath]++ + } + if byProject["github.com/o/big@main"] > 5 { + t.Fatalf("per-project chunk cap violated: big project got %d slots (cap=5)", + byProject["github.com/o/big@main"]) + } + if byProject["github.com/o/small@main"] != 1 { + t.Fatalf("small project should contribute its one chunk despite lower score; got %+v", + byProject) + } +} + +// TestWorkspaceSearch_MinScoreFilter verifies the optional +// `?min_score=` query parameter. Default is 0 (everything chromem +// returns is fair game) — set higher to drop low-relevance projects +// from the response. +// TestWorkspaceSearch_MinScoreDropsLowScoringChunks verifies that the +// optional `?min_score=` query param filters the dense list before +// candidacy aggregation: chunks below the floor are excluded from +// dense_signal, which can push a marginal project's candidacy under +// the per-query relative threshold and drop it from the response. +func TestWorkspaceSearch_MinScoreDropsLowScoringChunks(t *testing.T) { + d, err := dbOpenMemory(t) + if err != nil { + t.Fatalf("open db: %v", err) + } + vs := openTestVectorStore(t) + query := l2([]float32{1, 0, 0, 0}) + router := newSearchRouter(t, d, vs, fixedEmbedder{q: query}) + wsID := createWS(t, router, "minscore") + + // Strong-hit project at cosine ≈ 1.0. + seedRepoWithChunks(t, d, vs, wsID, "github.com/o/good@main", + []vectorstore.Chunk{ + {Content: "g", FilePath: "g.go", StartLine: 1, EndLine: 9, ChunkType: "function", SymbolName: "G", Language: "go"}, + }, + [][]float32{l2([]float32{1.0, 0.0, 0.0, 0.0})}, + ) + // Weak-hit project at cosine ≈ 0.45 (above the natural cosine + // floor chromem applies but well below 0.5). + seedRepoWithChunks(t, d, vs, wsID, "github.com/o/weak@main", + []vectorstore.Chunk{ + {Content: "w", FilePath: "w.go", StartLine: 1, EndLine: 9, ChunkType: "function", SymbolName: "W", Language: "go"}, + }, + [][]float32{l2([]float32{0.45, 0.8, 0.0, 0.0})}, // cosine with q ≈ 0.49 + ) + + // min_score=0.5: weak project's only chunk drops, dense_signal=0 + // → candidacy=0 → filtered by the project gate. Only good survives. + rr := doJSON(t, router, http.MethodGet, "/api/v1/workspaces/"+wsID+"/search?q=x&min_score=0.5", nil) + if rr.Code != http.StatusOK { + t.Fatalf("expected 200, got %d", rr.Code) + } + var filteredResp searchResp + _ = json.Unmarshal(rr.Body.Bytes(), &filteredResp) + if len(filteredResp.Projects) != 1 { + t.Fatalf("min_score=0.5: expected 1 project (good only), got %d (%+v)", + len(filteredResp.Projects), filteredResp.Projects) + } + if filteredResp.Projects[0].ProjectPath != "github.com/o/good@main" { + t.Fatalf("wrong project after filter: %+v", filteredResp.Projects[0]) + } +} + +// TestWorkspaceSearch_ProjectGateDropsDeadWeightRepos is the regression +// for the pre-hybrid failure mode that motivated this redesign: in a +// multi-repo workspace, repos with zero literal mentions of the query +// term still surfaced 50 chunks each at noise-level cosine similarity. +// The hybrid gate must drop projects whose normalised candidacy falls +// below 40% of the best project's candidacy, regardless of how many +// chunks chromem happily returned. +func TestWorkspaceSearch_ProjectGateDropsDeadWeightRepos(t *testing.T) { + d, err := dbOpenMemory(t) + if err != nil { + t.Fatalf("open db: %v", err) + } + vs := openTestVectorStore(t) + query := l2([]float32{1, 0, 0, 0}) + router := newSearchRouter(t, d, vs, fixedEmbedder{q: query}) + wsID := createWS(t, router, "gate") + + // Strong project: 5 chunks all close to the query. + strongChunks := []vectorstore.Chunk{} + strongEmbs := [][]float32{} + for i := 0; i < 5; i++ { + strongChunks = append(strongChunks, vectorstore.Chunk{ + Content: "strong content", FilePath: "s.go", + StartLine: i*10 + 1, EndLine: i*10 + 9, ChunkType: "function", Language: "go", + }) + strongEmbs = append(strongEmbs, l2([]float32{1.0 - 0.01*float32(i), 0.0, 0.0, 0.0})) + } + seedRepoWithChunks(t, d, vs, wsID, "github.com/o/strong@main", strongChunks, strongEmbs) + + // Dead-weight project: 5 chunks all orthogonal to the query + // (cosine == 0) and content that shares no token with the query. + deadChunks := []vectorstore.Chunk{} + deadEmbs := [][]float32{} + for i := 0; i < 5; i++ { + deadChunks = append(deadChunks, vectorstore.Chunk{ + Content: "totally unrelated material", FilePath: "d.go", + StartLine: i*10 + 1, EndLine: i*10 + 9, ChunkType: "function", Language: "go", + }) + deadEmbs = append(deadEmbs, l2([]float32{0.0, 1.0, 0.0, 0.0})) + } + seedRepoWithChunks(t, d, vs, wsID, "github.com/o/dead@main", deadChunks, deadEmbs) + + rr := doJSON(t, router, http.MethodGet, + "/api/v1/workspaces/"+wsID+"/search?q=strong", nil) + if rr.Code != http.StatusOK { + t.Fatalf("expected 200, got %d", rr.Code) + } + var resp searchResp + _ = json.Unmarshal(rr.Body.Bytes(), &resp) + if len(resp.Projects) != 1 { + t.Fatalf("expected only strong project to survive gate, got %d (%+v)", + len(resp.Projects), resp.Projects) + } + if resp.Projects[0].ProjectPath != "github.com/o/strong@main" { + t.Fatalf("wrong project survived: %+v", resp.Projects[0]) + } + for _, c := range resp.Chunks { + if c.ProjectPath == "github.com/o/dead@main" { + t.Errorf("dead project chunk leaked into output: %+v", c) + } + } +} + +// TestWorkspaceSearch_FlagsStaleFTSRepos exercises the pre-FTS5-mirror +// detection. We seed a project the way the old indexer used to — +// chromem + file_hashes populated, chunks_meta left empty — and +// verify the response calls it out in stale_fts_repos. +func TestWorkspaceSearch_FlagsStaleFTSRepos(t *testing.T) { + d, err := dbOpenMemory(t) + if err != nil { + t.Fatalf("open db: %v", err) + } + vs := openTestVectorStore(t) + query := l2([]float32{1, 0, 0, 0}) + router := newSearchRouter(t, d, vs, fixedEmbedder{q: query}) + wsID := createWS(t, router, "stale") + + // Seed a normal repo (chunks_fts populated via the helper). + seedRepoWithChunks(t, d, vs, wsID, "github.com/o/fresh@main", + []vectorstore.Chunk{ + {Content: "needle here", FilePath: "f.go", StartLine: 1, EndLine: 9, ChunkType: "function", Language: "go"}, + }, + [][]float32{l2([]float32{1.0, 0.0, 0.0, 0.0})}, + ) + + // Simulate a pre-FTS5-mirror repo: insert project + workspace_repo + // + chromem chunk + file_hashes row, but skip chunks_fts/meta. + now := time.Now().UTC().Format(time.RFC3339Nano) + stalePath := "github.com/o/stale@main" + if _, err := d.Exec( + `INSERT INTO projects (host_path, container_path, languages, settings, stats, status, created_at, updated_at, path_hash) + VALUES (?, ?, '[]', '{}', '{}', 'created', ?, ?, 'h')`, + stalePath, stalePath, now, now, + ); err != nil { + t.Fatalf("insert stale project: %v", err) + } + if _, err := d.Exec( + `INSERT INTO workspace_repos + (id, workspace_id, github_url, branch, project_path, webhook_secret, status, created_at, updated_at, last_indexed_at) + VALUES (?, ?, ?, 'main', ?, 'sec', 'indexed', ?, ?, ?)`, + uuid.NewString(), wsID, "https://"+stalePath, stalePath, now, now, now, + ); err != nil { + t.Fatalf("insert stale workspace_repo: %v", err) + } + if err := vs.UpsertChunks(context.Background(), stalePath, + []vectorstore.Chunk{{Content: "stale chunk", FilePath: "s.go", StartLine: 1, EndLine: 9, Language: "go"}}, + [][]float32{l2([]float32{0.9, 0.1, 0.0, 0.0})}, + ); err != nil { + t.Fatalf("upsert stale chromem chunks: %v", err) + } + if _, err := d.Exec( + `INSERT INTO file_hashes (project_path, file_path, content_hash, indexed_at) + VALUES (?, 's.go', 'hash', ?)`, + stalePath, now, + ); err != nil { + t.Fatalf("insert stale file_hashes: %v", err) + } + + rr := doJSON(t, router, http.MethodGet, + "/api/v1/workspaces/"+wsID+"/search?q=needle", nil) + if rr.Code != http.StatusOK { + t.Fatalf("expected 200, got %d", rr.Code) + } + var resp searchResp + _ = json.Unmarshal(rr.Body.Bytes(), &resp) + if len(resp.StaleFTSRepos) != 1 || resp.StaleFTSRepos[0].ProjectPath != stalePath { + t.Fatalf("expected stale_fts_repos to flag %q, got %+v", stalePath, resp.StaleFTSRepos) + } + // Sanity: the fresh repo must NOT appear in the stale list. + for _, s := range resp.StaleFTSRepos { + if s.ProjectPath == "github.com/o/fresh@main" { + t.Errorf("fresh repo wrongly flagged as stale: %+v", s) + } + } +} + +// TestWorkspaceSearch_BM25SurfacesLiteralTokenDenseMissed seeds two +// projects with content the dense embedder ranks identically (we use +// orthogonal vectors so cosine == 0 for both) but where one project +// contains the literal query token. BM25 must promote the literal +// match into the surviving set even though dense gives zero signal. +func TestWorkspaceSearch_BM25SurfacesLiteralTokenDenseMissed(t *testing.T) { + d, err := dbOpenMemory(t) + if err != nil { + t.Fatalf("open db: %v", err) + } + vs := openTestVectorStore(t) + query := l2([]float32{1, 0, 0, 0}) + router := newSearchRouter(t, d, vs, fixedEmbedder{q: query}) + wsID := createWS(t, router, "bm25-only") + + // Project A: content literally contains "needle"; embedding + // orthogonal to the query so dense sees nothing. + seedRepoWithChunks(t, d, vs, wsID, "github.com/o/literal@main", + []vectorstore.Chunk{ + {Content: "the needle is buried here", FilePath: "a.go", StartLine: 1, EndLine: 9, ChunkType: "function", Language: "go"}, + }, + [][]float32{l2([]float32{0.0, 1.0, 0.0, 0.0})}, + ) + // Project B: content unrelated, also orthogonal. + seedRepoWithChunks(t, d, vs, wsID, "github.com/o/unrelated@main", + []vectorstore.Chunk{ + {Content: "haystack of unrelated material", FilePath: "b.go", StartLine: 1, EndLine: 9, ChunkType: "function", Language: "go"}, + }, + [][]float32{l2([]float32{0.0, 0.0, 1.0, 0.0})}, + ) + + rr := doJSON(t, router, http.MethodGet, + "/api/v1/workspaces/"+wsID+"/search?q=needle", nil) + if rr.Code != http.StatusOK { + t.Fatalf("expected 200, got %d", rr.Code) + } + var resp searchResp + _ = json.Unmarshal(rr.Body.Bytes(), &resp) + if len(resp.Projects) != 1 || resp.Projects[0].ProjectPath != "github.com/o/literal@main" { + t.Fatalf("BM25 should have surfaced only the literal-match project, got %+v", resp.Projects) + } + // bm25_score in the response rounds to 4 decimals; with only + // 2 chunks in the FTS corpus the IDF term collapses near zero so + // we don't assert a magnitude here — surfacing the chunk at all + // is what proves the BM25 path fired (dense was orthogonal in + // both projects so any non-zero ranking signal came from BM25). + if len(resp.Chunks) != 1 || resp.Chunks[0].FilePath != "a.go" { + t.Errorf("expected the literal-match chunk in output, got %+v", resp.Chunks) + } +} + +// TestWorkspaceSearch_RRFFusesBothSides exercises the per-project +// RRF: one chunk wins on dense, a different chunk wins on BM25. +// The fused order must put both ahead of a dense-only third chunk +// that has neither distinction. +func TestWorkspaceSearch_RRFFusesBothSides(t *testing.T) { + d, err := dbOpenMemory(t) + if err != nil { + t.Fatalf("open db: %v", err) + } + vs := openTestVectorStore(t) + query := l2([]float32{1, 0, 0, 0}) + router := newSearchRouter(t, d, vs, fixedEmbedder{q: query}) + wsID := createWS(t, router, "rrf") + + seedRepoWithChunks(t, d, vs, wsID, "github.com/o/fuse@main", + []vectorstore.Chunk{ + // dense-strong, no token match + {Content: "the alpha chunk", FilePath: "f.go", StartLine: 1, EndLine: 9, ChunkType: "function", Language: "go"}, + // dense-weak, contains query token + {Content: "needle goes here", FilePath: "f.go", StartLine: 11, EndLine: 19, ChunkType: "function", Language: "go"}, + // dense-medium, no token match + {Content: "the beta filler", FilePath: "f.go", StartLine: 21, EndLine: 29, ChunkType: "function", Language: "go"}, + }, + [][]float32{ + l2([]float32{1.0, 0.0, 0.0, 0.0}), // cosine 1.0 with query + l2([]float32{0.0, 1.0, 0.0, 0.0}), // cosine 0 + l2([]float32{0.6, 0.8, 0.0, 0.0}), // cosine 0.6 + }, + ) + + rr := doJSON(t, router, http.MethodGet, + "/api/v1/workspaces/"+wsID+"/search?q=needle", nil) + if rr.Code != http.StatusOK { + t.Fatalf("expected 200, got %d", rr.Code) + } + var resp searchResp + _ = json.Unmarshal(rr.Body.Bytes(), &resp) + if len(resp.Chunks) < 2 { + t.Fatalf("expected >= 2 chunks for the project, got %d", len(resp.Chunks)) + } + + // The first two slots must contain the BM25 winner (lines 11-19) + // and the dense winner (lines 1-9). Order between those two is + // implementation-dependent; we just require both appear before + // the filler (lines 21-29). + top2 := map[string]bool{} + for _, c := range resp.Chunks[:2] { + key := strconv.Itoa(c.StartLine) + "-" + strconv.Itoa(c.EndLine) + top2[key] = true + } + if !top2["1-9"] || !top2["11-19"] { + t.Errorf("RRF should fuse dense-winner (1-9) and bm25-winner (11-19) into top-2; got %+v", + resp.Chunks[:2]) + } +} + +// TestWorkspaceSearch_ParallelFanoutNoRace seeds many small projects +// in one workspace and runs the search. The point is to stress the +// errgroup-bounded fan-out under `go test -race`: any concurrent +// write to the shared results slice without the mutex shows up here. +// +// 32 projects is well above typical NumCPU so the semaphore actually +// queues some goroutines, not just runs them all at once on a fast +// machine. +func TestWorkspaceSearch_ParallelFanoutNoRace(t *testing.T) { + d, err := dbOpenMemory(t) + if err != nil { + t.Fatalf("open db: %v", err) + } + vs := openTestVectorStore(t) + query := l2([]float32{1, 0, 0, 0}) + router := newSearchRouter(t, d, vs, fixedEmbedder{q: query}) + wsID := createWS(t, router, "race") + + const projects = 32 + for i := 0; i < projects; i++ { + pp := "github.com/race/p" + strconv.Itoa(i) + "@main" + seedRepoWithChunks(t, d, vs, wsID, pp, + []vectorstore.Chunk{ + {Content: "c", FilePath: "f.go", StartLine: 1, EndLine: 9, ChunkType: "function", SymbolName: "Fn", Language: "go"}, + }, + [][]float32{l2([]float32{1.0, 0.01 * float32(i), 0.0, 0.0})}, + ) + } + + rr := doJSON(t, router, http.MethodGet, "/api/v1/workspaces/"+wsID+"/search?q=x", nil) + if rr.Code != http.StatusOK { + t.Fatalf("expected 200, got %d (%s)", rr.Code, rr.Body.String()) + } + var resp searchResp + _ = json.Unmarshal(rr.Body.Bytes(), &resp) + if resp.Status != "ok" { + t.Fatalf("expected status=ok, got %q", resp.Status) + } + // Every seeded project hits exactly one chunk above threshold — + // the projects list is capped at the default 10, so we expect + // exactly that. + if len(resp.Projects) != 10 { + t.Fatalf("expected 10 projects (top_projects cap), got %d", len(resp.Projects)) + } +} + +func TestWorkspaceSearch_MissingQueryReturns422(t *testing.T) { + d, err := dbOpenMemory(t) + if err != nil { + t.Fatalf("open db: %v", err) + } + vs := openTestVectorStore(t) + router := newSearchRouter(t, d, vs, fixedEmbedder{q: l2([]float32{1, 0, 0, 0})}) + wsID := createWS(t, router, "platform") + rr := doJSON(t, router, http.MethodGet, "/api/v1/workspaces/"+wsID+"/search?q=", nil) + if rr.Code != http.StatusUnprocessableEntity { + t.Fatalf("expected 422 on missing q, got %d", rr.Code) + } +} + +func TestWorkspaceSearch_NotFoundWorkspace(t *testing.T) { + d, err := dbOpenMemory(t) + if err != nil { + t.Fatalf("open db: %v", err) + } + vs := openTestVectorStore(t) + router := newSearchRouter(t, d, vs, fixedEmbedder{q: l2([]float32{1, 0, 0, 0})}) + rr := doJSON(t, router, http.MethodGet, "/api/v1/workspaces/no-such-id/search?q=anything", nil) + if rr.Code != http.StatusNotFound { + t.Fatalf("expected 404, got %d", rr.Code) + } +} + +// TestWorkspaceSearch_RejectsEmptyEmbedding covers the early-bail +// guard for misbehaving embedders. Without it a zero-length vector +// would propagate to every per-project chromem call and produce +// failed_repos the size of the workspace, masking the real problem. +func TestWorkspaceSearch_RejectsEmptyEmbedding(t *testing.T) { + d, err := dbOpenMemory(t) + if err != nil { + t.Fatalf("open db: %v", err) + } + vs := openTestVectorStore(t) + router := newSearchRouter(t, d, vs, fixedEmbedder{q: []float32{}}) + wsID := createWS(t, router, "empty-emb") + // Need at least one indexed repo, otherwise the early empty- + // workspace path short-circuits before we hit the embedder check + // in the order it currently runs. + seedRepoWithChunks(t, d, vs, wsID, "github.com/o/r@main", + []vectorstore.Chunk{ + {Content: "x", FilePath: "x.go", StartLine: 1, EndLine: 9, ChunkType: "function", SymbolName: "X", Language: "go"}, + }, + [][]float32{l2([]float32{1.0, 0.0, 0.0, 0.0})}, + ) + rr := doJSON(t, router, http.MethodGet, "/api/v1/workspaces/"+wsID+"/search?q=x", nil) + if rr.Code != http.StatusServiceUnavailable { + t.Fatalf("expected 503 for empty embedding, got %d (%s)", rr.Code, rr.Body.String()) + } +} + +func TestWorkspaceSearch_Disabled(t *testing.T) { + router := workspaceRouter(t, false) + rr := doJSON(t, router, http.MethodGet, "/api/v1/workspaces/any/search?q=x", nil) + if rr.Code != http.StatusServiceUnavailable { + t.Fatalf("expected 503, got %d", rr.Code) + } +} + +// seedPendingRepo inserts a workspace_repos row with a non-`indexed` +// status (no projects row, no chromem collection). Mirrors what the +// DB looks like while clone/index jobs are still in flight. +func seedPendingRepo(t *testing.T, d *sql.DB, wsID, projectPath, status string) { + t.Helper() + now := time.Now().UTC().Format(time.RFC3339Nano) + if _, err := d.Exec( + `INSERT INTO workspace_repos + (id, workspace_id, github_url, branch, project_path, webhook_secret, status, created_at, updated_at) + VALUES (?, ?, ?, 'main', ?, 'sec', ?, ?, ?)`, + uuid.NewString(), wsID, "https://"+projectPath, projectPath, status, now, now, + ); err != nil { + t.Fatalf("insert pending workspace_repo %q: %v", projectPath, err) + } +} + +// TestWorkspaceSearch_SurfacesPendingRepos verifies that repos whose +// workspace_repos.status ≠ 'indexed' are reported back in +// `pending_repos` instead of being silently dropped. The dashboard +// uses this to render a "still indexing" banner — without it the +// operator sees a partial result set with no hint that anything's +// missing. +func TestWorkspaceSearch_SurfacesPendingRepos(t *testing.T) { + d, err := dbOpenMemory(t) + if err != nil { + t.Fatalf("open db: %v", err) + } + vs := openTestVectorStore(t) + router := newSearchRouter(t, d, vs, fixedEmbedder{q: l2([]float32{1, 0, 0, 0})}) + wsID := createWS(t, router, "pending") + + // One indexed repo with a strong hit — must still surface in the + // happy-path projects/chunks panels. + seedRepoWithChunks(t, d, vs, wsID, "github.com/o/ready@main", + []vectorstore.Chunk{ + {Content: "ready", FilePath: "r.go", StartLine: 1, EndLine: 9, ChunkType: "function", SymbolName: "R", Language: "go"}, + }, + [][]float32{l2([]float32{1.0, 0.0, 0.0, 0.0})}, + ) + + // Two repos still in flight — different statuses to make sure + // every non-indexed value propagates verbatim. + seedPendingRepo(t, d, wsID, "github.com/o/cloning@main", workspacerepos.StatusCloning) + seedPendingRepo(t, d, wsID, "github.com/o/indexing@main", workspacerepos.StatusIndexing) + + rr := doJSON(t, router, http.MethodGet, "/api/v1/workspaces/"+wsID+"/search?q=x", nil) + if rr.Code != http.StatusOK { + t.Fatalf("expected 200, got %d (%s)", rr.Code, rr.Body.String()) + } + var resp searchResp + _ = json.Unmarshal(rr.Body.Bytes(), &resp) + + if resp.Status != "ok" { + t.Fatalf("expected status=ok (indexed repo had a hit), got %q", resp.Status) + } + if len(resp.Projects) != 1 || resp.Projects[0].ProjectPath != "github.com/o/ready@main" { + t.Fatalf("expected exactly the indexed repo in projects, got %+v", resp.Projects) + } + if len(resp.PendingRepos) != 2 { + t.Fatalf("expected 2 pending repos, got %d (%+v)", len(resp.PendingRepos), resp.PendingRepos) + } + gotStatuses := map[string]string{} + for _, p := range resp.PendingRepos { + gotStatuses[p.ProjectPath] = p.Status + } + if gotStatuses["github.com/o/cloning@main"] != workspacerepos.StatusCloning { + t.Fatalf("cloning repo lost its status: %+v", resp.PendingRepos) + } + if gotStatuses["github.com/o/indexing@main"] != workspacerepos.StatusIndexing { + t.Fatalf("indexing repo lost its status: %+v", resp.PendingRepos) + } +} + +// TestWorkspaceSearch_AllPendingReturnsEmpty covers the all-repos- +// still-indexing case. The handler should skip the fan-out entirely +// (chromem would just return nil per missing collection) and let the +// client know via pending_repos that the workspace is alive but not +// yet ready. +func TestWorkspaceSearch_AllPendingReturnsEmpty(t *testing.T) { + d, err := dbOpenMemory(t) + if err != nil { + t.Fatalf("open db: %v", err) + } + vs := openTestVectorStore(t) + router := newSearchRouter(t, d, vs, fixedEmbedder{q: l2([]float32{1, 0, 0, 0})}) + wsID := createWS(t, router, "all-pending") + + seedPendingRepo(t, d, wsID, "github.com/o/p1@main", workspacerepos.StatusPending) + seedPendingRepo(t, d, wsID, "github.com/o/p2@main", workspacerepos.StatusCloning) + + rr := doJSON(t, router, http.MethodGet, "/api/v1/workspaces/"+wsID+"/search?q=x", nil) + if rr.Code != http.StatusOK { + t.Fatalf("expected 200, got %d (%s)", rr.Code, rr.Body.String()) + } + var resp searchResp + _ = json.Unmarshal(rr.Body.Bytes(), &resp) + if resp.Status != "empty" { + t.Fatalf("expected status=empty, got %q", resp.Status) + } + if len(resp.PendingRepos) != 2 { + t.Fatalf("expected 2 pending repos, got %d", len(resp.PendingRepos)) + } + if len(resp.Projects) != 0 || len(resp.Chunks) != 0 { + t.Fatalf("expected empty projects/chunks, got %+v", resp) + } +} + +// TestWorkspaceSearch_ClampsParams covers the trio of guard rails for +// `top_projects` / `top_chunks` / `min_score`. The OpenAPI spec +// declares min/max but oapi-codegen doesn't enforce them at runtime — +// the handler must clamp itself to keep negative slice indices, zero +// loop bounds, and absurd map allocations from leaking through. +func TestWorkspaceSearch_ClampsParams(t *testing.T) { + d, err := dbOpenMemory(t) + if err != nil { + t.Fatalf("open db: %v", err) + } + vs := openTestVectorStore(t) + router := newSearchRouter(t, d, vs, fixedEmbedder{q: l2([]float32{1, 0, 0, 0})}) + wsID := createWS(t, router, "clamp") + + // Seed three projects so a clamped top_projects=1 has something + // to bite. Each has one strong chunk on the x-axis so they all + // rank above the min_score floor. + for i, name := range []string{"a", "b", "c"} { + // Slight x/y mix so no chunk sits at cosine=1.0 exactly — + // the min_score=5 (clamped to 1) assertion below relies on + // the floor being unreachable. + seedRepoWithChunks(t, d, vs, wsID, "github.com/o/"+name+"@main", + []vectorstore.Chunk{ + {Content: name, FilePath: name + ".go", StartLine: 1, EndLine: 9, ChunkType: "function", SymbolName: "S", Language: "go"}, + }, + [][]float32{l2([]float32{0.9 - 0.01*float32(i), 0.1, 0.0, 0.0})}, + ) + } + + // top_projects=-5 used to slice projects[:-5] and panic. Clamped + // to 1 it must return exactly one project. + rr := doJSON(t, router, http.MethodGet, + "/api/v1/workspaces/"+wsID+"/search?q=x&top_projects=-5", nil) + if rr.Code != http.StatusOK { + t.Fatalf("top_projects=-5: expected 200, got %d (%s)", rr.Code, rr.Body.String()) + } + var resp searchResp + _ = json.Unmarshal(rr.Body.Bytes(), &resp) + if len(resp.Projects) != 1 { + t.Fatalf("top_projects=-5 should clamp to 1 project, got %d", len(resp.Projects)) + } + + // top_chunks=0 used to break the merge loop immediately. Clamped + // to 1 → at least one chunk surfaces. + rr = doJSON(t, router, http.MethodGet, + "/api/v1/workspaces/"+wsID+"/search?q=x&top_chunks=0", nil) + if rr.Code != http.StatusOK { + t.Fatalf("top_chunks=0: expected 200, got %d", rr.Code) + } + _ = json.Unmarshal(rr.Body.Bytes(), &resp) + if len(resp.Chunks) == 0 { + t.Fatalf("top_chunks=0 should clamp to 1, got empty chunks") + } + + // top_chunks=999_999_999 must not blow allocations. Clamped to + // 200, so the response has at most 200 chunks (here far fewer). + rr = doJSON(t, router, http.MethodGet, + "/api/v1/workspaces/"+wsID+"/search?q=x&top_chunks=999999999", nil) + if rr.Code != http.StatusOK { + t.Fatalf("top_chunks=huge: expected 200, got %d", rr.Code) + } + _ = json.Unmarshal(rr.Body.Bytes(), &resp) + if len(resp.Chunks) > 200 { + t.Fatalf("top_chunks=huge should clamp to 200, got %d", len(resp.Chunks)) + } + + // min_score=-1 → clamped to 0 → every chunk we seeded survives. + rr = doJSON(t, router, http.MethodGet, + "/api/v1/workspaces/"+wsID+"/search?q=x&min_score=-1", nil) + if rr.Code != http.StatusOK { + t.Fatalf("min_score=-1: expected 200, got %d", rr.Code) + } + _ = json.Unmarshal(rr.Body.Bytes(), &resp) + if len(resp.Chunks) == 0 { + t.Fatalf("min_score=-1 should clamp to 0 and keep chunks, got none") + } + + // min_score=5 → clamped to 1 → no chunk has cosine=1 exactly here + // (each seeded chunk loses a tiny bit to the per-i decay), so the + // response should be empty. + rr = doJSON(t, router, http.MethodGet, + "/api/v1/workspaces/"+wsID+"/search?q=x&min_score=5", nil) + if rr.Code != http.StatusOK { + t.Fatalf("min_score=5: expected 200, got %d", rr.Code) + } + _ = json.Unmarshal(rr.Body.Bytes(), &resp) + if resp.Status != "empty" { + t.Fatalf("min_score=5 should clamp to 1 and return empty, got status=%q (%d chunks)", + resp.Status, len(resp.Chunks)) + } +} + +// TestWorkspaceSearch_ChunksOnlyFromPanelProjects guards against the +// `interleaveByRank` vs `projects[]` panel inconsistency: when more +// projects survive the gate than `top_projects` allows in the panel, +// chunks must only come from projects that are actually visible in +// the panel. Otherwise agents see a chunk with a project_path they +// can't look up for bm25_score/dense_score. +func TestWorkspaceSearch_ChunksOnlyFromPanelProjects(t *testing.T) { + d, err := dbOpenMemory(t) + if err != nil { + t.Fatalf("open db: %v", err) + } + vs := openTestVectorStore(t) + router := newSearchRouter(t, d, vs, fixedEmbedder{q: l2([]float32{1, 0, 0, 0})}) + wsID := createWS(t, router, "panel-vs-chunks") + + // Seed 12 projects with monotonically decreasing strength. All + // stay above the 0.4*best gate (relative threshold). With + // top_projects=10 the bottom 2 projects must be dropped from the + // panel AND from chunks[]. + for i := 0; i < 12; i++ { + name := "p" + strconv.Itoa(i) + // First component decays slowly so cosine spread is gentle + // (0.99 down to ~0.85) — every project survives 0.4 × best. + x := float32(0.99) - 0.012*float32(i) + seedRepoWithChunks(t, d, vs, wsID, "github.com/o/"+name+"@main", + []vectorstore.Chunk{ + {Content: "rate limit middleware " + name, FilePath: name + ".go", + StartLine: 1, EndLine: 9, ChunkType: "function", + SymbolName: "S", Language: "go"}, + }, + [][]float32{l2([]float32{x, 0.1, 0.0, 0.0})}, + ) + } + + rr := doJSON(t, router, http.MethodGet, + "/api/v1/workspaces/"+wsID+"/search?q=rate+limit&top_projects=10&min_score=0", nil) + if rr.Code != http.StatusOK { + t.Fatalf("expected 200, got %d (%s)", rr.Code, rr.Body.String()) + } + var resp searchResp + _ = json.Unmarshal(rr.Body.Bytes(), &resp) + if len(resp.Projects) != 10 { + t.Fatalf("expected 10 projects in panel (top_projects=10), got %d", len(resp.Projects)) + } + panel := make(map[string]struct{}, len(resp.Projects)) + for _, p := range resp.Projects { + panel[p.ProjectPath] = struct{}{} + } + for i, c := range resp.Chunks { + if _, ok := panel[c.ProjectPath]; !ok { + t.Fatalf("chunk[%d] is from project %q which is NOT in projects[] panel "+ + "(panel has %d entries: %+v)", i, c.ProjectPath, len(resp.Projects), panel) + } + } +} + +// TestWorkspaceSearch_DefaultMinScoreIs04 verifies the documented +// default has effect: a request without ?min_score=... drops chunks +// whose cosine sits below 0.4, matching the per-project SemanticSearch +// default. A request that explicitly passes ?min_score=0 keeps them. +// +// Geometry: strong cosine 0.5 (lead), weak cosine 0.3 (in the gap +// (0,0.4) the change targets). Both projects must survive the +// relative project gate (0.4 × best = 0.2); the only thing +// differentiating them is the min_score filter. +func TestWorkspaceSearch_DefaultMinScoreIs04(t *testing.T) { + d, err := dbOpenMemory(t) + if err != nil { + t.Fatalf("open db: %v", err) + } + vs := openTestVectorStore(t) + query := l2([]float32{1, 0, 0, 0}) + router := newSearchRouter(t, d, vs, fixedEmbedder{q: query}) + wsID := createWS(t, router, "default-floor") + + // Strong project: cosine = 0.5 with query (l2-normalized vec + // projects onto x by exactly 0.5). + seedRepoWithChunks(t, d, vs, wsID, "github.com/o/strong@main", + []vectorstore.Chunk{ + {Content: "s", FilePath: "s.go", StartLine: 1, EndLine: 9, + ChunkType: "function", SymbolName: "S", Language: "go"}, + }, + [][]float32{l2([]float32{0.5, 0.866, 0, 0})}, // |v|=1, cos(v,q)=0.5 + ) + // Weak project: cosine = 0.3 — survives min_score=0 (and would + // survive old default 0) but is filtered by new default 0.4. + // Relative gate: 0.4 × strong_candidacy. Candidacy is + // 0.5*dense_norm; dense_norm(weak)=0.3/0.5=0.6 → candidacy(weak)=0.3. + // Threshold = 0.4 × 0.5 = 0.2 → weak survives the gate at + // min_score=0. + seedRepoWithChunks(t, d, vs, wsID, "github.com/o/weak@main", + []vectorstore.Chunk{ + {Content: "w", FilePath: "w.go", StartLine: 1, EndLine: 9, + ChunkType: "function", SymbolName: "W", Language: "go"}, + }, + [][]float32{l2([]float32{0.3, 0.9539, 0, 0})}, // cos = 0.3 + ) + + // Default: no min_score param → new 0.4 floor → weak's only chunk + // (cosine 0.3) is filtered before candidacy aggregation, weak's + // dense_signal drops to 0, gate drops the project. + rr := doJSON(t, router, http.MethodGet, + "/api/v1/workspaces/"+wsID+"/search?q=x", nil) + if rr.Code != http.StatusOK { + t.Fatalf("default min_score: expected 200, got %d (%s)", rr.Code, rr.Body.String()) + } + var defaultResp searchResp + _ = json.Unmarshal(rr.Body.Bytes(), &defaultResp) + if len(defaultResp.Projects) != 1 || defaultResp.Projects[0].ProjectPath != "github.com/o/strong@main" { + t.Fatalf("default min_score=0.4: expected only strong project, got %+v", + defaultResp.Projects) + } + + // Explicit override: min_score=0 → weak project survives gate. + rr = doJSON(t, router, http.MethodGet, + "/api/v1/workspaces/"+wsID+"/search?q=x&min_score=0", nil) + if rr.Code != http.StatusOK { + t.Fatalf("min_score=0 override: expected 200, got %d", rr.Code) + } + var openResp searchResp + _ = json.Unmarshal(rr.Body.Bytes(), &openResp) + if len(openResp.Projects) != 2 { + t.Fatalf("min_score=0: expected both projects to survive, got %d (%+v)", + len(openResp.Projects), openResp.Projects) + } +} diff --git a/server/internal/indexer/indexer.go b/server/internal/indexer/indexer.go index ee766ea..67d429e 100644 --- a/server/internal/indexer/indexer.go +++ b/server/internal/indexer/indexer.go @@ -19,6 +19,7 @@ import ( "github.com/google/uuid" "github.com/dvcdsys/code-index/server/internal/chunker" + "github.com/dvcdsys/code-index/server/internal/chunksfts" "github.com/dvcdsys/code-index/server/internal/embeddings" "github.com/dvcdsys/code-index/server/internal/langdetect" "github.com/dvcdsys/code-index/server/internal/symbolindex" @@ -239,6 +240,9 @@ func (s *Service) BeginIndexing(ctx context.Context, projectPath string, full bo return "", nil, fmt.Errorf("full wipe: %w", err) } } + if err := chunksfts.DeleteByProjectTx(ctx, tx2, projectPath); err != nil { + return "", nil, fmt.Errorf("full wipe chunks_fts: %w", err) + } if err := tx2.Commit(); err != nil { return "", nil, fmt.Errorf("commit (full): %w", err) } @@ -336,8 +340,6 @@ func (s *Service) ProcessFilesStreaming( now := nowUTC() filesAccepted := 0 batchChunks := 0 - var batchSymbols []symbolindex.Symbol - var batchRefs []symbolindex.Reference // maxContentBytes guards against files that grew past the CLI's MaxFileSize // filter between discovery and indexing (e.g. a log file written in-flight). @@ -345,19 +347,17 @@ func (s *Service) ProcessFilesStreaming( // the queue slot for tens of seconds per file. const maxContentBytes = 512 * 1024 - // Open the per-batch transaction. Every per-file DB change lives inside a - // SAVEPOINT of this tx so a single bad file only rolls back that file's - // rows, not the whole batch. - tx, err := s.db.BeginTx(ctx, nil) - if err != nil { - return 0, 0, 0, fmt.Errorf("begin batch tx: %w", err) - } - txCommitted := false - defer func() { - if !txCommitted { - _ = tx.Rollback() - } - }() + // Per-file transactions (not per-batch). Earlier revisions wrapped the + // whole loop in a single BeginTx and used SAVEPOINTs per file, which held + // SQLite's WAL writer lock across every embed call (a network RTT to + // llama-server per file). On a multi-minute batch any concurrent write — + // most visibly POST /projects from the dashboard add-repo flow — timed + // out against busy_timeout=5s with `database is locked (5) (SQLITE_BUSY)`. + // Per-file tx caps lock-holding to the actual DB writes (sub-ms) and + // releases the writer between files so other connections can interleave. + // Side benefit: a fatal mid-batch error (embed ErrBusy, etc.) no longer + // rolls back all of this batch's work — successfully-indexed files stay + // committed and the next batch resumes from where this one stopped. for fi, fp := range files { // file_started — emit even for files we'll skip below, so the client @@ -500,43 +500,11 @@ func (s *Service) ProcessFilesStreaming( EmbedMS: time.Since(embedStart).Milliseconds(), }) - // Per-file SAVEPOINT so a partial failure rolls back only this file. - // savepointName is derived from filesAccepted (monotonically increasing - // within the tx) so nested savepoints cannot collide. - savepointName := fmt.Sprintf("f%d", filesAccepted) - if _, err := tx.ExecContext(ctx, "SAVEPOINT "+savepointName); err != nil { - return filesAccepted, batchChunks, sess.filesProcessed, fmt.Errorf("savepoint: %w", err) - } - // Rollback helper for the failure path below. - rollback := func() { - _, _ = tx.ExecContext(ctx, "ROLLBACK TO SAVEPOINT "+savepointName) - _, _ = tx.ExecContext(ctx, "RELEASE SAVEPOINT "+savepointName) - } - - // Delete old symbols/refs before insert (matches Python). - if err := symbolindex.DeleteByFileTx(ctx, tx, projectPath, fp.Path); err != nil { - s.logger.Error("indexer: symbols delete by file", "path", fp.Path, "err", err) - rollback() - continue - } - if err := symbolindex.DeleteRefsByFileTx(ctx, tx, projectPath, fp.Path); err != nil { - s.logger.Error("indexer: refs delete by file", "path", fp.Path, "err", err) - rollback() - continue - } - - // Vector store has no transactions — delete is best-effort. If the - // savepoint rolls back below we leave any vectors in place; they get - // overwritten on the next successful indexing of this file. - if s.vs != nil { - if err := s.vs.DeleteByFile(ctx, projectPath, fp.Path); err != nil { - s.logger.Error("indexer: vectorstore delete by file", "path", fp.Path, "err", err) - rollback() - continue - } - } - - // Upsert chunks. + // Vector store has no transactions — do its writes BEFORE opening + // the DB tx so the writer lock is acquired strictly for the DB part. + // If the DB tx fails we leave the new vectors in place; next reindex + // will see file_hashes was not updated and re-process the file, + // overwriting them. Acceptable for an infrequent failure mode. vsChunks := make([]vectorstore.Chunk, len(chunks)) for i, c := range chunks { sym := "" @@ -554,36 +522,95 @@ func (s *Service) ProcessFilesStreaming( } } if s.vs != nil { + if err := s.vs.DeleteByFile(ctx, projectPath, fp.Path); err != nil { + s.logger.Error("indexer: vectorstore delete by file", "path", fp.Path, "err", err) + progressSend(progress, ProgressEvent{ + Event: EventFileError, + Path: fp.Path, + Message: "vectorstore delete: " + err.Error(), + Fatal: false, + }) + continue + } if err := s.vs.UpsertChunks(ctx, projectPath, vsChunks, embs); err != nil { s.logger.Error("indexer: vectorstore upsert", "path", fp.Path, "err", err) - rollback() + progressSend(progress, ProgressEvent{ + Event: EventFileError, + Path: fp.Path, + Message: "vectorstore upsert: " + err.Error(), + Fatal: false, + }) continue } } - if _, err := tx.ExecContext(ctx, - `INSERT OR REPLACE INTO file_hashes - (project_path, file_path, content_hash, indexed_at) - VALUES (?, ?, ?, ?)`, - projectPath, fp.Path, fp.ContentHash, now, - ); err != nil { - s.logger.Error("indexer: file_hashes upsert", "path", fp.Path, "err", err) - rollback() - continue + // Build chunksfts payload from the same chunks we just pushed to + // chromem. The FTS side reuses content + metadata; embeddings stay + // on the vector side only. + ftsChunks := make([]chunksfts.Chunk, len(vsChunks)) + for i, c := range vsChunks { + ftsChunks[i] = chunksfts.Chunk{ + Content: c.Content, + FilePath: c.FilePath, + StartLine: c.StartLine, + EndLine: c.EndLine, + ChunkType: c.ChunkType, + SymbolName: c.SymbolName, + Language: c.Language, + } } - if _, err := tx.ExecContext(ctx, "RELEASE SAVEPOINT "+savepointName); err != nil { - emitTerminal(progress, ProgressEvent{ - Event: EventError, - Message: "release savepoint: " + err.Error(), - Fatal: true, + // Per-file DB tx: delete-old + insert-new symbols/refs + chunks_fts + // + file_hashes commit atomically. Anonymous func so the deferred + // rollback fires per file rather than at function return. + fileErr := func() error { + ftx, err := s.db.BeginTx(ctx, nil) + if err != nil { + return fmt.Errorf("begin file tx: %w", err) + } + defer ftx.Rollback() //nolint:errcheck // no-op after commit + + if err := symbolindex.DeleteByFileTx(ctx, ftx, projectPath, fp.Path); err != nil { + return fmt.Errorf("symbols delete: %w", err) + } + if err := symbolindex.DeleteRefsByFileTx(ctx, ftx, projectPath, fp.Path); err != nil { + return fmt.Errorf("refs delete: %w", err) + } + if len(fileSymbols) > 0 { + if err := symbolindex.UpsertSymbolsTx(ctx, ftx, projectPath, fileSymbols); err != nil { + return fmt.Errorf("upsert symbols: %w", err) + } + } + if len(fileRefs) > 0 { + if err := symbolindex.UpsertReferencesTx(ctx, ftx, projectPath, fileRefs); err != nil { + return fmt.Errorf("upsert refs: %w", err) + } + } + if err := chunksfts.UpsertByFileTx(ctx, ftx, projectPath, fp.Path, ftsChunks); err != nil { + return fmt.Errorf("upsert chunks_fts: %w", err) + } + if _, err := ftx.ExecContext(ctx, + `INSERT OR REPLACE INTO file_hashes + (project_path, file_path, content_hash, indexed_at) + VALUES (?, ?, ?, ?)`, + projectPath, fp.Path, fp.ContentHash, now, + ); err != nil { + return fmt.Errorf("file_hashes upsert: %w", err) + } + return ftx.Commit() + }() + if fileErr != nil { + s.logger.Error("indexer: file tx failed", "path", fp.Path, "err", fileErr) + progressSend(progress, ProgressEvent{ + Event: EventFileError, + Path: fp.Path, + Message: fileErr.Error(), + Fatal: false, }) - return filesAccepted, batchChunks, sess.filesProcessed, fmt.Errorf("release savepoint: %w", err) + continue } batchChunks += len(chunks) - batchSymbols = append(batchSymbols, fileSymbols...) - batchRefs = append(batchRefs, fileRefs...) s.mu.Lock() sess.languagesSeen[language] = struct{}{} @@ -597,40 +624,6 @@ func (s *Service) ProcessFilesStreaming( }) } - // M2 — these upserts are part of the outer tx. Any failure returns the - // whole batch's work via deferred tx.Rollback, so the session counters - // below only advance on a successful commit. - if len(batchSymbols) > 0 { - if err := symbolindex.UpsertSymbolsTx(ctx, tx, projectPath, batchSymbols); err != nil { - emitTerminal(progress, ProgressEvent{ - Event: EventError, - Message: "upsert symbols: " + err.Error(), - Fatal: true, - }) - return filesAccepted, batchChunks, sess.filesProcessed, fmt.Errorf("upsert symbols: %w", err) - } - } - if len(batchRefs) > 0 { - if err := symbolindex.UpsertReferencesTx(ctx, tx, projectPath, batchRefs); err != nil { - emitTerminal(progress, ProgressEvent{ - Event: EventError, - Message: "upsert refs: " + err.Error(), - Fatal: true, - }) - return filesAccepted, batchChunks, sess.filesProcessed, fmt.Errorf("upsert refs: %w", err) - } - } - - if err := tx.Commit(); err != nil { - emitTerminal(progress, ProgressEvent{ - Event: EventError, - Message: "commit batch: " + err.Error(), - Fatal: true, - }) - return filesAccepted, batchChunks, sess.filesProcessed, fmt.Errorf("commit batch: %w", err) - } - txCommitted = true - s.mu.Lock() sess.filesProcessed += filesAccepted sess.chunksCreated += batchChunks @@ -691,6 +684,9 @@ func (s *Service) FinishIndexing( if err := symbolindex.DeleteRefsByFile(ctx, s.db, projectPath, dp); err != nil { s.logger.Warn("indexer: refs delete by file (finish)", "path", dp, "err", err) } + if err := deleteChunksFTSByFile(ctx, s.db, projectPath, dp); err != nil { + s.logger.Warn("indexer: chunks_fts delete by file (finish)", "path", dp, "err", err) + } if _, err := s.db.ExecContext(ctx, `DELETE FROM file_hashes WHERE project_path = ? AND file_path = ?`, projectPath, dp, @@ -978,3 +974,19 @@ func marshalJSONStringArray(langs []string) string { return b.String() } +// deleteChunksFTSByFile is the standalone-db wrapper used by the +// FinishIndexing deletedPaths loop, which operates outside the per-file +// tx. Internally it opens a short tx so chunks_fts and chunks_meta +// stay consistent if one of the two DELETEs fails. +func deleteChunksFTSByFile(ctx context.Context, db *sql.DB, projectPath, filePath string) error { + tx, err := db.BeginTx(ctx, nil) + if err != nil { + return err + } + defer tx.Rollback() //nolint:errcheck // no-op after commit + if err := chunksfts.DeleteByFileTx(ctx, tx, projectPath, filePath); err != nil { + return err + } + return tx.Commit() +} + diff --git a/server/internal/jobs/jobs.go b/server/internal/jobs/jobs.go new file mode 100644 index 0000000..989de93 --- /dev/null +++ b/server/internal/jobs/jobs.go @@ -0,0 +1,473 @@ +// Package jobs implements the persistent worker queue that drives the +// workspaces feature's long-running operations (clone, fetch, index, +// build-call-graph, community-recompute). +// +// Why persistent: clone+index can take minutes, webhook bursts can be +// frequent, and a single binary that operators restart needs to keep its +// work plan across SIGTERM. The cost is one polling SELECT every poll +// interval, which is irrelevant at the concurrency levels we run +// (default 2 workers). +// +// Dedup: every job may carry a `dedupe_key`. The schema has a partial +// unique index on (dedupe_key) WHERE status IN ('pending','running'), so +// 50 webhook deliveries for the same repo collapse into 1 pending job. The +// service translates the resulting UNIQUE error into a no-op return. +package jobs + +import ( + "context" + "database/sql" + "encoding/json" + "errors" + "fmt" + "log/slog" + "strings" + "sync" + "time" + + "github.com/google/uuid" +) + +// Status constants. +const ( + StatusPending = "pending" + StatusRunning = "running" + StatusCompleted = "completed" + StatusFailed = "failed" +) + +// Errors. +var ( + ErrNotFound = errors.New("job not found") + // ErrDuplicate is returned by Enqueue when a job with the same + // dedupe_key is already pending or running. Callers usually treat + // this as a soft no-op — the work is already on the queue. + ErrDuplicate = errors.New("job with this dedupe_key is already active") +) + +// Job is the wire view of a row. +type Job struct { + ID string + Type string + Status string + DedupeKey string // empty when not set + Payload []byte // raw JSON + Attempts int + MaxAttempts int + LastError string + ScheduledAt time.Time + StartedAt *time.Time + CompletedAt *time.Time + CreatedAt time.Time +} + +// EnqueueRequest is the input to Enqueue. Payload is encoded as JSON; pass +// a struct, map, or pre-marshalled []byte (which is passed through). +type EnqueueRequest struct { + Type string + DedupeKey string + Payload any + MaxAttempts int // default 3 when 0 + Delay time.Duration // default 0 +} + +// Handler is the signature for job type registrations. The handler runs +// inside the worker goroutine; long-running handlers MUST honour ctx for +// graceful shutdown. +type Handler func(ctx context.Context, job Job) error + +// Service is the SQLite-backed queue + in-process worker pool. +type Service struct { + db *sql.DB + logger *slog.Logger + concurrency int + pollEvery time.Duration + + mu sync.RWMutex + handlers map[string]Handler + + stop chan struct{} + done chan struct{} +} + +// Options configures Open. Sensible defaults are filled in on zero values. +type Options struct { + Concurrency int // default: 2 + PollEvery time.Duration // default: 1s + Logger *slog.Logger +} + +// New returns a Service. Workers are NOT started yet — call Start. +func New(db *sql.DB, opts Options) *Service { + if opts.Concurrency <= 0 { + opts.Concurrency = 2 + } + if opts.PollEvery <= 0 { + opts.PollEvery = time.Second + } + if opts.Logger == nil { + opts.Logger = slog.Default() + } + return &Service{ + db: db, + logger: opts.Logger, + concurrency: opts.Concurrency, + pollEvery: opts.PollEvery, + handlers: make(map[string]Handler), + stop: make(chan struct{}), + done: make(chan struct{}), + } +} + +// Register binds a handler to a job type. Re-registering a type replaces +// the prior handler. Must be called BEFORE Start — handlers added after +// Start are still picked up on the next poll, but the gap is racy with +// existing jobs of that type. +func (s *Service) Register(jobType string, h Handler) { + s.mu.Lock() + defer s.mu.Unlock() + s.handlers[jobType] = h +} + +// Start launches the worker pool. Idempotent-but-not-thread-safe — call +// once per Service. The returned function is a Stop alias for symmetry +// with other supervisor patterns in the codebase. +func (s *Service) Start(ctx context.Context) { + go s.runPool(ctx) +} + +// Stop signals the worker pool to drain. Blocks until all in-flight jobs +// finish or ctx is cancelled. Safe to call multiple times. +func (s *Service) Stop(ctx context.Context) error { + select { + case <-s.stop: + // already stopping + default: + close(s.stop) + } + select { + case <-s.done: + return nil + case <-ctx.Done(): + return ctx.Err() + } +} + +// runPool is the supervisor goroutine — it owns the worker count and +// fans tickets out to per-worker goroutines via a buffered channel. +func (s *Service) runPool(ctx context.Context) { + defer close(s.done) + + // Per-worker ticker is cheaper than a single ticker fanned out. + wg := sync.WaitGroup{} + for i := 0; i < s.concurrency; i++ { + wg.Add(1) + go func(workerID int) { + defer wg.Done() + s.workerLoop(ctx, workerID) + }(i) + } + wg.Wait() +} + +func (s *Service) workerLoop(ctx context.Context, workerID int) { + tick := time.NewTicker(s.pollEvery) + defer tick.Stop() + for { + select { + case <-ctx.Done(): + return + case <-s.stop: + return + case <-tick.C: + } + // Pull one job per tick. Higher throughput would benefit from a + // LIMIT batch, but the work is dominated by clone/index time, + // not queue overhead — keep this simple. + job, err := s.claimNext(ctx) + if err != nil { + s.logger.Error("jobs: claim failed", "worker", workerID, "err", err) + continue + } + if job == nil { + continue + } + s.execute(ctx, workerID, *job) + } +} + +// claimNext atomically picks the oldest pending job whose scheduled_at +// has elapsed, marks it running, and returns it. Returns (nil, nil) when +// the queue is empty. +// +// We use a SELECT … FOR UPDATE-ish pattern via an UPDATE … WHERE id IN +// (SELECT … LIMIT 1) … RETURNING. modernc.org/sqlite supports RETURNING +// as of 1.27 (already in go.mod). Concurrent claims race on the inner +// SELECT but the outer WHERE id = ? AND status = 'pending' ensures only +// one worker wins. Lost races re-poll on the next tick. +func (s *Service) claimNext(ctx context.Context) (*Job, error) { + now := time.Now().UTC().Format(time.RFC3339Nano) + row := s.db.QueryRowContext(ctx, ` + UPDATE jobs + SET status = 'running', + started_at = ?, + attempts = attempts + 1 + WHERE id = ( + SELECT id FROM jobs + WHERE status = 'pending' + AND scheduled_at <= ? + ORDER BY scheduled_at, created_at + LIMIT 1 + ) + RETURNING id, type, status, dedupe_key, payload, attempts, max_attempts, + last_error, scheduled_at, started_at, completed_at, created_at`, + now, now) + job, err := scanRow(row) + if err != nil { + if errors.Is(err, sql.ErrNoRows) { + return nil, nil // queue empty + } + return nil, fmt.Errorf("claim job: %w", err) + } + return &job, nil +} + +func (s *Service) execute(ctx context.Context, workerID int, job Job) { + s.mu.RLock() + h, ok := s.handlers[job.Type] + s.mu.RUnlock() + if !ok { + s.markFailed(ctx, job, fmt.Errorf("no handler registered for type %q", job.Type), false) + return + } + s.logger.Info("jobs: running", + "worker", workerID, + "id", job.ID, + "type", job.Type, + "attempt", job.Attempts, + "of", job.MaxAttempts) + + // Run the handler — never panic the worker, capture as a failed job. + var err error + func() { + defer func() { + if r := recover(); r != nil { + err = fmt.Errorf("handler panic: %v", r) + } + }() + err = h(ctx, job) + }() + + if err == nil { + s.markCompleted(ctx, job) + s.logger.Info("jobs: completed", + "worker", workerID, + "id", job.ID, + "type", job.Type) + return + } + + retry := job.Attempts < job.MaxAttempts + s.markFailed(ctx, job, err, retry) + if retry { + s.logger.Warn("jobs: failed, will retry", + "worker", workerID, + "id", job.ID, + "type", job.Type, + "attempts", job.Attempts, + "err", err) + } else { + s.logger.Error("jobs: failed permanently", + "worker", workerID, + "id", job.ID, + "type", job.Type, + "err", err) + } +} + +func (s *Service) markCompleted(ctx context.Context, job Job) { + now := time.Now().UTC().Format(time.RFC3339Nano) + if _, err := s.db.ExecContext(ctx, + `UPDATE jobs SET status = 'completed', completed_at = ?, last_error = NULL WHERE id = ?`, + now, job.ID); err != nil { + s.logger.Error("jobs: mark completed failed", "id", job.ID, "err", err) + } +} + +// markFailed transitions to 'failed' OR back to 'pending' (when retry=true) +// with a small linear backoff (attempts × 10s). +func (s *Service) markFailed(ctx context.Context, job Job, err error, retry bool) { + now := time.Now().UTC().Format(time.RFC3339Nano) + msg := err.Error() + if len(msg) > 1024 { + msg = msg[:1024] + } + if retry { + backoff := time.Duration(job.Attempts) * 10 * time.Second + newSchedule := time.Now().UTC().Add(backoff).Format(time.RFC3339Nano) + if _, qerr := s.db.ExecContext(ctx, + `UPDATE jobs SET status = 'pending', scheduled_at = ?, last_error = ? WHERE id = ?`, + newSchedule, msg, job.ID); qerr != nil { + s.logger.Error("jobs: re-enqueue failed", "id", job.ID, "err", qerr) + } + return + } + if _, qerr := s.db.ExecContext(ctx, + `UPDATE jobs SET status = 'failed', completed_at = ?, last_error = ? WHERE id = ?`, + now, msg, job.ID); qerr != nil { + s.logger.Error("jobs: mark failed failed", "id", job.ID, "err", qerr) + } +} + +// Enqueue inserts a new job. ErrDuplicate when dedupe_key collides with an +// already-active job. +func (s *Service) Enqueue(ctx context.Context, req EnqueueRequest) (Job, error) { + if strings.TrimSpace(req.Type) == "" { + return Job{}, fmt.Errorf("job type required") + } + payload, err := marshalPayload(req.Payload) + if err != nil { + return Job{}, fmt.Errorf("encode payload: %w", err) + } + maxAttempts := req.MaxAttempts + if maxAttempts <= 0 { + maxAttempts = 3 + } + id := uuid.NewString() + now := time.Now().UTC() + scheduledAt := now.Add(req.Delay).Format(time.RFC3339Nano) + createdAt := now.Format(time.RFC3339Nano) + + _, err = s.db.ExecContext(ctx, + `INSERT INTO jobs (id, type, status, dedupe_key, payload, attempts, max_attempts, + scheduled_at, created_at) + VALUES (?, ?, 'pending', ?, ?, 0, ?, ?, ?)`, + id, req.Type, nullableString(req.DedupeKey), string(payload), maxAttempts, + scheduledAt, createdAt, + ) + if err != nil { + if isUniqueConstraintViolation(err) { + return Job{}, ErrDuplicate + } + return Job{}, fmt.Errorf("insert job: %w", err) + } + return s.GetByID(ctx, id) +} + +// GetByID returns a single job. +func (s *Service) GetByID(ctx context.Context, id string) (Job, error) { + row := s.db.QueryRowContext(ctx, ` + SELECT id, type, status, dedupe_key, payload, attempts, max_attempts, + last_error, scheduled_at, started_at, completed_at, created_at + FROM jobs WHERE id = ?`, id) + job, err := scanRow(row) + if err != nil { + if errors.Is(err, sql.ErrNoRows) { + return Job{}, ErrNotFound + } + return Job{}, err + } + return job, nil +} + +// List returns jobs filtered by status / type. Empty filters mean "any". +// Always newest-first, capped at limit (default 100). +func (s *Service) List(ctx context.Context, status, jobType string, limit int) ([]Job, error) { + if limit <= 0 { + limit = 100 + } + q := `SELECT id, type, status, dedupe_key, payload, attempts, max_attempts, + last_error, scheduled_at, started_at, completed_at, created_at + FROM jobs WHERE 1=1` + var args []any + if status != "" { + q += " AND status = ?" + args = append(args, status) + } + if jobType != "" { + q += " AND type = ?" + args = append(args, jobType) + } + q += " ORDER BY created_at DESC LIMIT ?" + args = append(args, limit) + rows, err := s.db.QueryContext(ctx, q, args...) + if err != nil { + return nil, fmt.Errorf("list jobs: %w", err) + } + defer rows.Close() + out := []Job{} + for rows.Next() { + j, err := scanRow(rows) + if err != nil { + return nil, err + } + out = append(out, j) + } + return out, rows.Err() +} + +// --- helpers --- + +func scanRow(r interface{ Scan(dest ...any) error }) (Job, error) { + var ( + j Job + dedupe, lastErr, startedAt, completedAt sql.NullString + payload string + scheduledAt, createdAt string + ) + err := r.Scan(&j.ID, &j.Type, &j.Status, &dedupe, &payload, + &j.Attempts, &j.MaxAttempts, &lastErr, + &scheduledAt, &startedAt, &completedAt, &createdAt) + if err != nil { + return Job{}, err + } + j.DedupeKey = dedupe.String + j.Payload = []byte(payload) + j.LastError = lastErr.String + j.ScheduledAt, _ = time.Parse(time.RFC3339Nano, scheduledAt) + j.CreatedAt, _ = time.Parse(time.RFC3339Nano, createdAt) + if startedAt.Valid { + t, _ := time.Parse(time.RFC3339Nano, startedAt.String) + j.StartedAt = &t + } + if completedAt.Valid { + t, _ := time.Parse(time.RFC3339Nano, completedAt.String) + j.CompletedAt = &t + } + return j, nil +} + +func marshalPayload(p any) ([]byte, error) { + if p == nil { + return []byte("{}"), nil + } + if raw, ok := p.([]byte); ok { + return raw, nil + } + return json.Marshal(p) +} + +func nullableString(s string) any { + if s == "" { + return nil + } + return s +} + +func isUniqueConstraintViolation(err error) bool { + if err == nil { + return false + } + msg := err.Error() + return strings.Contains(msg, "UNIQUE constraint failed") || + strings.Contains(msg, "constraint failed: UNIQUE") +} + +// UnmarshalPayload is a tiny convenience for handlers: pass the job and +// a pointer to a typed struct. +func UnmarshalPayload(job Job, out any) error { + if len(job.Payload) == 0 { + return nil + } + return json.Unmarshal(job.Payload, out) +} diff --git a/server/internal/jobs/jobs_test.go b/server/internal/jobs/jobs_test.go new file mode 100644 index 0000000..4491eff --- /dev/null +++ b/server/internal/jobs/jobs_test.go @@ -0,0 +1,213 @@ +package jobs + +import ( + "context" + "errors" + "sync" + "sync/atomic" + "testing" + "time" + + "github.com/dvcdsys/code-index/server/internal/db" +) + +func openSvc(t *testing.T) *Service { + t.Helper() + d, err := db.Open(":memory:") + if err != nil { + t.Fatalf("open db: %v", err) + } + t.Cleanup(func() { _ = d.Close() }) + return New(d, Options{Concurrency: 1, PollEvery: 20 * time.Millisecond}) +} + +func TestEnqueueAndExecute(t *testing.T) { + ctx, cancel := context.WithCancel(context.Background()) + defer cancel() + svc := openSvc(t) + + var ran atomic.Bool + var receivedPayload atomic.Value + svc.Register("test_job", func(_ context.Context, job Job) error { + var p struct{ Name string } + _ = UnmarshalPayload(job, &p) + receivedPayload.Store(p.Name) + ran.Store(true) + return nil + }) + + if _, err := svc.Enqueue(ctx, EnqueueRequest{ + Type: "test_job", + Payload: map[string]string{"Name": "hello"}, + }); err != nil { + t.Fatalf("Enqueue: %v", err) + } + svc.Start(ctx) + defer func() { + stopCtx, c := context.WithTimeout(context.Background(), time.Second) + defer c() + _ = svc.Stop(stopCtx) + }() + + waitFor(t, time.Second, ran.Load) + if got, _ := receivedPayload.Load().(string); got != "hello" { + t.Fatalf("payload not delivered, got %q", got) + } +} + +func TestRetryOnError(t *testing.T) { + ctx, cancel := context.WithCancel(context.Background()) + defer cancel() + svc := openSvc(t) + // Shorten the linear backoff path by setting max_attempts > 1 but + // keeping attempts small. The backoff is attempts × 10s; first retry + // fires at +10s which is too slow for a unit test, so we'll just + // check the row transitions to status=failed eventually rather than + // waiting for retry. + var attempts atomic.Int32 + svc.Register("flaky_job", func(_ context.Context, _ Job) error { + attempts.Add(1) + return errors.New("boom") + }) + j, err := svc.Enqueue(ctx, EnqueueRequest{Type: "flaky_job", MaxAttempts: 1}) + if err != nil { + t.Fatalf("Enqueue: %v", err) + } + svc.Start(ctx) + defer func() { + stopCtx, c := context.WithTimeout(context.Background(), time.Second) + defer c() + _ = svc.Stop(stopCtx) + }() + + waitFor(t, time.Second, func() bool { + got, err := svc.GetByID(ctx, j.ID) + if err != nil { + return false + } + return got.Status == StatusFailed + }) + if a := attempts.Load(); a < 1 { + t.Fatalf("expected >=1 attempts, got %d", a) + } +} + +func TestDedupeKey(t *testing.T) { + ctx := context.Background() + svc := openSvc(t) + if _, err := svc.Enqueue(ctx, EnqueueRequest{ + Type: "x", DedupeKey: "k1", Payload: nil, + }); err != nil { + t.Fatalf("first: %v", err) + } + _, err := svc.Enqueue(ctx, EnqueueRequest{ + Type: "x", DedupeKey: "k1", Payload: nil, + }) + if !errors.Is(err, ErrDuplicate) { + t.Fatalf("expected ErrDuplicate, got %v", err) + } +} + +func TestDedupeKeyAllowsAfterCompletion(t *testing.T) { + ctx, cancel := context.WithCancel(context.Background()) + defer cancel() + svc := openSvc(t) + var done sync.WaitGroup + done.Add(1) + svc.Register("x", func(_ context.Context, _ Job) error { + done.Done() + return nil + }) + if _, err := svc.Enqueue(ctx, EnqueueRequest{ + Type: "x", DedupeKey: "k", Payload: nil, + }); err != nil { + t.Fatalf("first: %v", err) + } + svc.Start(ctx) + defer func() { + stopCtx, c := context.WithTimeout(context.Background(), time.Second) + defer c() + _ = svc.Stop(stopCtx) + }() + done.Wait() + // Tiny pause to allow markCompleted to settle. + time.Sleep(100 * time.Millisecond) + if _, err := svc.Enqueue(ctx, EnqueueRequest{ + Type: "x", DedupeKey: "k", Payload: nil, + }); err != nil { + t.Fatalf("second enqueue (after completion) should succeed, got %v", err) + } +} + +func TestUnregisteredTypeFailsLoudly(t *testing.T) { + ctx, cancel := context.WithCancel(context.Background()) + defer cancel() + svc := openSvc(t) + // Force max_attempts=1 so the job goes straight to failed without retry. + j, err := svc.Enqueue(ctx, EnqueueRequest{Type: "missing", MaxAttempts: 1}) + if err != nil { + t.Fatalf("Enqueue: %v", err) + } + svc.Start(ctx) + defer func() { + stopCtx, c := context.WithTimeout(context.Background(), time.Second) + defer c() + _ = svc.Stop(stopCtx) + }() + waitFor(t, time.Second, func() bool { + got, err := svc.GetByID(ctx, j.ID) + return err == nil && got.Status == StatusFailed + }) +} + +func TestPanicRecovered(t *testing.T) { + ctx, cancel := context.WithCancel(context.Background()) + defer cancel() + svc := openSvc(t) + svc.Register("crash", func(_ context.Context, _ Job) error { + panic("oops") + }) + j, err := svc.Enqueue(ctx, EnqueueRequest{Type: "crash", MaxAttempts: 1}) + if err != nil { + t.Fatalf("Enqueue: %v", err) + } + svc.Start(ctx) + defer func() { + stopCtx, c := context.WithTimeout(context.Background(), time.Second) + defer c() + _ = svc.Stop(stopCtx) + }() + waitFor(t, time.Second, func() bool { + got, err := svc.GetByID(ctx, j.ID) + return err == nil && got.Status == StatusFailed + }) +} + +func TestList(t *testing.T) { + ctx := context.Background() + svc := openSvc(t) + for i := 0; i < 3; i++ { + if _, err := svc.Enqueue(ctx, EnqueueRequest{Type: "x"}); err != nil { + t.Fatalf("Enqueue: %v", err) + } + } + all, err := svc.List(ctx, "", "", 10) + if err != nil { + t.Fatalf("List: %v", err) + } + if len(all) != 3 { + t.Fatalf("expected 3 jobs, got %d", len(all)) + } +} + +func waitFor(t *testing.T, max time.Duration, fn func() bool) { + t.Helper() + deadline := time.Now().Add(max) + for time.Now().Before(deadline) { + if fn() { + return + } + time.Sleep(20 * time.Millisecond) + } + t.Fatalf("condition not met within %s", max) +} diff --git a/server/internal/projects/projects.go b/server/internal/projects/projects.go index e8212c4..559311e 100644 --- a/server/internal/projects/projects.go +++ b/server/internal/projects/projects.go @@ -12,6 +12,8 @@ import ( "fmt" "strings" "time" + + "github.com/dvcdsys/code-index/server/internal/chunksfts" ) // ErrNotFound is returned when a project does not exist. @@ -263,12 +265,27 @@ func Patch(ctx context.Context, db *sql.DB, hostPath string, req UpdateRequest) } // Delete removes a project and its cascading records. Returns ErrNotFound if absent. +// +// chunks_meta and chunks_fts are not bound to projects via FK because +// chunks_fts is a virtual table and cannot participate in foreign keys. +// We wipe them in the same tx that drops the projects row so a failure +// rolls back the partial state. func Delete(ctx context.Context, db *sql.DB, hostPath string) error { if _, err := Get(ctx, db, hostPath); err != nil { return err } - _, err := db.ExecContext(ctx, `DELETE FROM projects WHERE host_path = ?`, hostPath) - return err + tx, err := db.BeginTx(ctx, nil) + if err != nil { + return fmt.Errorf("begin delete tx: %w", err) + } + defer tx.Rollback() //nolint:errcheck // no-op after commit + if err := chunksfts.DeleteByProjectTx(ctx, tx, hostPath); err != nil { + return err + } + if _, err := tx.ExecContext(ctx, `DELETE FROM projects WHERE host_path = ?`, hostPath); err != nil { + return fmt.Errorf("delete project: %w", err) + } + return tx.Commit() } // --------------------------------------------------------------------------- diff --git a/server/internal/repocloner/repocloner.go b/server/internal/repocloner/repocloner.go new file mode 100644 index 0000000..fef0bea --- /dev/null +++ b/server/internal/repocloner/repocloner.go @@ -0,0 +1,204 @@ +// Package repocloner is the workspaces feature's git boundary. It wraps +// go-git so the rest of the codebase doesn't need to know about plumbing +// objects, references, or storage layers. +// +// Why go-git (not `git` shell-out): the production CUDA image runs on +// distroless/cc-debian13 which has no shell and no git binary. Pulling +// go-git into the binary keeps the runtime image untouched. +// +// What this package does: +// - Clone a branch (public OR PAT-authenticated) +// - Fetch + reset to remote HEAD on subsequent runs +// - Report the current HEAD SHA (for last_sha bookkeeping) +// - Resolve a "github.com/owner/repo" + branch to a deterministic local +// directory under DataDir/repos/{repo_id}/ +// +// Errors are deliberately coarse — the worker pool surfaces them in the +// job row and the dashboard renders them verbatim. There's no point +// distinguishing "wrong PAT" from "branch missing" deep in the call chain. +package repocloner + +import ( + "context" + "errors" + "fmt" + "os" + "path/filepath" + "strings" + + "github.com/go-git/go-git/v5" + "github.com/go-git/go-git/v5/config" + "github.com/go-git/go-git/v5/plumbing" + "github.com/go-git/go-git/v5/plumbing/transport/http" +) + +// ErrAlreadyUpToDate signals a fetch found no new commits. Callers can +// short-circuit reindex on this. +var ErrAlreadyUpToDate = errors.New("repo already up to date") + +// CloneOptions parameterises a clone or fetch. +type CloneOptions struct { + // GitHubURL is the canonical HTTPS URL — "https://github.com/owner/repo" + // (with or without ".git" suffix; both work). + GitHubURL string + Branch string + // PAT, when non-empty, is sent as HTTP BasicAuth with username + // "x-access-token" — works for fine-grained tokens, classic PATs, and + // GitHub App installation tokens alike. + PAT string + // LocalDir is the absolute destination. Created if missing; reused + // (fetch+reset) if it already contains a git repository for the same + // remote URL. + LocalDir string +} + +// Result is what handlers care about post-clone. +type Result struct { + HeadSHA string +} + +// CloneOrFetch clones the repo when LocalDir is empty, otherwise fetches +// + resets the local checkout to origin/{branch}. Returns the HEAD SHA +// after the operation completes. +// +// The caller is responsible for choosing a LocalDir that won't collide +// across repos — typically `/repos//` keyed by the +// workspace_repos row id (NOT the github URL, which can change with +// rename + redirect). +func CloneOrFetch(ctx context.Context, opts CloneOptions) (Result, error) { + if strings.TrimSpace(opts.GitHubURL) == "" { + return Result{}, fmt.Errorf("GitHubURL required") + } + if strings.TrimSpace(opts.Branch) == "" { + return Result{}, fmt.Errorf("Branch required") + } + if strings.TrimSpace(opts.LocalDir) == "" { + return Result{}, fmt.Errorf("LocalDir required") + } + url := normaliseURL(opts.GitHubURL) + auth := authFor(opts.PAT) + + // First-time clone path: LocalDir is missing or empty. + if needsClone(opts.LocalDir) { + if err := os.MkdirAll(opts.LocalDir, 0o755); err != nil { + return Result{}, fmt.Errorf("mkdir clone target: %w", err) + } + repo, err := git.PlainCloneContext(ctx, opts.LocalDir, false, &git.CloneOptions{ + URL: url, + Auth: auth, + ReferenceName: plumbing.NewBranchReferenceName(opts.Branch), + SingleBranch: true, + Depth: 1, // shallow — minimises bandwidth + disk + }) + if err != nil { + // Cleanup so the next retry isn't stuck with a half-clone. + _ = os.RemoveAll(opts.LocalDir) + return Result{}, fmt.Errorf("clone: %w", err) + } + head, err := repo.Head() + if err != nil { + return Result{}, fmt.Errorf("resolve HEAD: %w", err) + } + return Result{HeadSHA: head.Hash().String()}, nil + } + + // Reuse path: open the existing repo, ensure the remote matches, fetch, + // reset to origin/{branch}. + repo, err := git.PlainOpen(opts.LocalDir) + if err != nil { + return Result{}, fmt.Errorf("open existing repo at %s: %w", opts.LocalDir, err) + } + if err := ensureRemote(repo, url); err != nil { + return Result{}, err + } + + err = repo.FetchContext(ctx, &git.FetchOptions{ + Auth: auth, + RefSpecs: []config.RefSpec{config.RefSpec(fmt.Sprintf("+refs/heads/%s:refs/remotes/origin/%s", opts.Branch, opts.Branch))}, + Depth: 1, + Force: true, + }) + if err != nil && !errors.Is(err, git.NoErrAlreadyUpToDate) { + return Result{}, fmt.Errorf("fetch: %w", err) + } + + remoteRef, err := repo.Reference(plumbing.NewRemoteReferenceName("origin", opts.Branch), true) + if err != nil { + return Result{}, fmt.Errorf("resolve remote ref: %w", err) + } + + wt, err := repo.Worktree() + if err != nil { + return Result{}, fmt.Errorf("worktree: %w", err) + } + // Hard reset — discards any local mutation that crept in. Worker-managed + // checkouts have no human edits we'd want to preserve. + if err := wt.Reset(&git.ResetOptions{ + Commit: remoteRef.Hash(), + Mode: git.HardReset, + }); err != nil { + return Result{}, fmt.Errorf("reset: %w", err) + } + + head, err := repo.Head() + if err != nil { + return Result{}, fmt.Errorf("resolve HEAD post-reset: %w", err) + } + return Result{HeadSHA: head.Hash().String()}, nil +} + +// LocalDirFor returns the canonical path for a workspace_repo's checkout +// under dataDir. Centralised so the worker pool and the cleanup path agree. +// The id segment is treated as opaque (UUID/ULID), no validation here. +func LocalDirFor(dataDir, id string) string { + return filepath.Join(dataDir, "repos", id) +} + +// --- helpers --- + +func authFor(pat string) *http.BasicAuth { + pat = strings.TrimSpace(pat) + if pat == "" { + return nil + } + return &http.BasicAuth{ + // x-access-token is the username GitHub accepts for App / fine-grained + // token auth. Classic PATs also accept it. + Username: "x-access-token", + Password: pat, + } +} + +func normaliseURL(u string) string { + u = strings.TrimSpace(u) + u = strings.TrimSuffix(u, "/") + if !strings.HasSuffix(u, ".git") { + u += ".git" + } + return u +} + +func needsClone(dir string) bool { + gitDir := filepath.Join(dir, ".git") + if _, err := os.Stat(gitDir); err != nil { + // Either dir doesn't exist or has no .git/ — fresh clone path. + return true + } + return false +} + +func ensureRemote(repo *git.Repository, wantURL string) error { + remote, err := repo.Remote("origin") + if err != nil { + return fmt.Errorf("no origin remote: %w", err) + } + urls := remote.Config().URLs + if len(urls) == 0 || urls[0] != wantURL { + // Repo on disk points at a different URL — likely the workspace + // admin changed the github_url. Easiest fix: nuke + reclone, but + // the caller can't see that from here. Surface as an error so the + // operator at least sees the mismatch in the failed job. + return fmt.Errorf("local repo remote %v does not match expected %s", urls, wantURL) + } + return nil +} diff --git a/server/internal/repoindexer/repoindexer.go b/server/internal/repoindexer/repoindexer.go new file mode 100644 index 0000000..c9d3c55 --- /dev/null +++ b/server/internal/repoindexer/repoindexer.go @@ -0,0 +1,239 @@ +// Package repoindexer is the in-process driver that turns a cloned git +// repository on disk into an indexed cix project. It bridges the +// workspaces feature's job pipeline (clone_repo → ??? → workspace_repo +// status=indexed) and the existing three-phase indexer that drives all +// other code indexing in cix. +// +// Why in-process: the CLI traditionally walks the filesystem locally, +// hashes files, then streams batches to the server over HTTP. For the +// workspaces feature the "source" is already on the server's disk (the +// worker just cloned it). Going out-and-back through HTTP for that case +// would mean dragging the entire 3-phase NDJSON streaming machinery into +// the worker, when we can call the same Service.BeginIndexing / +// ProcessFiles / FinishIndexing methods directly. +// +// Boundary: this package owns walking + chunk-payload construction. It +// does NOT own embedding, tokenisation, vectorstore mutation — those +// continue to live in indexer.Service. If embeddings are not configured +// (e.g. in CI tests), the indexer service returns errors that propagate +// back as job failures. +package repoindexer + +import ( + "context" + "crypto/sha256" + "encoding/hex" + "errors" + "fmt" + "io/fs" + "log/slog" + "os" + "path/filepath" + "strings" + + "github.com/dvcdsys/code-index/server/internal/indexer" + "github.com/dvcdsys/code-index/server/internal/langdetect" +) + +// BatchSize controls how many files we hand the indexer per ProcessFiles +// call. A few hundred is typical CLI behaviour — keeps batch tx commits +// tight and bounds memory. +const BatchSize = 50 + +// FileFilter decides whether a candidate file should be indexed. Returning +// false skips it silently (no log noise). The default filter rejects +// node_modules, hidden dirs, common build outputs, and files over a size +// cap. +type FileFilter struct { + ExcludeDirs []string // path segment match — "node_modules", ".git", etc. + MaxFileSize int64 // bytes; 0 disables the check + // SkipBinaries, when true (default), drops files whose first 512 + // bytes contain a NUL — a cheap-and-cheerful proxy for "not text". + SkipBinaries bool +} + +// DefaultFilter returns a sensible default ruleset. Mirrors the CLI's +// "obvious junk to skip" list so per-repo settings remain consistent +// across local + workspace projects. +func DefaultFilter() FileFilter { + return FileFilter{ + ExcludeDirs: []string{ + ".git", "node_modules", ".venv", "__pycache__", + "dist", "build", ".next", ".cache", ".DS_Store", + "target", ".idea", ".vscode", ".gradle", + "vendor", // Go vendor — usually mirror of deps already indexed elsewhere + }, + MaxFileSize: 524288, // 512 KiB + SkipBinaries: true, + } +} + +// IndexDir runs a full end-to-end index pass against a local directory: +// BeginIndexing(full=true) → ProcessFiles batches → FinishIndexing. The +// projects table row must already exist (caller's responsibility — the +// worker creates it before clone_repo runs). On any error mid-way, the +// indexer's internal session timer cleans up after an hour; we don't +// explicitly cancel since "best-effort retry" is the expected pattern. +// +// Returns (filesIndexed, chunksCreated, err). +func IndexDir( + ctx context.Context, + idx *indexer.Service, + projectPath, rootDir string, + filter FileFilter, + logger *slog.Logger, +) (int, int, error) { + if idx == nil { + return 0, 0, errors.New("indexer not configured") + } + if logger == nil { + logger = slog.Default() + } + + runID, _, err := idx.BeginIndexing(ctx, projectPath, true) + if err != nil { + return 0, 0, fmt.Errorf("begin indexing: %w", err) + } + + totalFiles := 0 + totalChunks := 0 + totalAccepted := 0 + batch := make([]indexer.FilePayload, 0, BatchSize) + + flush := func() error { + if len(batch) == 0 { + return nil + } + _, chunks, _, ferr := idx.ProcessFiles(ctx, projectPath, runID, batch) + if ferr != nil { + return fmt.Errorf("process batch: %w", ferr) + } + totalAccepted += len(batch) + totalChunks += chunks + batch = batch[:0] + return nil + } + + err = filepath.WalkDir(rootDir, func(path string, d fs.DirEntry, walkErr error) error { + if walkErr != nil { + // Permission errors on a subtree shouldn't kill the whole index. + logger.Warn("repoindexer: walk skipped", "path", path, "err", walkErr) + if d != nil && d.IsDir() { + return fs.SkipDir + } + return nil + } + if d.IsDir() { + if filter.shouldSkipDir(path, rootDir, d.Name()) { + return fs.SkipDir + } + return nil + } + // Regular file. + rel, rerr := filepath.Rel(rootDir, path) + if rerr != nil { + return nil + } + rel = filepath.ToSlash(rel) + totalFiles++ + + fp, ok, ferr := buildPayload(path, rel, filter) + if ferr != nil { + logger.Warn("repoindexer: file dropped", "path", rel, "err", ferr) + return nil + } + if !ok { + return nil + } + batch = append(batch, fp) + if len(batch) >= BatchSize { + if err := flush(); err != nil { + return err + } + } + return nil + }) + if err != nil { + return totalAccepted, totalChunks, fmt.Errorf("walk: %w", err) + } + if err := flush(); err != nil { + return totalAccepted, totalChunks, err + } + + if _, _, _, ferr := idx.FinishIndexing(ctx, projectPath, runID, nil, totalFiles); ferr != nil { + return totalAccepted, totalChunks, fmt.Errorf("finish indexing: %w", ferr) + } + return totalAccepted, totalChunks, nil +} + +// buildPayload reads a file and turns it into an indexer.FilePayload. +// Returns (payload, true, nil) on success, (_, false, nil) when the file +// should be silently skipped (size cap, binary content), and an error on +// IO failure. +func buildPayload(absPath, relPath string, filter FileFilter) (indexer.FilePayload, bool, error) { + info, err := os.Stat(absPath) + if err != nil { + return indexer.FilePayload{}, false, err + } + if !info.Mode().IsRegular() { + return indexer.FilePayload{}, false, nil + } + if filter.MaxFileSize > 0 && info.Size() > filter.MaxFileSize { + return indexer.FilePayload{}, false, nil + } + + raw, err := os.ReadFile(absPath) + if err != nil { + return indexer.FilePayload{}, false, err + } + if filter.SkipBinaries && looksBinary(raw) { + return indexer.FilePayload{}, false, nil + } + + sum := sha256.Sum256(raw) + lang := langdetect.Detect(relPath) + if lang == "" { + return indexer.FilePayload{}, false, nil + } + + return indexer.FilePayload{ + Path: relPath, + Content: string(raw), + ContentHash: hex.EncodeToString(sum[:]), + Language: lang, + Size: len(raw), + }, true, nil +} + +// shouldSkipDir returns true when the directory should be pruned from +// the walk. We match on the leaf segment for the common cases +// (node_modules anywhere in the tree), not the full path. +func (f FileFilter) shouldSkipDir(absPath, rootDir, name string) bool { + if absPath == rootDir { + return false + } + for _, ex := range f.ExcludeDirs { + if strings.EqualFold(name, ex) { + return true + } + } + return false +} + +func looksBinary(b []byte) bool { + const probe = 512 + if len(b) < probe { + probe := len(b) + _ = probe + } + n := len(b) + if n > 512 { + n = 512 + } + for i := 0; i < n; i++ { + if b[i] == 0 { + return true + } + } + return false +} diff --git a/server/internal/repoindexer/repoindexer_test.go b/server/internal/repoindexer/repoindexer_test.go new file mode 100644 index 0000000..b039a42 --- /dev/null +++ b/server/internal/repoindexer/repoindexer_test.go @@ -0,0 +1,105 @@ +package repoindexer + +import ( + "os" + "path/filepath" + "testing" +) + +// Tests focus on the parts that don't require the embeddings sidecar: +// file filtering, walk pruning, and binary detection. The full pipeline +// (IndexDir) is exercised by the integration test in httpapi that +// stands up a fake indexer service. + +func TestBuildPayloadSkipsLargeFiles(t *testing.T) { + dir := t.TempDir() + bigPath := filepath.Join(dir, "big.go") + // 600 KiB — over the 512 KiB default cap. + if err := os.WriteFile(bigPath, make([]byte, 600*1024), 0o644); err != nil { + t.Fatalf("write: %v", err) + } + _, ok, err := buildPayload(bigPath, "big.go", DefaultFilter()) + if err != nil { + t.Fatalf("buildPayload: %v", err) + } + if ok { + t.Fatalf("expected oversized file to be skipped") + } +} + +func TestBuildPayloadSkipsBinaries(t *testing.T) { + dir := t.TempDir() + binPath := filepath.Join(dir, "x.go") + // Embed a NUL near the start — flips the binary heuristic. + content := append([]byte("package x\n"), 0x00, 0x01, 0x02) + if err := os.WriteFile(binPath, content, 0o644); err != nil { + t.Fatalf("write: %v", err) + } + _, ok, err := buildPayload(binPath, "x.go", DefaultFilter()) + if err != nil { + t.Fatalf("buildPayload: %v", err) + } + if ok { + t.Fatalf("expected binary-looking content to be skipped") + } +} + +func TestBuildPayloadAcceptsRegular(t *testing.T) { + dir := t.TempDir() + p := filepath.Join(dir, "main.go") + src := "package main\n\nfunc main() {}\n" + if err := os.WriteFile(p, []byte(src), 0o644); err != nil { + t.Fatalf("write: %v", err) + } + fp, ok, err := buildPayload(p, "main.go", DefaultFilter()) + if err != nil || !ok { + t.Fatalf("expected ok payload, got ok=%v err=%v", ok, err) + } + if fp.Language != "go" { + t.Fatalf("language detection wrong: %q", fp.Language) + } + if fp.Content != src { + t.Fatalf("content mismatch") + } + if fp.ContentHash == "" { + t.Fatalf("hash empty") + } + if fp.Path != "main.go" { + t.Fatalf("path mismatch: %q", fp.Path) + } +} + +func TestBuildPayloadSkipsUnknownLanguage(t *testing.T) { + dir := t.TempDir() + p := filepath.Join(dir, "README.unknown_ext_zz") + if err := os.WriteFile(p, []byte("plain text"), 0o644); err != nil { + t.Fatalf("write: %v", err) + } + _, ok, _ := buildPayload(p, "README.unknown_ext_zz", DefaultFilter()) + if ok { + t.Fatalf("expected unknown extension to be skipped") + } +} + +func TestShouldSkipDir(t *testing.T) { + f := DefaultFilter() + root := "/tmp/root" + cases := map[string]bool{ + "node_modules": true, + ".git": true, + ".venv": true, + "vendor": true, + "src": false, + "pkg": false, + } + for name, want := range cases { + got := f.shouldSkipDir(filepath.Join(root, name), root, name) + if got != want { + t.Errorf("shouldSkipDir(%q) = %v, want %v", name, got, want) + } + } + // Root itself must never be skipped, even if its name matches. + if f.shouldSkipDir(root, root, ".git") { + t.Fatal("root directory should never be skipped") + } +} diff --git a/server/internal/secrets/secrets.go b/server/internal/secrets/secrets.go new file mode 100644 index 0000000..eda2a65 --- /dev/null +++ b/server/internal/secrets/secrets.go @@ -0,0 +1,338 @@ +// Package secrets implements at-rest encryption for sensitive values (most +// notably GitHub Personal Access Tokens) stored in the SQLite database. +// +// Threat model: an attacker who lifts the SQLite file off disk should NOT be +// able to recover GitHub tokens without also obtaining the encryption key, +// which lives outside the database (env var, keyfile, or distinct file with +// 0600 perms). The service is intentionally simple — AES-256-GCM with a +// random nonce per encrypt, prefixed with a single-byte version so future +// key-rotation rolls can be deployed without a one-shot migration. +// +// Ciphertext layout: [1 byte version=0x01][12 byte nonce][N bytes ciphertext+tag] +// +// Key resolution order (the first source that yields a valid 32-byte key wins): +// 1. CIX_SECRET_KEY — base64 (std or url) or hex-encoded 32 bytes +// 2. CIX_SECRET_KEYFILE — absolute path to a file containing the key +// (raw 32 bytes, base64, or hex). File must have 0600 permissions or +// stricter or the service refuses to start. +// 3. /.secret_key — auto-generated on first run when neither env +// is set. Written with 0600. The operator can rotate by deleting the +// file before any github_tokens row exists; once tokens exist, deleting +// the key bricks them — Open() refuses to start in that situation. +package secrets + +import ( + "crypto/aes" + "crypto/cipher" + "crypto/rand" + "crypto/subtle" + "encoding/base64" + "encoding/hex" + "errors" + "fmt" + "io" + "log/slog" + "os" + "path/filepath" + "runtime" + "strings" +) + +// KeySize is the AES-256 key length in bytes. +const KeySize = 32 + +// nonceSize is the GCM nonce length in bytes. 12 is the AEAD-recommended +// value and what crypto/cipher.NewGCM defaults to. +const nonceSize = 12 + +// currentVersion is the version byte prepended to every ciphertext. Bump +// when changing the AEAD or KDF; older rows decrypt via legacy branches. +const currentVersion byte = 0x01 + +// minKeyFilePerm is the maximum permissive bit set we accept on a key +// file. 0o600 (owner read/write only) — any group/other bits make us +// refuse to read it. +const minKeyFilePerm os.FileMode = 0o077 + +var ( + // ErrNoKey means key resolution found no candidate at any of the + // resolution sources. Surfaced at Open() so the operator sees a clear + // startup error rather than a confusing "decrypt failed" later. + ErrNoKey = errors.New("no encryption key configured (CIX_SECRET_KEY / CIX_SECRET_KEYFILE / generated keyfile)") + + // ErrCiphertextTooShort signals a malformed input — short of the + // version+nonce+tag overhead, so it cannot even be parsed. + ErrCiphertextTooShort = errors.New("ciphertext too short") + + // ErrUnknownVersion is returned when the version byte does not match + // any decryption branch we know about. Usually means the DB came from + // a future server build. + ErrUnknownVersion = errors.New("unknown ciphertext version") +) + +// Service is the in-process encryption helper. Construct via Open. Safe for +// concurrent use — the underlying cipher.AEAD is stateless. +type Service struct { + aead cipher.AEAD + source string // human-friendly description for logs / status: "env", "keyfile:/path", "generated:/path" + autogenKey bool // true when we wrote a fresh keyfile this boot +} + +// OpenOptions configures Open. EnvVar / EnvKeyFileVar override the default +// "CIX_SECRET_KEY" / "CIX_SECRET_KEYFILE" lookup; useful for tests. +// DataDir is the directory the auto-generated keyfile is created in when +// nothing else resolved. +type OpenOptions struct { + EnvVar string // default: "CIX_SECRET_KEY" + EnvKeyFileVar string // default: "CIX_SECRET_KEYFILE" + DataDir string // required when fallback generation is desired + Logger *slog.Logger + // AllowGenerate, when true, lets Open() create a keyfile under + // DataDir if no other key source resolved. Tests that exercise the + // "no key" path can leave this false to force ErrNoKey. + AllowGenerate bool +} + +// Open resolves the encryption key from the configured sources and returns +// a ready-to-use Service. Returns ErrNoKey when AllowGenerate is false and +// none of the sources yield a key. +func Open(opts OpenOptions) (*Service, error) { + envKey := opts.EnvVar + if envKey == "" { + envKey = "CIX_SECRET_KEY" + } + envKeyFile := opts.EnvKeyFileVar + if envKeyFile == "" { + envKeyFile = "CIX_SECRET_KEYFILE" + } + + logger := opts.Logger + if logger == nil { + logger = slog.Default() + } + + var ( + key []byte + source string + autogen bool + ) + + // 1. Direct env var. + if v := strings.TrimSpace(os.Getenv(envKey)); v != "" { + parsed, err := decodeKey(v) + if err != nil { + return nil, fmt.Errorf("%s: %w", envKey, err) + } + key = parsed + source = "env:" + envKey + } + + // 2. Keyfile path env var. + if key == nil { + if path := strings.TrimSpace(os.Getenv(envKeyFile)); path != "" { + parsed, err := readKeyFile(path) + if err != nil { + return nil, fmt.Errorf("%s=%s: %w", envKeyFile, path, err) + } + key = parsed + source = "keyfile:" + path + } + } + + // 3. Auto-generated keyfile under DataDir. + if key == nil && opts.AllowGenerate && opts.DataDir != "" { + path := filepath.Join(opts.DataDir, ".secret_key") + existing, err := readKeyFile(path) + switch { + case err == nil: + key = existing + source = "keyfile:" + path + case errors.Is(err, os.ErrNotExist): + generated, gerr := generateKeyFile(path) + if gerr != nil { + return nil, fmt.Errorf("auto-generate keyfile %s: %w", path, gerr) + } + key = generated + source = "generated:" + path + autogen = true + logger.Warn("secrets: generated new encryption keyfile", + "path", path, + "action_required", "back this file up — losing it makes all encrypted github_tokens unreadable") + default: + return nil, fmt.Errorf("read keyfile %s: %w", path, err) + } + } + + if key == nil { + return nil, ErrNoKey + } + if len(key) != KeySize { + return nil, fmt.Errorf("encryption key must be %d bytes, got %d (source=%s)", KeySize, len(key), source) + } + + block, err := aes.NewCipher(key) + if err != nil { + return nil, fmt.Errorf("aes.NewCipher: %w", err) + } + aead, err := cipher.NewGCM(block) + if err != nil { + return nil, fmt.Errorf("cipher.NewGCM: %w", err) + } + // Wipe the key bytes from the local slice now that the AEAD holds its + // internal copy. (cipher.NewGCM copies the key into the gcm struct.) + for i := range key { + key[i] = 0 + } + runtime.KeepAlive(key) + + return &Service{ + aead: aead, + source: source, + autogenKey: autogen, + }, nil +} + +// Source returns a human-friendly identifier of where the key came from. +// Used by /api/v1/status (admin-only) so operators can confirm the source +// without exposing the key itself. +func (s *Service) Source() string { + if s == nil { + return "" + } + return s.source +} + +// Autogenerated reports whether Open() created a new keyfile this boot. +// Triggers a one-time warning banner in the startup log so the operator +// knows to back up the file before it gets paved by a redeploy. +func (s *Service) Autogenerated() bool { + if s == nil { + return false + } + return s.autogenKey +} + +// Encrypt returns the canonical ciphertext layout for plaintext. Each call +// uses a fresh random nonce. +func (s *Service) Encrypt(plaintext []byte) ([]byte, error) { + nonce := make([]byte, nonceSize) + if _, err := io.ReadFull(rand.Reader, nonce); err != nil { + return nil, fmt.Errorf("nonce: %w", err) + } + out := make([]byte, 0, 1+nonceSize+len(plaintext)+s.aead.Overhead()) + out = append(out, currentVersion) + out = append(out, nonce...) + out = s.aead.Seal(out, nonce, plaintext, nil) + return out, nil +} + +// Decrypt reverses Encrypt. Returns ErrCiphertextTooShort or +// ErrUnknownVersion for malformed input; the underlying AEAD error +// otherwise (likely "cipher: message authentication failed" — surface as +// a generic decryption failure to avoid oracling). +func (s *Service) Decrypt(ciphertext []byte) ([]byte, error) { + if len(ciphertext) < 1+nonceSize+s.aead.Overhead() { + return nil, ErrCiphertextTooShort + } + version := ciphertext[0] + if version != currentVersion { + return nil, ErrUnknownVersion + } + nonce := ciphertext[1 : 1+nonceSize] + body := ciphertext[1+nonceSize:] + plain, err := s.aead.Open(nil, nonce, body, nil) + if err != nil { + return nil, fmt.Errorf("decrypt: %w", err) + } + return plain, nil +} + +// ConstantTimeEqual compares two byte slices in constant time. Re-exported +// so callers don't need to import subtle themselves for things like +// HMAC-secret comparisons. +func ConstantTimeEqual(a, b []byte) bool { + if len(a) != len(b) { + return false + } + return subtle.ConstantTimeCompare(a, b) == 1 +} + +// --- helpers --- + +// decodeKey accepts the key in one of three encodings: +// - hex (64 chars) +// - base64 (std or url, with or without padding) +// - raw 32 bytes (treated as already-decoded — only matched when len is +// exactly KeySize, which would also match a 32-byte raw read from a +// file but never a plausible env var value) +func decodeKey(v string) ([]byte, error) { + v = strings.TrimSpace(v) + if len(v) == 0 { + return nil, fmt.Errorf("empty key") + } + // hex first — unambiguous given the strict 64-char + hex alphabet. + if len(v) == KeySize*2 { + if b, err := hex.DecodeString(v); err == nil && len(b) == KeySize { + return b, nil + } + } + // base64 (try url then std; both with and without padding). + for _, enc := range []*base64.Encoding{base64.RawURLEncoding, base64.URLEncoding, base64.RawStdEncoding, base64.StdEncoding} { + if b, err := enc.DecodeString(v); err == nil && len(b) == KeySize { + return b, nil + } + } + // Last resort: caller passed raw bytes directly. + if len(v) == KeySize { + return []byte(v), nil + } + return nil, fmt.Errorf("key must be %d bytes encoded as hex or base64", KeySize) +} + +// readKeyFile loads a key file from disk. Refuses files whose permissions +// allow group/other read — the operator obviously meant 0600 and a wider +// mask is almost always a mistake. +func readKeyFile(path string) ([]byte, error) { + info, err := os.Stat(path) + if err != nil { + return nil, err + } + if info.Mode().Perm()&minKeyFilePerm != 0 { + return nil, fmt.Errorf("keyfile %s has insecure permissions %o, expected 0600", path, info.Mode().Perm()) + } + raw, err := os.ReadFile(path) + if err != nil { + return nil, err + } + // File may be hex / base64 / raw bytes; try the structured forms first + // so we don't accidentally treat a 32-byte hex string as raw. + if v := strings.TrimSpace(string(raw)); v != "" { + if b, err := decodeKey(v); err == nil { + return b, nil + } + } + if len(raw) == KeySize { + return raw, nil + } + return nil, fmt.Errorf("keyfile %s did not contain %d bytes of key material", path, KeySize) +} + +// generateKeyFile creates a new 32-byte CSPRNG key and writes it to path +// (hex-encoded for human readability) with 0600 permissions. Returns the +// generated key bytes. +func generateKeyFile(path string) ([]byte, error) { + if err := os.MkdirAll(filepath.Dir(path), 0o755); err != nil { + return nil, fmt.Errorf("mkdir keyfile parent: %w", err) + } + key := make([]byte, KeySize) + if _, err := io.ReadFull(rand.Reader, key); err != nil { + return nil, fmt.Errorf("rand: %w", err) + } + // Hex encoding is friendlier to ops (curl-able, copy-pasteable) and + // adds nothing to the security cost — the bytes hit disk regardless. + encoded := []byte(hex.EncodeToString(key) + "\n") + if err := os.WriteFile(path, encoded, 0o600); err != nil { + return nil, fmt.Errorf("write keyfile: %w", err) + } + return key, nil +} diff --git a/server/internal/secrets/secrets_test.go b/server/internal/secrets/secrets_test.go new file mode 100644 index 0000000..5b931bb --- /dev/null +++ b/server/internal/secrets/secrets_test.go @@ -0,0 +1,177 @@ +package secrets + +import ( + "bytes" + "encoding/hex" + "errors" + "os" + "path/filepath" + "strings" + "testing" +) + +func TestRoundTrip(t *testing.T) { + dir := t.TempDir() + t.Setenv("CIX_SECRET_KEY", "") + t.Setenv("CIX_SECRET_KEYFILE", "") + + svc, err := Open(OpenOptions{DataDir: dir, AllowGenerate: true}) + if err != nil { + t.Fatalf("Open: %v", err) + } + if !svc.Autogenerated() { + t.Fatalf("expected autogenerated keyfile") + } + plain := []byte("ghp_super_secret_token_value") + ct, err := svc.Encrypt(plain) + if err != nil { + t.Fatalf("Encrypt: %v", err) + } + if bytes.Contains(ct, plain) { + t.Fatalf("ciphertext leaked plaintext bytes") + } + got, err := svc.Decrypt(ct) + if err != nil { + t.Fatalf("Decrypt: %v", err) + } + if !bytes.Equal(got, plain) { + t.Fatalf("roundtrip mismatch: got %q want %q", got, plain) + } +} + +func TestPersistentKeyfile(t *testing.T) { + dir := t.TempDir() + t.Setenv("CIX_SECRET_KEY", "") + t.Setenv("CIX_SECRET_KEYFILE", "") + + svc1, err := Open(OpenOptions{DataDir: dir, AllowGenerate: true}) + if err != nil { + t.Fatalf("first Open: %v", err) + } + ct, err := svc1.Encrypt([]byte("hello")) + if err != nil { + t.Fatalf("Encrypt: %v", err) + } + + // Second Open in the same dir must reuse the keyfile (NOT regenerate). + svc2, err := Open(OpenOptions{DataDir: dir, AllowGenerate: true}) + if err != nil { + t.Fatalf("second Open: %v", err) + } + if svc2.Autogenerated() { + t.Fatalf("second Open should reuse the existing keyfile, not regenerate") + } + plain, err := svc2.Decrypt(ct) + if err != nil { + t.Fatalf("Decrypt with reused key: %v", err) + } + if string(plain) != "hello" { + t.Fatalf("unexpected plaintext %q", plain) + } +} + +func TestErrNoKey(t *testing.T) { + t.Setenv("CIX_SECRET_KEY", "") + t.Setenv("CIX_SECRET_KEYFILE", "") + _, err := Open(OpenOptions{AllowGenerate: false}) + if !errors.Is(err, ErrNoKey) { + t.Fatalf("expected ErrNoKey, got %v", err) + } +} + +func TestEnvKeyHex(t *testing.T) { + dir := t.TempDir() + key := bytes.Repeat([]byte{0xAB}, KeySize) + t.Setenv("CIX_SECRET_KEY", hex.EncodeToString(key)) + t.Setenv("CIX_SECRET_KEYFILE", "") + + svc, err := Open(OpenOptions{DataDir: dir}) + if err != nil { + t.Fatalf("Open: %v", err) + } + if !strings.HasPrefix(svc.Source(), "env:") { + t.Fatalf("expected env source, got %q", svc.Source()) + } + ct, _ := svc.Encrypt([]byte("x")) + got, err := svc.Decrypt(ct) + if err != nil { + t.Fatalf("Decrypt: %v", err) + } + if string(got) != "x" { + t.Fatalf("unexpected plaintext") + } +} + +func TestKeyFileBadPerms(t *testing.T) { + dir := t.TempDir() + path := filepath.Join(dir, "key") + key := bytes.Repeat([]byte{0xCD}, KeySize) + if err := os.WriteFile(path, []byte(hex.EncodeToString(key)), 0o644); err != nil { + t.Fatalf("write: %v", err) + } + t.Setenv("CIX_SECRET_KEY", "") + t.Setenv("CIX_SECRET_KEYFILE", path) + _, err := Open(OpenOptions{}) + if err == nil { + t.Fatalf("expected error for 0644 keyfile") + } + if !strings.Contains(err.Error(), "insecure permissions") { + t.Fatalf("expected permission error, got %v", err) + } +} + +func TestDecryptTampered(t *testing.T) { + dir := t.TempDir() + t.Setenv("CIX_SECRET_KEY", "") + t.Setenv("CIX_SECRET_KEYFILE", "") + svc, err := Open(OpenOptions{DataDir: dir, AllowGenerate: true}) + if err != nil { + t.Fatalf("Open: %v", err) + } + ct, _ := svc.Encrypt([]byte("payload")) + // Flip a byte in the body — must trigger AEAD failure. + ct[len(ct)-1] ^= 0xFF + if _, err := svc.Decrypt(ct); err == nil { + t.Fatalf("expected decrypt failure on tampered ciphertext") + } +} + +func TestDecryptUnknownVersion(t *testing.T) { + dir := t.TempDir() + t.Setenv("CIX_SECRET_KEY", "") + t.Setenv("CIX_SECRET_KEYFILE", "") + svc, err := Open(OpenOptions{DataDir: dir, AllowGenerate: true}) + if err != nil { + t.Fatalf("Open: %v", err) + } + ct, _ := svc.Encrypt([]byte("payload")) + ct[0] = 0xFE // unknown version byte + if _, err := svc.Decrypt(ct); !errors.Is(err, ErrUnknownVersion) { + t.Fatalf("expected ErrUnknownVersion, got %v", err) + } +} + +func TestDecryptTooShort(t *testing.T) { + dir := t.TempDir() + t.Setenv("CIX_SECRET_KEY", "") + t.Setenv("CIX_SECRET_KEYFILE", "") + svc, err := Open(OpenOptions{DataDir: dir, AllowGenerate: true}) + if err != nil { + t.Fatalf("Open: %v", err) + } + if _, err := svc.Decrypt([]byte{0x01, 0x02}); !errors.Is(err, ErrCiphertextTooShort) { + t.Fatalf("expected ErrCiphertextTooShort, got %v", err) + } +} + +func TestConstantTimeEqual(t *testing.T) { + if !ConstantTimeEqual([]byte("abc"), []byte("abc")) { + t.Fatal("equal slices should compare equal") + } + if ConstantTimeEqual([]byte("abc"), []byte("abcd")) { + t.Fatal("different-length slices should not compare equal") + } + if ConstantTimeEqual([]byte("abc"), []byte("abd")) { + t.Fatal("different slices should not compare equal") + } +} diff --git a/server/internal/vectorstore/store.go b/server/internal/vectorstore/store.go index e00bcf9..94630e6 100644 --- a/server/internal/vectorstore/store.go +++ b/server/internal/vectorstore/store.go @@ -209,3 +209,4 @@ func (s *Store) Count(projectPath string) int { func round4(f float32) float32 { return float32(math.Round(float64(f)*10000) / 10000) } + diff --git a/server/internal/vectorstore/store_test.go b/server/internal/vectorstore/store_test.go index bf8c5ed..82bd078 100644 --- a/server/internal/vectorstore/store_test.go +++ b/server/internal/vectorstore/store_test.go @@ -340,3 +340,4 @@ func TestSearchLatencyGate(t *testing.T) { t.Errorf("P95 latency %.1fms ≥ 200ms gate", p95) } } + diff --git a/server/internal/workspacejobs/workspacejobs.go b/server/internal/workspacejobs/workspacejobs.go new file mode 100644 index 0000000..7b7f2e2 --- /dev/null +++ b/server/internal/workspacejobs/workspacejobs.go @@ -0,0 +1,231 @@ +// Package workspacejobs wires the workspaces feature's job handlers into +// the generic internal/jobs queue. It owns nothing — just composes the +// other workspaces packages (workspacerepos, githubtokens, repocloner, +// repoindexer) behind a thin Register function called from main. +// +// Lifecycle for a repo: +// +// 1. POST /api/v1/workspaces/{id}/repos +// - inserts a workspace_repos row (status=pending) +// - enqueues clone_repo job (dedupe_key="clone:") +// +// 2. clone_repo handler +// - reveals PAT via githubtokens.Reveal (if token_id set) +// - calls repocloner.CloneOrFetch into DataDir/repos// +// - registers projects row (host_path = workspace_repos.project_path) +// - flips status → indexing +// - enqueues index_repo job (dedupe_key="index:") +// +// 3. index_repo handler +// - calls repoindexer.IndexDir with the workspace_repo.project_path +// - flips status → indexed (or failed on error) +// +// Workspace-level search is served straight from the per-project +// chromem collections via a weighted fan-out (see +// internal/httpapi/workspacesearch.go) — there is no background +// "build centroid index" step anymore. +package workspacejobs + +import ( + "context" + "database/sql" + "encoding/json" + "errors" + "fmt" + "log/slog" + "time" + + "github.com/dvcdsys/code-index/server/internal/githubtokens" + "github.com/dvcdsys/code-index/server/internal/indexer" + "github.com/dvcdsys/code-index/server/internal/jobs" + "github.com/dvcdsys/code-index/server/internal/projects" + "github.com/dvcdsys/code-index/server/internal/repocloner" + "github.com/dvcdsys/code-index/server/internal/repoindexer" + "github.com/dvcdsys/code-index/server/internal/vectorstore" + "github.com/dvcdsys/code-index/server/internal/workspacerepos" +) + +// Job type constants. Kept here so handlers and enqueue-sites share one +// string — typos in job types are a notoriously easy source of "why isn't +// this running?" bugs. +const ( + TypeCloneRepo = "clone_repo" + TypeIndexRepo = "index_repo" +) + +// ClonePayload is the JSON shape stored on a clone_repo job. +type ClonePayload struct { + RepoID string `json:"repo_id"` +} + +// IndexPayload is the JSON shape stored on an index_repo job. +type IndexPayload struct { + RepoID string `json:"repo_id"` +} + +// Deps bundles everything the handlers need. Keeping it explicit makes +// wiring obvious in main and means tests can swap any single piece for a +// fake. +type Deps struct { + DB *sql.DB + Jobs *jobs.Service + WorkspaceRepos *workspacerepos.Service + GithubTokens *githubtokens.Service + Indexer *indexer.Service + VectorStore *vectorstore.Store + DataDir string // root for cloned repos: /repos// + Logger *slog.Logger +} + +// Register hooks the workspaces job handlers into a jobs.Service. Call +// once at startup, BEFORE jobs.Start. +func Register(d Deps) { + if d.Logger == nil { + d.Logger = slog.Default() + } + d.Jobs.Register(TypeCloneRepo, func(ctx context.Context, job jobs.Job) error { + return handleClone(ctx, d, job) + }) + d.Jobs.Register(TypeIndexRepo, func(ctx context.Context, job jobs.Job) error { + return handleIndex(ctx, d, job) + }) +} + +// EnqueueClone inserts a clone_repo job. The index_repo job is chained +// on successful clone — callers don't enqueue it directly. +func EnqueueClone(ctx context.Context, j *jobs.Service, repoID string) error { + _, err := j.Enqueue(ctx, jobs.EnqueueRequest{ + Type: TypeCloneRepo, + DedupeKey: "clone:" + repoID, + Payload: ClonePayload{RepoID: repoID}, + }) + if errors.Is(err, jobs.ErrDuplicate) { + // Already queued — soft no-op. + return nil + } + return err +} + +func handleClone(ctx context.Context, d Deps, job jobs.Job) error { + var p ClonePayload + if err := jobs.UnmarshalPayload(job, &p); err != nil { + return fmt.Errorf("decode payload: %w", err) + } + if p.RepoID == "" { + return errors.New("empty repo_id") + } + wr, err := d.WorkspaceRepos.GetByID(ctx, p.RepoID) + if err != nil { + return fmt.Errorf("load workspace_repo: %w", err) + } + + if err := d.WorkspaceRepos.SetStatus(ctx, wr.ID, workspacerepos.StatusCloning, "", "", nil); err != nil { + return fmt.Errorf("mark cloning: %w", err) + } + + pat := "" + if wr.TokenID != "" { + token, terr := d.GithubTokens.Reveal(ctx, wr.TokenID) + if terr != nil { + d.recordFailure(ctx, wr.ID, fmt.Errorf("reveal token: %w", terr)) + return terr + } + pat = token + // Best-effort last_used bookkeeping; ignore errors. + _ = d.GithubTokens.Touch(ctx, wr.TokenID) + } + + result, err := repocloner.CloneOrFetch(ctx, repocloner.CloneOptions{ + GitHubURL: wr.GitHubURL, + Branch: wr.Branch, + PAT: pat, + LocalDir: repocloner.LocalDirFor(d.DataDir, wr.ID), + }) + if err != nil { + d.recordFailure(ctx, wr.ID, fmt.Errorf("clone: %w", err)) + return err + } + + // Register the project row (idempotent — Get-or-Create pattern). Two + // branches: + // a) project already exists → leave it alone (incremental updates + // happen via subsequent index runs) + // b) project missing → create it with the project_path as host_path + if _, gerr := projects.Get(ctx, d.DB, wr.ProjectPath); gerr != nil { + if _, cerr := projects.Create(ctx, d.DB, projects.CreateRequest{ + HostPath: wr.ProjectPath, + }); cerr != nil && !errors.Is(cerr, projects.ErrConflict) { + d.recordFailure(ctx, wr.ID, fmt.Errorf("register project: %w", cerr)) + return cerr + } + } + + if err := d.WorkspaceRepos.SetStatus(ctx, wr.ID, workspacerepos.StatusIndexing, result.HeadSHA, "", nil); err != nil { + // Non-fatal — still chain the index job. + d.Logger.Warn("workspacejobs: set status indexing failed", "repo_id", wr.ID, "err", err) + } + + // Chain index_repo. Use the same dedupe pattern so a manual reindex + // fired by the user mid-clone collapses into the natural follow-up. + if _, eerr := d.Jobs.Enqueue(ctx, jobs.EnqueueRequest{ + Type: TypeIndexRepo, + DedupeKey: "index:" + wr.ID, + Payload: IndexPayload{RepoID: wr.ID}, + }); eerr != nil && !errors.Is(eerr, jobs.ErrDuplicate) { + d.recordFailure(ctx, wr.ID, fmt.Errorf("enqueue index: %w", eerr)) + return eerr + } + return nil +} + +func handleIndex(ctx context.Context, d Deps, job jobs.Job) error { + var p IndexPayload + if err := jobs.UnmarshalPayload(job, &p); err != nil { + return fmt.Errorf("decode payload: %w", err) + } + if p.RepoID == "" { + return errors.New("empty repo_id") + } + wr, err := d.WorkspaceRepos.GetByID(ctx, p.RepoID) + if err != nil { + return fmt.Errorf("load workspace_repo: %w", err) + } + cloneDir := repocloner.LocalDirFor(d.DataDir, wr.ID) + + _, _, err = repoindexer.IndexDir(ctx, d.Indexer, wr.ProjectPath, cloneDir, repoindexer.DefaultFilter(), d.Logger) + if err != nil { + d.recordFailure(ctx, wr.ID, fmt.Errorf("index: %w", err)) + return err + } + + now := time.Now().UTC() + if err := d.WorkspaceRepos.SetStatus(ctx, wr.ID, workspacerepos.StatusIndexed, "", "", &now); err != nil { + return fmt.Errorf("mark indexed: %w", err) + } + return nil +} + +// recordFailure flips the workspace_repo into status=failed with the +// error message attached. Logs the error too (handler return value also +// gets logged by the jobs service but at a different layer — duplicate +// is fine). +func (d Deps) recordFailure(ctx context.Context, repoID string, err error) { + if err == nil { + return + } + d.Logger.Error("workspacejobs: repo failed", "repo_id", repoID, "err", err) + msg := err.Error() + if len(msg) > 1024 { + msg = msg[:1024] + } + if uerr := d.WorkspaceRepos.SetStatus(ctx, repoID, workspacerepos.StatusFailed, "", msg, nil); uerr != nil { + d.Logger.Error("workspacejobs: could not write failed status", "repo_id", repoID, "err", uerr) + } +} + +// Compile-time guard: ClonePayload / IndexPayload encode cleanly. +var _ = func() (any, any) { + a, _ := json.Marshal(ClonePayload{}) + b, _ := json.Marshal(IndexPayload{}) + return a, b +} diff --git a/server/internal/workspacerepos/workspacerepos.go b/server/internal/workspacerepos/workspacerepos.go new file mode 100644 index 0000000..f235fa1 --- /dev/null +++ b/server/internal/workspacerepos/workspacerepos.go @@ -0,0 +1,466 @@ +// Package workspacerepos is the service layer for the workspace_repos +// table — one row per (workspace, github_url, branch). Each row maps 1:1 +// to an indexed project (host_path = "github.com/owner/repo@branch"). +// +// Lifecycle (PR2): +// +// create row (status=pending) → enqueue clone_repo job → worker clones +// → enqueue index_repo job → worker indexes → status=indexed +// +// PR3 adds webhook delivery → enqueue fetch_repo on push; PR4+ feeds +// call-graph + community recompute. This package stays small — handlers +// own service composition; we just persist rows. +package workspacerepos + +import ( + "context" + "crypto/rand" + "database/sql" + "encoding/base64" + "errors" + "fmt" + "net/url" + "strings" + "time" + + "github.com/google/uuid" +) + +// Status values. Kept as bare strings since they map straight to the DB +// column and the JSON wire format. +const ( + StatusPending = "pending" // row created, work not yet scheduled + StatusCloning = "cloning" // clone_repo job running + StatusIndexing = "indexing" // index_repo job running + StatusIndexed = "indexed" // happy path + StatusFailed = "failed" // last attempt errored (see LastError) +) + +// Webhook modes. The legacy AutoWebhook bool stays in the struct for +// backwards compatibility with old API consumers, but new code should +// consult WebhookMode — it carries the operator's stated intent (auto +// vs manual-pending vs deliberately disabled). +const ( + WebhookModeManual = "manual" + WebhookModeAuto = "auto" + WebhookModeDisabled = "disabled" +) + +// NormaliseWebhookMode rejects unknown values up front so the database +// only ever stores one of the three documented states. Empty input maps +// to the default ('manual'), so old API clients that omit the field +// keep working unchanged. +func NormaliseWebhookMode(s string) (string, error) { + switch strings.ToLower(strings.TrimSpace(s)) { + case "": + return WebhookModeManual, nil + case WebhookModeManual: + return WebhookModeManual, nil + case WebhookModeAuto: + return WebhookModeAuto, nil + case WebhookModeDisabled: + return WebhookModeDisabled, nil + default: + return "", ErrInvalidWebhookMode + } +} + +// Errors. +var ( + ErrNotFound = errors.New("workspace repo not found") + ErrDuplicate = errors.New("repo is already in this workspace on that branch") + ErrInvalidURL = errors.New("github_url must be an https://github.com/owner/repo URL") + ErrBranchEmpty = errors.New("branch is required") + ErrInvalidWebhookMode = errors.New("webhook_mode must be one of manual, auto, disabled") +) + +// WorkspaceRepo is the wire view. Tokens themselves are referenced by +// id — Reveal happens server-side via internal/githubtokens. +type WorkspaceRepo struct { + ID string + WorkspaceID string + GitHubURL string + Branch string + ProjectPath string + TokenID string // empty when no PAT is associated (public repo) + WebhookSecret string + WebhookID *int64 // GitHub hook id (set by PR3 auto-register) + AutoWebhook bool + WebhookMode string // 'manual' | 'auto' | 'disabled' + Status string + LastSHA string + LastError string + LastIndexedAt *time.Time + IsLinked bool // true for lightweight references to projects owned by another workspace_repo + CreatedAt time.Time + UpdatedAt time.Time +} + +// Service wraps the workspace_repos table. +type Service struct { + DB *sql.DB +} + +// New returns a Service. +func New(db *sql.DB) *Service { return &Service{DB: db} } + +// CreateRequest is what handlers pass in. +type CreateRequest struct { + WorkspaceID string + GitHubURL string + Branch string + TokenID string // optional + AutoWebhook bool // legacy: kept for old clients; new code uses WebhookMode + WebhookMode string // 'manual' | 'auto' | 'disabled'; empty = manual +} + +// Create inserts a workspace_repo and generates a webhook secret. The +// resulting ProjectPath is "github.com/owner/repo@branch" — the canonical +// id for downstream tables (projects.host_path). +func (s *Service) Create(ctx context.Context, req CreateRequest) (WorkspaceRepo, error) { + owner, repo, err := parseGitHubURL(req.GitHubURL) + if err != nil { + return WorkspaceRepo{}, err + } + if strings.TrimSpace(req.Branch) == "" { + return WorkspaceRepo{}, ErrBranchEmpty + } + projectPath := fmt.Sprintf("github.com/%s/%s@%s", owner, repo, req.Branch) + + secret, err := generateWebhookSecret() + if err != nil { + return WorkspaceRepo{}, fmt.Errorf("generate webhook secret: %w", err) + } + + id := uuid.NewString() + now := time.Now().UTC().Format(time.RFC3339Nano) + githubURL := canonicaliseURL(req.GitHubURL) + + // WebhookMode is the source of truth in the DB; AutoWebhook stays + // derived so the legacy SELECT path keeps working until removed. + mode, merr := NormaliseWebhookMode(req.WebhookMode) + if merr != nil { + return WorkspaceRepo{}, merr + } + // If the caller used the legacy bool but left WebhookMode empty, + // honour the bool — otherwise mode wins. + if req.WebhookMode == "" && req.AutoWebhook { + mode = WebhookModeAuto + } + auto := 0 + if mode == WebhookModeAuto { + auto = 1 + } + tokenID := nullableString(req.TokenID) + + _, err = s.DB.ExecContext(ctx, + `INSERT INTO workspace_repos ( + id, workspace_id, github_url, branch, project_path, + token_id, webhook_secret, auto_webhook, webhook_mode, status, + created_at, updated_at + ) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)`, + id, req.WorkspaceID, githubURL, req.Branch, projectPath, + tokenID, secret, auto, mode, StatusPending, + now, now, + ) + if err != nil { + if isUniqueConstraintViolation(err) { + return WorkspaceRepo{}, ErrDuplicate + } + return WorkspaceRepo{}, fmt.Errorf("insert workspace_repo: %w", err) + } + return s.GetByID(ctx, id) +} + +// CreateLink inserts a workspace_repo with is_linked=1: a lightweight +// pointer to an already-indexed project. Unlike Create, there is no +// clone job, no webhook, no PAT — the row exists only so the project +// participates in workspace-level features (search, communities, +// the repo list UI). The canonical project must already exist in the +// projects table; the caller (HTTP handler) is responsible for that +// check + the status='indexed' precondition before calling here. +// +// projectPath must be the same canonical form Create produces, i.e. +// "github.com/owner/repo@branch" — we round-trip through parseProjectPath +// so the resulting (workspace_id, github_url, branch) triple matches +// what an owned row would produce. This is what makes the +// UNIQUE(workspace_id, github_url, branch) constraint catch an attempt +// to link the same project that's already attached as owned. +// +// webhook_secret is generated but never used — the column is NOT NULL. +// webhook_mode is set to 'disabled' so the dashboard hides the webhook +// UI for linked rows. +func (s *Service) CreateLink(ctx context.Context, workspaceID, projectPath string) (WorkspaceRepo, error) { + githubURL, branch, err := parseProjectPath(projectPath) + if err != nil { + return WorkspaceRepo{}, err + } + + secret, err := generateWebhookSecret() + if err != nil { + return WorkspaceRepo{}, fmt.Errorf("generate webhook secret: %w", err) + } + + id := uuid.NewString() + now := time.Now().UTC().Format(time.RFC3339Nano) + + _, err = s.DB.ExecContext(ctx, + `INSERT INTO workspace_repos ( + id, workspace_id, github_url, branch, project_path, + token_id, webhook_secret, auto_webhook, webhook_mode, status, + is_linked, created_at, updated_at + ) VALUES (?, ?, ?, ?, ?, NULL, ?, 0, ?, ?, 1, ?, ?)`, + id, workspaceID, githubURL, branch, projectPath, + secret, WebhookModeDisabled, StatusIndexed, + now, now, + ) + if err != nil { + if isUniqueConstraintViolation(err) { + return WorkspaceRepo{}, ErrDuplicate + } + return WorkspaceRepo{}, fmt.Errorf("insert linked workspace_repo: %w", err) + } + return s.GetByID(ctx, id) +} + +// GetByID returns one row. +func (s *Service) GetByID(ctx context.Context, id string) (WorkspaceRepo, error) { + row := s.DB.QueryRowContext(ctx, selectColumns+` WHERE id = ?`, id) + return scanRow(row) +} + +// ListByWorkspace returns every repo in a workspace, newest first. +func (s *Service) ListByWorkspace(ctx context.Context, workspaceID string) ([]WorkspaceRepo, error) { + rows, err := s.DB.QueryContext(ctx, + selectColumns+` WHERE workspace_id = ? ORDER BY created_at DESC`, workspaceID) + if err != nil { + return nil, fmt.Errorf("list repos: %w", err) + } + defer rows.Close() + return scanRows(rows) +} + +// SetStatus is the workhorse called from job handlers. lastSHA / lastError +// / lastIndexedAt are optional — pass empty / nil / nil to leave them +// unchanged. +func (s *Service) SetStatus(ctx context.Context, id, status string, lastSHA, lastError string, indexed *time.Time) error { + now := time.Now().UTC().Format(time.RFC3339Nano) + // We use a single UPDATE with COALESCE to keep optional fields atomic. + var indexedStr any + if indexed != nil { + indexedStr = indexed.UTC().Format(time.RFC3339Nano) + } else { + indexedStr = nil + } + res, err := s.DB.ExecContext(ctx, ` + UPDATE workspace_repos + SET status = ?, + last_sha = COALESCE(NULLIF(?, ''), last_sha), + last_error = CASE WHEN ? = '' THEN NULL ELSE ? END, + last_indexed_at = COALESCE(?, last_indexed_at), + updated_at = ? + WHERE id = ?`, + status, lastSHA, lastError, lastError, indexedStr, now, id, + ) + if err != nil { + return fmt.Errorf("set status: %w", err) + } + n, _ := res.RowsAffected() + if n == 0 { + return ErrNotFound + } + return nil +} + +// SetWebhookID persists the GitHub-side hook id returned by the +// auto-register flow. ErrNotFound when the row is gone (race with +// concurrent delete — caller can ignore). +func (s *Service) SetWebhookID(ctx context.Context, id string, hookID int64) error { + res, err := s.DB.ExecContext(ctx, + `UPDATE workspace_repos SET webhook_id = ?, updated_at = ? WHERE id = ?`, + hookID, time.Now().UTC().Format(time.RFC3339Nano), id) + if err != nil { + return fmt.Errorf("set webhook_id: %w", err) + } + n, _ := res.RowsAffected() + if n == 0 { + return ErrNotFound + } + return nil +} + +// Delete removes a workspace_repo. The on-disk clone, indexed project, and +// associated rows are NOT cleaned up here — handlers should enqueue a +// cleanup job (PR3+) or accept the orphan for now. +func (s *Service) Delete(ctx context.Context, id string) error { + res, err := s.DB.ExecContext(ctx, `DELETE FROM workspace_repos WHERE id = ?`, id) + if err != nil { + return fmt.Errorf("delete workspace_repo: %w", err) + } + n, err := res.RowsAffected() + if err != nil { + return fmt.Errorf("rows affected: %w", err) + } + if n == 0 { + return ErrNotFound + } + return nil +} + +// --- helpers --- + +const selectColumns = ` + SELECT id, workspace_id, github_url, branch, project_path, + token_id, webhook_secret, webhook_id, auto_webhook, + webhook_mode, status, last_sha, last_error, last_indexed_at, + is_linked, created_at, updated_at + FROM workspace_repos` + +func scanRow(r interface{ Scan(dest ...any) error }) (WorkspaceRepo, error) { + var ( + wr WorkspaceRepo + tokenID sql.NullString + webhookID sql.NullInt64 + autoWebhook int + webhookMode string + lastSHA sql.NullString + lastError sql.NullString + lastIndexed sql.NullString + isLinked int + createdAt string + updatedAt string + ) + err := r.Scan(&wr.ID, &wr.WorkspaceID, &wr.GitHubURL, &wr.Branch, &wr.ProjectPath, + &tokenID, &wr.WebhookSecret, &webhookID, &autoWebhook, + &webhookMode, &wr.Status, &lastSHA, &lastError, &lastIndexed, + &isLinked, &createdAt, &updatedAt) + if err != nil { + if errors.Is(err, sql.ErrNoRows) { + return WorkspaceRepo{}, ErrNotFound + } + return WorkspaceRepo{}, fmt.Errorf("scan workspace_repo: %w", err) + } + wr.TokenID = tokenID.String + if webhookID.Valid { + v := webhookID.Int64 + wr.WebhookID = &v + } + wr.AutoWebhook = autoWebhook == 1 + wr.WebhookMode = webhookMode + if wr.WebhookMode == "" { + wr.WebhookMode = WebhookModeManual + } + wr.LastSHA = lastSHA.String + wr.LastError = lastError.String + if lastIndexed.Valid { + t, _ := time.Parse(time.RFC3339Nano, lastIndexed.String) + wr.LastIndexedAt = &t + } + wr.IsLinked = isLinked == 1 + wr.CreatedAt, _ = time.Parse(time.RFC3339Nano, createdAt) + wr.UpdatedAt, _ = time.Parse(time.RFC3339Nano, updatedAt) + return wr, nil +} + +func scanRows(rows *sql.Rows) ([]WorkspaceRepo, error) { + out := []WorkspaceRepo{} + for rows.Next() { + wr, err := scanRow(rows) + if err != nil { + return nil, err + } + out = append(out, wr) + } + return out, rows.Err() +} + +// parseGitHubURL extracts owner + repo from an HTTPS GitHub URL. Accepts +// trailing slash and ".git" suffix. Rejects anything not on github.com so +// we don't accidentally try to clone arbitrary forge URLs (each forge has +// its own quirks — supporting them is out of scope). +func parseGitHubURL(s string) (owner, repo string, err error) { + s = strings.TrimSpace(s) + if s == "" { + return "", "", ErrInvalidURL + } + u, perr := url.Parse(s) + if perr != nil { + return "", "", ErrInvalidURL + } + if !strings.EqualFold(u.Host, "github.com") { + return "", "", ErrInvalidURL + } + path := strings.Trim(u.Path, "/") + path = strings.TrimSuffix(path, ".git") + parts := strings.Split(path, "/") + if len(parts) != 2 || parts[0] == "" || parts[1] == "" { + return "", "", ErrInvalidURL + } + return parts[0], parts[1], nil +} + +// parseProjectPath splits a canonical project_path of the form +// "github.com/owner/repo@branch" back into (github_url, branch) so we +// can reuse the per-workspace uniqueness key when creating a linked +// row from a project hash. Inverse of the Sprintf at Create(). +// +// Errors: +// - empty input or missing "@" → ErrInvalidURL +// - prefix not "github.com/" → ErrInvalidURL (linked rows only make +// sense for GitHub-derived projects; local paths can't map to a +// workspace_repo since the schema requires github_url + branch) +// - branch portion empty → ErrBranchEmpty +func parseProjectPath(projectPath string) (githubURL, branch string, err error) { + s := strings.TrimSpace(projectPath) + at := strings.LastIndex(s, "@") + if at <= 0 { + return "", "", ErrInvalidURL + } + left, right := s[:at], s[at+1:] + if right == "" { + return "", "", ErrBranchEmpty + } + const prefix = "github.com/" + if !strings.HasPrefix(left, prefix) { + return "", "", ErrInvalidURL + } + ownerRepo := strings.Trim(left[len(prefix):], "/") + parts := strings.Split(ownerRepo, "/") + if len(parts) != 2 || parts[0] == "" || parts[1] == "" { + return "", "", ErrInvalidURL + } + return "https://github.com/" + parts[0] + "/" + parts[1], right, nil +} + +// canonicaliseURL strips trailing slash + ".git" so two forms of the same +// URL aren't treated as distinct repos. +func canonicaliseURL(s string) string { + s = strings.TrimSpace(s) + s = strings.TrimSuffix(s, "/") + s = strings.TrimSuffix(s, ".git") + return s +} + +func generateWebhookSecret() (string, error) { + var buf [32]byte + if _, err := rand.Read(buf[:]); err != nil { + return "", err + } + return base64.RawURLEncoding.EncodeToString(buf[:]), nil +} + +func nullableString(s string) any { + if s == "" { + return nil + } + return s +} + +func isUniqueConstraintViolation(err error) bool { + if err == nil { + return false + } + msg := err.Error() + return strings.Contains(msg, "UNIQUE constraint failed") || + strings.Contains(msg, "constraint failed: UNIQUE") +} diff --git a/server/internal/workspacerepos/workspacerepos_test.go b/server/internal/workspacerepos/workspacerepos_test.go new file mode 100644 index 0000000..aa5437c --- /dev/null +++ b/server/internal/workspacerepos/workspacerepos_test.go @@ -0,0 +1,244 @@ +package workspacerepos + +import ( + "context" + "errors" + "testing" + "time" + + "github.com/dvcdsys/code-index/server/internal/db" + "github.com/dvcdsys/code-index/server/internal/workspaces" +) + +// withWorkspace creates a workspaces row and returns its id. Tests need a +// real FK target since workspace_repos.workspace_id has ON DELETE CASCADE. +func withWorkspace(t *testing.T) (*Service, string) { + t.Helper() + d, err := db.Open(":memory:") + if err != nil { + t.Fatalf("open: %v", err) + } + t.Cleanup(func() { _ = d.Close() }) + ws, err := workspaces.New(d).Create(context.Background(), "ws", "") + if err != nil { + t.Fatalf("seed workspace: %v", err) + } + return New(d), ws.ID +} + +func TestCreateAndGet(t *testing.T) { + svc, wsID := withWorkspace(t) + ctx := context.Background() + wr, err := svc.Create(ctx, CreateRequest{ + WorkspaceID: wsID, + GitHubURL: "https://github.com/spf13/cobra", + Branch: "main", + }) + if err != nil { + t.Fatalf("Create: %v", err) + } + if wr.ProjectPath != "github.com/spf13/cobra@main" { + t.Fatalf("unexpected project_path %q", wr.ProjectPath) + } + if wr.WebhookSecret == "" { + t.Fatalf("webhook secret should be auto-generated") + } + if wr.Status != StatusPending { + t.Fatalf("expected pending status, got %q", wr.Status) + } + + got, err := svc.GetByID(ctx, wr.ID) + if err != nil { + t.Fatalf("GetByID: %v", err) + } + if got.ProjectPath != wr.ProjectPath { + t.Fatalf("get/create mismatch") + } +} + +func TestURLNormalisation(t *testing.T) { + svc, wsID := withWorkspace(t) + ctx := context.Background() + // trailing slash + .git suffix should be collapsed. + wr, err := svc.Create(ctx, CreateRequest{ + WorkspaceID: wsID, + GitHubURL: "https://github.com/spf13/cobra.git/", + Branch: "main", + }) + if err != nil { + t.Fatalf("Create: %v", err) + } + if wr.GitHubURL != "https://github.com/spf13/cobra" { + t.Fatalf("URL not canonicalised, got %q", wr.GitHubURL) + } + if wr.ProjectPath != "github.com/spf13/cobra@main" { + t.Fatalf("project_path wrong: %q", wr.ProjectPath) + } +} + +func TestDuplicateRejected(t *testing.T) { + svc, wsID := withWorkspace(t) + ctx := context.Background() + if _, err := svc.Create(ctx, CreateRequest{ + WorkspaceID: wsID, GitHubURL: "https://github.com/x/y", Branch: "main", + }); err != nil { + t.Fatalf("first: %v", err) + } + if _, err := svc.Create(ctx, CreateRequest{ + WorkspaceID: wsID, GitHubURL: "https://github.com/x/y", Branch: "main", + }); !errors.Is(err, ErrDuplicate) { + t.Fatalf("expected ErrDuplicate, got %v", err) + } + // Different branch should succeed. + if _, err := svc.Create(ctx, CreateRequest{ + WorkspaceID: wsID, GitHubURL: "https://github.com/x/y", Branch: "develop", + }); err != nil { + t.Fatalf("different branch should succeed: %v", err) + } +} + +func TestInvalidURL(t *testing.T) { + svc, wsID := withWorkspace(t) + cases := []string{ + "", + "not a url", + "https://gitlab.com/x/y", + "https://github.com", + "https://github.com/onlyowner", + } + for _, c := range cases { + _, err := svc.Create(context.Background(), CreateRequest{ + WorkspaceID: wsID, GitHubURL: c, Branch: "main", + }) + if !errors.Is(err, ErrInvalidURL) { + t.Fatalf("URL %q: expected ErrInvalidURL, got %v", c, err) + } + } +} + +func TestSetStatus(t *testing.T) { + svc, wsID := withWorkspace(t) + ctx := context.Background() + wr, _ := svc.Create(ctx, CreateRequest{ + WorkspaceID: wsID, GitHubURL: "https://github.com/x/y", Branch: "main", + }) + now := time.Now().UTC() + if err := svc.SetStatus(ctx, wr.ID, StatusIndexed, "abc123", "", &now); err != nil { + t.Fatalf("SetStatus: %v", err) + } + got, _ := svc.GetByID(ctx, wr.ID) + if got.Status != StatusIndexed || got.LastSHA != "abc123" { + t.Fatalf("status/sha not persisted: %+v", got) + } + if got.LastIndexedAt == nil { + t.Fatalf("LastIndexedAt should be set") + } +} + +func TestDeleteCascade(t *testing.T) { + svc, wsID := withWorkspace(t) + ctx := context.Background() + wr, _ := svc.Create(ctx, CreateRequest{ + WorkspaceID: wsID, GitHubURL: "https://github.com/a/b", Branch: "main", + }) + if err := svc.Delete(ctx, wr.ID); err != nil { + t.Fatalf("Delete: %v", err) + } + if err := svc.Delete(ctx, wr.ID); !errors.Is(err, ErrNotFound) { + t.Fatalf("expected ErrNotFound, got %v", err) + } +} + +func TestCreateLink_HappyPath(t *testing.T) { + svc, wsID := withWorkspace(t) + ctx := context.Background() + wr, err := svc.CreateLink(ctx, wsID, "github.com/spf13/cobra@main") + if err != nil { + t.Fatalf("CreateLink: %v", err) + } + if !wr.IsLinked { + t.Fatalf("expected IsLinked=true, got %v", wr.IsLinked) + } + if wr.Status != StatusIndexed { + t.Fatalf("expected status=indexed, got %q", wr.Status) + } + if wr.WebhookMode != WebhookModeDisabled { + t.Fatalf("expected webhook_mode=disabled, got %q", wr.WebhookMode) + } + if wr.TokenID != "" { + t.Fatalf("linked rows must have empty token_id, got %q", wr.TokenID) + } + if wr.GitHubURL != "https://github.com/spf13/cobra" { + t.Fatalf("github_url derived wrong: %q", wr.GitHubURL) + } + if wr.Branch != "main" { + t.Fatalf("branch derived wrong: %q", wr.Branch) + } + if wr.ProjectPath != "github.com/spf13/cobra@main" { + t.Fatalf("project_path mismatch: %q", wr.ProjectPath) + } +} + +func TestCreateLink_DuplicateInSameWorkspace(t *testing.T) { + svc, wsID := withWorkspace(t) + ctx := context.Background() + if _, err := svc.CreateLink(ctx, wsID, "github.com/foo/bar@main"); err != nil { + t.Fatalf("first: %v", err) + } + // Second link with the same (workspace, repo, branch) → ErrDuplicate. + if _, err := svc.CreateLink(ctx, wsID, "github.com/foo/bar@main"); !errors.Is(err, ErrDuplicate) { + t.Fatalf("expected ErrDuplicate, got %v", err) + } + // An owned row in the same workspace conflicts with the linked one too — + // both share the same UNIQUE(workspace_id, github_url, branch) key. + if _, err := svc.Create(ctx, CreateRequest{ + WorkspaceID: wsID, GitHubURL: "https://github.com/foo/bar", Branch: "main", + }); !errors.Is(err, ErrDuplicate) { + t.Fatalf("owned-after-linked: expected ErrDuplicate, got %v", err) + } +} + +func TestCreateLink_AllowedAcrossWorkspaces(t *testing.T) { + svcA, wsA := withWorkspace(t) + // Reuse the same underlying DB — withWorkspace gives us a Service + // bound to a fresh DB; for a cross-workspace test we need two + // workspaces on one DB. Seed a second workspace explicitly. + wsB, err := workspaces.New(svcA.DB).Create(context.Background(), "ws-b", "") + if err != nil { + t.Fatalf("seed second workspace: %v", err) + } + ctx := context.Background() + // Same canonical project_path attaches as owned in A, then linked + // in B without tripping the legacy global UNIQUE. + if _, err := svcA.Create(ctx, CreateRequest{ + WorkspaceID: wsA, GitHubURL: "https://github.com/x/y", Branch: "main", + }); err != nil { + t.Fatalf("owned in A: %v", err) + } + if _, err := svcA.CreateLink(ctx, wsB.ID, "github.com/x/y@main"); err != nil { + t.Fatalf("linked in B (same project): %v", err) + } +} + +func TestCreateLink_InvalidProjectPath(t *testing.T) { + svc, wsID := withWorkspace(t) + ctx := context.Background() + cases := []struct { + name string + path string + want error + }{ + {"empty", "", ErrInvalidURL}, + {"no at", "github.com/foo/bar", ErrInvalidURL}, + {"empty branch", "github.com/foo/bar@", ErrBranchEmpty}, + {"non-github", "gitlab.com/foo/bar@main", ErrInvalidURL}, + {"missing repo", "github.com/foo@main", ErrInvalidURL}, + } + for _, c := range cases { + t.Run(c.name, func(t *testing.T) { + if _, err := svc.CreateLink(ctx, wsID, c.path); !errors.Is(err, c.want) { + t.Fatalf("path=%q: expected %v, got %v", c.path, c.want, err) + } + }) + } +} diff --git a/server/internal/workspaces/workspaces.go b/server/internal/workspaces/workspaces.go new file mode 100644 index 0000000..efb7cf7 --- /dev/null +++ b/server/internal/workspaces/workspaces.go @@ -0,0 +1,197 @@ +// Package workspaces is the service layer for the workspaces table — the +// top-level entity of the workspaces feature. A workspace groups one or +// more GitHub repos for cross-project semantic search powered by +// community-detection on the call graph (PRs 2–7 of the feature branch). +// +// PR1 scope: bare CRUD. workspace_repos / call_edges / communities land in +// later PRs. Visibility model is server-wide shared: every authenticated +// user can list/create/modify any workspace. The decision is captured in +// the workspaces.md plan; revisit if a per-user ACL becomes necessary. +package workspaces + +import ( + "context" + "database/sql" + "errors" + "fmt" + "strings" + "time" + + "github.com/google/uuid" +) + +// Errors. ErrNotFound is the not-found sentinel used by handlers; ErrNameTaken +// surfaces UNIQUE-name collisions so handlers can return 409 instead of 500. +var ( + ErrNotFound = errors.New("workspace not found") + ErrNameTaken = errors.New("workspace name already in use") + ErrNameEmpty = errors.New("workspace name is required") +) + +// Workspace is the metadata view. Pointers are NOT used for description +// because zero-string "" is the desired absent representation (the column +// is nullable but the JSON shape sends "" — see openapi.yaml). +type Workspace struct { + ID string + Name string + Description string + CreatedAt time.Time + UpdatedAt time.Time +} + +// Service wraps the workspaces table. +type Service struct { + DB *sql.DB +} + +// New returns a Service. +func New(db *sql.DB) *Service { return &Service{DB: db} } + +// Create inserts a new workspace. Name must be non-empty and unique. +func (s *Service) Create(ctx context.Context, name, description string) (Workspace, error) { + name = strings.TrimSpace(name) + if name == "" { + return Workspace{}, ErrNameEmpty + } + description = strings.TrimSpace(description) + + id := uuid.NewString() + now := time.Now().UTC().Format(time.RFC3339Nano) + + _, err := s.DB.ExecContext(ctx, + `INSERT INTO workspaces (id, name, description, created_at, updated_at) + VALUES (?, ?, ?, ?, ?)`, + id, name, nullableString(description), now, now, + ) + if err != nil { + if isUniqueConstraintViolation(err) { + return Workspace{}, ErrNameTaken + } + return Workspace{}, fmt.Errorf("insert workspace: %w", err) + } + return s.GetByID(ctx, id) +} + +// GetByID returns one workspace. ErrNotFound when absent. +func (s *Service) GetByID(ctx context.Context, id string) (Workspace, error) { + row := s.DB.QueryRowContext(ctx, + `SELECT id, name, description, created_at, updated_at + FROM workspaces WHERE id = ?`, id) + return scanRow(row) +} + +// List returns every workspace, newest first. +func (s *Service) List(ctx context.Context) ([]Workspace, error) { + rows, err := s.DB.QueryContext(ctx, + `SELECT id, name, description, created_at, updated_at + FROM workspaces ORDER BY created_at DESC`) + if err != nil { + return nil, fmt.Errorf("list workspaces: %w", err) + } + defer rows.Close() + return scanRows(rows) +} + +// Update accepts pointers so callers can express "leave this field alone". +// A pointer-to-empty-string clears description; nil keeps the prior value. +// Name nil = no change; name "" returns ErrNameEmpty. +func (s *Service) Update(ctx context.Context, id string, name *string, description *string) (Workspace, error) { + current, err := s.GetByID(ctx, id) + if err != nil { + return Workspace{}, err + } + newName := current.Name + if name != nil { + trimmed := strings.TrimSpace(*name) + if trimmed == "" { + return Workspace{}, ErrNameEmpty + } + newName = trimmed + } + newDesc := current.Description + if description != nil { + newDesc = strings.TrimSpace(*description) + } + now := time.Now().UTC().Format(time.RFC3339Nano) + _, err = s.DB.ExecContext(ctx, + `UPDATE workspaces SET name = ?, description = ?, updated_at = ? WHERE id = ?`, + newName, nullableString(newDesc), now, id) + if err != nil { + if isUniqueConstraintViolation(err) { + return Workspace{}, ErrNameTaken + } + return Workspace{}, fmt.Errorf("update workspace: %w", err) + } + return s.GetByID(ctx, id) +} + +// Delete removes a workspace. Idempotent — deleting an absent workspace +// returns ErrNotFound so the handler can choose between 404 and 204. +func (s *Service) Delete(ctx context.Context, id string) error { + res, err := s.DB.ExecContext(ctx, `DELETE FROM workspaces WHERE id = ?`, id) + if err != nil { + return fmt.Errorf("delete workspace: %w", err) + } + n, err := res.RowsAffected() + if err != nil { + return fmt.Errorf("rows affected: %w", err) + } + if n == 0 { + return ErrNotFound + } + return nil +} + +// --- helpers --- + +func scanRow(r interface{ Scan(dest ...any) error }) (Workspace, error) { + var ( + w Workspace + description sql.NullString + createdAt string + updatedAt string + ) + err := r.Scan(&w.ID, &w.Name, &description, &createdAt, &updatedAt) + if err != nil { + if errors.Is(err, sql.ErrNoRows) { + return Workspace{}, ErrNotFound + } + return Workspace{}, fmt.Errorf("scan workspace: %w", err) + } + w.Description = description.String + w.CreatedAt, _ = time.Parse(time.RFC3339Nano, createdAt) + w.UpdatedAt, _ = time.Parse(time.RFC3339Nano, updatedAt) + return w, nil +} + +func scanRows(rows *sql.Rows) ([]Workspace, error) { + out := []Workspace{} + for rows.Next() { + w, err := scanRow(rows) + if err != nil { + return nil, err + } + out = append(out, w) + } + return out, rows.Err() +} + +func nullableString(s string) any { + if s == "" { + return nil + } + return s +} + +// isUniqueConstraintViolation detects sqlite UNIQUE-failures by the prefix +// modernc.org/sqlite emits ("constraint failed: UNIQUE ..."). Brittle to a +// driver change but the canonical match used elsewhere in this codebase +// (e.g. users.Create) — keep this in sync with that pattern. +func isUniqueConstraintViolation(err error) bool { + if err == nil { + return false + } + msg := err.Error() + return strings.Contains(msg, "UNIQUE constraint failed") || + strings.Contains(msg, "constraint failed: UNIQUE") +} diff --git a/server/internal/workspaces/workspaces_test.go b/server/internal/workspaces/workspaces_test.go new file mode 100644 index 0000000..f6b17a0 --- /dev/null +++ b/server/internal/workspaces/workspaces_test.go @@ -0,0 +1,121 @@ +package workspaces + +import ( + "context" + "errors" + "testing" + + "github.com/dvcdsys/code-index/server/internal/db" +) + +func mustOpen(t *testing.T) *Service { + t.Helper() + database, err := db.Open(":memory:") + if err != nil { + t.Fatalf("open: %v", err) + } + t.Cleanup(func() { _ = database.Close() }) + return New(database) +} + +func TestCreateAndGet(t *testing.T) { + svc := mustOpen(t) + ctx := context.Background() + + w, err := svc.Create(ctx, "platform", "microservices") + if err != nil { + t.Fatalf("Create: %v", err) + } + if w.ID == "" || w.Name != "platform" || w.Description != "microservices" { + t.Fatalf("unexpected workspace: %+v", w) + } + + got, err := svc.GetByID(ctx, w.ID) + if err != nil { + t.Fatalf("GetByID: %v", err) + } + if got.Name != "platform" { + t.Fatalf("got name %q", got.Name) + } +} + +func TestCreateEmptyNameRejected(t *testing.T) { + svc := mustOpen(t) + if _, err := svc.Create(context.Background(), " ", "x"); !errors.Is(err, ErrNameEmpty) { + t.Fatalf("expected ErrNameEmpty, got %v", err) + } +} + +func TestCreateDuplicateName(t *testing.T) { + svc := mustOpen(t) + ctx := context.Background() + if _, err := svc.Create(ctx, "alpha", ""); err != nil { + t.Fatalf("first create: %v", err) + } + if _, err := svc.Create(ctx, "alpha", ""); !errors.Is(err, ErrNameTaken) { + t.Fatalf("expected ErrNameTaken, got %v", err) + } +} + +func TestList(t *testing.T) { + svc := mustOpen(t) + ctx := context.Background() + if _, err := svc.Create(ctx, "alpha", ""); err != nil { + t.Fatalf("create: %v", err) + } + if _, err := svc.Create(ctx, "bravo", ""); err != nil { + t.Fatalf("create: %v", err) + } + list, err := svc.List(ctx) + if err != nil { + t.Fatalf("List: %v", err) + } + if len(list) != 2 { + t.Fatalf("expected 2 workspaces, got %d", len(list)) + } +} + +func TestUpdate(t *testing.T) { + svc := mustOpen(t) + ctx := context.Background() + w, _ := svc.Create(ctx, "alpha", "old") + newName := "alpha-renamed" + newDesc := "new" + updated, err := svc.Update(ctx, w.ID, &newName, &newDesc) + if err != nil { + t.Fatalf("Update: %v", err) + } + if updated.Name != newName || updated.Description != newDesc { + t.Fatalf("update did not apply: %+v", updated) + } + + // nil description = leave alone. + finalName := "alpha-final" + updated2, err := svc.Update(ctx, w.ID, &finalName, nil) + if err != nil { + t.Fatalf("Update again: %v", err) + } + if updated2.Description != newDesc { + t.Fatalf("description should have been preserved, got %q", updated2.Description) + } +} + +func TestUpdateNotFound(t *testing.T) { + svc := mustOpen(t) + name := "x" + if _, err := svc.Update(context.Background(), "no-such-id", &name, nil); !errors.Is(err, ErrNotFound) { + t.Fatalf("expected ErrNotFound, got %v", err) + } +} + +func TestDelete(t *testing.T) { + svc := mustOpen(t) + ctx := context.Background() + w, _ := svc.Create(ctx, "x", "") + if err := svc.Delete(ctx, w.ID); err != nil { + t.Fatalf("Delete: %v", err) + } + if err := svc.Delete(ctx, w.ID); !errors.Is(err, ErrNotFound) { + t.Fatalf("second delete should be ErrNotFound, got %v", err) + } +} diff --git a/skills/README.md b/skills/README.md index 230130d..b2804aa 100644 --- a/skills/README.md +++ b/skills/README.md @@ -24,4 +24,63 @@ Loads navigation guidance into context for the rest of the session. To activate automatically in every session, add `cix` usage instructions to `~/.claude/CLAUDE.md` (see the [Agent Integration](../README.md#agent-integration) -section in the main README). \ No newline at end of file +section in the main README). + +--- + +## cix-workspace — Cross-Project Research + +Structures the agent's workflow for tasks that touch more than one +repo: how to identify which repos are in scope, how to investigate +them (single-project search or parallel sub-agent fan-out), and how +to synthesize a per-repo change plan. Includes a worked example +showing the failure mode that motivated the hybrid BM25+dense +algorithm. + +The skill answers three questions per request: + +1. Which repos does this request touch? +2. Which code in those repos is relevant? +3. What changes need to land, and in what order? + +It also handles the *primary project* nuance — the agent is usually +`cd`'d into a specific repo, and the user's task is rooted there; the +workspace is for the surrounding context. + +### Bundled sub-agent + +This skill ships with a dedicated `cix-workspace-investigator` +sub-agent — a thin, read-only shell around `cix search` / `cix def` / +`cix refs` with scope-isolation invariants baked in (one repo per +spawn, no edits, no recursion). When the main session fans out across +3+ repos, each spawn runs in its own context, keeping the main session +free of per-repo code chunks. The methodology (what to look for, in +what format) is the main agent's call per spawn; the sub-agent just +follows instructions and reports. + +### Install + +```bash +# Skill body +cp -r skills/cix-workspace ~/.claude/skills/cix-workspace + +# Bundled sub-agent — must live in ~/.claude/agents/ for Claude Code +# to discover it +mkdir -p ~/.claude/agents +cp skills/cix-workspace/agents/cix-workspace-investigator.md ~/.claude/agents/ +``` + +### Usage + +In a Claude Code session: + +``` +/cix-workspace +``` + +Loads the cross-project research workflow into context. Pair with +`/cix` for the single-repo navigation guidance. + +To activate automatically when the user's request looks cross-cutting, +mention `cix-workspace` in your `~/.claude/CLAUDE.md` alongside the +`cix` instructions. \ No newline at end of file diff --git a/skills/cix-workspace/SKILL.md b/skills/cix-workspace/SKILL.md new file mode 100644 index 0000000..a11c4b5 --- /dev/null +++ b/skills/cix-workspace/SKILL.md @@ -0,0 +1,624 @@ +--- +name: cix-workspace +description: Cross-project research via cix workspace search. Use when a task touches more than the project you're cd'd into — microservices that talk to each other, a feature whose implementation lives in N repos (backend + api + shared-models + workers + infra + …), or any time the user mentions a product name / service / event that isn't fully defined in the primary repo. The skill structures the research around three questions and a sub-agent fan-out so the answer doesn't drown in chunks. +user-invocable: true +--- + +# `cix workspace` — Cross-Project Research Workflow + +You usually work inside one repo — your **primary project** — the +directory the user opened you in. Most tasks are fully contained there +and `cix search` / `cix definitions` / `cix references` are the right +tools. + +But some tasks are not contained. A request like "wire feature X +through the platform" can touch a half-dozen repos in different +languages, layers, and shapes — a service, a shared library, the +infra manifests, an API spec. Reading the primary repo alone gives +you 1/N of the picture. Worse, you don't know which N repos are +actually involved until you look. + +`cix workspace` is the tool for that. It searches every repo in a +named workspace at once and tells you: + +1. **Which repos are actually relevant to this request.** +2. **Which code in those repos is the entry point.** +3. **What changes need to land in each, and in what order.** + +Those three questions are the *goal* of using this skill. Don't jump +to implementation before you can answer all three with evidence. + +--- + +## When to reach for workspace search + +| Signal in the user's request | What to do | +|---|---| +| Names a product / acronym you don't fully recognize from primary repo | Workspace search the acronym, see where it lives | +| "Add X to the Y flow", "wire Z into A" | Workspace search Y or Z — likely cross-cutting | +| "Across services", "between repos", "end-to-end" | Workspace search the feature | +| Talks about an event / topic / contract / API endpoint | Workspace search the event name | +| References infra / deployment alongside code | Workspace search — infra repo is probably in the workspace too | +| "How do I change X in production / staging" | Workspace search BUT look past top-1 — the answer is usually a manifests/config/contract repo even when a code repo ranks higher (rule 7 below) | +| Plain bugfix entirely inside one file | **Don't** workspace search. `cix search` is enough | +| User points at a specific symbol / file path | **Don't.** `cix definitions ` or just Read the path | + +If you're not sure, run `cix ws` once to see whether the primary +project is even part of a workspace. If it isn't, this skill doesn't +apply. + +--- + +## The workflow + +The goal-driven loop. Don't shortcut it. Each step is fast. + +### Step 0 — orient + +```bash +cix ws # list workspaces; find the one your primary is in +cix ws # describe — confirm repos are indexed (✓ count) +``` + +If the workspace shows `stale_fts_repos` in any search response later, +trust the dense ranking less — see the troubleshooting section. + +### Step 1 — answer "which repos?" + +Run workspace search with a **short, term-rich query**, not the full +user sentence: + +```bash +# GOOD — short, term-rich (a product acronym + an action verb) +cix ws platform search "rate-limit middleware" + +# BAD — full sentence dilutes BM25 with stopwords ("add", "to", "a") +cix ws platform search "Add a rate limit to every API endpoint" +``` + +Why short: the hybrid algorithm fuses BM25 (literal token match) with +dense (semantic). BM25 carries the project-gating signal — repos that +share zero vocabulary with the query drop out. Common words ("add", +"flow", "for") match everywhere and dilute that signal. + +Read the response: + +- **`projects[]` is the answer to Q1.** Sorted by `project_score` + (candidacy). Each entry has `bm25_score` (literal-token overlap) + and `dense_score` (semantic similarity). +- Projects below the per-query relative threshold are already + filtered out — you only see the survivors. +- Top entry's `project_score` is your reference. Entries at 60-100% + of top are core relevant. Entries at 40-60% are secondary. Below + 40% would have been dropped server-side. + +**Always include the primary project** even if workspace search ranks +it low — the user's task is rooted there. The workspace's other +repos are dependencies / consumers / providers / counter-parties. + +### Step 2 — answer "what code is relevant?" + +For each repo from step 1, look at the chunks panel. The chunk list +is interleaved by rank across surviving projects so each repo's top +hit appears early. Use these chunks as **starting points** for a +deeper read, not as the full answer. + +For repos other than the primary, you have two options: + +**A. Quick scan (≤ 2 repos to investigate):** use single-project +search directly. + +```bash +# Search inside one specific project +curl -G -H "Authorization: Bearer $CIX_KEY" \ + --data-urlencode "q=rate limit middleware handler" \ + --data-urlencode "min_score=0" \ + "$CIX_URL/api/v1/projects/$(project_hash)/search" +``` + +The per-project default `min_score` is `0.2` — light floor that +keeps abstract NL queries non-empty. For drill-down on a natural- +language question ("how does X work end-to-end"), pass `min_score=0` +explicitly to be safe. For strict code-symbol matching, pass `0.4+`. + +**B. Fan-out to sub-agents (≥ 3 repos, or you need a thorough read):** +spawn one `cix-workspace-investigator` sub-agent per relevant repo, in +parallel. See the dedicated [Sub-agent fan-out pattern](#sub-agent-fan-out-pattern) +section below for the four-part prompt template, including how to pass +seed chunks with your interpretive commentary. + +Run them concurrently (one message, multiple Agent tool calls). When +they report back, you have N independent reads to synthesize, not N +sequential rabbit-holes. + +### Step 3 — answer "what changes?" + +This is your job, not a sub-agent's. Sub-agents report findings; you +write the plan. + +For each relevant repo: + +- What needs to change (specific file:line, or a new file). +- Why (which step of the data flow this implements). +- Order constraints (e.g. "shared-models migration must deploy + before backend reads new field"). +- Tests that prove it works. + +Confirm with the user before any of this lands. The plan is the +deliverable of this skill; the implementation is a separate step. + +### Throughout — ask, don't guess + +Trigger a clarifying question when: + +- Top-2 projects are at near-equal `project_score` and have different + labels — the request might fit either repo, ask which. +- `bm25_score` is 0 across all projects → either the FTS index is + stale (see troubleshooting) OR the user's term doesn't exist + literally in any repo. Ask the user for the term that *would* + appear in code ("we call it `Order` in code, not `Trade`"). +- A sub-agent reports it can't find a clear entry point — surface + that uncertainty back to the user, don't paper over it. +- The implementation plan needs a deploy-order assumption — confirm + who owns each repo and what their cycle looks like. + +Don't ask if the answer is obvious from the chunks. The bar is "I +have two plausible interpretations and the wrong one costs the user +real time." + +--- + +## Reading the projects panel — what the numbers mean + +``` +project-a@main 0.500 5 hits bm25 0.421 dense 0.556 +project-b@main 0.412 5 hits bm25 0.318 dense 0.498 +project-c@main 0.288 3 hits bm25 0.155 dense 0.362 +``` + +- `project_score` (first column): the α-blended candidacy in [0, 1]. + Top = strongest signal across both retrieval modes. +- `bm25_score` and `dense_score`: the raw per-mode signals. The + algorithm normalizes these per query before blending — useful for + diagnosis, not for sorting. +- If `bm25_score` >> `dense_score` for a project: it's relevant + because of literal token overlap (product name appears in code). + Trust the surface area but verify semantic relevance manually. +- If `dense_score` >> `bm25_score`: it's relevant because of + semantic similarity (handler shape matches the query intent) but + the literal term isn't there. Common when the user's term is a + product nickname not used in code. +- If both are near zero: you're seeing the project because nothing + else cleared the gate either. Treat with skepticism. + +--- + +## Trust rules — making sense of the response + +These ten rules were derived from a calibration eval (113 synthetic +queries + 5 real engineering tasks against a mixed-domain workspace). +Apply them before acting on workspace-search output. Numbers below +are empirical, not vibes. + +### Rule 1 — `chunk.score >= 0.4` is the trust threshold + +Chunks with `score < 0.4` are noise about 75% of the time +(rank-inversion and weak-signal FPs from the relative project gate). +Skim them only when the higher-scored chunks don't answer the +question. With the default `min_score=0.4` you usually won't see them +at all; if you passed `min_score=0` (intentional broad sweep), apply +this rule yourself. + +### Rule 2 — `chunk.score == 0` is a BM25-only hit, not low confidence + +The chunk's project matched the literal query tokens via FTS5 but the +embedding side didn't surface it. These are valuable when the query +carries project-specific identifiers (CamelCase symbols, file names, +acronyms). Discount them when the query is a generic English word +(`error`, `data`, `config`) — common-word BM25 hits are noise. + +### Rule 3 — Top-1 of `projects[]` is correct ~70% of the time in real tasks + +The synthetic eval measured 91% on single-target queries; real +engineering tasks hit ~70% because real queries often span layers +(see rule 7). When the top-1 project doesn't match your task's +intent, **scan ranks 2–5 before reformulating** — the right repo is +usually there. The `projects[]` panel is the answer to "where do +the words live", not "where should the change happen". + +### Rule 4 — Drop down to single-project search for depth + +When `projects[]` shows the target at rank 1 with a clear lead +(`project_score` ≥ 1.5× the next), switch to per-project search. +You get file-grouped, deeper results without the cross-project +round-robin cap of 5 chunks per repo. + +### Rule 5 — `min_score=0` for intentional cross-project sweeps + +Default workspace `min_score` is `0.4`. For queries that should +legitimately span many repos ("authentication", "configuration +loading", "Kafka consumers"), pass `min_score=0` explicitly. +Expect `projects[]` to list 5–8 entries — that's the feature, not a +bug. Ignore rule 1 in this mode: many real positives sit below 0.4 +in genuine cross-cutting queries. + +### Rule 6 — Add a 3rd disambiguating token, carefully + +If two query words are each domain-overloaded (e.g. "client SDK" +could be the generated API client, the shared library, or a model +type), add a third word. **Prefer meta-tokens** (`endpoint`, +`route`, `handler`, `manifest`, `migration`, `config file`) over +tech-stack guesses (`grpc`, `kafka`, `terraform`) — wrong stack +guesses actively rotate the ranking away from the right answer. If +unsure of the stack, run the query without a disambiguator first, +read the top-1 project's language/path patterns, then refine. + +### Rule 7 — "Change X in production" → manifests repo, not code repo + +For tasks framed as deploying / configuring / overriding a feature, +the answer usually lives in a manifests / config / contract repo +(K8s overlays, Helm charts, OpenAPI specs, environment-specific +yaml). Workspace search ranks by token frequency, so the code repo +typically wins. Look at `projects[]` for repos with **manifests, +config, platform, deploy, contract, openapi, infra** in their +names — those are often the right targets even at rank 3–5. + +### Rule 8 — When top-1 doesn't fit, scan first, reformulate second + +If you think top-1 is wrong: + +1. First, scan ranks 2–5. The right project is there ~80% of the + time when the layer mismatch caused rule 3 to fail. +2. Only after scanning, reformulate. Reformulating before scanning + wastes a round-trip and risks the new query introducing fresh + layer confusion. + +### Rule 9 — For per-project NL drill-down, pass `min_score=0` explicitly + +When dropping from workspace to per-project search with a natural- +language query (e.g. "how does X work"), pass `min_score=0` to be +safe. The per-project default `min_score=0.2` is lighter than it +used to be (`0.4`) and usually fine, but abstract semantic queries +can score in the 0.2–0.3 range that the default still rejects. + +### Rule 10 — Words ≠ change location (the intent-vs-tokens watchword) + +Workspace search ranks projects by *where the words live*. Your +task is usually about *where the change should happen*. These +coincide ~70% of the time, not 91%. When in doubt: read the +chunks in ranks 2–5 before committing to a target repo. + +### Quick example — when rules 7 and 10 save you + +> User: "Change the database timeout for the staging environment of +> the order service." + +Workspace search ranks the **order-service code repo** at #1 (it's +where the word "database" appears most). But the change needs to +land in the **environment-platform manifests repo** at rank #4. If +you stopped at top-1 you'd edit the wrong file. Rules 7 and 10 +remind you to scan further. + +--- + +## Primary project nuance + +You are typically `cd`'d into a single repo. That's the *primary +project*. The user's task is framed *from* that repo — they're +extending it, integrating with something it depends on, or wiring up +something that consumes it. + +Patterns: + +- **The change centers on primary, others are consumers/providers.** + Most common. Primary gets the bulk of the implementation; the + other repos get small adapter changes (new field consumption, new + webhook subscriber, new client method). +- **The change is in another repo, primary just calls it.** Less + common but real. Primary's role is the integration test or the + feature-flag flip; the heavy lifting is elsewhere. +- **The change is genuinely distributed.** Migrations, schema changes + rolling through many services, protocol bumps. Each repo gets a + coordinated change with deploy-order constraints. + +Workspace search tells you which pattern you're in. Don't assume. + +--- + +## Sub-agent fan-out pattern + +When you have 3+ relevant repos, fan out. Sub-agents run with isolated +context — the main session stays clean (no per-repo code chunks bloating +it) and the investigations run in parallel. + +Use the dedicated **`cix-workspace-investigator`** sub-agent, which ships +with this skill. It's a thin, read-only shell around `cix search` / `cix +def` / `cix refs` / `Read` / `Grep` with three hard rules baked in: +stay inside the assigned project, no edits, no recursion. The +methodology — what to look for, what to report, in what format — is +**your** call, per spawn. The sub-agent follows your instructions; it +doesn't second-guess them. + +### The four parts of a good per-spawn prompt + +You'll write one prompt per repo. A good one has four parts: + +#### 1. The user's task, verbatim + +Sub-agents have zero prior context. Paste the original user request even +if it feels redundant — your interpretation might be wrong, and the +user's wording is the ground truth the sub-agent should reason from. + +#### 2. The `project_path` you're assigning + +Plus the workspace ID or `cix` command-prefix if your setup needs it. +One repo per spawn. + +#### 3. Seed chunks **with your commentary** + +This is the part most often done badly. Don't just paste raw chunk +pointers and hope the sub-agent figures out what matters. You saw the +workspace search response; you have hunches about which chunks are real +entry points and which are noise; pass that down. + +For each chunk you cite, add one short line of interpretation. For +the response as a whole, flag suspicious signals: + +- Which chunk looks like the most likely entry point and why +- Which chunks look like test fixtures / dead code / wrong-layer the + sub-agent should de-prioritize +- Numeric signals that need a second opinion: `score=0` (BM25-only + literal — verify the token isn't a false friend), `score < 0.4` (low + confidence, possible rank-inversion), `bm25_score` high + `dense_score` + near zero (literal-only match — concept may not actually live here) +- Whether you suspect this repo is wrong-layer (rule 7) — tell the + sub-agent to confirm relevance before diving into the chunks + +**Example "good chunk block":** + +``` +Seed chunks from workspace search: +- `internal/gateway/server.go:412-418` (score 0.55) — looks like the + HTTP handler entry point for the rate-limit feature; confirm it + invokes the limiter middleware rather than just returning 429. +- `internal/gateway/middleware.go:89-93` (score 0.49) — middleware + registration site. Verify whether rate-limit is wired here or + elsewhere. +- `tests/integration/rate_limit_test.go` (score 0.41) — integration + test. Useful for understanding the expected shape, but not where + the change lands. Skim only. +- `pkg/shared/util.go:1-30` (score 0) — BM25-only hit, "limit" + appears in a comment. Almost certainly noise; skip unless you need + shared utilities. + +Panel-level notes: +- Workspace ranked this project #1 with a clear lead (project_score + 1.000 vs next 0.860). High confidence this is the right repo. +- bm25_score=8.5, dense_score=0.54 — strong on both signals, not a + wrong-layer concern. +``` + +#### 4. Explicit deliverable + +Tell the sub-agent **exactly** what to return and in what shape. Each +task has different needs: + +- "Confirm whether this repo is in scope. Yes / no / partial + one + sentence why." +- "Find the entry point for the rate-limit middleware. Report + file:line of the entry and a five-step trace through the call + graph." +- "List every file that would need to change to add a new audit-log + event type. No code, just file path + one-line per-file reason." + +Vague deliverables (`"investigate this repo"`) → vague answers. + +### Anti-patterns to avoid + +- **"Investigate this repo for rate-limit"** — no deliverable. The + sub-agent guesses scope and you can't verify the result. +- **Three paragraphs of context with nested questions** — sub-agent + answers the wrong question. Pick one deliverable per spawn. +- **"Read all the auth code"** — unbounded. Either fails or returns a + wall of text. +- **Pasting raw chunks without interpretation** — you saw the + response, you have hunches about what matters. Sub-agent doesn't. + Skipping commentary throws away the most valuable thing you can pass + down. + +### Mechanics + +Run all sub-agents in **one message with multiple Agent calls** so they +execute in parallel. Wait for completion. Synthesize their reports +yourself — sub-agents don't see each other's work; you do. Surface +inconsistencies (e.g. two repos disagree on which event format is +canonical) back to the user. + +--- + +## Worked example — why this skill exists + +A representative failure mode that motivated the hybrid algorithm: + +**The naïve approach:** running workspace search with a full natural- +language sentence ("Add feature X to product Y"). The pre-hybrid +implementation was pure-dense — it returned the N nearest vectors +regardless of how far away "nearest" actually was. Every repo in the +workspace surfaced, including repos that contained **zero literal +mentions** of either the feature name or the product code. Confidently +reporting all of them as "relevant" wasted time on completely +unrelated repos. + +**The structural failure:** + +1. Pure-dense fan-out cannot tell "no signal" apart from "weak + signal" — chromem always returns the K nearest vectors. +2. Long natural-language queries dilute the few tokens that carry + the actual gating signal. +3. Without a sparse-retrieval channel, an acronym or unique + identifier query has nothing to lock onto. + +**What this skill teaches instead:** + +1. Query with **just the high-precision term** first — the product + acronym, the feature name, the unique symbol. Everything else + is noise. +2. Verify that projects with `bm25_score = 0` aren't masquerading + as relevant. After the hybrid landed, repos with no literal + matches AND only marginal dense similarity drop out automatically + via the project gate. +3. Confirm with the user before treating "this repo surfaced in + search" as "this repo is in scope for the change". + +**The lesson encoded in this skill:** + +- Step 1: query the term, not the sentence. +- Step 1: trust the project gate; if a repo dropped out, it dropped + out for a reason. +- Step 2: read the surface area from `projects[]` first, then read + the chunks as starting points. +- Step 3: never assume "in search results" == "in scope". Verify. + +--- + +## Troubleshooting + +### `bm25_score` is 0.000 on every project + +The workspace was indexed before the FTS5 mirror existed and the +sparse half of the hybrid is empty. Hybrid degrades to pure-dense +fan-out — the same algorithm that produces the false-positive +failure mode described in the worked example above. + +The response includes `stale_fts_repos` listing the affected +project_paths. Fix: reindex each repo (dashboard → repo card → +reindex button, or `POST /api/v1/workspaces/{id}/repos/{repo_id}/reindex`). +After reindex, BM25 populates incrementally per-file as chunks are +written. + +Until reindex completes, **don't trust the project gating** — the +algorithm is producing the old failure mode. Verify project relevance +by literal grep on the term. + +### `status: "empty"` despite obviously-relevant repos in the workspace + +Either: + +- The query terms don't appear literally in any repo AND the dense + similarity is below threshold for everything (project-gate dropped + everyone). Re-phrase with the term the code actually uses, or + lower `min_score`. +- Every workspace repo is still indexing. Check `pending_repos` in + the response. + +### `status: "partial_failure"` + +At least one repo errored out (`failed_repos` array names them). +Common cause: corrupt chromem collection. The remaining repos still +returned results. Surface to the user; don't silently treat as +complete. + +### Top-2 projects are at near-equal candidacy + +The algorithm isn't confident which repo is more relevant. Possible +causes: + +- The feature genuinely lives in both. Ask the user which they + intended as primary scope. +- The query is too broad — both repos match generic vocabulary. + Re-query with a more specific term. +- One repo is a fork or duplicate. Confirm with `cix ws ` + describe. + +### One project absolutely dominates everything else + +Could be legit (the user's task is mostly contained in one repo and +that repo is just very dense with relevant content). Or could be a +single repo accidentally matching the user's stopwords across many +files. Spot-check: is the project's `bm25_score` driven by the +high-IDF term (the product name) or by common words? + +### Top-1 is wrong-layer (rule 7 / rule 10 in action) + +The top-1 project contains the words but isn't where the change +should land. Classic example: "deploy X to staging" → workspace +ranks the code repo for X at #1, but the staging overlay lives in +a manifests repo at rank #4. Or: "add API endpoint Y" → ranks the +backend implementation at #1, but the OpenAPI contract repo at #3 +must be updated first. + +**Fix:** scan ranks 2–5 explicitly. Look for projects whose names +hint at a different layer (`*-platform`, `*-manifests`, +`*-contracts`, `*-config`, `*-infra`, `openapi*`). If you see one, +that's probably your real target. + +### Disambiguator backfired — the query lost its grip + +You added a 3rd word to discriminate between two overloaded terms, +and the response is *worse* — top projects all have mediocre scores +and the right repo isn't among them anymore. This usually happens +when the added token belongs to a different stack than your target +(e.g. you guessed a transport / framework / library that the canonical +repo doesn't use), so the extra token rotates the ranking toward +unrelated repos. + +**Fix:** strip the guessed-stack token. Try a meta-token instead +(`endpoint`, `route`, `handler`, `manifest`, `migration`). Or: run +the 2-word query as-is, scan the top-1 project's path patterns and +language to see what stack it actually uses, then refine. + +--- + +## Quick command reference + +```bash +# List workspaces +cix ws +cix ws list --json + +# Describe one workspace (always do this before searching) +cix ws platform +cix ws platform describe --json + +# List repos attached to a workspace +cix ws platform list +cix ws platform repos --verbose + +# Search a workspace +cix ws platform search "rate-limit middleware" +cix ws platform search "JWT validation" --top-projects 8 --top-chunks 30 +cix ws platform search "audit logging" --json +``` + +Flags: + +- `--top-projects N` — surface up to N projects in the panel + (default 10, max 50). Increase for very broad explorations. +- `--top-chunks K` — return up to K chunks total (default 20, max + 200). Round-robin interleaved across surviving projects. +- `--min-score F` — drop dense hits below cosine F before scoring. + **Default 0.4** (symmetric with per-project search default). + Pass `0` explicitly for intentional cross-project sweeps that + need long-tail recall — broad concepts like "authentication" or + "Kafka consumers" that legitimately live in many repos. Higher + values (0.5+) for queries you want laser-focused. +- `--json` — raw machine-readable response. + +--- + +## TL;DR + +When the user's task plausibly spans more than one repo: + +1. `cix ws` → find the workspace, then `cix ws ` describe it. +2. Workspace search with a **short, term-rich** query. +3. Read `projects[]` → that's your scope (Q1 answered). +4. For each repo in scope, either single-project search or spawn a + `cix-workspace-investigator` sub-agent — in parallel, with seed + chunks AND your interpretive commentary on what to trust. +5. Synthesize the sub-agent reports → plan changes per repo, with + order constraints (Q2 + Q3 answered). +6. Ask the user to confirm the scope and plan before implementing. + +If `bm25_score` is 0 across the board, the FTS index is stale — +fix it before trusting the result. diff --git a/skills/cix-workspace/agents/cix-workspace-investigator.md b/skills/cix-workspace/agents/cix-workspace-investigator.md new file mode 100644 index 0000000..7d24379 --- /dev/null +++ b/skills/cix-workspace/agents/cix-workspace-investigator.md @@ -0,0 +1,63 @@ +--- +name: cix-workspace-investigator +description: Read-only deep-dive of ONE repository inside a workspace fan-out task. Receives the user task + project_path + seed chunks (with the main agent's commentary on what to trust and what to question) + an explicit deliverable. Returns whatever the main agent asked for, in the format they asked for. Use only when the main session is running the cix-workspace skill workflow and has identified one or more cross-project repos to investigate in parallel. Do not use for: single-repo questions (use cix search directly), tasks not framed by the cix-workspace skill, anything that requires editing or running code. +tools: Bash, Read, Grep +--- + +# `cix-workspace-investigator` + +You investigate ONE repository as part of a larger cross-project workspace task. +The main agent has full context about the user's goal; you only see what they +passed to you in this single prompt. + +## Your tools + +You have a read-only toolkit for code investigation inside the assigned project: + +- **`cix search ""`** — semantic / hybrid lookups inside the assigned + project. Default tool for "find code that means X". +- **`cix def `** — go-to-definition. +- **`cix refs `** — find every usage. +- **Read** — open specific files when chunk inspection isn't enough. +- **Grep** — exact literal strings only (error messages, config keys, import + paths). Not for semantic search. +- **Bash** — for running the `cix` CLI and small read-only shell commands + (`ls`, `wc`, `head`, `cat` short files). Never mutate state. + +The cix index already covers this project — you don't need to (and can't) +re-index. + +## Hard rules — non-negotiable + +1. **Stay inside the assigned `project_path`.** Don't read or query other + workspace repos. If you discover a finding that requires looking elsewhere, + surface it as an uncertainty for the main agent to fan out further. +2. **Read-only.** No `Write`, no `Edit`, no `git` mutations, no shell side + effects. If you see a bug, describe it — don't fix it. +3. **No recursion.** Don't spawn further sub-agents. You are one level of + fan-out; the main agent handles synthesis. +4. **Follow the main agent's instructions exactly.** Output format, depth, + word budget, and what to look for are the main agent's call — not yours. + If they ask for three bullets, give three bullets. If they ask for a + five-step trace, give that. Don't volunteer extra structure. +5. **Report what you can't do.** If a file is missing, if `cix` returns + empty for a term that should exist, if a seed chunk doesn't match what + the main agent suggested — say so explicitly. Don't fabricate findings + to fill a template. + +## Output contract + +Return exactly what the main agent asked for, in exactly the format they +asked for. The main agent already knows how to parse the response they +requested. Don't add a preamble, don't add a meta-summary unless asked, +don't restate the task back at them. + +If the request is ambiguous, pick the most-likely interpretation, execute it, +and flag the ambiguity in one short line at the end. + +## What you are NOT + +You are not a generic code-explorer. You are not a planner. You are not a +reviewer. You are a focused, read-only investigator for one repo, working +under explicit per-call instructions from a main agent that already knows +the workspace and the user. diff --git a/workspaces.md b/workspaces.md new file mode 100644 index 0000000..93291c7 --- /dev/null +++ b/workspaces.md @@ -0,0 +1,870 @@ +# Workspaces + +> [!WARNING] +> **Experimental.** Workspaces ship behind a feature flag and the HTTP + UI +> surface is still evolving. The defaults and search algorithm are +> calibrated on a 113-query eval (see [§ Search algorithm](#search-algorithm)), +> but expect breaking changes to API shape, dashboard layout, and CLI flags +> before this graduates to stable. + +A **workspace** is a named group of repositories that cix can search **as +one corpus**. Where `cix search` is for the project you're `cd`'d into, a +workspace is for tasks that span multiple repos — microservices that talk +to each other, a feature whose implementation crosses several services, +or any time the answer is "look in N repos, not one". + +Workspaces clone GitHub repositories server-side and index them next to +your local projects, then expose a single hybrid (BM25 + dense) +cross-project search endpoint. + +[← back to main README](README.md) + +--- + +## Table of contents + +- [What you get](#what-you-get) +- [Enabling workspaces](#enabling-workspaces) +- [Concepts](#concepts) +- [Quick start](#quick-start) +- [Adding repositories](#adding-repositories) +- [GitHub tokens](#github-tokens) +- [Searching a workspace](#searching-a-workspace) +- [Search algorithm](#search-algorithm) +- [Webhooks (auto-reindex on push)](#webhooks-auto-reindex-on-push) +- [Strengths and weaknesses](#strengths-and-weaknesses) +- [Configuration reference](#configuration-reference) +- [REST API reference](#rest-api-reference) +- [Troubleshooting](#troubleshooting) +- [Agent integration](#agent-integration) + +--- + +## What you get + +- **Cross-project semantic search.** One query, one ranked response across + every indexed repo in the workspace. Returns `projects[]` (which repos + look relevant) and `chunks[]` (round-robin interleaved snippets). +- **Server-side clones of GitHub repositories.** Add a repo by URL + + branch; the server clones it under its data directory and indexes it + the same way it indexes local projects. +- **GitHub PAT management.** Store a personal access token once, use it + to clone private repos and (optionally) auto-register push webhooks. + Tokens are AES-256-GCM encrypted at rest. +- **Auto-reindex on push.** With `auto` webhook mode the server + registers a GitHub webhook on the repo and re-clones + reindexes on + every push to the tracked branch. +- **Dashboard UI.** Browser-facing CRUD for workspaces, repos, tokens, + and a two-stage search interface. +- **CLI integration.** `cix ws` for listing workspaces, describing + them, and running cross-project search from the terminal. +- **Agent skill.** A `cix-workspace` skill teaches AI agents how to use + the workspace search responsibly, with a dedicated + `cix-workspace-investigator` sub-agent for parallel per-repo + investigation. See [`skills/cix-workspace/SKILL.md`](skills/cix-workspace/SKILL.md). + +--- + +## Enabling workspaces + +The feature is off by default. Set the flag in `.env` (or the equivalent +in your deployment): + +```bash +CIX_WORKSPACES_ENABLED=true +``` + +…and restart the server. Without the flag every workspace endpoint +returns `503 Service Unavailable` with `workspaces feature is disabled +(set CIX_WORKSPACES_ENABLED=true and restart)`. + +You may also want to set: + +```bash +# Where workspace repo clones live on disk. Defaults to /repos. +CIX_WORKSPACES_DATA_DIR=/var/lib/cix/repos + +# Public URL of this server — required if you want auto-registered +# GitHub webhooks. Without it, webhook mode falls back to `manual`. +CIX_PUBLIC_URL=https://cix.example.com + +# Encryption key for GitHub tokens. If unset, the server auto-generates +# one at /.secret_key on first boot. +CIX_SECRET_KEY=$(openssl rand -hex 32) +``` + +--- + +## Concepts + +### Workspace + +A named group with an `id` (UUID), `name` (unique), and `description`. +A user creates a workspace, then attaches repositories to it. A workspace +has no built-in access control beyond what the server's auth layer already +provides — anyone authenticated can list and search workspaces today. + +### Workspace repo + +A row in `workspace_repos` that ties one GitHub repo+branch to a +workspace. Two kinds: + +- **Owned** (`is_linked=0`): the server clones the repo to disk and runs + indexing. Status transitions: `pending → cloning → indexing → indexed` + (or `failed`). These are the "true" workspace repos. +- **Linked** (`is_linked=1`): a lightweight pointer to an *already-indexed* + local project (one that's tracked in the `projects` table because you + `cix init`'d it). No clone, no separate index. Useful for including + your primary repo in a workspace without duplicating data. + +### GitHub token + +A personal access token (PAT) the server uses to clone private repos and +optionally register webhooks. Stored AES-256-GCM-encrypted; the plaintext +is returned to the client exactly once (on creation) and never again. + +### Project path + +Workspace repos use the canonical form `github.com/owner/repo@branch` +(e.g. `github.com/acme/api-server@main`) as their `project_path`. This +is the same identifier you'll see in the `projects[]` panel and in chunk +records when searching. + +--- + +## Quick start + +End-to-end walkthrough, assuming the server is up and you have a fresh +admin login. + +### 1. Enable the feature + +```bash +echo 'CIX_WORKSPACES_ENABLED=true' >> .env +docker compose restart # or `make run` for native +``` + +### 2. Create a workspace + +From the dashboard: open `/dashboard/workspaces` → **Create workspace**. + +Or from the API: + +```bash +curl -X POST http://localhost:21847/api/v1/workspaces \ + -H "Authorization: Bearer $CIX_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{"name":"platform","description":"Platform services"}' +# → {"id":"4f2a785c-...","name":"platform",...} +``` + +### 3. Add a GitHub token (if any repo is private) + +Dashboard: **API Keys** page → **GitHub tokens** tab → **Add token**. +Paste a PAT scoped at minimum to `repo` (for private repos) and +`admin:repo_hook` (for auto-registered webhooks). + +Or: + +```bash +curl -X POST http://localhost:21847/api/v1/github-tokens \ + -H "Authorization: Bearer $CIX_API_KEY" \ + -d '{"name":"my-pat","token":"ghp_xxx..."}' +# → {"id":"abc-123","name":"my-pat","scopes":["repo","admin:repo_hook"]} +``` + +The token's plaintext is **never echoed back**. Lose the value and you +must rotate. + +### 4. Attach a repo + +Dashboard: open the workspace → **Add repository** → walk through the +staged dialog (token → account/org → repo → branch → webhook mode). + +Or: + +```bash +curl -X POST http://localhost:21847/api/v1/workspaces//repos \ + -H "Authorization: Bearer $CIX_API_KEY" \ + -d '{ + "github_url":"https://github.com/acme/api-server", + "branch":"main", + "token_id":"abc-123", + "webhook_mode":"manual" + }' +# → {"id":"...","status":"pending","project_path":"github.com/acme/api-server@main",...} +``` + +Status will transition through `cloning → indexing → indexed` over the +next minutes (depends on repo size + embedding throughput). + +### 5. Watch the indexing progress + +```bash +curl -H "Authorization: Bearer $CIX_API_KEY" \ + http://localhost:21847/api/v1/workspaces//repos +# Look for `status: "indexed"` per repo. +``` + +### 6. Search + +CLI: + +```bash +cix ws platform search "authentication middleware" +``` + +Or directly: + +```bash +curl -G -H "Authorization: Bearer $CIX_API_KEY" \ + --data-urlencode "q=authentication middleware" \ + http://localhost:21847/api/v1/workspaces//search +``` + +--- + +## Adding repositories + +### Owned vs linked + +| | Owned repo (`is_linked=0`) | Linked project (`is_linked=1`) | +|---|---|---| +| Source | GitHub clone | Existing `cix init`'d local project | +| Clone path | `/repos//` | n/a (uses original) | +| Index lifecycle | Server-managed | Whatever the user runs locally | +| Indexed by | Server's index pipeline | `cix init` / `cix watch` | +| Webhooks | Supported | Not applicable | +| API | `POST /workspaces/{id}/repos` | `POST /workspaces/{id}/repos/link` | +| Dashboard | **Add repository** button | **Link existing project** button | + +Use **linked** when the primary project you're working in should appear +in the workspace search but you don't want a second clone. + +### From the dashboard + +The **Add repository** dialog is staged: + +1. **Pick a GitHub token** — required for private repos. Public repos + can be added without a token (HTTPS anonymous clone). +2. **Pick an account** — your user or one of the orgs visible to the + token's scopes. +3. **Pick a repo** — fetched from GitHub via the token (up to 500 + shown; search to narrow). +4. **Pick a branch** — defaults to the repo's default branch. +5. **Pick a webhook mode** — `manual` / `auto` / `disabled`. See + [Webhooks](#webhooks-auto-reindex-on-push). + +The dialog calls `POST /workspaces/{id}/repos` at the end. The clone + +index job runs in the background. + +### From the API + +```bash +curl -X POST http://localhost:21847/api/v1/workspaces//repos \ + -H "Authorization: Bearer $CIX_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "github_url": "https://github.com/owner/repo", + "branch": "main", + "token_id": "", + "webhook_mode": "manual" + }' +``` + +Response fields worth knowing: + +- `id` — workspace_repo UUID (use this for delete / reindex / webhook + endpoints) +- `project_path` — `github.com/owner/repo@branch`, the search identifier +- `status` — starts at `pending`, becomes `indexed` when the pipeline + finishes +- `webhook_secret` — server-generated HMAC secret. Returned exactly + once if you set `webhook_mode=manual`. Use it when you configure the + webhook on GitHub manually. + +### Cloning, indexing, and status transitions + +What happens when you add a repo: + +1. **`pending`** — row inserted in `workspace_repos`. Clone job queued. +2. **`cloning`** — server fetches via `git clone` (or `git fetch + + checkout` if the repo is already on disk). Private repos use the + attached token. Result lands at `//`. +3. **`indexing`** — indexer scans the clone with the standard pipeline + (tree-sitter chunking → embeddings → vector store + FTS5 mirror). +4. **`indexed`** — `last_indexed_at` updated, repo is searchable. +5. **`failed`** — clone or index errored out. `last_error` populated. + Common causes: invalid token, repo not found, branch doesn't exist, + embedder unavailable. + +Clone + index parallelism: `CIX_WORKER_CONCURRENCY` (default `2`). +Increase for fleet onboarding; lower if you saturate disk or GPU. + +### Reindexing a single repo + +```bash +curl -X POST http://localhost:21847/api/v1/workspaces//repos//reindex \ + -H "Authorization: Bearer $CIX_API_KEY" +``` + +Use this after a manual content update, after the embedding model +changes, or after the stale-FTS warning (see [Search algorithm](#search-algorithm)). + +### Removing a repo + +```bash +curl -X DELETE http://localhost:21847/api/v1/workspaces//repos/ \ + -H "Authorization: Bearer $CIX_API_KEY" +``` + +The clone is deleted from disk; the `projects` row is cleaned up if no +other workspace_repo references it; vectors are removed from chromem. + +--- + +## GitHub tokens + +### Why store tokens + +Three uses, all server-side: + +1. **Clone private repositories** during the add-repo flow. +2. **List your accessible orgs and repos** so the dashboard's + add-repo dialog can show a picker instead of asking for raw URLs. +3. **Auto-register push webhooks** on GitHub so the server can rebuild + the index when the upstream changes. + +### Storage and encryption + +- Tokens are stored in the `github_tokens` table. +- The plaintext is **AES-256-GCM-encrypted** before insert via + `internal/secrets/secrets.go`. +- Encryption key resolution order: + 1. `CIX_SECRET_KEY` (hex or base64 32 bytes) + 2. `CIX_SECRET_KEYFILE` (path to a 0o600+ file) + 3. Auto-generated at `/.secret_key` (mode 0600) on first + boot +- The server **refuses to read tokens at startup** if encrypted rows + exist but no key resolves. Losing the key means re-pasting every PAT. +- Plaintext is returned exactly once on `POST /github-tokens` and never + again. The dashboard caches it in memory just long enough to show + it to the user. + +### Token lifecycle + +| | | +|---|---| +| Create | `POST /api/v1/github-tokens` — validates scopes against GitHub `/user` endpoint, encrypts, stores. | +| List | `GET /api/v1/github-tokens` — metadata only (id, name, scopes, timestamps). | +| List accounts | `GET /api/v1/github-tokens/{id}/accounts` — PAT owner + visible orgs. | +| List repos | `GET /api/v1/github-tokens/{id}/repos?account=...` — up to 500. | +| Delete | `DELETE /api/v1/github-tokens/{id}` — revokes the metadata row. Does **not** revoke the PAT on GitHub itself. | + +`last_used_at` updates on every successful decrypt (clone job, webhook +registration, repo listing). + +### Recommended scopes + +| Scope | Needed for | +|---|---| +| `repo` | Cloning private repos | +| `read:org` | Listing private org repos in the dashboard picker | +| `admin:repo_hook` | Auto-registering push webhooks (`webhook_mode=auto`) | + +Use a separate PAT per token entry if you want easy revocation paths; +the server doesn't multiplex one PAT across users. + +--- + +## Searching a workspace + +Three surfaces, same backend. + +### Dashboard + +`/dashboard/workspaces/` → **Search** button. Two-stage UI: + +1. Type a query, hit Enter. +2. See the **projects panel** (which repos look relevant, with + `project_score` + per-signal `bm25_score` and `dense_score`) plus + the **chunks panel** (round-robin interleaved snippets, file:line + + score). +3. Drill into a chunk to open the full file in the project's detail + page. + +### CLI + +```bash +# List workspaces +cix ws +cix ws list --json + +# Describe one workspace +cix ws platform +cix ws platform describe --json + +# List repos in a workspace +cix ws platform list +cix ws platform repos --verbose + +# Search +cix ws platform search "authentication middleware" +cix ws platform search "JWT validation" --top-projects 8 --top-chunks 30 +cix ws platform search "feature flag rollout" --json +``` + +CLI flags: + +- `--top-projects N` — surface up to N projects in the panel (1–50, + default 10). +- `--top-chunks K` — return up to K chunks total (1–200, default 20). + Round-robin interleaved across surviving projects. +- `--json` — raw response, machine-readable. +- `-v` / `--verbose` — extra columns in the human-readable output. + +### REST API + +```bash +curl -G -H "Authorization: Bearer $CIX_API_KEY" \ + --data-urlencode "q=cross-project query here" \ + --data-urlencode "top_projects=10" \ + --data-urlencode "top_chunks=20" \ + --data-urlencode "min_score=0.4" \ + http://localhost:21847/api/v1/workspaces//search +``` + +Response shape (abbreviated): + +```jsonc +{ + "status": "ok", // "ok" | "empty" | "partial_failure" + "projects": [ + { + "project_path": "github.com/acme/api-server@main", + "label": "api-server@main", + "project_score": 0.87, // blended candidacy [0,1] + "bm25_score": 6.42, // per-signal aggregate (raw) + "dense_score": 0.54, + "num_hits": 5 + }, + ... + ], + "chunks": [ + { + "project_path": "github.com/acme/api-server@main", + "file_path": "internal/auth/middleware.go", + "start_line": 42, + "end_line": 58, + "symbol_name": "RequireAuth", + "language": "go", + "score": 0.61, // cosine; 0 for BM25-only matches + "content": "..." + }, + ... + ], + "pending_repos": [...], // repos still cloning / indexing + "failed_repos": [...], // repos that errored out + "stale_fts_repos": [...] // pre-FTS-mirror repos — reindex +} +``` + +--- + +## Search algorithm + +Workspace search is **hybrid BM25 + dense**, fanned out per-project and +fused across projects. The full implementation lives in +[`server/internal/httpapi/workspacesearch.go`](server/internal/httpapi/workspacesearch.go). + +### Pipeline + +``` +query + │ + ├──► EmbedQuery (llama.cpp sidecar) + │ + ▼ +For every indexed workspace repo, in parallel: + │ + ├── dense path: chromem.Search(query_embedding) → top-50 by cosine + ├── sparse path: SQLite FTS5 BM25 over chunks_fts → top-50 by BM25 + │ + ▼ + Per-project RRF fusion (k=60) → single ranked chunk list per project + Per-project signal aggregates: mean of top-5 dense, mean of top-5 BM25 + │ + ▼ +Across-project candidacy: + - Per-signal min-max normalize over all projects + - candidacy = α · bm25_norm + (1 − α) · dense_norm (α = 0.5) + - Drop projects below `0.4 × best_candidacy` (project gate) + │ + ▼ +Build projects panel (top N by candidacy) +Round-robin interleave chunks across surviving projects (per-project cap = 5) +Return projects[] + chunks[] +``` + +### Tunable parameters + +All defaults are calibrated on a 113-query eval (33 identifier + 33 +conceptual + 22 mixed + 25 cross-project). Source: +`workspacesearch.go:43-52`. + +| Constant | Value | Meaning | +|---|---|---| +| `workspaceSearchPerProjectLimit` | 50 | Per-side retrieval depth per project | +| `workspaceSearchBM25Limit` | 50 | Same for BM25 side | +| `workspaceSearchTopNPerProject` | 5 | Top-N hits feeding per-signal aggregate | +| `workspaceSearchTopProjects` | 10 | Default panel size (1–50 via param) | +| `workspaceSearchPerProjChunkCap` | 5 | Max chunks from one project in chunks[] | +| `workspaceSearchAlpha` | 0.5 | BM25 weight in candidacy blend | +| `workspaceSearchProjThreshold` | 0.4 | Relative gate: candidacy ≥ best × 0.4 | +| `rrfK` | 60 | Reciprocal Rank Fusion constant (Cormack 2009) | + +Request-time: + +| Param | Default | Range | +|---|---|---| +| `top_projects` | 10 | 1–50 | +| `top_chunks` | 20 | 1–200 | +| `min_score` | **0.4** | 0–1 | + +### `min_score` semantics + +- **Default `0.4`** — matches per-project search default. Filters + weak-cosine chunks before they enter the per-signal aggregate. + Calibrated on the eval: 91–99% of false positives are eliminated by + this floor. +- **Pass `0` explicitly** for intentional cross-workspace sweeps where + long-tail recall matters more than precision (broad queries like + "authentication and authorization" that legitimately span many repos). +- **Higher values (0.5+)** when you want laser-focused results and can + tolerate occasional recall misses. + +### Why hybrid + +The pre-hybrid algorithm (pure dense fan-out) had a known failure mode: +chromem always returns the nearest K vectors regardless of how far +"nearest" actually is. A workspace with repos that share zero +vocabulary with the query still surfaced 50 chunks per repo at +noise-level cosine. BM25 fixes this: a repo that scores 0 on the +literal token side gets caught by the relative project gate even if +dense is mildly positive. + +The asymmetric blend (α = 0.5) was tuned on the eval to balance two +opposite failure modes: pure BM25 over-favors literal-token matches +(misses semantic similarity); pure dense over-favors common-domain +vocabulary (false friends across unrelated repos). + +### Pre-FTS repos + +If a workspace repo was indexed before the chunks_fts mirror existed, +BM25 will be permanently 0 for it. The response surfaces this via +`stale_fts_repos: [{project_path: "..."}]`. Run a reindex on each: + +```bash +curl -X POST http://localhost:21847/api/v1/workspaces//repos//reindex \ + -H "Authorization: Bearer $CIX_API_KEY" +``` + +--- + +## Webhooks (auto-reindex on push) + +Each workspace repo has a `webhook_mode`: + +- **`disabled`** — server never reindexes automatically. Triggered + reindex via the API only. +- **`manual`** — server generates a `webhook_secret` and exposes a + delivery URL. Configure the webhook on GitHub yourself. Pushes to + the tracked branch trigger reindex. +- **`auto`** — server calls the GitHub API to **create the webhook** + on the repo using your stored PAT. Requires `CIX_PUBLIC_URL` set + and a token with `admin:repo_hook` scope. Best-effort: failure to + register doesn't block the add-repo flow but sets a warning. + +### Delivery endpoint + +``` +POST /api/v1/webhooks/github/{repo_id} +``` + +GitHub's payload is HMAC-SHA256-signed with `webhook_secret`; the +server verifies via the `X-Hub-Signature-256` header. + +Event handling: + +| Event | Action | +|---|---| +| `push` (ref matches tracked branch) | Enqueue clone+index (dedupes burst pushes) | +| `push` (other ref) | Ignored | +| `ping` | 200 with no-op | +| Anything else | Ignored | + +### Manual webhook setup + +When `webhook_mode=manual`, the add-repo response includes a +`webhook_secret` (returned once) and a `webhook_url` (always +returnable). Configure on GitHub: + +- **Payload URL:** `/api/v1/webhooks/github/` +- **Content type:** `application/json` +- **Secret:** the returned `webhook_secret` +- **Events:** Just `push` (the server ignores everything else) + +--- + +## Strengths and weaknesses + +Honest assessment from the calibration eval and 5 real engineering tasks. + +### Strengths + +- **Hybrid signal is robust.** Identifier-style queries hit ~91% top-1 + precision; conceptual queries ~70%; mixed (identifier + concept) + ~96%. BM25 catches what dense misses and vice versa. +- **Project gate works.** The relative `0.4 × best` threshold filtered + zero true positives across 88 single-target queries — projects with + no shared vocabulary AND marginal semantic similarity fall out + cleanly. +- **Round-robin + chunk cap prevents domination.** No single repo + monopolizes the chunks panel even when its scores dwarf the rest. +- **Per-project drill-down is fast.** Once you know which repo to dig + into, switching to a per-project search returns deeper, file-grouped + results without the cross-project interleave overhead. +- **Index is incremental.** Webhook-driven re-indexes only re-embed + changed files (SHA-256 file hashes). + +### Weaknesses + +- **Top-1 is ~70% on real tasks, not 91%.** The synthetic eval framed + each query as single-target by construction. Real tasks often have a + "right repo for the words" that's different from the "right repo for + the change". When the task is action-oriented (modify / configure / + deploy), the manifests / config / contracts repo is often the right + target even when a code repo ranks higher. +- **Token-frequency bias.** A repo that mentions your query terms + dozens of times (in comments, tests, fixtures) outranks a repo + where the canonical implementation lives but uses different + vocabulary. The path-aware preamble in indexing helps, but doesn't + fully neutralize this. +- **`chunk.score=0` is misleading.** Chunks matched via BM25 only + (no dense overlap) report `score=0` in the response. Agents and + UIs that read the score field as "confidence" can wrongly discard + valid literal-match hits. Inspect `projects[].bm25_score` to know + whether the project survived on literal-token strength. +- **Disambiguator backfire.** Adding a third query word to discriminate + between overloaded terms can rotate the ranking away from the right + answer if the added word belongs to the wrong stack (e.g. naming a + protocol your target repo doesn't use). Prefer meta-tokens + (`endpoint`, `route`, `handler`, `manifest`) over guesses at the + underlying technology. +- **Index gaps.** The indexer skips files outside its language + allow-list and files larger than `CIX_MAX_FILE_SIZE`. If your + workspace contains a language the indexer doesn't recognize, those + files contribute zero to either BM25 or dense — search will look + past that repo entirely. Check `cix summary` per repo if results + look thin. +- **No multi-tenancy.** Anyone authenticated can read every workspace + in the deployment. Don't share a single cix-server across teams who + shouldn't see each other's code. +- **Stale-FTS repos.** Workspaces created before the FTS5 mirror + existed don't have BM25 data. The response flags them in + `stale_fts_repos`; you must reindex to repair. +- **Clone storage grows.** Each workspace repo is a full `git clone`. + Plan for several hundred MB to several GB per repo depending on + history size. `git gc` is not automated; the cleanest reset is + remove + re-add. + +### When to use vs when not to + +Use a workspace when: + +- The task plausibly spans 2+ repos and you need to know which. +- You want an agent to find cross-cutting code (e.g. event flows, + API contracts mirrored across services) without N separate searches. +- You're onboarding to an unfamiliar codebase and need to see what's + where before diving in. + +Don't use a workspace when: + +- The task is fully contained in one repo. `cix search` from inside + that repo is faster and more precise. +- You're looking for an exact symbol or file path. `cix definitions + ` or `cix files ` against the project directly skips + the cross-project interleave. +- The repos truly share no vocabulary. The project gate will collapse + the response to one repo anyway — search there directly. + +--- + +## Configuration reference + +### Server environment variables + +| Variable | Default | Description | +|---|---|---| +| `CIX_WORKSPACES_ENABLED` | `false` | **Required** to enable the feature. Restart after change. | +| `CIX_WORKSPACES_DATA_DIR` | `/repos` | Where workspace repo clones live on disk. | +| `CIX_PUBLIC_URL` | — | Public origin used to build webhook delivery URLs. Required for `webhook_mode=auto`. | +| `CIX_SECRET_KEY` | — | 32-byte hex/base64 key for at-rest encryption of GitHub tokens. Falls back to a keyfile or auto-generated key. | +| `CIX_SECRET_KEYFILE` | — | Path to an alternative key source (file with mode ≤ 0600). | +| `CIX_WORKER_CONCURRENCY` | `2` | Parallelism for clone + index workers. | + +### CLI configuration + +`cix ws` reuses the standard `~/.cix/config.yaml` — no extra setup +needed beyond `api.url` and `api.key`. + +--- + +## REST API reference + +All endpoints require `Authorization: Bearer ` or a valid cookie +session. All return `503` if `CIX_WORKSPACES_ENABLED=false`. + +### Workspaces + +``` +GET /api/v1/workspaces list +POST /api/v1/workspaces create (body: {name, description}) +GET /api/v1/workspaces/{id} detail +PATCH /api/v1/workspaces/{id} rename / update description +DELETE /api/v1/workspaces/{id} remove (cascades to repos + clones) +``` + +### Workspace repos + +``` +GET /api/v1/workspaces/{id}/repos list +POST /api/v1/workspaces/{id}/repos add (clones + indexes) +POST /api/v1/workspaces/{id}/repos/link link existing local project +DELETE /api/v1/workspaces/{id}/repos/{repo_id} remove +POST /api/v1/workspaces/{id}/repos/{repo_id}/reindex trigger fresh index +GET /api/v1/workspaces/{id}/repos/{repo_id}/webhook-info dashboard helper +``` + +### Workspace search + +``` +GET /api/v1/workspaces/{id}/search?q=...&top_projects=10&top_chunks=20&min_score=0.4 +``` + +See [§ Searching a workspace](#searching-a-workspace) for response shape. + +### GitHub tokens + +``` +GET /api/v1/github-tokens list (metadata only) +POST /api/v1/github-tokens create (returns plaintext once) +GET /api/v1/github-tokens/{id}/accounts PAT owner + orgs +GET /api/v1/github-tokens/{id}/repos?account=... repos visible to PAT +DELETE /api/v1/github-tokens/{id} revoke (server-side only) +``` + +### Webhooks + +``` +POST /api/v1/webhooks/github/{repo_id} GitHub delivery endpoint +``` + +Full OpenAPI: `doc/openapi.yaml` and `http://:21847/docs`. + +--- + +## Troubleshooting + +**`503 workspaces feature is disabled`** +→ `CIX_WORKSPACES_ENABLED=true` is missing or the server hasn't been +restarted. + +**`status: "failed"` on a repo, `last_error: "authentication required"`** +→ Private repo with no token, or token's scopes are insufficient. +Re-create the token with `repo` scope (and `admin:repo_hook` if you +want auto webhooks), then retry by deleting and re-adding the repo. + +**`status: "failed"`, `last_error: "branch not found"`** +→ Typo or the branch was deleted upstream. Delete the repo entry and +re-add with the correct branch. + +**Search returns `empty` for a query that should match** +→ Three likely causes: +1. Default `min_score=0.4` filtered everything. Retry with `min_score=0`. +2. Repo is still indexing (`status: pending|cloning|indexing`). Check + `GET /workspaces/{id}/repos`. +3. The literal terms genuinely don't appear in any repo AND dense + similarity is below threshold. Re-phrase with the term the code + actually uses. + +**`stale_fts_repos` populated on every search** +→ These repos were indexed pre-FTS5 mirror. Run +`POST /workspaces/{id}/repos/{repo_id}/reindex` on each. + +**`status: "partial_failure"`** +→ At least one repo's dense search errored (corrupt chromem collection, +disk pressure). Other repos still returned. Check server logs; the +fastest fix is usually a reindex of the failed repo. + +**Webhook isn't triggering reindex** +→ Verify: +1. GitHub's webhook deliveries page shows 200 OK. +2. Push was to the *tracked* branch (the one in `workspace_repos.branch`). +3. Server logs show signature verification succeeding. +4. `CIX_PUBLIC_URL` is set and reachable from GitHub (for `auto` mode). + +**Token gone after a restart, all repos failing** +→ The encryption key resolved differently. Common cause: switched from +auto-generated `/.secret_key` to `CIX_SECRET_KEY` env var +without copying the original key value. Either restore the key or +re-create every token entry. + +--- + +## Agent integration + +The `cix-workspace` skill teaches AI agents how to use workspace search +responsibly — when to fan out, how to read the `projects[]` panel, how +to interpret `score=0` hits, how to spawn parallel per-repo investigators. +See [`skills/cix-workspace/SKILL.md`](skills/cix-workspace/SKILL.md). + +Install for Claude Code: + +```bash +cp -r skills/cix-workspace ~/.claude/skills/cix-workspace +mkdir -p ~/.claude/agents +cp skills/cix-workspace/agents/cix-workspace-investigator.md ~/.claude/agents/ +``` + +Then in a Claude Code session: + +``` +/cix-workspace add a new rate-limit middleware and wire it through +the gateway, the backend, and the deployment manifests +``` + +The skill loads the cross-project workflow, the agent runs workspace +search, identifies the relevant repos, and spawns +`cix-workspace-investigator` sub-agents in parallel — one per repo — +to do the deep dive without bloating the main session's context. + +--- + +## Roadmap + +This feature is experimental. Known direction: + +- **Multi-tenancy / workspace ACLs.** Today any authenticated user sees + every workspace. Per-workspace owner + reader roles are planned. +- **`project_kind` enum in `projects[]`.** Surface whether each project + is `code` / `manifests` / `contracts` / `docs` so agents can reason + about the "words vs change location" mismatch noted above. +- **Auto-detect stale indexes.** Today reindex is manual; the server + should detect when a repo's vectors are incompatible with the current + embedding model and prompt automatically. +- **Broader language coverage in the indexer.** Expand the + `CIX_LANGUAGES` allow-list to cover more domain-specific file types, + and raise the file-size cap for prose-heavy docs. + +Track open issues at [github.com/dvcdsys/code-index/issues](https://github.com/dvcdsys/code-index/issues).