feat(mcp): first-class metric source support#2437
Conversation
🦋 Changeset detectedLatest commit: 30812d1 The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR promotes metric sources to first-class status in the MCP layer: two new tools (
Confidence Score: 4/5Safe to merge for all query and describe paths; the list_metrics tool can hang indefinitely on slow ClickHouse tables due to a missing timeout wrapper. The new packages/api/src/mcp/tools/sources/listMetrics.ts — the tool handler and fetchMetricsForKind both need timeout and abort-signal handling to match the pattern established by describeMetric.ts and describeSource.ts. Important Files Changed
Sequence Diagram%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant Agent
participant MCP
participant CH as ClickHouse
Agent->>MCP: clickstack_list_sources
MCP-->>Agent: sourceId + kind:"metric"
Agent->>MCP: clickstack_describe_source(sourceId)
MCP->>CH: getColumns(representativeMetricTable)
MCP->>CH: getMapKeys / getAllKeyValues / sampleMetricNames (per kind, parallel)
CH-->>MCP: columns, map keys, metric name samples
MCP-->>Agent: "schema + metricNames{gauge,sum,histogram}"
opt Need more metrics
Agent->>MCP: clickstack_list_metrics(sourceId, kind?, namePattern?, cursor?)
MCP->>CH: SELECT MetricName GROUP BY ORDER BY LIMIT (per kind)
CH-->>MCP: rows
MCP-->>Agent: metrics[] + optional nextCursor
end
Agent->>MCP: clickstack_describe_metric(sourceId, metricName, kind)
MCP->>CH: fetchAttributeKeys (CTE LIMIT 100k + max_execution_time:8)
MCP->>CH: sampleAttributeValues (CTE LIMIT 100k + max_execution_time:8)
CH-->>MCP: attribute keys + sampled values
MCP-->>Agent: "kindDetail{attributeKeys, attributeValues, usage}"
Agent->>MCP: "clickstack_timeseries(sourceId, select:[{metricType,metricName,aggFn}])"
MCP->>MCP: validateMetricSelectItems + applyMetricSelectDefaults
MCP->>MCP: assertSourceKindMatchesSelect
MCP->>CH: renderChartConfig → SQL with metricTables
CH-->>MCP: timeseries rows
MCP-->>Agent: result + optional increase top-N hint
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant Agent
participant MCP
participant CH as ClickHouse
Agent->>MCP: clickstack_list_sources
MCP-->>Agent: sourceId + kind:"metric"
Agent->>MCP: clickstack_describe_source(sourceId)
MCP->>CH: getColumns(representativeMetricTable)
MCP->>CH: getMapKeys / getAllKeyValues / sampleMetricNames (per kind, parallel)
CH-->>MCP: columns, map keys, metric name samples
MCP-->>Agent: "schema + metricNames{gauge,sum,histogram}"
opt Need more metrics
Agent->>MCP: clickstack_list_metrics(sourceId, kind?, namePattern?, cursor?)
MCP->>CH: SELECT MetricName GROUP BY ORDER BY LIMIT (per kind)
CH-->>MCP: rows
MCP-->>Agent: metrics[] + optional nextCursor
end
Agent->>MCP: clickstack_describe_metric(sourceId, metricName, kind)
MCP->>CH: fetchAttributeKeys (CTE LIMIT 100k + max_execution_time:8)
MCP->>CH: sampleAttributeValues (CTE LIMIT 100k + max_execution_time:8)
CH-->>MCP: attribute keys + sampled values
MCP-->>Agent: "kindDetail{attributeKeys, attributeValues, usage}"
Agent->>MCP: "clickstack_timeseries(sourceId, select:[{metricType,metricName,aggFn}])"
MCP->>MCP: validateMetricSelectItems + applyMetricSelectDefaults
MCP->>MCP: assertSourceKindMatchesSelect
MCP->>CH: renderChartConfig → SQL with metricTables
CH-->>MCP: timeseries rows
MCP-->>Agent: result + optional increase top-N hint
Reviews (11): Last reviewed commit: "[HDX-4347] fix(mcp): base timeseries hin..." | Re-trigger Greptile |
91f17a6 to
6f63d7f
Compare
E2E Test Results✅ All tests passed • 200 passed • 3 skipped • 1358s
Tests ran across 4 shards in parallel. |
| ); | ||
| throw e; | ||
| } finally { | ||
| if (timeoutId !== undefined) clearTimeout(timeoutId); |
There was a problem hiding this comment.
Drive-by fix: this setTimeout-never-cleared pattern is pre-existing and was not flagged by greptile (the surrounding code is untouched by HDX-4347). Fixing here for consistency with the same fix in describeMetric.ts two files over — both files now clearTimeout in finally so a stale controller.abort() will no longer fire DESCRIBE_TIMEOUT_MS after every successful describe call, and the setTimeout closure is released immediately on resolution.
31bc21b to
e27d683
Compare
…AttributeKeys
Greptile pointed out that sampleAttributeValues hit the same
(MetricName, time range) filter on the same metric table as
fetchAttributeKeys but lacked any of the scan bounds the earlier
commit added there:
- no inner LIMIT subquery
- no max_execution_time / timeout_overflow_mode
On a hot metric with millions of matching rows in the default 24-hour
window the query could run until the client-side AbortController
fires the wall-clock timeout, after which the ClickHouse server may
still be processing the work.
Apply the same pattern used in fetchAttributeKeys:
- Wrap the FROM/WHERE clause in a subquery that LIMITs at
METRIC_ATTR_KEYS_SAMPLE_SIZE matching rows. The inner SELECT
projects only the distinct set of map columns referenced by the
sampled keys.
- Add clickhouse_settings: { max_execution_time:
METRIC_ATTR_KEYS_MAX_EXEC_SECONDS, timeout_overflow_mode: 'break' }
as a server-side ceiling.
Live retest against the metric source on staging confirms the
attributeValues payload still comes back rich (same k8s.* diversity
on container.cpu.usage) — the 100k-row sample is plenty for value
discovery.
Refs: #2437 (comment)
🔴 Tier 4 — CriticalTouches auth, data models, config, tasks, OTel pipeline, ClickHouse, or CI/CD. Why this tier:
Review process: Deep review from a domain expert. Synchronous walkthrough may be required. Stats
|
| return await Source.findOne({ _id: sourceId, team }); | ||
| } catch { | ||
| // Defense-in-depth: if Mongoose still throws (e.g. a future cast | ||
| // path), treat it as "not found" so the MCP tools surface a clean |
There was a problem hiding this comment.
docs(non-blocking): we should make this comment more generic since this controller is used across the app (eg so the caller can surface a clean error)
There was a problem hiding this comment.
Done in 0c2e0d9 — reworded both inline comments in getSource to caller-neutral language ("the caller can surface a clean error") since the controller is shared across the REST routers, external-api v2, and MCP tools.
| @@ -92,14 +109,28 @@ | |||
| } | |||
| const { startDate, endDate } = timeRange; | |||
|
|
|||
There was a problem hiding this comment.
issue(existing, non-blocking): Not related to this PR, but during testing this the top reason that LLM went from using timeseries to raw sql was the following:
timeseries.ts threads granularity into the tile config (lines 135–142), but buildTile in
helpers.ts (lines 187–202) parses the input through externalDashboardTileSchemaWithId. The
line/stacked_bar sub-schema in zod.ts (lines 246–261) has no granularity field, so Zod
silently strips it.
// packages/api/src/utils/zod.ts:246-261
// ❌ granularity is not declared — Zod strips any extra fields
const lineTileSchema = z.object({
type: z.literal('line'),
// ... other fields, but no granularity
});Downstream, convertToInternalTileConfig never re-injects it — contrast with the REST charts path,
which explicitly defaults it:
// packages/api/src/routes/charts.ts:289
granularity: translatedGranularity ?? 'auto',Impact
The renderer only includes the time bucket in SELECT/GROUP BY/ORDER BY when
chartConfig.granularity != null:
// packages/api/src/renderChartConfig.ts:109-128
if (chartConfig.granularity != null) {
// adds __hdx_time_bucket to SELECT, GROUP BY, ORDER BY
}Because granularity was stripped, every MCP timeseries call for gauge/sum metrics (and
log/trace sources) returns a single collapsed row per group — no timestamp column at all —
regardless of what the agent passed.
The gauge/sum translation does bucket the inner CTE with granularity || 'auto', but then
spreads ...restChartConfig for the outer query, so the outer aggregation collapses. Histogram
only escapes this by hard-coding __hdx_time_bucket directly (line 2106).
There was a problem hiding this comment.
Fixed in ebc8b48. Great catch — this was the root cause.
granularity was indeed stripped: buildTile parses through externalDashboardTileSchemaWithId, whose line/stacked_bar schemas declare no granularity field, and convertToInternalTileConfig never re-injected it. So the renderer's isUsingGranularity guard failed and __hdx_time_bucket was omitted — every MCP timeseries collapsed to one row per group.
Rather than widen the shared external-dashboard schema, I re-inject granularity in runConfigTile for line/stacked_bar tiles, defaulting to "auto" to mirror the REST charts path (charts.ts:289). The tool input value is now threaded through. The gauge timeseries test asserts __hdx_time_bucket is present in result.meta as a regression guard.
| // 1 row, the data likely collapsed into a single time bucket. Add a | ||
| // hint so the agent knows to adjust the time range. | ||
| if ( | ||
| result.content?.[0]?.type === 'text' && |
There was a problem hiding this comment.
issue(existing, non-blocking): Another existing but more clear one now with metrics:
Bug: Hint Heuristics Fire on Wrong Signal, Recommend Wrong Fix, and Clobber Each Other
1. Single-Bucket Hint Triggers on the Wrong Signal
The "only 1 time bucket" hint in timeseries.ts:170-189 triggers on data.length === 1, which conflates three distinct cases:
- Genuine single-bucket collapse (granularity too coarse)
- A single group when
groupByis set (working as intended) - The no-time-column collapse from previous comment — where the advice to widen range or use coarser granularity can't help and "coarser" is actually backwards
A more robust signal would be to inspect result.meta for the presence of the __hdx_time_bucket column rather than counting rows. This distinguishes the no-time-column case (issue #1) from a genuine single-bucket result, and avoids misfiring when groupBy produces exactly one group.
There was a problem hiding this comment.
Fixed in 5f28571, taking your suggestion to key on result.meta instead of the row count.
The hint now inspects whether the __hdx_time_bucket column (FIXED_TIME_BUCKET_EXPR_ALIAS) is present:
- absent -> "result is not bucketed over time; pass a granularity" (drops the backwards "coarser granularity" advice)
- present + exactly 1 row -> genuine single-bucket collapse hint
- present + many rows -> no hint
Combined with the granularity fix in ebc8b48 (granularity now defaults to auto), the no-bucket case should essentially not happen for timeseries anymore, but the meta-based signal is the robust guard and no longer misfires on a legitimate single groupBy group.
| `aggFn:"increase" combined with groupBy is capped at the top ${INCREASE_TOP_N_CAP} ` + | ||
| `groups by max bucket sum at the renderer layer. ` + | ||
| `Results may not include every group present in the underlying data.`; | ||
| first.text = JSON.stringify(parsed, null, 2); |
There was a problem hiding this comment.
suggestion: annotateIncreaseTopNHint in helpers.ts:99-121 emits a "may be truncated" disclaimer whenever increase + groupBy returns any data — even with fewer than 20 groups — making it a false alarm rather than an actual detection.
Additionally, both this function and the single-bucket hint in timeseries.ts assign to parsed.hint directly. Since the single-bucket hint runs first (timeseries.ts:165) and annotateIncreaseTopNHint runs after (timeseries.ts:178), one silently overwrites the other with no indication either existed.
Suggested fix: change hint to hints: string[] and append at each site. Gate the truncation hint on an actual group count threshold (e.g. >= 20) rather than emitting unconditionally.
There was a problem hiding this comment.
Fixed in 4b4b40c.
- False alarm:
annotateIncreaseTopNHintnow counts the distinct groupBy combinations actually present in the result (countDistinctGroups, bracket-aware split so multi-column groupBy works) and only emits the truncation hint when that count reachesINCREASE_TOP_N_CAP. When the groupBy column cannot be resolved from the row keys (e.g. bracket-syntax map attrs that render asarrayElement(...)aliases) it skips the hint rather than crying wolf. - Clobbering: migrated
hint: string→hints: string[]across all three writers (this function, the single-bucket hint intimeseries.ts, and the empty-result hint informatQueryResult) via a sharedappendHint()helper, so they accumulate instead of overwriting.
Added unit tests for: at/below cap, distinct-groups-vs-raw-rows, multi-column groupBy, unresolvable groupBy, and append-not-overwrite coexistence.
| { tableName, error: e instanceof Error ? e.message : String(e) }, | ||
| 'sampleAttributeValues failed', | ||
| ); | ||
| return {}; |
There was a problem hiding this comment.
suggestion: Failures are silently discarded in several places — describeMetric.ts:301-307, describeMetric.ts:412-418, and per-kind skips in listMetrics.ts:385-395 — causing transient ClickHouse errors to surface as the noSignal hint (describeMetric.ts:584-604):
"No data found — try widening startTime/endTime"
Agents cannot distinguish "genuinely empty" from "fetch failed" and will follow incorrect recovery steps as a result.
Suggested fix: instead of swallowing errors into empty returns, surface a partialFailure indicator in the response shape — recording which kinds failed and why. This gives agents a recoverable, accurate signal ("some metric kinds failed") rather than a misleading empty-data path.
There was a problem hiding this comment.
Fixed in b41d623.
fetchAttributeKeys / sampleAttributeValues now return a typed { ok: true, data } | { ok: false, error } instead of swallowing errors into {}. The errors are sanitized (single line, capped at 200 chars).
describe_metriccollects failures intopartialFailure: [{ stage, error }]on the response and suppresses thenoSignal"No data found — widen startTime/endTime" hint when any stage failed (an emptyattributeKeyscan no longer be trusted as "genuinely empty"). A retry-oriented hint is emitted instead.list_metricscollects per-kind failures intopartialFailure: [{ kind, error }]and likewise suppresses the "No metrics matched" hint when failures explain the empty result.
Tests point a kind's metricTables entry at the logs table (exists, but no MetricName/TimeUnix columns) to force the discovery query to throw, then assert partialFailure is present and the misleading widen-the-window hint is absent. A happy-path test asserts partialFailure stays undefined.
| connectionId, | ||
| clickhouse_settings: { | ||
| max_execution_time: METRIC_ATTR_KEYS_MAX_EXEC_SECONDS, | ||
| timeout_overflow_mode: 'break', |
There was a problem hiding this comment.
suggestion: Callers can't distinguish "not sampled" from "no values". A per-key sampled: boolean/truncated: boolean would allow agents to know to adjust their query
There was a problem hiding this comment.
Fixed in 3b0b570.
Each kindDetail now carries attributeValuesMeta: { sampledKeys, truncatedKeys }:
sampledKeys— display names actually queried for valuestruncatedKeys— display names skipped because theMAX_ATTR_KEYS_TO_SAMPLE(12) cap fired
So a key in truncatedKeys was never queried (query it directly if needed), whereas a key in sampledKeys but missing from attributeValues is genuinely empty in the window. The tool description documents the distinction.
Tests: a 15-key metric → 12 sampled + 3 truncated (disjoint sets covering all keys); a small metric → truncatedKeys empty.
| @@ -92,14 +109,28 @@ | |||
| } | |||
| const { startDate, endDate } = timeRange; | |||
There was a problem hiding this comment.
idea(non-blocking): probably OOS but I waned a better historic graph
It would be helpful if the timeseries tool returned per-bucket rows with a timestamp column for metric sources using increase + groupBy, rather than collapsing to per-group aggregates with no ts column — that way the bucket breakdown is actually visible in the output.
There was a problem hiding this comment.
Deferring this as a follow-up since it's flagged OOS and is an output-shape enhancement rather than a correctness fix.
That said, the granularity fix in ebc8b48 already improves it: now that granularity is threaded through, clickstack_timeseries for increase + groupBy emits the __hdx_time_bucket column and returns per-bucket rows over time (previously it collapsed to one row per group). So the "better historic graph" is largely delivered via the timeseries tool for metric sources. I'll capture any remaining per-bucket output-shape polish in a follow-up.
|
Overall - metrics support works very good! It fell back to sql a few times, everytime it did I asked my agent to explain why (findings above). I tested it on my local instance and a larger internal instance with tons of metrics, it worked 🥳 |
getSource is called from REST routers, external-api v2, and MCP tools alike — the inline comments referenced only the MCP surface. Reword to caller-neutral language per review feedback. Refs: #2437 (comment)
…not row count The single-bucket hint keyed on data.length === 1, conflating three cases: genuine coarse-granularity collapse, a legitimate single group under groupBy, and the no-time-bucket case — where 'try a coarser granularity' is actively backwards advice. Key the hint on the renderer's FIXED_TIME_BUCKET_EXPR_ALIAS (__hdx_time_bucket) column in result.meta instead: - bucket column absent -> 'result is not bucketed over time; pass a granularity' (no coarser-granularity advice). - bucket column present + exactly 1 row -> genuine single-bucket collapse hint (widen range / finer granularity). - bucket column present + many rows -> no hint. Emitted via appendHint so it coexists with the increase top-N hint. After the granularity-threading fix this rarely fires for timeseries (granularity now defaults to auto), but the meta-based signal is the robust guard. Test: gauge timeseries asserts the 'not bucketed over time' hint never fires when the bucket column is present. Refs: #2437 (comment)
…emas Extend the MCP select-item schemas with first-class metric fields so the clickstack_timeseries, clickstack_table, and dashboard builder tile tools can accept metric-source queries: - Add 'increase' to MCP_AGG_FN_OPTIONS (sum-only counter-increase aggFn) - Add optional metricType (gauge/sum/histogram), metricName, and isDelta fields to mcpSelectItemSchema and mcpTileSelectItemSchema - Restrict metricType to the three kinds the renderer supports today; summary and exponential histogram are excluded - Extract cross-field validation into getMetricSelectIssues so both schemas share the same metric rules: metricType<->metricName paired, 'increase' is sum-only, histogram only allows quantile (with level) or count, isDelta is gauge-only, valueExpression is required for non-count aggFns unless metricType is set (defaults to 'Value' for metric sources) - Apply validation imperatively in the timeseries/table handlers via validateMetricSelectItems (works around a Zod 3.x quirk where .superRefine widens optional-field inference and breaks the strict consumer types of mergeWhereIntoSelectItems and resolveOrderBy) - Update timeseries/table tool descriptions with a METRIC SOURCES subsection covering the discovery workflow chain and per-kind aggFn guidance, plus the 20-group top-N cap on sum+increase+groupBy - Exclude 'increase' from AGG_FN_NAMES so resolveOrderBy doesn't try to synthesize a non-existent SQL increase() function Adds 20 unit tests covering the validator branches and the resolveOrderBy handling of 'increase'. No new TypeScript errors above the pre-existing baseline on main.
…er path Make clickstack_timeseries and clickstack_table actually work against metric sources. Before this change the builder branch of runConfigTile fell back to the source's empty tableName and never threaded source.metricTables onto the chart config, so any metric query failed with 'Both table name and UUID are empty' at render time. For metric sources, the builder branch now: - Sets from.tableName = '' (renderer picks the per-kind ClickHouse table from metricTables at render time) - Adds metricTables: source.metricTables to the chart config (mirrors packages/api/src/routers/external-api/v2/charts.ts:261-267) - Defaults each select item's valueExpression to 'Value' when missing, matching the agent-friendly external-API behavior so metric queries can omit valueExpression and have it resolve correctly Also surfaces the renderer's increase+groupBy top-N cap (20 groups) to agents: - Adds annotateIncreaseTopNHint helper that attaches a neutral hint to successful, non-empty tool results when any select item uses aggFn:'increase' alongside a non-empty groupBy - Exposes INCREASE_TOP_N_CAP=20 as a named constant referencing the renderer's INCREASE_MAX_NUM_GROUPS in common-utils - Wires the hint into both clickstack_timeseries and clickstack_table Adds 9 unit tests covering the hint helper across the increase+groupBy condition, empty results, error results, missing groupBy, missing increase aggFn, and malformed/missing content envelopes. No new TypeScript errors above the pre-existing baseline on main.
Replace the early-return for metric sources at describeSource.ts:97-115 with a metric-aware discovery flow so agents can use the existing describe-then-query workflow against metric sources without falling back to clickstack_sql. For metric sources, describe_source now: - Picks a representative metric table (gauge -> sum -> histogram) via pickRepresentativeMetricTable, exposed in meta.discoveryMetricKind so agents know which kind's schema they're seeing - Reuses the existing stages 1-4 (columns, map keys, low-cardinality values, map-attribute values) against the representative table — the OTel Collector metric schemas share Attributes/ResourceAttributes/ ScopeAttributes shape, so one sample is representative enough for the starter view - Samples up to 20 distinct MetricName values per queryable kind in a new stage 5 (sampleMetricNamesForKind), reusing the discovery dateRange and respecting the existing wall-clock timeout / partial-result flag - Defensively enriches each sampled metric with MetricUnit / MetricDescription when those columns exist on the kind's table (getColumns presence check); falls back gracefully on schemas that omit them - Surfaces the new clickstack_list_metrics / clickstack_describe_metric discovery tools via nextSteps.discovery so agents can paginate or drill into per-metric attribute keys when the sample isn't enough Updates the existing locking test in sources.test.ts that asserted columns/lowCardinalityValues/etc. were undefined for metric sources; the test now asserts the rich discovery payload (columns present, discoveryMetricKind set to 'gauge', nextSteps mention the new discovery tools). No new TypeScript errors above the pre-existing baseline on main.
…ribe_metric
Add two new metric-specific discovery tools so agents can find and
characterise metrics on a metric source without falling back to
clickstack_sql or guessing OTel semantic-convention names.
clickstack_list_metrics
- Paginated metric-name catalog scoped to a metric source
- Optional kind filter (gauge/sum/histogram); omit to scan every
populated kind in order
- Optional namePattern (ClickHouse ILIKE) and time window
- Defaults to last 24h to surface only metrics with recent data
- Opaque base64 cursor pagination over (kind, lastName) pairs so
multi-kind scans resume cleanly mid-result
- Default limit 50, max 500
- Defensive MetricUnit/MetricDescription enrichment using getColumns
presence checks (matches the OTel Collector default schema; falls
back gracefully on custom schemas)
- Empty-result hint guides the agent to widen the filter or drop kind
clickstack_describe_metric
- Per-metric drill-down: kind(s), unit, description, attributeKeys per
map column, sampled values (when sampleValues=true, default)
- kind auto-detection across all populated tables when omitted, via
parallel SELECT 1 probes per kind
- Reuses metadata.getAllFields with metricName filter for attribute-key
discovery scoped to the specific metric (existing helper at
packages/common-utils/src/core/metadata.ts:736-816)
- Per-kind usage guidance baked into the response (aggFn hints, level
requirement for histogram quantile, 20-group cap for sum + increase)
- Pre-built clickstack_timeseries call example in nextSteps.query with
the right aggFn / level for the detected kind
- 10s wall-clock timeout matches clickstack_describe_source
Both tools reject non-Metric sources with a friendly hint pointing at
clickstack_list_sources. Tool descriptions document the full discovery
workflow chain: list_sources -> describe_source -> list_metrics ->
describe_metric -> timeseries|table.
Notable implementation detail: the queryable-kind tuple is declared as
plain string literals ('gauge'|'sum'|'histogram') rather than
MetricsDataType enum members because Zod 3.x's z.enum(...) infers an
unknown-widened type when fed an enum-member tuple, which breaks the MCP
SDK callback inference downstream. A compile-time assertion keeps the
string literals in sync with the enum.
No new TypeScript errors above the pre-existing baseline on main.
…idance in prompt
The clickstack_save_dashboard prompt previously told the model to fall
back to raw SQL tiles for any metric source because the MCP select-item
shape did not carry metricType / metricName and builder tiles failed at
render time with 'Both table name and UUID are empty'. Stages 1-3b
fixed both gaps. Update the prompt and its regression tests to teach the
new discovery + builder workflow.
prompts/dashboards/content.ts
- Replace the 'NOTE: Authoring builder tiles on a metric source is not
reliable today...' paragraph with a dedicated '== METRIC SOURCES =='
section that documents:
* the required select-item fields (metricType, metricName) and the
valueExpression='Value' default
* per-kind aggregation guidance for gauge / sum / histogram including
the histogram quantile + level requirement and the 20-group cap on
sum + increase + groupBy
* the five-step discovery workflow (list_sources -> describe_source
-> list_metrics -> describe_metric -> timeseries|table)
* one worked example per supported kind using real OTel metric names
- Drop 'metric tiles with multiple metricTables' and 'builder tiles on
metric sources' from the validation-known-gaps paragraphs at lines
127 and 1247 since both gaps are closed.
__tests__/dashboards/prompts.test.ts
- Rewrite the two regression tests that asserted the old workaround
language. The new tests assert the positive guidance is present
(metricType / metricName / isDelta fields are named, per-kind aggFn
rules and 20-group cap are documented, the five-step workflow chain
is enumerated, one worked example per kind appears) AND assert the
old workaround language is gone (no 'Authoring builder tiles on a
metric source is not reliable', no 'Both table name and UUID are
empty').
…s, cursor Adds unit and integration coverage for the metric pieces shipped in Stages 1-3b. listMetricsCursor.test.ts (NEW, unit) - 12 tests for encodeCursor / decodeCursor round-trips and rejection cases. Exercises malformed base64, malformed JSON, missing fields, wrong types, non-queryable kinds (summary, exponential histogram), empty input, and unicode metric names. Exports cursor helpers from listMetrics.ts as @internal so they can be tested without spinning up ClickHouse. sources.test.ts (integration; expanded) - clickstack_list_metrics: * rejects non-metric sources and unknown source IDs * returns metrics across every populated kind when kind is omitted * narrows to a single kind when kind is set * applies namePattern as a server-side ILIKE filter * paginates via opaque nextCursor and resumes cleanly * rejects malformed cursors with an actionable error * surfaces the empty-result hint - clickstack_describe_metric: * rejects non-metric sources * returns an actionable hint when the metric is unknown * auto-detects kind for a gauge metric, returns attribute keys + sampled values + matching usage example * respects sampleValues:false * returns the increase-shaped query example for sum kind * returns the quantile + level example for histogram kind * rejects an explicit kind when the source has no table for it queryTool.test.ts (integration; expanded) - clickstack_timeseries and clickstack_table: * gauge avg over time * sum + aggFn:'increase' table * histogram + aggFn:'quantile' + level:0.95 timeseries - Schema rejection paths: * aggFn:'increase' on a gauge -> friendly schema error * aggFn:'quantile' on a histogram without level * aggFn:'avg' on a histogram (not supported) * metricType set without metricName * isDelta on a non-gauge metric All unit tests pass (103 total: 63 query + 12 cursor + 28 prompts). Integration tests require make dev-int with the standard Docker stack to run end-to-end (they rely on bulkInsertMetricsGauge/Sum/Histogram seeding). No new TypeScript errors above the pre-existing baseline on main.
Update the user-facing MCP tool reference to reflect the new metric
discovery + querying capabilities shipped in Stages 1-3b.
- Add clickstack_list_metrics and clickstack_describe_metric rows to
the tool table
- Mark clickstack_describe_source / clickstack_timeseries /
clickstack_table as supporting metric sources alongside log/trace
- Add a new 'Metric Sources' section covering:
* the required metricType + metricName fields on each select item
* the valueExpression='Value' default
* per-kind aggregation guidance (gauge/sum/histogram, the increase
top-N cap on sum + groupBy, the level requirement for histogram
quantile)
* the unsupported metric kinds (summary, exponential histogram)
* the five-step discovery workflow chaining list_sources ->
describe_source -> list_metrics -> describe_metric ->
timeseries|table
Closes the documentation acceptance criterion on HDX-4347.
- Extract QUERYABLE_METRIC_KINDS to a shared packages/api/src/mcp/tools/sources/metricKinds.ts so describeMetric, listMetrics, describeSource, and the query / dashboard tile schemas all reference a single source of truth. Adding a new queryable kind (e.g. when the renderer learns 'summary') is now a one-file change (greptile). - Drop the unused 'ms' import in listMetrics.ts that would have failed CI lint (greptile). - Clear the describe_metric wall-clock timeout in 'finally' so a stale controller.abort() does not fire DESCRIBE_TIMEOUT_MS after every successful call, and the setTimeout closure does not stay pinned for the same duration (greptile). - Apply the same clearTimeout fix to describe_source as a drive-by: the bug pre-existed this PR but the pattern is now consistent across both tools (PR comment posted on the line). - Use ?? instead of || for the metric-source valueExpression default in runConfigTile so an explicit empty string is not silently swapped for 'Value' (greptile). - Remove the unused MCP_METRIC_TYPE_OPTIONS export; mcpMetricTypeSchema now builds directly off the shared QUERYABLE_METRIC_KINDS tuple (knip). The dashboard tile mcpTileMetricTypeSchema also switches to the shared constant for consistency. Verified: yarn tsc --noEmit baseline diff = 0, yarn lint clean, yarn knip clean, 105 unit tests pass.
Bug 1: timeseries / table metric queries failed with 'Value expression
is required for non-count aggregation functions'
The Stage 2 default of valueExpression='Value' for metric items ran
inside runConfigTile, AFTER buildTile had already called
externalDashboardTileSchemaWithId.parse(). That schema's superRefine
rejects non-count aggregations with an empty valueExpression, so any
metric query that omitted the field never reached the runtime default.
Add applyMetricSelectDefaults() to tools/query/schemas.ts and call it
in the timeseries / table handlers BEFORE buildTile, mirroring the
agent-friendly default the external API v2 charts path applies on the
REST surface. The runtime default in runConfigTile stays as defensive
coverage for saved dashboards.
Bug 2: clickstack_describe_metric returned an empty attributeKeys map
for metric scopes that DID have data
metadata.getAllFields({ metricName }) was returning empty arrays for
metric-scoped Map-column scans in the CI environment. Replace the call
with fetchAttributeKeys(), a direct query that issues
'arrayDistinct(arrayFlatten(groupArray(mapKeys(col))))' for each Map
column on the kind's table. This matches the proven pattern used by
the web app's useFetchMetricResourceAttrs hook and is more transparent
than the metadata helper for this single-purpose case.
Verified end-to-end against make dev-int:
- queryTool.test.ts: 53/53 pass (was 50/53 before this commit)
- sources.test.ts: 76/76 pass (was 75/76 before this commit)
- query.test.ts unit suite: 69/69 (added 4 new tests for
applyMetricSelectDefaults)
No new TypeScript errors above the pre-existing baseline.
The new applyMetricSelectDefaults tests inferred narrow object shapes from inline literals, so 'valueExpression' was absent from the inferred generic type and TS errored on accessing 'out[0].valueExpression'. Annotate each test input as 'SelectItem' so the optional field stays in the inferred type. Tests behave identically at runtime; this is a TypeScript-only fix.
Calling clickstack_describe_source / clickstack_list_metrics / clickstack_describe_metric with a non-ObjectId sourceId surfaced a raw Mongoose CastError to the MCP client: Cast to ObjectId failed for value "…" (type string) at path "_id" for model "Source" The well-formed-but-missing path was already handled by each tool's "Source X not found" branch, but malformed IDs threw before getSource returned. Pre-check via mongoose.Types.ObjectId.isValid and wrap the findOne in a defensive try/catch so getSource returns null on any malformed input and the existing not-found branch fires cleanly. Add a controller unit test asserting both the malformed and missing ObjectId paths return null.
…ype mismatch clickstack_timeseries and clickstack_table accepted metricType + metricName on any source. When the source was not a metric source, the renderer composed SQL that referenced the metric "Value" column and let ClickHouse fail with: Unknown expression or function identifier `Value` in scope SELECT AVG(toFloat64OrDefault(toString(Value))) … FROM default.otel_traces The reciprocal case (metric source with no metricType) reached the renderer in the same broken state. Add assertSourceKindMatchesSelect() in tools/query/helpers.ts and call it inside runConfigTile right after getSource(). Both clickstack_timeseries and clickstack_table inherit the guard, with agent-actionable error messages that mirror the wording on clickstack_list_metrics / clickstack_describe_metric. Add unit tests covering both mismatch directions plus the no-op paths (raw-string select, empty metricType, non-array select).
clickstack_list_sources and clickstack_describe_source emitted the
embedded metricTables subdoc with a stray Mongoose-generated _id key
mixed in with the legitimate kind keys:
"metricTables": {
"gauge": "otel_metrics_gauge",
"sum": "otel_metrics_sum",
"_id": "6a2ad3b4da94be764e2deed8"
}
The metricTables field was declared as an inline `type: {...}` literal
which Mongoose interprets as an embedded subdoc schema with an
auto-generated _id field. Convert it to a real child Schema with
{ _id: false } so newly persisted documents stop carrying the stray id.
Add a sanitizeMetricTables() helper as belt-and-braces defense for
documents persisted before the schema fix, and use it at the MCP
response boundary in both list_sources and describe_source. The helper
looks up each valid MetricsDataType key via property access so it
handles both plain objects and Mongoose subdoc instances.
Tests: unit tests for sanitizeMetricTables covering the _id strip,
non-string-value drop, unknown-key drop, and Mongoose-subdoc shape.
Extend the existing describe_source integration test to assert the
response payload only contains valid kind keys.
clickstack_describe_metric historically auto-detected the metric kind
when omitted, probing every populated metric table for the given
MetricName and returning one entry per kind that matched. This path
was the consistent source of 10s wall-clock timeouts on multi-kind
metrics (e.g. container.cpu.usage which lives in both gauge and sum
tables), even with sampleValues:false.
The auto-detect path was also redundant: the upstream discovery tools
both surface kind alongside name:
- clickstack_list_metrics returns { name, kind, ... }
- clickstack_describe_source returns metricNames grouped by kind
By the time an agent reaches clickstack_describe_metric it already
has the kind. Requiring it on the input drops the detectKindsForMetric
probe, eliminates the multi-kind discovery path, and removes the only
place the tool ran across more than one ClickHouse table.
Also bound the per-kind attribute-keys discovery so the single-kind
path stays fast on production-scale metric tables. The previous
inline SQL ran an unbounded `WHERE MetricName = ?` scan with no
time filter, which still exceeded the wall-clock timeout on tables
with millions of rows per metric. Wrap the scan in a CTE that
samples at most 100k matching rows (MetricName + time window) and
aggregate map keys from that sample, plus an 8s server-side
max_execution_time as a belt-and-braces safety net.
Changes:
- Make `kind` required in describeMetricSchema (was optional).
- Remove detectKindsForMetric — auto-detect path no longer exists.
- Collapse the if/else branch to the explicit-kind path.
- Always return exactly one kindDetail; emit a top-level hint when
the (metric, kind) pair has no signal so the agent can widen the
window or verify the kind via clickstack_list_metrics.
- Bound fetchAttributeKeys with a CTE-LIMIT sample + max_execution_time
so the discovery query stays within the wall clock on hot metrics.
- Update the tool description and the timeout hint accordingly.
Tests: drop the auto-detect and multi-kind tests; existing tests that
already pinned `kind` are unchanged. Update the unknown-metric test
to reflect the new always-one-kind shape. Add a test asserting the
tool rejects calls with missing kind.
…e_source
clickstack_describe_source consistently timed out on the cold-cache
first call against a populated metric source, then succeeded on retry.
The bottleneck was the value-sampling stages (3 + 4) and the metric-
name sampling stage (5), which all delegate to metadata.getAllKeyValues
/ metadata.getMapKeys.
When the source has no metadataMaterializedViews configured (the case
for every metric source today), those helpers fall back to scanning
the raw metric table. The fallback path only applies a time filter
when timestampValueExpression is provided alongside dateRange — and
describe_source was passing dateRange but not timestampValueExpression.
The result was an unbounded scan per key against the full metric
table on cold cache, easily exceeding the 10s wall-clock timeout.
Pass source.timestampValueExpression through to:
- metadata.getMapKeys (stage 2)
- metadata.getAllKeyValues × 2 (stages 3 + 4)
- sampleMetricNamesForKind (stage 5)
and its inner metadata.getAllKeyValues call
For log / trace sources this is also a small win because the no-MVs
fallback now scopes its scan, but the practical impact is on metric
sources where the rollup MVs are not yet configured.
No new tests: existing sources.test.ts coverage already exercises
every stage on a metric source. The change is observably load-bearing
on the live MCP server — the first call now returns the full payload
in under the 10s budget instead of timing out.
…AttributeKeys
Greptile pointed out that sampleAttributeValues hit the same
(MetricName, time range) filter on the same metric table as
fetchAttributeKeys but lacked any of the scan bounds the earlier
commit added there:
- no inner LIMIT subquery
- no max_execution_time / timeout_overflow_mode
On a hot metric with millions of matching rows in the default 24-hour
window the query could run until the client-side AbortController
fires the wall-clock timeout, after which the ClickHouse server may
still be processing the work.
Apply the same pattern used in fetchAttributeKeys:
- Wrap the FROM/WHERE clause in a subquery that LIMITs at
METRIC_ATTR_KEYS_SAMPLE_SIZE matching rows. The inner SELECT
projects only the distinct set of map columns referenced by the
sampled keys.
- Add clickhouse_settings: { max_execution_time:
METRIC_ATTR_KEYS_MAX_EXEC_SECONDS, timeout_overflow_mode: 'break' }
as a server-side ceiling.
Live retest against the metric source on staging confirms the
attributeValues payload still comes back rich (same k8s.* diversity
on container.cpu.usage) — the 100k-row sample is plenty for value
discovery.
Refs: #2437 (comment)
getSource is called from REST routers, external-api v2, and MCP tools alike — the inline comments referenced only the MCP surface. Reword to caller-neutral language per review feedback. Refs: #2437 (comment)
… to hints[] Two problems with the query-tool hint plumbing: 1. annotateIncreaseTopNHint emitted the 'results may be truncated' disclaimer whenever increase + groupBy returned any data — even 3 groups — making it a false alarm rather than a detection. 2. Both annotateIncreaseTopNHint and the single-bucket hint in timeseries.ts assigned to parsed.hint directly, so whichever ran second silently overwrote the other. Fixes: - Add countDistinctGroups(): resolves comma-separated groupBy segments against the result row keys (bracket-aware split) and counts distinct combinations. The truncation hint now only fires when the distinct group count reaches INCREASE_TOP_N_CAP. When no segment resolves to a row key (e.g. bracket-syntax map attrs render as arrayElement(...) aliases), skip the hint instead of crying wolf. - Replace the single hint: string with hints: string[] across the query-tool response shape (annotateIncreaseTopNHint, the single-bucket hint, and formatQueryResult's empty-result hint) via a shared appendHint() helper, so multiple advisories coexist. Tests: rewrite the annotateIncreaseTopNHint block — at/below cap, distinct-groups-vs-raw-rows, multi-column groupBy combinations, unresolvable groupBy, and append-not-overwrite coexistence. Refs: #2437 (comment)
…ist_metrics
Discovery sub-query failures were silently swallowed into empty
returns (fetchAttributeKeys / sampleAttributeValues catch → {};
listMetrics per-kind catch → continue). Transient ClickHouse errors
then surfaced as the 'No data found — try widening startTime/endTime'
hint, so agents could not distinguish 'genuinely empty' from 'fetch
failed' and followed incorrect recovery steps.
- fetchAttributeKeys / sampleAttributeValues now return a typed
{ ok, data } | { ok: false, error } result. Errors are sanitised
(single line, capped at 200 chars).
- describe_metric collects failures into partialFailure[{stage,error}]
on the response and suppresses the no-data hint when any stage
failed — empty attributeKeys can't be trusted as 'no data' then.
A retry-oriented hint is emitted instead.
- list_metrics collects per-kind fetch failures into
partialFailure[{kind,error}] and suppresses the 'No metrics matched'
hint when failures explain the empty result.
Tests: new cases pointing a kind's metricTables entry at the logs
table (exists, but no MetricName/TimeUnix columns) to force the
discovery query to throw; assert partialFailure is present and the
misleading widen-the-window hints are absent. Happy-path test asserts
partialFailure stays undefined.
Refs: #2437 (comment)
sampleAttributeValues caps value sampling at MAX_ATTR_KEYS_TO_SAMPLE (12) keys and only emits keys with non-empty samples — so a key absent from attributeValues was indistinguishable between 'skipped by the cap' and 'sampled but empty in the window'. Agents could not tell when to re-query a key directly. Add attributeValuesMeta to each kindDetail: - sampledKeys: display names actually queried for values - truncatedKeys: display names skipped because the cap fired A key in truncatedKeys was never queried; a key in sampledKeys but missing from attributeValues is genuinely empty in the window. Tool description documents the semantics so agents know to query truncated keys directly. Tests: 15-key metric → 12 sampled + 3 truncated, disjoint sets covering all keys; small metric → truncatedKeys empty. Refs: #2437 (comment)
… to the renderer clickstack_timeseries put granularity into the tile config, but buildTile parses through externalDashboardTileSchemaWithId whose line/stacked_bar schemas don't declare a granularity field, so Zod's strip-transform dropped it. convertToInternalTileConfig never re-injects it (only the REST charts path defaults granularity). The renderer's isUsingGranularity guard (granularity != null) then omitted __hdx_time_bucket entirely, so every MCP timeseries call collapsed to a single row per group with no time dimension — which pushed agents to fall back to raw SQL. Re-inject granularity in runConfigTile for line/stacked_bar tiles, defaulting to "auto" so the renderer picks a bucket (mirrors external-api/v2/charts.ts:289). Threaded from the tool input via a new optional runConfigTile option; Search tiles keep granularity:undefined. Test: gauge timeseries now asserts __hdx_time_bucket is present in result.meta (regression guard) rather than just 'no renderer error'. Refs: #2437 (comment)
…not row count The single-bucket hint keyed on data.length === 1, conflating three cases: genuine coarse-granularity collapse, a legitimate single group under groupBy, and the no-time-bucket case — where 'try a coarser granularity' is actively backwards advice. Key the hint on the renderer's FIXED_TIME_BUCKET_EXPR_ALIAS (__hdx_time_bucket) column in result.meta instead: - bucket column absent -> 'result is not bucketed over time; pass a granularity' (no coarser-granularity advice). - bucket column present + exactly 1 row -> genuine single-bucket collapse hint (widen range / finer granularity). - bucket column present + many rows -> no hint. Emitted via appendHint so it coexists with the increase top-N hint. After the granularity-threading fix this rarely fires for timeseries (granularity now defaults to auto), but the meta-based signal is the robust guard. Test: gauge timeseries asserts the 'not bucketed over time' hint never fires when the bucket column is present. Refs: #2437 (comment)
5f28571 to
30812d1
Compare
Summary
Why: Metric sources were MVP-only in MCP —
describe_sourceearly-returned with justmetricTables, builder tiles on metric sources saved but failed at render time, the query tools couldn't carrymetricName/metricType, and the prompt told agents to fall back to raw SQL. Closes that gap.What changed:
clickstack_list_metrics— paginated metric-name catalog withkind,namePattern(ILIKE), time-window filters, and opaque cursor pagination.clickstack_describe_metric— per-metric unit, description, attribute keys, sampled values.kindis required (paired withmetricNameby every upstream discovery tool, so requiring it eliminates the multi-kind probe path).clickstack_describe_sourcenow picks a representative metric table (gauge → sum → histogram) and runs the existing column / map-key / value-sampling pipeline against it, plus a per-kind metric-name sample.clickstack_timeseries/clickstack_tableacceptmetricType,metricName,isDeltaon each select item, plusaggFn:"increase"for Sum counters.valueExpressiondefaults to"Value". Emits a hint when the renderer's 20-group cap onincrease + groupBymay apply. Reject up-front whenmetricTypeis paired with a non-metric source (or omitted on a metric source) instead of falling through to a cryptic ClickHouse error.summaryand"exponential histogram"kinds (no renderer support).Live MCP testing findings & fixes
Live testing against a populated metric source surfaced 5 issues. All are fixed in this PR.
Cast to ObjectId failed for value "..." (type string) at path "_id" for model "Source"leaked throughclickstack_describe_source/_list_metrics/_describe_metricon a badsourceIdgetSource()didn't guard against malformed inputObjectId.isValid+ try/catch aroundfindOne; all tools surface their existing "Source not found" branch (19bc4205)clickstack_timeseries/clickstack_tableacceptedmetricType+metricNameon non-metric sources and failed at SQL render withUnknown expression or function identifier "Value"runConfigTileassertSourceKindMatchesSelect()helper called between source-not-found and the connection lookup — both directions of the mismatch return agent-actionable errors (7001a915)metricTablesmap inclickstack_list_sources/_describe_sourceresponses included a stray_idObjectId alongside the valid kind keystype: { … }Mongoose subdoc auto-adds_idSchema(..., { _id: false })formetricTables; defense-in-depthsanitizeMetricTables()at the MCP response boundary (7ef9d9ca)clickstack_describe_metricconsistently timed out at 10 s on multi-kind metrics (e.g.container.cpu.usagelives in both gauge and sum)fetchAttributeKeysran an unboundedWHERE MetricName = ?scan with no time filterkind(every upstream tool already pairs it with name); boundfetchAttributeKeyswith a CTE sample LIMIT + server-sidemax_execution_time(30203f5e)clickstack_describe_sourceon a populated metric source timed out on the cold call, then succeeded on retrytimestampValueExpression, so the helpers' time filter was silently droppedsource.timestampValueExpressionthrough togetMapKeys, bothgetAllKeyValuescalls, andsampleMetricNamesForKind(31bc21b5)Screenshots or video
N/A — backend-only.
How to test on Vercel preview
N/A — non-UI change.
References