feat(examples): add service-localization multi-domain agent demo#36
Open
qiansheng91 wants to merge 2 commits into
Open
feat(examples): add service-localization multi-domain agent demo#36qiansheng91 wants to merge 2 commits into
qiansheng91 wants to merge 2 commits into
Conversation
A new scenario-driven example pack showing how an AI agent uses UModel to fetch telemetry and localize a bottleneck down a four-layer request stack (product API -> service -> datastore -> infrastructure). It complements incident-investigation: that demo is reactive root-cause analysis; this one is vertical localization with data retrieval as the hero. Scenario: checkout-api breaches its latency SLO. Walking the critical path one hop at a time (getDirectRelations) and fetching the saturation signal at each layer localizes the cause to orders-db's connection pool (~98%), while the hosting node is healthy and sibling services are fine. Contents (4 domains, all model files pass make example-validate): - 6 entity sets: product.api/journey, service.app, data.store, infra.node/pod - 7 entity_set_links forming the vertical stack (calls / reads_writes / runs_on / hosted_on / scheduled_on / depends_on) - 4 metric_sets (incl. data.store.metrics.connection_pool_usage, the localization signal) + 1 log_set + prometheus/elasticsearch storage, with 5 data_links and 5 storage_links - 23 entities / 29 relations encoding the planted bottleneck (md5-hex ids per CMS 2.0) - bilingual README with a 5-step data-retrieval walkthrough - MCP-driven test-integration.sh (16/16), using the correct query arg key and the safe PASS counter idiom Registered in internal/sampledata sampleCatalog as "service-localization" (aliases: bottleneck-localization, examples/service-localization) and linked from the root README (en/cn) and docs index so it is discoverable from day one. The runbook + standalone agent skills (model-resident and SKILL.md forms) land in a follow-up PR on top of this data pack. Verified: make example-validate, make ci, test-integration.sh (16/16), and manual get_metrics/get_logs/.topo probes against `make quickstart QUICKSTART_SAMPLE=examples/service-localization`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add two ways to exercise the localization flow as a whole, not just piecemeal: - examples/service-localization/demo.sh — a narrated, runnable replay of the agent's bottleneck-localization loop against a live server. Prints each SPL, the key result, and the reasoning per hop (symptom → entry → hop 1 calls → service CPU healthy → hop 2 reads_writes → datastore saturation PromQL → hop 3 hosted_on → node healthy → conclusion), and asserts the load-bearing facts (10 checks) so it doubles as a smoke gate. 10/10 against `make quickstart QUICKSTART_SAMPLE=examples/service-localization`. - internal/bootstrap/localization_test.go (TestServiceLocalizationPath) — an in-process gate that imports the sample and walks the same path via Query.Execute, asserting the topology hops, the connection_pool_usage plan rendering with the orders-db id substituted, and dataset discovery. Runs under `make ci`, so the demo path can't silently rot if the sample data, links, or datasets drift. Both READMEs point at demo.sh and note the CI coverage. Verified: TestServiceLocalizationPath passes; make ci green; demo.sh 10/10. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A new scenario-driven example pack —
examples/service-localization/— showing how an AI agent uses UModel to fetch telemetry and localize a bottleneck down a four-layer request stack (product API → service → datastore → infrastructure). Data retrieval is the hero.It complements Incident Investigation: that demo is reactive root-cause analysis (symptom → cause via a runbook, walking horizontally across callers + business); this one is vertical localization — walk the request stack down, fetch the signal at each hop, attribute the latency to a layer.
Scenario
checkout-apibreaches its 300ms latency SLO. Walking the critical path one hop at a time and fetching the saturation signal at each layer localizes the cause:Sibling services (catalog/search/payment/inventory) and all infra nodes are healthy, so the localization narrows cleanly to the datastore connection pool.
What's in the pack (4 domains)
product.api,product.journey,service.app,data.store,infra.node,infra.poddepends_on/calls/reads_writes/runs_on/hosted_on/scheduled_ondata.store.metrics.connection_pool_usage(the localization signal) — 1 log_set, prometheus + elasticsearch storage, with 5 data_links and 5 storage_linkstest-integration.sh(16/16), using the correctqueryarg key and the safePASS=$((PASS+1))counter idiom (the bugs fixed in feat(examples): add telemetry layer to incident-investigation demo #32)Registered in
internal/sampledatasampleCatalogasservice-localization(aliasesbottleneck-localization,examples/service-localization) and linked from the root README (en/cn) + docs index, so it is discoverable from day one.Test plan
make example-validate— all 30 new model YAMLs passmake ci— greenexamples/service-localization/test-integration.sh— 16/16make quickstart QUICKSTART_SAMPLE=examples/service-localization:.entity ... query='degraded'→ checkout-apigetDirectRelationswalks checkout-api → order-svc → orders-db → node-aget_metrics('data','data.store.metrics','connection_pool_usage')→ Prometheus plan with the orders-db id substitutedlist_data_setsurfaces the service metric + log setsNotes for reviewers
getDirectRelationsper hop rather than a single deepgetNeighborNodes. In the memory graphstore,getNeighborNodesdepth > 1 does not expand transitively (depth 2 == depth 1; depth 3 returns empty), and one-hop stepping is also the more faithful model of how an agent localizes. Flagging in case the multi-hop behavior is worth a separate look.status+ the planted topology, so the path is fully reproducible offline.Follow-up
The analysis/localization skills — both the model-resident
runbook_setand standaloneSKILL.mdfiles — land in a follow-up PR stacked on this data pack.