feat: add oq pipeline query language for OpenAPI schema graphs#177
feat: add oq pipeline query language for OpenAPI schema graphs#177vishalg0wda merged 28 commits intomainfrom
Conversation
Implement a domain-specific pipeline query language (oq) that enables agents and humans to construct ad-hoc structural queries over OpenAPI documents. The query engine operates over a pre-computed directed graph materialized from openapi.Index. New packages: - graph/: SchemaGraph type with node/edge types, Build() constructor, reachability/ancestor traversal, and pre-computed metrics - oq/expr/: Predicate expression parser and evaluator supporting ==, !=, >, <, >=, <=, and, or, not, has(), matches() - oq/: Pipeline parser, AST, executor with source/traversal/filter stages, and table/JSON formatters New CLI command: openapi spec query <file> '<pipeline>' Example queries: schemas.components | sort depth desc | take 10 | select name, depth schemas | where union_width > 0 | sort union_width desc | take 10 schemas.components | where in_degree == 0 | select name operations | sort schema_count desc | take 10 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The cmd/openapi module needs a replace directive pointing to the root module so that go mod tidy can resolve the new graph/ and oq/ packages that aren't yet published. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use require.Error for error assertions and assert.Positive for count checks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace fmt.Errorf with errors.New where no format args (perfsprint) - Convert if-else chain to switch statement (gocritic) - Use assert.Len and assert.Positive in tests (testifylint) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use t.Context() instead of context.Background() in tests - Replace WriteString(fmt.Sprintf(...)) with fmt.Fprintf - Remove development replace directive from cmd/openapi/go.mod - Fix trailing newline for count results in table format Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📊 Test Coverage ReportCurrent Coverage: Coverage Change: 📈 +.1% (improved) Coverage by Package
📋 Detailed Coverage by Function (click to expand)
Generated by GitHub Actions |
New stages: explain, fields, head (alias), sample, path, top, bottom, format New operation fields: tag, parameter_count, deprecated, description, summary New graph method: ShortestPath for BFS pathfinding New formatter: FormatMarkdown for markdown table output Restore replace directive in cmd/openapi/go.mod (required for CI) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
- Fix stdinOrFileArgs(2,2) -> (1,2) so -f flag works with 1 positional arg - Fix OOB panic in expr tokenizer on unterminated backslash-terminated strings - Add tests for refs-out, refs-in, items, format groups, field coverage, empty/count edge cases bringing oq coverage from 72% to 83% Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implement FormatToon following the TOON (Token-Oriented Object Notation)
spec: tabular array syntax with header[N]{fields}: and comma-delimited
data rows. Includes proper string escaping per TOON quoting rules.
See https://github.com/toon-format/toon
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…panic Add `openapi spec query-reference` subcommand that prints the complete oq language reference. Add README.md for the oq package. Fix OOB panic in expr parser's expect() method when tokens are exhausted mid-parse. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Edge annotations: 1-hop traversal stages (refs-out, refs-in, properties, union-members, items) now populate edge_kind, edge_label, and edge_from fields on result rows, making relationship types visible in query output. New traversal stages: connected, blast-radius, neighbors <n> New analysis stages: orphans, leaves, cycles, clusters, tag-boundary, shared-refs New schema fields: op_count, tag_count Graph layer additions: Neighbors (depth-limited bidirectional BFS), StronglyConnectedComponents (Tarjan's SCC), SchemaOpCount. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Change `openapi spec query <file> <query>` to `openapi spec query <query> [file]`. The query is the primary argument; the input file is optional and defaults to stdin when omitted. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
TristanSpeakEasy
left a comment
There was a problem hiding this comment.
LGTM — well-structured, well-tested addition (135 tests pass). Clean pipeline language design with good composability. A few items flagged inline.
TristanSpeakEasy
left a comment
There was a problem hiding this comment.
LGTM one you get the checks passing and address the devin feedback etc if needed
…ulti-module workflow
…double newline, and lint issues
|
|
||
| return stages, nil |
There was a problem hiding this comment.
🟡 Parse returns nil stages with nil error for pipe-only or whitespace-only queries
When Parse is called with a query like "|" or " | ", splitPipeline produces parts that are all empty/whitespace strings. These pass the initial len(parts) == 0 check at oq/parse.go:14, but every part is skipped by the part == "" check at oq/parse.go:22-24, leaving stages as nil. The function then returns (nil, nil) — no stages and no error. Downstream, run at oq/exec.go:17-18 treats zero stages as success and returns an empty Result, silently producing no output instead of reporting a parse error.
Was this helpful? React with 👍 or 👎 to provide feedback.
Modernize oq with jq-style syntax: select(expr) for filtering, pick for field projection, sort_by(field; desc), first/last/length, group_by(), def/include for user-defined functions and modules, let $var for variable binding, // alternative operator, if-then-else-end, and string interpolation \(expr). All legacy syntax remains supported. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix gocritic assignOp, gosec G304, staticcheck punctuation, and testifylint issues. Update README, query help text, and language reference to document new jq-style syntax (select, pick, sort_by, first/last, //, if-then-else, def/include, let). Fix has() to use != 0 instead of > 0 for correct non-zero semantics. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| if strings.HasPrefix(tok, "\"") { | ||
| p.next() | ||
| inner := tok[1 : len(tok)-1] // strip quotes |
There was a problem hiding this comment.
🔴 Expression parser panics on unterminated single-quote string token
When the expression tokenizer produces a single " token (from an unterminated string literal, e.g. the query schemas | where "), parsePrimary at oq/expr/expr.go:444 executes tok[1:len(tok)-1] which evaluates to tok[1:0]. In Go, a slice expression where low > high causes a runtime panic (slice bounds out of range). The tokenizer at oq/expr/expr.go:590-603 produces a single " token when the input is just " with no closing quote (j starts at i+1 which equals len(input), the scan loop doesn't execute, and the closing-quote increment is skipped). This allows any user to crash the application with a malformed query expression.
| if strings.HasPrefix(tok, "\"") { | |
| p.next() | |
| inner := tok[1 : len(tok)-1] // strip quotes | |
| if strings.HasPrefix(tok, "\"") { | |
| p.next() | |
| if len(tok) < 2 { | |
| return nil, errors.New("unterminated string literal") | |
| } | |
| inner := tok[1 : len(tok)-1] // strip quotes |
Was this helpful? React with 👍 or 👎 to provide feedback.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
🚩 JSON output uses Go %q escaping instead of proper JSON encoding
The jsonValue function at oq/oq.go:793-804 and formatGroupsJSON use fmt.Sprintf("%q", ...) for string quoting. Go's %q (backed by strconv.Quote) produces Go-style escaping which differs from JSON in edge cases: Go may emit \x hex escapes and \a/\v control character escapes that are not valid JSON. For typical OpenAPI schema names (ASCII identifiers), this works fine. But if a schema name or path contains unusual characters, the output would be invalid JSON. Consider using encoding/json.Marshal for string values if strict JSON compliance is needed.
(Refers to lines 329-340)
Was this helpful? React with 👍 or 👎 to provide feedback.
| rows := result.Rows | ||
| if stage.Limit < len(rows) { | ||
| rows = rows[:stage.Limit] | ||
| } |
There was a problem hiding this comment.
🔴 Negative limit values parsed without validation cause runtime panics
The parser accepts negative integers for first(), last(), take, head, sample(), top(), and bottom() limit parameters (via strconv.Atoi which parses negatives). When executed, negative limits cause slice bounds out of range panics. For example, first(-5) parses to Stage{Kind: StageTake, Limit: -5}. In execTake, the check stage.Limit < len(rows) passes (since -5 < N for any N >= 0), and then rows[:stage.Limit] (i.e., rows[:-5]) panics. The same issue affects execLast (rows[len(rows)-(-5):] → rows[len+5:] → panic) at oq/exec.go:194 and execSample (rows[:stage.Limit]) at oq/exec.go:1052.
Panic scenario trace
Query: schemas | first(-5)
splitKeywordCall("first(-5)")→ keyword="first", args="-5"strconv.Atoi("-5")→ -5 (no error)Stage{Kind: StageTake, Limit: -5}createdexecTake:-5 < len(rows)→ true →rows[:-5]→ panic: runtime error: slice bounds out of range
| rows := result.Rows | |
| if stage.Limit < len(rows) { | |
| rows = rows[:stage.Limit] | |
| } | |
| func execTake(stage Stage, result *Result) (*Result, error) { | |
| rows := result.Rows | |
| if stage.Limit < 0 { | |
| return &Result{Fields: result.Fields, FormatHint: result.FormatHint}, nil | |
| } | |
| if stage.Limit < len(rows) { | |
| rows = rows[:stage.Limit] | |
| } |
Was this helpful? React with 👍 or 👎 to provide feedback.
| func execLast(stage Stage, result *Result) (*Result, error) { | ||
| rows := result.Rows | ||
| if stage.Limit < len(rows) { | ||
| rows = rows[len(rows)-stage.Limit:] | ||
| } |
There was a problem hiding this comment.
🔴 Negative limit in execLast causes slice bounds out of range panic
Same root cause as BUG-0001 but in execLast. When stage.Limit is negative (e.g., last(-5) with 3 rows), rows[len(rows)-stage.Limit:] becomes rows[3-(-5):] = rows[8:], which panics with slice bounds out of range.
| func execLast(stage Stage, result *Result) (*Result, error) { | |
| rows := result.Rows | |
| if stage.Limit < len(rows) { | |
| rows = rows[len(rows)-stage.Limit:] | |
| } | |
| func execLast(stage Stage, result *Result) (*Result, error) { | |
| rows := result.Rows | |
| if stage.Limit < 0 { | |
| return &Result{Fields: result.Fields, FormatHint: result.FormatHint}, nil | |
| } | |
| if stage.Limit < len(rows) { | |
| rows = rows[len(rows)-stage.Limit:] | |
| } |
Was this helpful? React with 👍 or 👎 to provide feedback.
| func execSample(stage Stage, result *Result) (*Result, error) { | ||
| if stage.Limit >= len(result.Rows) { | ||
| return result, nil | ||
| } | ||
|
|
||
| // Deterministic shuffle using Fisher-Yates with a fixed seed derived from row count. | ||
| rows := append([]Row{}, result.Rows...) | ||
| rng := rand.New(rand.NewPCG(uint64(len(rows)), 0)) //nolint:gosec // deterministic seed is intentional | ||
| rng.Shuffle(len(rows), func(i, j int) { | ||
| rows[i], rows[j] = rows[j], rows[i] | ||
| }) | ||
|
|
||
| out := &Result{Fields: result.Fields} | ||
| out.Rows = rows[:stage.Limit] | ||
| return out, nil |
There was a problem hiding this comment.
🔴 Negative limit in execSample causes slice bounds out of range panic
Same root cause as BUG-0001 but in execSample. When stage.Limit is negative, the guard stage.Limit >= len(result.Rows) is false (negative < any non-negative), so execution continues to rows[:stage.Limit] which panics.
| func execSample(stage Stage, result *Result) (*Result, error) { | |
| if stage.Limit >= len(result.Rows) { | |
| return result, nil | |
| } | |
| // Deterministic shuffle using Fisher-Yates with a fixed seed derived from row count. | |
| rows := append([]Row{}, result.Rows...) | |
| rng := rand.New(rand.NewPCG(uint64(len(rows)), 0)) //nolint:gosec // deterministic seed is intentional | |
| rng.Shuffle(len(rows), func(i, j int) { | |
| rows[i], rows[j] = rows[j], rows[i] | |
| }) | |
| out := &Result{Fields: result.Fields} | |
| out.Rows = rows[:stage.Limit] | |
| return out, nil | |
| func execSample(stage Stage, result *Result) (*Result, error) { | |
| if stage.Limit <= 0 || stage.Limit >= len(result.Rows) { | |
| return result, nil | |
| } | |
| // Deterministic shuffle using Fisher-Yates with a fixed seed derived from row count. | |
| rows := append([]Row{}, result.Rows...) | |
| rng := rand.New(rand.NewPCG(uint64(len(rows)), 0)) //nolint:gosec // deterministic seed is intentional | |
| rng.Shuffle(len(rows), func(i, j int) { | |
| rows[i], rows[j] = rows[j], rows[i] | |
| }) | |
| out := &Result{Fields: result.Fields} | |
| out.Rows = rows[:stage.Limit] | |
| return out, nil | |
| } |
Was this helpful? React with 👍 or 👎 to provide feedback.
| func buildIndex(ctx context.Context, doc *openapi.OpenAPI) *openapi.Index { | ||
| resolveOpts := references.ResolveOptions{ | ||
| RootDocument: doc, | ||
| TargetDocument: doc, | ||
| TargetLocation: ".", | ||
| } | ||
| return openapi.BuildIndex(ctx, doc, resolveOpts) | ||
| } |
There was a problem hiding this comment.
🚩 TargetLocation set to '.' instead of actual file path
In cmd/openapi/commands/openapi/query.go:140, TargetLocation is set to "." rather than the actual input file path. Every other call to BuildIndex in the codebase uses a meaningful file path (e.g., "test.yaml", "testdata/petstore.yaml"). The TargetLocation is used by isFromMainDocument() to compare against the current document stack. For single-file specs this works, but for multi-file specs with external $refs, using "." could cause isFromMainDocument() to behave incorrectly (e.g., classifying external schemas as main-document schemas). The graph test at graph/graph_test.go:29 correctly uses the actual path.
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
Implements
oq— a pipeline query language for semantic traversal and analysis of OpenAPI documents. Agents and humans can construct ad-hoc structural queries over the schema reference graph at runtime.Docs: oq README · Full reference via
openapi spec query-referenceArchitecture
New packages
graph/—SchemaGraphtype withBuild()constructor, node/edge types, BFS traversal, shortest path, SCC (Tarjan's), connected componentsoq/— Pipeline query language: parser, AST, executor, formatters (table, JSON, markdown, TOON)oq/expr/— Predicate expression parser and evaluator forwhereclausesPipeline stages
schemas,schemas.components,schemas.inline,operationsrefs-out,refs-in,reachable,ancestors,properties,union-members,items,ops,schemas,path <from> <to>,connected,blast-radius,neighbors <n>orphans,leaves,cycles,clusters,tag-boundary,shared-refswhere <expr>,select <fields>,sort <field> [asc|desc],take/head <n>,sample <n>,top <n> <field>,bottom <n> <field>,unique,group-by <field>,countexplain,fields,format <table|json|markdown|toon>Edge annotations
1-hop traversals (
refs-out,refs-in,properties,union-members,items) populateedge_kind,edge_label, andedge_fromfields on result rows, making relationship types visible:openapi spec query 'schemas.components | where name == "Pet" | refs-out | select name, edge_kind, edge_label, edge_from' petstore.yamlSchema fields
name,type,depth,in_degree,out_degree,union_width,property_count,is_component,is_inline,is_circular,has_ref,hash,path,op_count,tag_countOperation fields
name,method,path,operation_id,schema_count,component_count,tag,parameter_count,deprecated,description,summaryExample queries
Scaling verification
Tested against real specs from the Speakeasy specs corpus:
Test plan
🤖 Generated with Claude Code