Skip to content

Implement zero-alloc encoder, unsafe cleanup, and comment API tests, C API and Header Consolidation#1

Open
SOG-web wants to merge 3 commits into
masterfrom
zero-copy
Open

Implement zero-alloc encoder, unsafe cleanup, and comment API tests, C API and Header Consolidation#1
SOG-web wants to merge 3 commits into
masterfrom
zero-copy

Conversation

@SOG-web
Copy link
Copy Markdown
Owner

@SOG-web SOG-web commented May 25, 2026

This pull request makes major improvements to the C API and build system, simplifies the project structure, and updates documentation to reflect these changes and recent performance improvements. The most significant update is the introduction of a new, unified C header (c/pg_query.h) that consolidates all public types and function declarations, eliminating the need for a C bridge layer. The build system and documentation are updated to match this new structure, and outdated or redundant files and instructions are removed.

C API and Header Consolidation:

  • Added a new public C header c/pg_query.h that defines all public types, enums, and function signatures, consolidating the C API and eliminating the need for a separate C bridge layer. ([c/pg_query.hR1-R177](https://github.com/SOG-web/pg_query.v/pull/1/files#diff-1ab809fd954f69433c80d405f413afb9e8f6ba87a135063146f3ef468c4bb45dR1-R177))
  • Added c/postgres_deparse.h for deparse options and comments, referenced by the main header. ([c/postgres_deparse.hR1-R34](https://github.com/SOG-web/pg_query.v/pull/1/files#diff-8a6146cf9c18c05a0ed65a1ea8813bb36658d72505ae81ae06867df28479291cR1-R34))
  • Updated compile_flags.txt to include the new c and c/vendor include directories. ([compile_flags.txtR1-R2](https://github.com/SOG-web/pg_query.v/pull/1/files#diff-c8a7d7f6c0b0a18342f11f0524c216886aa0c4a5fb2a0afcd44cb59b2d531091R1-R2))

Build System and Project Structure:

  • Updated the Makefile to use the new C API layout: the build target now copies the static library to c/libpg_query.a, and the dev-setup target creates a symlink for development. The test target runs tests from the project root. ([MakefileL1-R17](https://github.com/SOG-web/pg_query.v/pull/1/files#diff-76ed074a9305c04054cdebb9e9aad2d818052b07091de1f20cad0bbac34ffb52L1-R17))
  • Removed the old GitHub Actions CI workflow (.github/workflows/ci.yml) that referenced the previous project structure. ([.github/workflows/ci.ymlL1-L31](https://github.com/SOG-web/pg_query.v/pull/1/files#diff-b803fcb7f17ed9235f1e5cb1fcd2f5d3b2838429d4368ae4c57ce4436577f03fL1-L31))

Documentation Updates:

  • Updated README.md to remove references to the old C bridge and object files, and to use the new test and build instructions. The architecture diagram and usage notes now reflect the new API layers. ([[1]](https://github.com/SOG-web/pg_query.v/pull/1/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5L43-R44), [[2]](https://github.com/SOG-web/pg_query.v/pull/1/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5L191-R189), [[3]](https://github.com/SOG-web/pg_query.v/pull/1/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5L207-L222))
  • Updated docs/design.md with new benchmarks showing improved performance, revised function timings, and clarified that the C bridge layer has been eliminated. ([[1]](https://github.com/SOG-web/pg_query.v/pull/1/files#diff-b0929dacd4f1fc192cd308c1a9db2e85154ea742ccec315518e659d7f6a3efc7L13-R38), [[2]](https://github.com/SOG-web/pg_query.v/pull/1/files#diff-b0929dacd4f1fc192cd308c1a9db2e85154ea742ccec315518e659d7f6a3efc7L70-R72), [[3]](https://github.com/SOG-web/pg_query.v/pull/1/files#diff-b0929dacd4f1fc192cd308c1a9db2e85154ea742ccec315518e659d7f6a3efc7L96-R102))
  • Removed docs/analysis.md, which contained resolved or outdated findings. ([docs/analysis.mdL1-L67](https://github.com/SOG-web/pg_query.v/pull/1/files#diff-e6ec36e1440ad692c77749c4078726e0e406653b44bebde7d1be7867649b4e65L1-L67))

Summary by CodeRabbit

Release Notes

  • New Features

    • Added comment extraction and preservation in SQL deparsing operations
    • Introduced deparse formatting options for controlling output (pretty-printing, indentation, line length, comma placement)
    • Added throughput benchmarking example for performance analysis
  • Improvements

    • Enhanced protobuf encoding performance with optimized buffer handling
    • Updated build and test instructions for streamlined workflow
    • Expanded example code with more complex SQL query demonstrations

Review Change Stack

SOG-web added 3 commits May 16, 2026 23:27
…PI tests

Core encoder rewrite:
- Replace per-field heap-alloc buffers with zero-allocation _into variants
  that write directly into a caller-provided buf using backpatch varint
- Use padded 4-byte varint for submessage length (backpatch_varint4),
  avoiding the need for intermediate buffers
- Backtrack empty Alias nodes (zero-value default) instead of emitting
  spurious empty submessages

Unsafe cleanup (gen_ast.v):
- Replace unsafe buf.trim -> unsafe buf = buf[..tag_start_]
- Replace self-ref if val.f != unsafe nil check -> if x := val.f pattern
  using the new optional pointer type (?&Type)
- Replace isnil / unsafe nil -> != none / none for optional ref fields
- Replace unsafe cstring_to_vstring with string() cast in pgquery.v
- All changes: v test pg_query/ passes

Comment deparse API:
- Add tests for comment extraction (single-line, block, mixed, multiline, empty)
- Add roundtrip test: parse -> extract comments -> deparse with comments preserved
- Make DeparseOpts.comments field public for external access
- Add section 12 to examples/parse_sql.v with comment extraction + roundtrip demo
- Note: C library preserves inline comments between AST nodes but not trailing
  end-of-statement comments (C lib limitation)

Protobuf helpers (pg_query_protobuf.v):
- Add inline on write_varint_into / write_tag_into
- Add write_u32_placeholder and backpatch_varint4
- Optimize write_string_into/write_string to use unsafe byte copy
… layout)

Move all V source files from pg_query/ subdirectory to project root.
Vendor C library headers and static archive into c/ directory.
Replace C bridge layer with native V @VMODROOT/c/ paths.
Update Makefile, CI, compile_flags.txt, gen_ast.v, and docs for new layout.
- Add examples/throughput.v: real-world pooler benchmark with getrusage CPU/RSS
- Remove stale docs/analysis.md (findings already resolved)
- Remove stale .github/workflows/ci.yml (outdated CI config)
- Fix quote style in examples/bench.v for consistency
@SOG-web SOG-web self-assigned this May 25, 2026
Copilot AI review requested due to automatic review settings May 25, 2026 12:48
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 25, 2026

📝 Walkthrough

Walkthrough

This PR eliminates a C bridge indirection layer and reorganizes the pg_query.v project to expose libpg_query C APIs directly through V FFI. Build artifacts consolidate into a c/ directory, generated AST code moves to the project root, and C struct definitions become fully visible to V rather than wrapped by helper functions. The change reduces FFI overhead and simplifies memory management.

Changes

Bridge elimination and C API reorganization

Layer / File(s) Summary
Build system and project structure
.gitignore, Makefile, compile_flags.txt, v.mod, .github/workflows/ci.yml (deleted)
Makefile simplifies to delegate to libpg_query and copy output to c/; test paths move from pg_query/ to root; object-file ignore patterns broaden; compile flags reference c/ and c/vendor/; module name updated from pg_query.v to pg_query; CI workflow removed.
C API header definitions
c/pg_query.h, c/postgres_deparse.h, c/vendor/protobuf-c/protobuf-c.h, c/vendor/xxhash/xxhash.c
New public C headers expose the complete libpg_query and protobuf-c APIs: result structs for parsing/normalizing/splitting/deparsing, function prototypes for all operations, parse modes, memory cleanup helpers, Postgres version macros, and vendored implementations.
V C FFI bindings for direct struct access
pgquery.c.v
C.PostgresDeparseOpts and statement/utility-result structs transition from opaque voidptr to fully visible struct definitions with typed pointers. Parse flags use hardcoded literals, version constants are exported, and bridge helper declarations are removed.
V wrapper layer FFI refactoring
pgquery.v
deparse_opts_to_c, split_stmts_from_c, and deparse_comments_from_c now manipulate exposed C structs directly instead of calling bridge helpers. DeparseOpts fields become public. Version functions return cached strings.
AST types and protobuf encoding updates
pg_query_ast.v, pg_query_decode.v, pg_query_protobuf.v
SelectStmt recursive pointers change to optional (?&SelectStmt). String encoding uses unsafe vbytes slices for zero-copy. New backpatch helpers enable allocation-free protobuf serialization. Decoder uses none for optional initialization.
AST and protobuf code generator
tools/gen_ast.v
Generator emits pg_query_ast.v, pg_query_decode.v, pg_query_encode.v to project root. Recursive field types switch from &T to ?&T. Encoder is rewritten with backpatch-based "_into" variants for zero-allocation length-delimited message encoding.
Comment extraction and preservation tests
pg_query_test.v
New tests validate deparse_comments_for_query extracts single-line, block, and mixed comments, and confirm inline comments are preserved through deparse_protobuf_opts.
Examples and documentation updates
examples/bench.v, examples/parse_sql.v, examples/throughput.v, README.md, docs/design.md
Benchmark queries expand to heavy multi-nested SQL; parse_sql.v extends to comprehensive example covering AST traversal, roundtrips, rewrites, concurrent parsing, and comments; new throughput.v benchmarks parsing QPS and per-path latency with rusage instrumentation; README updates test commands and adds AST regeneration; docs/design.md updates performance figures.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A bridge burned, but the path remains clear,
Direct to the C API we cheer!
Structs exposed, no more veiled away,
Backpatch and inline make encoding faster today

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title comprehensively covers the main changes: zero-alloc encoder optimization, unsafe pointer cleanup, comment API tests, and C API header consolidation—all core objectives of this substantial refactoring PR.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch zero-copy

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the pg_query V wrapper by eliminating the C bridge layer and moving C headers and vendor code into a dedicated c/ directory. The AST generator was overhauled to produce V-native protobuf encoding logic using a zero-allocation pattern with length backpatching, and self-referencing nodes now use optional pointers. Performance benchmarks were updated to show improvements from these changes, and a new throughput tool was introduced. Feedback was provided regarding the hardcoded pointer size in memory allocations, suggesting the use of sizeof(voidptr) for better portability across architectures.

Comment thread pgquery.v
comment.newlines_before_comment, comment.newlines_after_comment, comment.str.str)
unsafe {
n := opts.comments.len
raw := C.calloc(usize(n), usize(8))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The pointer size is hardcoded to 8 bytes, which assumes a 64-bit architecture. For better portability across different architectures (e.g., 32-bit systems), use sizeof(voidptr) instead.

			raw := C.calloc(usize(n), usize(sizeof(voidptr)))

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (1)
examples/bench.v (1)

203-210: ⚡ Quick win

Deparse benchmark includes parse time but label says encode+deparse.

This loop benchmarks parse_protobuf_ast + deparse_ast, not just encode+deparse. Either relabel this row or pre-parse outside the timed block to isolate deparse latency.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/bench.v` around lines 203 - 210, The benchmark currently measures
parse_protobuf_ast + deparse_ast inside the timed loop (see the loop calling
pg_query.parse_protobuf_ast(q) and pg_query.deparse_ast(...)), but the printed
label says "encode+deparse"; to fix, either change the printed label to reflect
"parse+deparse" or move the parse_protobuf_ast(q) call outside the timing block
(pre-parse into a slice/array of ASTs before starting start/elapsed) so the
timed section only calls deparse_ast (the variable s and the deparse_ast
invocation identify the code to change).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/design.md`:
- Line 72: The bullet for PostgresDeparseOpts is inconsistent: replace the
“opaque voidptr” phrasing with wording that reflects the current direct C-struct
exposure model and state that the bridge layer has been removed; update the line
referencing `PostgresDeparseOpts` to say it is exposed as a concrete C struct in
the ABI (not an opaque void*) and note that the bridge layer has been eliminated
accordingly.
- Around line 23-31: Add a language tag to the fenced benchmark block in the
docs/design.md snippet (the triple-backtick block that lists normalize(),
parse(), fingerprint(), parse_protobuf_ast(), parse_json_ast(),
encode_parse_result(), deparse_ast() timings) so the fence is like ```text (or
another appropriate language) to satisfy MD040; update the opening backticks
only, leaving the contents unchanged.

In `@examples/throughput.v`:
- Around line 7-10: The code uses a hardcoded ru_maxrss_offset constant and
reads struct rusage by byte offsets which is non-portable and misreports memory
because Linux reports ru_maxrss in KiB while macOS reports bytes; replace the
byte-offset approach with direct named field access to struct rusage.ru_maxrss
(wherever ru_maxrss_offset is used and in the functions that read/process rusage
values) and normalize units per platform—e.g., detect OS (Linux vs macOS) or use
conditional compilation/build tags to convert Linux ru_maxrss * 1024 to bytes
(or consistently format as KiB) so the reported value is correct across
platforms; remove ru_maxrss_offset and any offset-based reads and update the
formatting code that prints the value accordingly.

In `@pg_query_protobuf.v`:
- Around line 413-419: backpatch_varint4 currently encodes length without
validating preconditions; add guards at the start of backpatch_varint4 to assert
length < (1 << 28) and that pos is non-negative and buf has at least pos+4 bytes
(buf.len >= pos + 4), and fail fast (panic/assert/return error consistent with
project conventions) with a clear message if any invariant is violated so
oversized payloads or out‑of‑bounds writes cannot silently produce invalid
protobuf.

In `@pgquery.v`:
- Line 584: The allocation uses a hardcoded 8-byte element size (raw :=
C.calloc(usize(n), usize(8))) which assumes 64-bit pointers and breaks 32-bit
portability; change the calloc element size to use the pointer size instead
(e.g., C.calloc(usize(n), usize(sizeof(voidptr))) or C.calloc(usize(n),
usize(C.sizeof_voidptr))) so the allocation matches the pointer arithmetic used
later, and verify any pointer-indexing/cleanup that uses sizeof(voidptr) still
matches this allocation.

In `@README.md`:
- Line 207: Update the README reference to the static archive path: replace
occurrences of "libpg_query.a" with the new build output path "c/libpg_query.a"
(the string literal "libpg_query.a" in the sentence describing the bundled
library) so the documentation matches the updated C layout and build artifacts.

---

Nitpick comments:
In `@examples/bench.v`:
- Around line 203-210: The benchmark currently measures parse_protobuf_ast +
deparse_ast inside the timed loop (see the loop calling
pg_query.parse_protobuf_ast(q) and pg_query.deparse_ast(...)), but the printed
label says "encode+deparse"; to fix, either change the printed label to reflect
"parse+deparse" or move the parse_protobuf_ast(q) call outside the timing block
(pre-parse into a slice/array of ASTs before starting start/elapsed) so the
timed section only calls deparse_ast (the variable s and the deparse_ast
invocation identify the code to change).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 23cb0452-eaac-465e-aa11-e5492bc97b8b

📥 Commits

Reviewing files that changed from the base of the PR and between 23319af and ff5a0fe.

📒 Files selected for processing (31)
  • .github/workflows/ci.yml
  • .gitignore
  • Makefile
  • README.md
  • c/libpg_query.a
  • c/pg_query.h
  • c/postgres_deparse.h
  • c/vendor/protobuf-c/protobuf-c.c
  • c/vendor/protobuf-c/protobuf-c.h
  • c/vendor/xxhash/xxhash.c
  • c/vendor/xxhash/xxhash.h
  • compile_flags.txt
  • docs/analysis.md
  • docs/design.md
  • examples/bench.v
  • examples/parse_sql.v
  • examples/throughput.v
  • pg_query/c_bridge.c
  • pg_query/c_bridge.h
  • pg_query/compile_flags.txt
  • pg_query/pg_query_encode.v
  • pg_query_ast.v
  • pg_query_decode.v
  • pg_query_encode.v
  • pg_query_protobuf.v
  • pg_query_test.v
  • pgquery.c.v
  • pgquery.v
  • tools/gen_ast.v
  • v-and-c-integration.md
  • v.mod
💤 Files with no reviewable changes (6)
  • .github/workflows/ci.yml
  • docs/analysis.md
  • pg_query/compile_flags.txt
  • v-and-c-integration.md
  • pg_query/c_bridge.h
  • pg_query/c_bridge.c

Comment thread docs/design.md
Comment on lines 23 to 31
```
normalize() → anonymized SQL 7.5 us/op
parse() → JSON string 13.1 us/op
fingerprint() → hash 16.4 us/op
parse_protobuf_ast() → typed AST 62.3 us/op
parse_json_ast() → typed AST 166.4 us/op
normalize() → anonymized SQL 2.6 us/op
parse() → JSON string 3.2 us/op
fingerprint() → hash 5.6 us/op
parse_protobuf_ast() → typed AST 18.6 us/op
parse_json_ast() → typed AST 53.7 us/op
encode_parse_result() → protobuf 111.0 us/op
deparse_ast() → SQL (encode+deparse) 159.7 us/op
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language tag to the fenced benchmark block.

This block is missing a fence language and triggers MD040.

💡 Proposed fix
-```
+```text
 normalize() → anonymized SQL               2.6 us/op
 ...
 deparse_ast() → SQL (encode+deparse)     159.7 us/op
</details>

<!-- suggestion_start -->

<details>
<summary>📝 Committable suggestion</summary>

> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

```suggestion

🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 23-23: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/design.md` around lines 23 - 31, Add a language tag to the fenced
benchmark block in the docs/design.md snippet (the triple-backtick block that
lists normalize(), parse(), fingerprint(), parse_protobuf_ast(),
parse_json_ast(), encode_parse_result(), deparse_ast() timings) so the fence is
like ```text (or another appropriate language) to satisfy MD040; update the
opening backticks only, leaving the contents unchanged.

Comment thread docs/design.md
- [x] Packed encoding for repeated scalar/enum fields
- [x] `parse_ast()` deprecated in favor of `parse_protobuf_ast()` (~3× faster)
- [x] `PostgresDeparseOpts` ABI decoupled (opaque voidptr via C bridge setters)
- [x] `PostgresDeparseOpts` ABI decoupled (opaque voidptr, bridge layer eliminated)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

PostgresDeparseOpts wording is inconsistent with bridge-elimination state.

This still describes an “opaque voidptr” ABI while also saying bridge layer is eliminated. Please align this bullet with the current direct C-struct exposure model.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/design.md` at line 72, The bullet for PostgresDeparseOpts is
inconsistent: replace the “opaque voidptr” phrasing with wording that reflects
the current direct C-struct exposure model and state that the bridge layer has
been removed; update the line referencing `PostgresDeparseOpts` to say it is
exposed as a concrete C struct in the ABI (not an opaque void*) and note that
the bridge layer has been eliminated accordingly.

Comment thread examples/throughput.v
Comment on lines +7 to +10
// ru_maxrss offset within struct rusage on macOS arm64.
// timeval = 16 bytes × 2 (ru_utime, ru_stime) = 32 bytes.
const ru_maxrss_offset = 32

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

On Linux and macOS, what are the units of ru_maxrssfromgetrusage, and is reading struct rusage via hardcoded byte offsets considered portable?

💡 Result:

  1. Units of ru_maxrss from getrusage on Linux and macOS - Linux: ru_maxrss is “the maximum resident set size used (in kilobytes)” (since Linux 2.6.32). [1][2] - macOS: ru_maxrss is “the maximum resident set size utilized (in bytes).” [3] So, Linux reports ru_maxrss in KiB, while macOS reports it in bytes. 2) Is reading struct rusage via hardcoded byte offsets portable? No—hardcoding byte offsets into struct rusage is not portable across platforms, architectures, ABIs, compiler/library versions, or even kernel/CRT variations. Reasons (grounded in what the sources show): - The struct rusage layout and field representations are not something you can assume is stable: even on Linux/glibc, ru_maxrss and similar fields may be represented using unions/ABI-specific types to remain kernel-compatible. [4] - The documented unit/meaning of ru_maxrss differs between Linux and macOS (KiB vs bytes), demonstrating that the same field name does not guarantee uniform interpretation across systems. [1][3] Portable approach: - Always use the named field access provided by your platform headers (e.g., rusage.ru_maxrss in C), and avoid byte-offset “struct scraping.” - If you must serialize/transfer across systems, define your own normalized format (including explicit unit conversion based on platform).

Citations:


Fix ru_maxrss portability/accuracy in examples/throughput.v

  • ru_maxrss units differ: Linux reports it in KiB, macOS in bytes—formatting as bytes will misreport values on at least one platform (lines 7-10; also 87-95, 155-156).
  • Reading struct rusage via hardcoded byte offsets is ABI-fragile and not portable—use named struct field access and normalize units per platform.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/throughput.v` around lines 7 - 10, The code uses a hardcoded
ru_maxrss_offset constant and reads struct rusage by byte offsets which is
non-portable and misreports memory because Linux reports ru_maxrss in KiB while
macOS reports bytes; replace the byte-offset approach with direct named field
access to struct rusage.ru_maxrss (wherever ru_maxrss_offset is used and in the
functions that read/process rusage values) and normalize units per
platform—e.g., detect OS (Linux vs macOS) or use conditional compilation/build
tags to convert Linux ru_maxrss * 1024 to bytes (or consistently format as KiB)
so the reported value is correct across platforms; remove ru_maxrss_offset and
any offset-based reads and update the formatting code that prints the value
accordingly.

Comment thread pg_query_protobuf.v
Comment on lines +413 to +419
fn backpatch_varint4(mut buf []u8, pos int, length int) {
v := u32(length)
buf[pos] = u8(v & 0x7F) | 0x80
buf[pos + 1] = u8((v >> 7) & 0x7F) | 0x80
buf[pos + 2] = u8((v >> 14) & 0x7F) | 0x80
buf[pos + 3] = u8((v >> 21) & 0x7F)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard backpatch_varint4 invariants before writing.

The function documents length < 2^28 but doesn’t enforce it. Oversized payloads will be silently encoded with a truncated 4-byte varint and produce invalid protobuf.

💡 Proposed fix
 fn backpatch_varint4(mut buf []u8, pos int, length int) {
+	if length < 0 || length >= (1 << 28) {
+		panic('backpatch_varint4: length out of range (${length}), expected 0 <= length < 2^28')
+	}
+	if pos < 0 || pos + 3 >= buf.len {
+		panic('backpatch_varint4: invalid patch position ${pos} for buffer len ${buf.len}')
+	}
 	v := u32(length)
 	buf[pos]     = u8(v & 0x7F) | 0x80
 	buf[pos + 1] = u8((v >> 7) & 0x7F) | 0x80
 	buf[pos + 2] = u8((v >> 14) & 0x7F) | 0x80
 	buf[pos + 3] = u8((v >> 21) & 0x7F)
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn backpatch_varint4(mut buf []u8, pos int, length int) {
v := u32(length)
buf[pos] = u8(v & 0x7F) | 0x80
buf[pos + 1] = u8((v >> 7) & 0x7F) | 0x80
buf[pos + 2] = u8((v >> 14) & 0x7F) | 0x80
buf[pos + 3] = u8((v >> 21) & 0x7F)
}
fn backpatch_varint4(mut buf []u8, pos int, length int) {
if length < 0 || length >= (1 << 28) {
panic('backpatch_varint4: length out of range (${length}), expected 0 <= length < 2^28')
}
if pos < 0 || pos + 3 >= buf.len {
panic('backpatch_varint4: invalid patch position ${pos} for buffer len ${buf.len}')
}
v := u32(length)
buf[pos] = u8(v & 0x7F) | 0x80
buf[pos + 1] = u8((v >> 7) & 0x7F) | 0x80
buf[pos + 2] = u8((v >> 14) & 0x7F) | 0x80
buf[pos + 3] = u8((v >> 21) & 0x7F)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pg_query_protobuf.v` around lines 413 - 419, backpatch_varint4 currently
encodes length without validating preconditions; add guards at the start of
backpatch_varint4 to assert length < (1 << 28) and that pos is non-negative and
buf has at least pos+4 bytes (buf.len >= pos + 4), and fail fast
(panic/assert/return error consistent with project conventions) with a clear
message if any invariant is violated so oversized payloads or out‑of‑bounds
writes cannot silently produce invalid protobuf.

Comment thread pgquery.v
comment.newlines_before_comment, comment.newlines_after_comment, comment.str.str)
unsafe {
n := opts.comments.len
raw := C.calloc(usize(n), usize(8))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Hardcoded pointer size breaks 32-bit portability.

usize(8) assumes 64-bit pointers. On 32-bit systems, this creates a mismatch: allocation uses 8-byte spacing while cleanup iterates with sizeof(voidptr) (4 bytes), causing incorrect pointer arithmetic.

Proposed fix
-			raw := C.calloc(usize(n), usize(8))
+			raw := C.calloc(usize(n), sizeof(voidptr))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pgquery.v` at line 584, The allocation uses a hardcoded 8-byte element size
(raw := C.calloc(usize(n), usize(8))) which assumes 64-bit pointers and breaks
32-bit portability; change the calloc element size to use the pointer size
instead (e.g., C.calloc(usize(n), usize(sizeof(voidptr))) or C.calloc(usize(n),
usize(C.sizeof_voidptr))) so the allocation matches the pointer arithmetic used
later, and verify any pointer-indexing/cleanup that uses sizeof(voidptr) still
matches this allocation.

Comment thread README.md
## How it works

The library bundles [libpg_query](https://github.com/pganalyze/libpg_query) (version 6.2.2, wrapping PostgreSQL 17.7), pre-built as a static archive (`libpg_query.a`). The V wrapper in `pg_query/` has these layers:
The library bundles [libpg_query](https://github.com/pganalyze/libpg_query) (version 6.2.2, wrapping PostgreSQL 17.7), pre-built as a static archive (`libpg_query.a`). The V wrapper has these layers:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Static library path looks stale after the C layout move.

This line still references libpg_query.a; with the new structure it should point to c/libpg_query.a to match the build output.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@README.md` at line 207, Update the README reference to the static archive
path: replace occurrences of "libpg_query.a" with the new build output path
"c/libpg_query.a" (the string literal "libpg_query.a" in the sentence describing
the bundled library) so the documentation matches the updated C layout and build
artifacts.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR restructures the project’s C integration by introducing a unified public C header under c/, removing the previous C bridge layer, and updating the V-side bindings/build/docs accordingly. It also updates the V-native protobuf encoder generator to support a “write-into-buffer” (zero-alloc) encoding style using length backpatching, and adds comment-related tests and example coverage.

Changes:

  • Consolidate the public C API into c/pg_query.h (and c/postgres_deparse.h), removing the legacy pg_query/c_bridge.* bridge layer.
  • Update the V protobuf encoder generation to support _into (caller-buffer) variants and a backpatching technique for length-delimited fields.
  • Refresh build/test instructions, examples, and documentation to match the new layout; add tests for comment extraction and roundtrip.

Reviewed changes

Copilot reviewed 23 out of 31 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
v.mod Renames the module to pg_query.
v-and-c-integration.md Removes now-redundant V/C interop documentation.
tools/gen_ast.v Updates generator outputs and introduces _into encoder generation + optional self-ref changes.
README.md Updates build/test instructions and removes references to the old bridge object.
pgquery.v Removes bridge calls, updates deparse opts handling/freeing, and adjusts split/comments handling.
pgquery.c.v Switches include/lib paths to c/, updates C type declarations, removes bridge declarations, hardcodes option/version constants.
pg_query/compile_flags.txt Removes old compile flags file from the previous layout.
pg_query/c_bridge.h Deletes the legacy bridge header.
pg_query/c_bridge.c Deletes the legacy bridge implementation.
pg_query_test.v Adds tests for comment extraction and comment-preserving deparse roundtrip.
pg_query_protobuf.v Optimizes string writes and adds placeholder/backpatch helpers for encoder.
pg_query_decode.v Updates generated decode defaults for optional self-references (none vs nil).
pg_query_ast.v Updates generated AST self-references to optional pointers (?&T) and adjusts str() output.
Makefile Removes bridge build; copies libpg_query.a into c/ and updates test target + dev symlink.
examples/throughput.v Adds a throughput/microbenchmark example using getrusage.
examples/parse_sql.v Expands example coverage (heavy queries, truncation, comment extraction/roundtrip).
examples/bench.v Updates benchmark queries and output labeling for the new performance focus.
docs/design.md Updates benchmark numbers/date and claims around bridge removal/ABI decoupling.
docs/analysis.md Removes outdated analysis notes.
compile_flags.txt Adds root-level compile flags pointing at c/ and c/vendor.
c/vendor/xxhash/xxhash.c Adds vendored xxHash implementation file.
c/vendor/protobuf-c/protobuf-c.h Adds vendored protobuf-c public header.
c/postgres_deparse.h Adds deparse options/comments public header.
c/pg_query.h Adds the consolidated public C API header.
.gitignore Ignores the dev symlink and broadens object file ignores.
.github/workflows/ci.yml Removes the existing CI workflow.
Comments suppressed due to low confidence (4)

pg_query_ast.v:4210

  • larg/rarg are ?&SelectStmt, but str() interpolates m.larg/m.rarg directly. This likely won’t compile (or will print the option wrapper) because optionals should be unwrapped before use. Suggest if larg := m.larg { parts << "larg: ${larg}" } (same for rarg).
    pgquery.v:586
  • C.calloc(usize(n), usize(8)) hardcodes an 8-byte pointer size for the PostgresDeparseComment** array. This is non-portable and can under/over-allocate on non-64-bit targets. Allocate with sizeof(voidptr) / sizeof(&C.PostgresDeparseComment) instead so the array element size matches the platform pointer size.
    pg_query_protobuf.v:418
  • backpatch_varint4 documents a hard constraint (length < 2^28) but doesn’t enforce it. If a larger submessage is ever encoded, the length varint will be incorrect (and the output protobuf becomes malformed). Add a guard (assert/error) or switch to a placeholder/patching scheme that supports larger lengths (e.g., 5-byte placeholder or fallback to normal varint encoding with buffer adjustment).
    pgquery.c.v:139
  • These parse-mode / parser-option / version constants are duplicated as hardcoded numbers/strings even though pg_query.h defines the authoritative macros/enums. This creates a maintenance hazard on PG/libpg_query bumps. Prefer deriving these from the C header (C.PG_QUERY_*, C.PG_VERSION*) or otherwise centralize them so there’s a single source of truth.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tools/gen_ast.v
out += '\tif m.${vfname}.len > 0 { parts << "${vfname}: \${m.${vfname}}" }\n'
} else if is_recursive {
out += '\tif !isnil(m.${vfname}) { parts << "${vfname}: \${m.${vfname}}" }\n'
out += '\tif m.${vfname} != none { parts << "${vfname}: \${m.${vfname}}" }\n'
Comment thread c/pg_query.h
Comment on lines +4 to +9
#include <stdbool.h>
#include <stdint.h>
#include <sys/types.h>

#include "postgres_deparse.h"

Comment thread c/postgres_deparse.h
#ifndef POSTGRES_DEPARSE_H
#define POSTGRES_DEPARSE_H

#include <stdbool.h>
Comment thread docs/design.md
Comment on lines +72 to 74
- [x] `PostgresDeparseOpts` ABI decoupled (opaque voidptr, bridge layer eliminated)
- [x] `valid_enum_int_strict()` for rejecting invalid enum values
- [x] Self-validating version tests (no hardcoded PG version strings)
Comment thread examples/parse_sql.v
Comment on lines +65 to +66
if preview.len > 50 { preview = preview[..50] }
return '${preview}... -> ${res.stmts.len} statement(s)'
Comment thread examples/throughput.v
Comment on lines +5 to +11
#include <sys/resource.h>

// ru_maxrss offset within struct rusage on macOS arm64.
// timeval = 16 bytes × 2 (ru_utime, ru_stime) = 32 bytes.
const ru_maxrss_offset = 32

fn C.getrusage(who int, usage voidptr) int
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants