Implement zero-alloc encoder, unsafe cleanup, and comment API tests, C API and Header Consolidation by SOG-web · Pull Request #1 · SOG-web/pg_query.v

SOG-web · 2026-05-25T12:48:30Z

This pull request makes major improvements to the C API and build system, simplifies the project structure, and updates documentation to reflect these changes and recent performance improvements. The most significant update is the introduction of a new, unified C header (c/pg_query.h) that consolidates all public types and function declarations, eliminating the need for a C bridge layer. The build system and documentation are updated to match this new structure, and outdated or redundant files and instructions are removed.

C API and Header Consolidation:

Added a new public C header c/pg_query.h that defines all public types, enums, and function signatures, consolidating the C API and eliminating the need for a separate C bridge layer. ([c/pg_query.hR1-R177](https://github.com/SOG-web/pg_query.v/pull/1/files#diff-1ab809fd954f69433c80d405f413afb9e8f6ba87a135063146f3ef468c4bb45dR1-R177))
Added c/postgres_deparse.h for deparse options and comments, referenced by the main header. ([c/postgres_deparse.hR1-R34](https://github.com/SOG-web/pg_query.v/pull/1/files#diff-8a6146cf9c18c05a0ed65a1ea8813bb36658d72505ae81ae06867df28479291cR1-R34))
Updated compile_flags.txt to include the new c and c/vendor include directories. ([compile_flags.txtR1-R2](https://github.com/SOG-web/pg_query.v/pull/1/files#diff-c8a7d7f6c0b0a18342f11f0524c216886aa0c4a5fb2a0afcd44cb59b2d531091R1-R2))

Build System and Project Structure:

Updated the Makefile to use the new C API layout: the build target now copies the static library to c/libpg_query.a, and the dev-setup target creates a symlink for development. The test target runs tests from the project root. ([MakefileL1-R17](https://github.com/SOG-web/pg_query.v/pull/1/files#diff-76ed074a9305c04054cdebb9e9aad2d818052b07091de1f20cad0bbac34ffb52L1-R17))
Removed the old GitHub Actions CI workflow (.github/workflows/ci.yml) that referenced the previous project structure. ([.github/workflows/ci.ymlL1-L31](https://github.com/SOG-web/pg_query.v/pull/1/files#diff-b803fcb7f17ed9235f1e5cb1fcd2f5d3b2838429d4368ae4c57ce4436577f03fL1-L31))

Documentation Updates:

Updated README.md to remove references to the old C bridge and object files, and to use the new test and build instructions. The architecture diagram and usage notes now reflect the new API layers. ([[1]](https://github.com/SOG-web/pg_query.v/pull/1/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5L43-R44), [[2]](https://github.com/SOG-web/pg_query.v/pull/1/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5L191-R189), [[3]](https://github.com/SOG-web/pg_query.v/pull/1/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5L207-L222))
Updated docs/design.md with new benchmarks showing improved performance, revised function timings, and clarified that the C bridge layer has been eliminated. ([[1]](https://github.com/SOG-web/pg_query.v/pull/1/files#diff-b0929dacd4f1fc192cd308c1a9db2e85154ea742ccec315518e659d7f6a3efc7L13-R38), [[2]](https://github.com/SOG-web/pg_query.v/pull/1/files#diff-b0929dacd4f1fc192cd308c1a9db2e85154ea742ccec315518e659d7f6a3efc7L70-R72), [[3]](https://github.com/SOG-web/pg_query.v/pull/1/files#diff-b0929dacd4f1fc192cd308c1a9db2e85154ea742ccec315518e659d7f6a3efc7L96-R102))
Removed docs/analysis.md, which contained resolved or outdated findings. ([docs/analysis.mdL1-L67](https://github.com/SOG-web/pg_query.v/pull/1/files#diff-e6ec36e1440ad692c77749c4078726e0e406653b44bebde7d1be7867649b4e65L1-L67))

Summary by CodeRabbit

Release Notes

New Features
- Added comment extraction and preservation in SQL deparsing operations
- Introduced deparse formatting options for controlling output (pretty-printing, indentation, line length, comma placement)
- Added throughput benchmarking example for performance analysis
Improvements
- Enhanced protobuf encoding performance with optimized buffer handling
- Updated build and test instructions for streamlined workflow
- Expanded example code with more complex SQL query demonstrations

…PI tests Core encoder rewrite: - Replace per-field heap-alloc buffers with zero-allocation _into variants that write directly into a caller-provided buf using backpatch varint - Use padded 4-byte varint for submessage length (backpatch_varint4), avoiding the need for intermediate buffers - Backtrack empty Alias nodes (zero-value default) instead of emitting spurious empty submessages Unsafe cleanup (gen_ast.v): - Replace unsafe buf.trim -> unsafe buf = buf[..tag_start_] - Replace self-ref if val.f != unsafe nil check -> if x := val.f pattern using the new optional pointer type (?&Type) - Replace isnil / unsafe nil -> != none / none for optional ref fields - Replace unsafe cstring_to_vstring with string() cast in pgquery.v - All changes: v test pg_query/ passes Comment deparse API: - Add tests for comment extraction (single-line, block, mixed, multiline, empty) - Add roundtrip test: parse -> extract comments -> deparse with comments preserved - Make DeparseOpts.comments field public for external access - Add section 12 to examples/parse_sql.v with comment extraction + roundtrip demo - Note: C library preserves inline comments between AST nodes but not trailing end-of-statement comments (C lib limitation) Protobuf helpers (pg_query_protobuf.v): - Add inline on write_varint_into / write_tag_into - Add write_u32_placeholder and backpatch_varint4 - Optimize write_string_into/write_string to use unsafe byte copy

… layout) Move all V source files from pg_query/ subdirectory to project root. Vendor C library headers and static archive into c/ directory. Replace C bridge layer with native V @VMODROOT/c/ paths. Update Makefile, CI, compile_flags.txt, gen_ast.v, and docs for new layout.

- Add examples/throughput.v: real-world pooler benchmark with getrusage CPU/RSS - Remove stale docs/analysis.md (findings already resolved) - Remove stale .github/workflows/ci.yml (outdated CI config) - Fix quote style in examples/bench.v for consistency

coderabbitai · 2026-05-25T12:48:42Z

📝 Walkthrough

Walkthrough

This PR eliminates a C bridge indirection layer and reorganizes the pg_query.v project to expose libpg_query C APIs directly through V FFI. Build artifacts consolidate into a c/ directory, generated AST code moves to the project root, and C struct definitions become fully visible to V rather than wrapped by helper functions. The change reduces FFI overhead and simplifies memory management.

Changes

Bridge elimination and C API reorganization

Layer / File(s)	Summary
Build system and project structure `.gitignore`, `Makefile`, `compile_flags.txt`, `v.mod`, `.github/workflows/ci.yml` (deleted)	Makefile simplifies to delegate to libpg_query and copy output to c/; test paths move from pg_query/ to root; object-file ignore patterns broaden; compile flags reference c/ and c/vendor/; module name updated from pg_query.v to pg_query; CI workflow removed.
C API header definitions `c/pg_query.h`, `c/postgres_deparse.h`, `c/vendor/protobuf-c/protobuf-c.h`, `c/vendor/xxhash/xxhash.c`	New public C headers expose the complete libpg_query and protobuf-c APIs: result structs for parsing/normalizing/splitting/deparsing, function prototypes for all operations, parse modes, memory cleanup helpers, Postgres version macros, and vendored implementations.
V C FFI bindings for direct struct access `pgquery.c.v`	C.PostgresDeparseOpts and statement/utility-result structs transition from opaque voidptr to fully visible struct definitions with typed pointers. Parse flags use hardcoded literals, version constants are exported, and bridge helper declarations are removed.
V wrapper layer FFI refactoring `pgquery.v`	deparse_opts_to_c, split_stmts_from_c, and deparse_comments_from_c now manipulate exposed C structs directly instead of calling bridge helpers. DeparseOpts fields become public. Version functions return cached strings.
AST types and protobuf encoding updates `pg_query_ast.v`, `pg_query_decode.v`, `pg_query_protobuf.v`	SelectStmt recursive pointers change to optional (?&SelectStmt). String encoding uses unsafe vbytes slices for zero-copy. New backpatch helpers enable allocation-free protobuf serialization. Decoder uses none for optional initialization.
AST and protobuf code generator `tools/gen_ast.v`	Generator emits pg_query_ast.v, pg_query_decode.v, pg_query_encode.v to project root. Recursive field types switch from &T to ?&T. Encoder is rewritten with backpatch-based "_into" variants for zero-allocation length-delimited message encoding.
Comment extraction and preservation tests `pg_query_test.v`	New tests validate deparse_comments_for_query extracts single-line, block, and mixed comments, and confirm inline comments are preserved through deparse_protobuf_opts.
Examples and documentation updates `examples/bench.v`, `examples/parse_sql.v`, `examples/throughput.v`, `README.md`, `docs/design.md`	Benchmark queries expand to heavy multi-nested SQL; parse_sql.v extends to comprehensive example covering AST traversal, roundtrips, rewrites, concurrent parsing, and comments; new throughput.v benchmarks parsing QPS and per-path latency with rusage instrumentation; README updates test commands and adds AST regeneration; docs/design.md updates performance figures.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A bridge burned, but the path remains clear,
Direct to the C API we cheer!
Structs exposed, no more veiled away,
Backpatch and inline make encoding faster today ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title comprehensively covers the main changes: zero-alloc encoder optimization, unsafe pointer cleanup, comment API tests, and C API header consolidation—all core objectives of this substantial refactoring PR.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch zero-copy

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request refactors the pg_query V wrapper by eliminating the C bridge layer and moving C headers and vendor code into a dedicated c/ directory. The AST generator was overhauled to produce V-native protobuf encoding logic using a zero-allocation pattern with length backpatching, and self-referencing nodes now use optional pointers. Performance benchmarks were updated to show improvements from these changes, and a new throughput tool was introduced. Feedback was provided regarding the hardcoded pointer size in memory allocations, suggesting the use of sizeof(voidptr) for better portability across architectures.

gemini-code-assist · 2026-05-25T12:50:04Z

-				comment.newlines_before_comment, comment.newlines_after_comment, comment.str.str)
+		unsafe {
+			n := opts.comments.len
+			raw := C.calloc(usize(n), usize(8))


The pointer size is hardcoded to 8 bytes, which assumes a 64-bit architecture. For better portability across different architectures (e.g., 32-bit systems), use sizeof(voidptr) instead.

raw := C.calloc(usize(n), usize(sizeof(voidptr)))

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (1)

examples/bench.v (1)

203-210: ⚡ Quick win

Deparse benchmark includes parse time but label says encode+deparse.

This loop benchmarks parse_protobuf_ast + deparse_ast, not just encode+deparse. Either relabel this row or pre-parse outside the timed block to isolate deparse latency.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/bench.v` around lines 203 - 210, The benchmark currently measures
parse_protobuf_ast + deparse_ast inside the timed loop (see the loop calling
pg_query.parse_protobuf_ast(q) and pg_query.deparse_ast(...)), but the printed
label says "encode+deparse"; to fix, either change the printed label to reflect
"parse+deparse" or move the parse_protobuf_ast(q) call outside the timing block
(pre-parse into a slice/array of ASTs before starting start/elapsed) so the
timed section only calls deparse_ast (the variable s and the deparse_ast
invocation identify the code to change).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/design.md`:
- Line 72: The bullet for PostgresDeparseOpts is inconsistent: replace the
“opaque voidptr” phrasing with wording that reflects the current direct C-struct
exposure model and state that the bridge layer has been removed; update the line
referencing `PostgresDeparseOpts` to say it is exposed as a concrete C struct in
the ABI (not an opaque void*) and note that the bridge layer has been eliminated
accordingly.
- Around line 23-31: Add a language tag to the fenced benchmark block in the
docs/design.md snippet (the triple-backtick block that lists normalize(),
parse(), fingerprint(), parse_protobuf_ast(), parse_json_ast(),
encode_parse_result(), deparse_ast() timings) so the fence is like ```text (or
another appropriate language) to satisfy MD040; update the opening backticks
only, leaving the contents unchanged.

In `@examples/throughput.v`:
- Around line 7-10: The code uses a hardcoded ru_maxrss_offset constant and
reads struct rusage by byte offsets which is non-portable and misreports memory
because Linux reports ru_maxrss in KiB while macOS reports bytes; replace the
byte-offset approach with direct named field access to struct rusage.ru_maxrss
(wherever ru_maxrss_offset is used and in the functions that read/process rusage
values) and normalize units per platform—e.g., detect OS (Linux vs macOS) or use
conditional compilation/build tags to convert Linux ru_maxrss * 1024 to bytes
(or consistently format as KiB) so the reported value is correct across
platforms; remove ru_maxrss_offset and any offset-based reads and update the
formatting code that prints the value accordingly.

In `@pg_query_protobuf.v`:
- Around line 413-419: backpatch_varint4 currently encodes length without
validating preconditions; add guards at the start of backpatch_varint4 to assert
length < (1 << 28) and that pos is non-negative and buf has at least pos+4 bytes
(buf.len >= pos + 4), and fail fast (panic/assert/return error consistent with
project conventions) with a clear message if any invariant is violated so
oversized payloads or out‑of‑bounds writes cannot silently produce invalid
protobuf.

In `@pgquery.v`:
- Line 584: The allocation uses a hardcoded 8-byte element size (raw :=
C.calloc(usize(n), usize(8))) which assumes 64-bit pointers and breaks 32-bit
portability; change the calloc element size to use the pointer size instead
(e.g., C.calloc(usize(n), usize(sizeof(voidptr))) or C.calloc(usize(n),
usize(C.sizeof_voidptr))) so the allocation matches the pointer arithmetic used
later, and verify any pointer-indexing/cleanup that uses sizeof(voidptr) still
matches this allocation.

In `@README.md`:
- Line 207: Update the README reference to the static archive path: replace
occurrences of "libpg_query.a" with the new build output path "c/libpg_query.a"
(the string literal "libpg_query.a" in the sentence describing the bundled
library) so the documentation matches the updated C layout and build artifacts.

---

Nitpick comments:
In `@examples/bench.v`:
- Around line 203-210: The benchmark currently measures parse_protobuf_ast +
deparse_ast inside the timed loop (see the loop calling
pg_query.parse_protobuf_ast(q) and pg_query.deparse_ast(...)), but the printed
label says "encode+deparse"; to fix, either change the printed label to reflect
"parse+deparse" or move the parse_protobuf_ast(q) call outside the timing block
(pre-parse into a slice/array of ASTs before starting start/elapsed) so the
timed section only calls deparse_ast (the variable s and the deparse_ast
invocation identify the code to change).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 23cb0452-eaac-465e-aa11-e5492bc97b8b

📥 Commits

Reviewing files that changed from the base of the PR and between 23319af and ff5a0fe.

📒 Files selected for processing (31)

.github/workflows/ci.yml
.gitignore
Makefile
README.md
c/libpg_query.a
c/pg_query.h
c/postgres_deparse.h
c/vendor/protobuf-c/protobuf-c.c
c/vendor/protobuf-c/protobuf-c.h
c/vendor/xxhash/xxhash.c
c/vendor/xxhash/xxhash.h
compile_flags.txt
docs/analysis.md
docs/design.md
examples/bench.v
examples/parse_sql.v
examples/throughput.v
pg_query/c_bridge.c
pg_query/c_bridge.h
pg_query/compile_flags.txt
pg_query/pg_query_encode.v
pg_query_ast.v
pg_query_decode.v
pg_query_encode.v
pg_query_protobuf.v
pg_query_test.v
pgquery.c.v
pgquery.v
tools/gen_ast.v
v-and-c-integration.md
v.mod

💤 Files with no reviewable changes (6)

.github/workflows/ci.yml
docs/analysis.md
pg_query/compile_flags.txt
v-and-c-integration.md
pg_query/c_bridge.h
pg_query/c_bridge.c

coderabbitai · 2026-05-25T12:55:45Z

 ```
-normalize() → anonymized SQL               7.5 us/op
-parse() → JSON string                     13.1 us/op
-fingerprint() → hash                      16.4 us/op
-parse_protobuf_ast() → typed AST          62.3 us/op
-parse_json_ast() → typed AST             166.4 us/op
+normalize() → anonymized SQL               2.6 us/op
+parse() → JSON string                      3.2 us/op
+fingerprint() → hash                       5.6 us/op
+parse_protobuf_ast() → typed AST          18.6 us/op
+parse_json_ast() → typed AST              53.7 us/op
 encode_parse_result() → protobuf         111.0 us/op
 deparse_ast() → SQL (encode+deparse)     159.7 us/op
 ```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a language tag to the fenced benchmark block.

This block is missing a fence language and triggers MD040.

💡 Proposed fix

-``` +```text normalize() → anonymized SQL 2.6 us/op ... deparse_ast() → SQL (encode+deparse) 159.7 us/op

</details>  <details> <summary>📝 Committable suggestion</summary> > ‼️ **IMPORTANT** > Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements. ```suggestion

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 23-23: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/design.md` around lines 23 - 31, Add a language tag to the fenced benchmark block in the docs/design.md snippet (the triple-backtick block that lists normalize(), parse(), fingerprint(), parse_protobuf_ast(), parse_json_ast(), encode_parse_result(), deparse_ast() timings) so the fence is like ```text (or another appropriate language) to satisfy MD040; update the opening backticks only, leaving the contents unchanged.

coderabbitai · 2026-05-25T12:55:45Z

 - [x] Packed encoding for repeated scalar/enum fields
 - [x] `parse_ast()` deprecated in favor of `parse_protobuf_ast()` (~3× faster)
- [x] `PostgresDeparseOpts` ABI decoupled (opaque voidptr via C bridge setters)
+- [x] `PostgresDeparseOpts` ABI decoupled (opaque voidptr, bridge layer eliminated)


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

PostgresDeparseOpts wording is inconsistent with bridge-elimination state.

This still describes an “opaque voidptr” ABI while also saying bridge layer is eliminated. Please align this bullet with the current direct C-struct exposure model.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/design.md` at line 72, The bullet for PostgresDeparseOpts is inconsistent: replace the “opaque voidptr” phrasing with wording that reflects the current direct C-struct exposure model and state that the bridge layer has been removed; update the line referencing `PostgresDeparseOpts` to say it is exposed as a concrete C struct in the ABI (not an opaque void*) and note that the bridge layer has been eliminated accordingly.

coderabbitai · 2026-05-25T12:55:46Z

+// ru_maxrss offset within struct rusage on macOS arm64.
+// timeval = 16 bytes × 2 (ru_utime, ru_stime) = 32 bytes.
+const ru_maxrss_offset = 32
+


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

On Linux and macOS, what are the units of ru_maxrssfromgetrusage, and is reading struct rusage via hardcoded byte offsets considered portable?

💡 Result:

Units of ru_maxrss from getrusage on Linux and macOS - Linux: ru_maxrss is “the maximum resident set size used (in kilobytes)” (since Linux 2.6.32). [1][2] - macOS: ru_maxrss is “the maximum resident set size utilized (in bytes).” [3] So, Linux reports ru_maxrss in KiB, while macOS reports it in bytes. 2) Is reading struct rusage via hardcoded byte offsets portable? No—hardcoding byte offsets into struct rusage is not portable across platforms, architectures, ABIs, compiler/library versions, or even kernel/CRT variations. Reasons (grounded in what the sources show): - The struct rusage layout and field representations are not something you can assume is stable: even on Linux/glibc, ru_maxrss and similar fields may be represented using unions/ABI-specific types to remain kernel-compatible. [4] - The documented unit/meaning of ru_maxrss differs between Linux and macOS (KiB vs bytes), demonstrating that the same field name does not guarantee uniform interpretation across systems. [1][3] Portable approach: - Always use the named field access provided by your platform headers (e.g., rusage.ru_maxrss in C), and avoid byte-offset “struct scraping.” - If you must serialize/transfer across systems, define your own normalized format (including explicit unit conversion based on platform).

Citations:

1: https://man.archlinux.org/man/getrusage.2.en.raw

2: https://man7.org/linux/man-pages/man2/getrusage.2.html

3: https://github.com/apple/darwin-xnu/blob/master/bsd/man/man2/getrusage.2

4: https://codebrowser.dev/glibc/glibc/resource/bits/types/struct_rusage.h.html

Fix ru_maxrss portability/accuracy in examples/throughput.v

ru_maxrss units differ: Linux reports it in KiB, macOS in bytes—formatting as bytes will misreport values on at least one platform (lines 7-10; also 87-95, 155-156).

Reading struct rusage via hardcoded byte offsets is ABI-fragile and not portable—use named struct field access and normalize units per platform.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/throughput.v` around lines 7 - 10, The code uses a hardcoded ru_maxrss_offset constant and reads struct rusage by byte offsets which is non-portable and misreports memory because Linux reports ru_maxrss in KiB while macOS reports bytes; replace the byte-offset approach with direct named field access to struct rusage.ru_maxrss (wherever ru_maxrss_offset is used and in the functions that read/process rusage values) and normalize units per platform—e.g., detect OS (Linux vs macOS) or use conditional compilation/build tags to convert Linux ru_maxrss * 1024 to bytes (or consistently format as KiB) so the reported value is correct across platforms; remove ru_maxrss_offset and any offset-based reads and update the formatting code that prints the value accordingly.

coderabbitai · 2026-05-25T12:55:46Z

+fn backpatch_varint4(mut buf []u8, pos int, length int) {
+	v := u32(length)
+	buf[pos]     = u8(v & 0x7F) | 0x80
+	buf[pos + 1] = u8((v >> 7) & 0x7F) | 0x80
+	buf[pos + 2] = u8((v >> 14) & 0x7F) | 0x80
+	buf[pos + 3] = u8((v >> 21) & 0x7F)
+}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard backpatch_varint4 invariants before writing.

The function documents length < 2^28 but doesn’t enforce it. Oversized payloads will be silently encoded with a truncated 4-byte varint and produce invalid protobuf.

💡 Proposed fix

fn backpatch_varint4(mut buf []u8, pos int, length int) { + if length < 0 || length >= (1 << 28) { + panic('backpatch_varint4: length out of range (${length}), expected 0 <= length < 2^28') + } + if pos < 0 || pos + 3 >= buf.len { + panic('backpatch_varint4: invalid patch position ${pos} for buffer len ${buf.len}') + } v := u32(length) buf[pos] = u8(v & 0x7F) | 0x80 buf[pos + 1] = u8((v >> 7) & 0x7F) | 0x80 buf[pos + 2] = u8((v >> 14) & 0x7F) | 0x80 buf[pos + 3] = u8((v >> 21) & 0x7F) }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

fn backpatch_varint4(mut buf []u8, pos int, length int) {

v := u32(length)

buf[pos] = u8(v & 0x7F) | 0x80

buf[pos + 1] = u8((v >> 7) & 0x7F) | 0x80

buf[pos + 2] = u8((v >> 14) & 0x7F) | 0x80

buf[pos + 3] = u8((v >> 21) & 0x7F)

}

fn backpatch_varint4(mut buf []u8, pos int, length int) {

if length < 0 || length >= (1 << 28) {

panic('backpatch_varint4: length out of range (${length}), expected 0 <= length < 2^28')

}

if pos < 0 || pos + 3 >= buf.len {

panic('backpatch_varint4: invalid patch position ${pos} for buffer len ${buf.len}')

}

v := u32(length)

buf[pos] = u8(v & 0x7F) | 0x80

buf[pos + 1] = u8((v >> 7) & 0x7F) | 0x80

buf[pos + 2] = u8((v >> 14) & 0x7F) | 0x80

buf[pos + 3] = u8((v >> 21) & 0x7F)

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pg_query_protobuf.v` around lines 413 - 419, backpatch_varint4 currently encodes length without validating preconditions; add guards at the start of backpatch_varint4 to assert length < (1 << 28) and that pos is non-negative and buf has at least pos+4 bytes (buf.len >= pos + 4), and fail fast (panic/assert/return error consistent with project conventions) with a clear message if any invariant is violated so oversized payloads or out‑of‑bounds writes cannot silently produce invalid protobuf.

coderabbitai · 2026-05-25T12:55:46Z

-				comment.newlines_before_comment, comment.newlines_after_comment, comment.str.str)
+		unsafe {
+			n := opts.comments.len
+			raw := C.calloc(usize(n), usize(8))


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Hardcoded pointer size breaks 32-bit portability.

usize(8) assumes 64-bit pointers. On 32-bit systems, this creates a mismatch: allocation uses 8-byte spacing while cleanup iterates with sizeof(voidptr) (4 bytes), causing incorrect pointer arithmetic.

Proposed fix

- raw := C.calloc(usize(n), usize(8)) + raw := C.calloc(usize(n), sizeof(voidptr))

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pgquery.v` at line 584, The allocation uses a hardcoded 8-byte element size (raw := C.calloc(usize(n), usize(8))) which assumes 64-bit pointers and breaks 32-bit portability; change the calloc element size to use the pointer size instead (e.g., C.calloc(usize(n), usize(sizeof(voidptr))) or C.calloc(usize(n), usize(C.sizeof_voidptr))) so the allocation matches the pointer arithmetic used later, and verify any pointer-indexing/cleanup that uses sizeof(voidptr) still matches this allocation.

coderabbitai · 2026-05-25T12:55:46Z

 ## How it works

-The library bundles [libpg_query](https://github.com/pganalyze/libpg_query) (version 6.2.2, wrapping PostgreSQL 17.7), pre-built as a static archive (`libpg_query.a`). The V wrapper in `pg_query/` has these layers:
+The library bundles [libpg_query](https://github.com/pganalyze/libpg_query) (version 6.2.2, wrapping PostgreSQL 17.7), pre-built as a static archive (`libpg_query.a`). The V wrapper has these layers:


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Static library path looks stale after the C layout move.

This line still references libpg_query.a; with the new structure it should point to c/libpg_query.a to match the build output.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@README.md` at line 207, Update the README reference to the static archive path: replace occurrences of "libpg_query.a" with the new build output path "c/libpg_query.a" (the string literal "libpg_query.a" in the sentence describing the bundled library) so the documentation matches the updated C layout and build artifacts.

Copilot

Pull request overview

This PR restructures the project’s C integration by introducing a unified public C header under c/, removing the previous C bridge layer, and updating the V-side bindings/build/docs accordingly. It also updates the V-native protobuf encoder generator to support a “write-into-buffer” (zero-alloc) encoding style using length backpatching, and adds comment-related tests and example coverage.

Changes:

Consolidate the public C API into c/pg_query.h (and c/postgres_deparse.h), removing the legacy pg_query/c_bridge.* bridge layer.
Update the V protobuf encoder generation to support _into (caller-buffer) variants and a backpatching technique for length-delimited fields.
Refresh build/test instructions, examples, and documentation to match the new layout; add tests for comment extraction and roundtrip.

Reviewed changes

Copilot reviewed 23 out of 31 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
v.mod	Renames the module to `pg_query`.
v-and-c-integration.md	Removes now-redundant V/C interop documentation.
tools/gen_ast.v	Updates generator outputs and introduces `_into` encoder generation + optional self-ref changes.
README.md	Updates build/test instructions and removes references to the old bridge object.
pgquery.v	Removes bridge calls, updates deparse opts handling/freeing, and adjusts split/comments handling.
pgquery.c.v	Switches include/lib paths to `c/`, updates C type declarations, removes bridge declarations, hardcodes option/version constants.
pg_query/compile_flags.txt	Removes old compile flags file from the previous layout.
pg_query/c_bridge.h	Deletes the legacy bridge header.
pg_query/c_bridge.c	Deletes the legacy bridge implementation.
pg_query_test.v	Adds tests for comment extraction and comment-preserving deparse roundtrip.
pg_query_protobuf.v	Optimizes string writes and adds placeholder/backpatch helpers for encoder.
pg_query_decode.v	Updates generated decode defaults for optional self-references (`none` vs `nil`).
pg_query_ast.v	Updates generated AST self-references to optional pointers (`?&T`) and adjusts `str()` output.
Makefile	Removes bridge build; copies `libpg_query.a` into `c/` and updates test target + dev symlink.
examples/throughput.v	Adds a throughput/microbenchmark example using `getrusage`.
examples/parse_sql.v	Expands example coverage (heavy queries, truncation, comment extraction/roundtrip).
examples/bench.v	Updates benchmark queries and output labeling for the new performance focus.
docs/design.md	Updates benchmark numbers/date and claims around bridge removal/ABI decoupling.
docs/analysis.md	Removes outdated analysis notes.
compile_flags.txt	Adds root-level compile flags pointing at `c/` and `c/vendor`.
c/vendor/xxhash/xxhash.c	Adds vendored xxHash implementation file.
c/vendor/protobuf-c/protobuf-c.h	Adds vendored protobuf-c public header.
c/postgres_deparse.h	Adds deparse options/comments public header.
c/pg_query.h	Adds the consolidated public C API header.
.gitignore	Ignores the dev symlink and broadens object file ignores.
.github/workflows/ci.yml	Removes the existing CI workflow.

Comments suppressed due to low confidence (4)

pg_query_ast.v:4210

larg/rarg are ?&SelectStmt, but str() interpolates m.larg/m.rarg directly. This likely won’t compile (or will print the option wrapper) because optionals should be unwrapped before use. Suggest if larg := m.larg { parts << "larg: ${larg}" } (same for rarg).
pgquery.v:586
C.calloc(usize(n), usize(8)) hardcodes an 8-byte pointer size for the PostgresDeparseComment** array. This is non-portable and can under/over-allocate on non-64-bit targets. Allocate with sizeof(voidptr) / sizeof(&C.PostgresDeparseComment) instead so the array element size matches the platform pointer size.
pg_query_protobuf.v:418
backpatch_varint4 documents a hard constraint (length < 2^28) but doesn’t enforce it. If a larger submessage is ever encoded, the length varint will be incorrect (and the output protobuf becomes malformed). Add a guard (assert/error) or switch to a placeholder/patching scheme that supports larger lengths (e.g., 5-byte placeholder or fallback to normal varint encoding with buffer adjustment).
pgquery.c.v:139
These parse-mode / parser-option / version constants are duplicated as hardcoded numbers/strings even though pg_query.h defines the authoritative macros/enums. This creates a maintenance hazard on PG/libpg_query bumps. Prefer deriving these from the C header (C.PG_QUERY_*, C.PG_VERSION*) or otherwise centralize them so there’s a single source of truth.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

 			out += '\tif m.${vfname}.len > 0 { parts << "${vfname}: \${m.${vfname}}" }\n'
 		} else if is_recursive {
-			out += '\tif !isnil(m.${vfname}) { parts << "${vfname}: \${m.${vfname}}" }\n'
+			out += '\tif m.${vfname} != none { parts << "${vfname}: \${m.${vfname}}" }\n'


+#include <stdbool.h>
+#include <stdint.h>
+#include <sys/types.h>
+
+#include "postgres_deparse.h"
+


+#ifndef POSTGRES_DEPARSE_H
+#define POSTGRES_DEPARSE_H
+
+#include <stdbool.h>


+- [x] `PostgresDeparseOpts` ABI decoupled (opaque voidptr, bridge layer eliminated)
 - [x] `valid_enum_int_strict()` for rejecting invalid enum values
 - [x] Self-validating version tests (no hardcoded PG version strings)


+	if preview.len > 50 { preview = preview[..50] }
+	return '${preview}... -> ${res.stmts.len} statement(s)'


+#include <sys/resource.h>
+
+// ru_maxrss offset within struct rusage on macOS arm64.
+// timeval = 16 bytes × 2 (ru_utime, ru_stime) = 32 bytes.
+const ru_maxrss_offset = 32
+
+fn C.getrusage(who int, usage voidptr) int


SOG-web added 3 commits May 16, 2026 23:27

SOG-web self-assigned this May 25, 2026

Copilot AI review requested due to automatic review settings May 25, 2026 12:48

Copilot started reviewing on behalf of SOG-web May 25, 2026 12:48 View session

gemini-code-assist Bot reviewed May 25, 2026

View reviewed changes

coderabbitai Bot reviewed May 25, 2026

View reviewed changes

Copilot AI reviewed May 25, 2026

View reviewed changes

		if preview.len > 50 { preview = preview[..50] }
		return '${preview}... -> ${res.stmts.len} statement(s)'

Conversation

SOG-web commented May 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SOG-web commented May 25, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 25, 2026 •

edited

Loading