Skip to content

feat: Add ParseMetadata() function to parity BHCE ingest - BED-7790#4

Open
wes-mil wants to merge 13 commits into
mainfrom
BED-7790
Open

feat: Add ParseMetadata() function to parity BHCE ingest - BED-7790#4
wes-mil wants to merge 13 commits into
mainfrom
BED-7790

Conversation

@wes-mil
Copy link
Copy Markdown
Collaborator

@wes-mil wes-mil commented Apr 28, 2026

Description

Add a function that just parses metadata to satisfy a code path in BHCE

Motivation and Context

Resolves BED-7790

Why is this change required? What problem does it solve?

In order to replace BHCE ingest with chow, this function had to be added

How Has This Been Tested?

Tests passing in BHCE as expected!

Screenshots (optional):

Types of changes

  • New feature (non-breaking change which adds functionality)

Checklist:

  • I have met the contributing prerequisites
    • Assigned myself to this PR
    • Added the appropriate labels
    • Read the CODE_OF_CONDUCT.md and CONTRIBUTING.md
  • I have ensured that related documentation is up-to-date
    • Code comments (GoDocs)
  • I have followed proper test practices
    • Added/updated tests to cover my changes
    • All new and existing tests passed

Summary by CodeRabbit

  • New Features

    • Added chowbench command for benchmarking validation performance with configurable runs, warmups, and strict-mode exit behavior. Reports include min/max/avg timing and per-file validation status.
  • Bug Fixes

    • Tightened validation rules for node and edge attributes: property names cannot contain uppercase letters, and kind values are restricted from TAG-like patterns.
  • Documentation

    • Updated README with benchmarking workflow and usage documentation.
  • Chores

    • Updated dependencies.

@wes-mil wes-mil self-assigned this Apr 28, 2026
@wes-mil wes-mil added the enhancement New feature or request label Apr 28, 2026
@wes-mil
Copy link
Copy Markdown
Collaborator Author

wes-mil commented Jun 1, 2026

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

Warning

Review limit reached

@wes-mil, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 4 minutes and 12 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7544ae81-31fa-4c4f-b3be-e3bcb28ed7f6

📥 Commits

Reviewing files that changed from the base of the PR and between 0439fad and e09380c.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (16)
  • .gitignore
  • README.md
  • cmd/chowbench/main.go
  • cmd/chowbench/main_test.go
  • go.mod
  • main.go
  • pkg/payload/jsonschema/edge.json
  • pkg/payload/jsonschema/metadata.json
  • pkg/payload/jsonschema/node.json
  • pkg/payload/jsonschema/schema.json
  • pkg/payload/schema.go
  • pkg/payload/schema_contract_test.go
  • pkg/payload/schema_test.go
  • pkg/payload/validator.go
  • pkg/payload/validator_test.go
  • pkg/validator/validator_test.go

Walkthrough

This PR refactors payload validation infrastructure by migrating the validator package to a new payload package, enhancing JSON schemas with stricter validation rules, and introducing a benchmarking CLI tool (chowbench) for measuring validation performance across multiple files.

Changes

Payload Package Refactoring & Benchmarking

Layer / File(s) Summary
JSON Schema Strictness Enhancement
pkg/payload/jsonschema/edge.json, pkg/payload/jsonschema/node.json
Node and edge schemas gain tighter constraints: property names restricted to lowercase via propertyNames, property values must not be objects, and kind fields reject TAG-like patterns with alphanumeric/underscore restriction.
Schema Loading Refactoring
pkg/payload/schema.go
Schema loading API renamed from IngestSchema/LoadIngestSchema to Schema/LoadSchema; new loadSchemaFromFS helper abstracts filesystem reads for improved testability.
Schema Contract & Unit Tests
pkg/payload/schema_contract_test.go, pkg/payload/schema_test.go
Contract tests validate representative node, edge, and metadata JSON documents against schemas; unit tests verify LoadSchema() success and loadSchemaFromFS error handling (missing files, unmarshal failures, invalid definitions).
Validator Relocation & Extension
pkg/payload/validator.go
Validator moves to payload package with updated Schema type acceptance; new ParseMetadata() method extracts metadata-only without full validation; ValidationError implements error interface; result aggregation renamed to buildValidatedData().
Validator Test Suite
pkg/payload/validator_test.go
Comprehensive tests cover OpenGraph and legacy payload parsing, node/edge validation constraints (kind patterns, property naming, type restrictions), metadata-only parsing, configuration errors, and ValidationError.Error() string formatting.
Main CLI Migration
main.go
CLI updated to use payload.LoadSchema() and payload.NewValidator(); function signatures for outputReport, formatCriticalError, and formatValidationError accept payload-typed errors.
Benchmarking Tool: chowbench
cmd/chowbench/main.go, cmd/chowbench/main_test.go
New benchmarking command supports -runs, -warmup, and -strict flags; performs warmup and timed validation passes; collects per-file status, error counts, and timing statistics; outputs tab-separated results; includes tests for duration aggregation, status classification, and exit-error logic.
Documentation & Build Configuration
.gitignore, README.md, go.mod
README documents chowbench workflow and updates JSON Schema URL paths to pkg/payload/jsonschema/; .gitignore ignores fixtures/ directory; go.mod adds indirect test dependencies and bumps golang.org/x/text to v0.35.0.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A rabbit hops with glee,
From validator to payload it be,
Schemas now stricter, tests more bright,
And chowbench speeds up the night!
Refactored with care, the structure's quite fair. 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding a ParseMetadata() function to achieve parity with BHCE ingest, directly matching the PR's primary objective.
Description check ✅ Passed The description follows the template structure with all major sections completed: description, motivation/context with ticket reference (BED-7790), testing notes, type of change identified, and checklist marked complete.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch BED-7790

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
main.go (1)

79-108: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Surface report write failures to main().

outputReport can fail, but the caller still ignores that result. A short write or broken pipe will currently produce a partial/missing report while the process may still exit successfully. Please propagate this error back to main() and exit non-zero.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@main.go` around lines 79 - 108, The caller of outputReport currently ignores
its returned error which can hide write failures; modify the call site in main
to check the error returned by outputReport and if non-nil log or print the
error and exit non-zero (e.g., via os.Exit(1)); ensure outputReport continues to
return any write/formatting errors from
formatCriticalError/formatValidationError and w.Write calls so failures
propagate up; update main to handle that returned error path (use the existing
outputReport function name and the main function) and ensure the process exits
with a non-zero code on failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/chowbench/main.go`:
- Around line 72-73: writeResults currently has its errors (broken-pipe/flush)
ignored, so the process can exit successfully despite write failures; update
writeResults to return an error and change all call sites (e.g., the calls in
main where writeResults(w, results) is invoked and the other similar block
around the long-running result emission) to check that error and propagate it as
the command failure (either return the write error directly or combine it with
exitErrorForResults(results, strict) so any writer error wins and results in a
non-zero exit). Ensure callers handle and return the error up the stack so
write/flush failures are not dropped.
- Around line 94-103: The loop currently overwrites result fields each iteration
causing later successful runs to hide earlier failures; modify the loop that
calls validateFile and statusForValidationResult so it preserves the
worst-observed outcome across all runs: after calling
statusForValidationResult(report, err) compare the returned Status and numeric
error counts against the existing result.Status, result.CriticalErrors,
result.ValidationErrors and only update result to the new values if the new
Status is worse (or equal but with higher CriticalErrors/ValidationErrors) and
ensure result.Error is set if any iteration returned a non-nil err; keep
durations collection as-is but ensure the final result reflects the worst
observed validation, so -strict will see any failure.

In `@pkg/payload/jsonschema/node.json`:
- Around line 42-50: The "kinds" array currently allows empty arrays and empty
strings; update the JSON Schema for the "kinds" property to require at least one
non-empty kind by changing "minItems": 0 to "minItems": 1 and strengthen the
item schema (the "items" object) to disallow empty strings (e.g., add
"minLength": 1 or an equivalent non-empty pattern alongside the existing "not"
pattern) so that kinds[0] is always a usable non-empty string.

In `@pkg/payload/validator.go`:
- Around line 453-462: The new "meta" branch in ParseMetadata bypasses legacy
type validation and should preserve the original validation performed by
handleOriginalMetadata; modify the "meta" case so after decoding into
ingest.OriginalMetadata it either calls v.handleOriginalMetadata(metadata) (or
invokes the same type-check logic that raises ErrInvalidDataType) before setting
v.originalData.MetadataFound/Metadata and returning, and ensure any validation
error is returned (e.g., propagate ErrInvalidDataType) instead of silently
accepting invalid legacy meta.type values so ParseMetadata mirrors
ParseAndValidate's behavior.

---

Outside diff comments:
In `@main.go`:
- Around line 79-108: The caller of outputReport currently ignores its returned
error which can hide write failures; modify the call site in main to check the
error returned by outputReport and if non-nil log or print the error and exit
non-zero (e.g., via os.Exit(1)); ensure outputReport continues to return any
write/formatting errors from formatCriticalError/formatValidationError and
w.Write calls so failures propagate up; update main to handle that returned
error path (use the existing outputReport function name and the main function)
and ensure the process exits with a non-zero code on failure.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 834740df-1c3e-47ed-a342-e2f6f00965b6

📥 Commits

Reviewing files that changed from the base of the PR and between 0439fad and 9fef646.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (16)
  • .gitignore
  • README.md
  • cmd/chowbench/main.go
  • cmd/chowbench/main_test.go
  • go.mod
  • main.go
  • pkg/payload/jsonschema/edge.json
  • pkg/payload/jsonschema/metadata.json
  • pkg/payload/jsonschema/node.json
  • pkg/payload/jsonschema/schema.json
  • pkg/payload/schema.go
  • pkg/payload/schema_contract_test.go
  • pkg/payload/schema_test.go
  • pkg/payload/validator.go
  • pkg/payload/validator_test.go
  • pkg/validator/validator_test.go
💤 Files with no reviewable changes (1)
  • pkg/validator/validator_test.go

Comment thread cmd/chowbench/main.go Outdated
Comment thread cmd/chowbench/main.go
Comment thread pkg/payload/jsonschema/node.json
Comment thread pkg/payload/validator.go
@wes-mil
Copy link
Copy Markdown
Collaborator Author

wes-mil commented Jun 1, 2026

@coderabbitai full review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

✅ Actions performed

Full review triggered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant