Skip to content

Detect duplicate JSON keys in flow files#78

Open
sfc-gh-pvillard wants to merge 1 commit into
mainfrom
feature/duplicate-key-detection
Open

Detect duplicate JSON keys in flow files#78
sfc-gh-pvillard wants to merge 1 commit into
mainfrom
feature/duplicate-key-detection

Conversation

@sfc-gh-pvillard

Copy link
Copy Markdown
Collaborator

Closes #77.

Summary

  • Pre-validation before parsing: both flow files are scanned with a streaming JSON parser that has STRICT_DUPLICATE_DETECTION enabled before any POJO deserialization. A dedicated validateNoDuplicateKeys() method handles this using a separate JsonFactory so the existing parsing configuration is untouched.
  • Clear PR comment error: when a duplicate key is found, a [!CAUTION] block is posted with the file path and exact line/column, matching the style of the existing checkstyle violation block.
  • Action exits with code 1: the PR check is blocked until the file is fixed, preventing a silently-wrong diff from giving false confidence to reviewers.
  • snapshotA catch tightened: the overly broad catch (Exception e) for the "no original flow" case is narrowed to catch (IOException e), so duplicate-key errors in flowA are now reported rather than silently treated as a first-version flow.

Example PR comment on detection

### Executing Snowflake Flow Diff for flow: `submitted-changes/flows/my-flow.json`

> [!CAUTION]
> Flow file `submitted-changes/flows/my-flow.json` contains duplicate JSON keys (this typically indicates a merge conflict that was not fully resolved): Duplicate field 'flowContents'
> Line 37, column 17

Test plan

  • All 27 tests pass (mvn test)
  • testDuplicateKeyInFlowBThrowsException - getDiff() throws JsonParseException when flowB has a duplicate key
  • testDuplicateKeyInFlowAThrowsException - getDiff() throws JsonParseException when flowA has a duplicate key
  • testDuplicateKeyReturnsFailureExitCode - run() returns exit code 1 when a duplicate key is present
  • testDuplicateKeyOutputContainsCaution - PR comment output contains the CAUTION callout and the duplicate field name

Resolves #77. Before computing a diff, both flow files are now scanned
with a streaming JSON parser that has STRICT_DUPLICATE_DETECTION enabled.
If a duplicate key is found the action posts a CAUTION callout in the PR
comment with the file path and exact line/column, then exits with code 1
so the PR check is blocked until the file is fixed.

- Add validateNoDuplicateKeys() which uses a separate JsonFactory with
  STRICT_DUPLICATE_DETECTION; re-wraps JsonParseException with the file
  path for a self-contained error message.
- Call it at the start of getDiff() for both pathA (skipped when absent)
  and pathB.
- Catch JsonParseException in executeFlowDiffForOneFlow() and print the
  CAUTION block; set jsonParseError flag for run() to track.
- Track jsonParseError per flow in run() and return RETURN_FAILURE if
  any parse error occurred.
- Tighten the catch for snapshotA from Exception to IOException.
- Add test fixture flow_v9_duplicate_key.json and four new tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Detect duplicate JSON keys in flow files

1 participant