Fix safe-outputs YAML corruption when OTLP header masking is enabled#37068
Conversation
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes a workflow YAML corruption issue in the consolidated safe_outputs job when both OTLP header masking steps and GitHub App token minting are enabled. The corruption was caused by inserting the app-token step at a slice index that could fall inside another step’s multi-line block, producing invalid indentation (e.g., run: content ending up under a with: block).
Changes:
- Adjusts the GitHub App token insertion index calculation to account for OTLP header/attribute masking step injection.
- Adds a step-boundary realignment guard to ensure insertion occurs at the next canonical step start line.
- Adds a regression test covering the GitHub App + OTLP headers composition and asserting the
setup-agent-output-envrun: |block remains intact.
Show a summary per file
| File | Description |
|---|---|
pkg/workflow/compiler_safe_outputs_job.go |
Updates app-token insertion index logic to include OTLP mask steps and realigns insertion to step boundaries to prevent YAML block fragmentation. |
pkg/workflow/compiler_safe_outputs_job_test.go |
Adds a focused regression test that reproduces the OTLP-headers masking + GitHub App insertion scenario and checks for the prior YAML corruption pattern. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 2/2 changed files
- Comments generated: 0
|
|
|
|
|
✅ PR Code Quality Reviewer completed the code quality review. |
|
🧪 Test Quality Sentinel completed test quality analysis. |
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. No ADR enforcement needed: PR #37068 does not have the 'implementation' label (has_implementation_label=false) and has 63 new lines of code in business logic directories, which is at or below the 100-line threshold (requires_adr_by_default_volume=false). |
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /diagnose and /tdd — commenting with non-blocking suggestions to improve test coverage and make the index arithmetic more resilient.
📋 Key Themes & Highlights
Key Themes
- Test coverage gaps: The regression test covers
OTEL_EXPORTER_OTLP_HEADERSbut notGH_AW_OTLP_ATTRIBUTES, and no test exercises both simultaneously. - Fragile offset arithmetic:
insertIndex++hardcodes a step-count assumption that the while-loop guard compensates for but does not remove. - Silent failure mode: The boundary-scan loop has no log or error when it exhausts the slice, making a miscalculation invisible in production.
Positive Highlights
- ✅ Two-layer defence: explicit offset increments + a while-loop boundary guard is a well-structured approach — the guard provides a real safety net even when the arithmetic drifts.
- ✅ Focused regression test:
TestGitHubAppTokenStepWithOTLPHeaderstargets the exact corruption signature (with:\n echo ...) rather than a generic golden-file comparison. Easy to understand and maintain. - ✅
stepNameLinePrefixas a named constant: elevating the sentinel to a package constant with a doc-comment is the right call; it makes the intent clear and is a single point of change. - ✅ Clear PR description: root cause, fix strategy, and regression coverage are all explained concisely. Great template for future compiler bug fixes.
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · sonnet46 1.7M · 245.9 AIC
| workflowData := &WorkflowData{ | ||
| Name: "Test Workflow", | ||
| Env: `"env": | ||
| OTEL_EXPORTER_OTLP_HEADERS: "Authorization=******"`, |
There was a problem hiding this comment.
[/tdd] The regression test only exercises the isOTLPHeadersPresent path (OTEL_EXPORTER_OTLP_HEADERS). The isOTLPAttributesPresent path (GH_AW_OTLP_ATTRIBUTES) has no coverage — but the fix also increments insertIndex for it.
💡 Suggested addition
Add a second test (or extend this one) with GH_AW_OTLP_ATTRIBUTES set in Env, repeating the same ordering assertions. Then add a third scenario with both env vars set to confirm insertIndex is incremented twice correctly.
Env: `"env":
OTEL_EXPORTER_OTLP_HEADERS: "Authorization=******"
GH_AW_OTLP_ATTRIBUTES: "{\"service.name\":\"test\"}"`Without this, a regression in the attributes path would go undetected.
| assert.Less(t, setupEnvEchoIndex, tokenIndex, "app token step must be inserted after setup-agent-output-env run block") | ||
| // Regression guard: app token insertion must not split setup-agent-output-env and leave | ||
| // an `echo ...` run line nested under a previous `with:` block. | ||
| assert.NotContains(t, stepsContent, "with:\n echo \"GH_AW_AGENT_OUTPUT=/tmp/gh-aw/agent_output.json\" >> \"$GITHUB_OUTPUT\"") |
There was a problem hiding this comment.
[/tdd] No test exercises the combined scenario where both OTEL_EXPORTER_OTLP_HEADERS and GH_AW_OTLP_ATTRIBUTES are present at the same time. That is the highest-risk composition — insertIndex is incremented twice — yet it has no regression guard.
💡 Why this matters
If one of the two insertIndex++ increments were accidentally removed or a mask step were to emit more than one entry, the combined path would silently corrupt YAML in a way that the single-variable tests would not detect. A dedicated combined test would catch that class of regression immediately.
| if isOTLPHeadersPresent(data) { | ||
| insertIndex++ | ||
| } | ||
| if isOTLPAttributesPresent(data) { | ||
| insertIndex++ | ||
| } |
There was a problem hiding this comment.
[/diagnose] The insertIndex++ increments are load-bearing: each assumes the corresponding mask helper appends exactly one string entry to the steps slice. The comment documents this ("Each mask helper currently appends one string entry"), but if generateOTLPHeadersMaskStep() or generateOTLPAttributesMaskStep() ever grow to emit two entries (e.g. an added env-echo step), the arithmetic silently undercounts and the while-loop guard must compensate.
💡 More robust alternative
The offset could be derived from the actual number of entries appended, keeping the arithmetic self-consistent regardless of how many lines each helper emits:
if isOTLPHeadersPresent(data) {
mask := generateOTLPHeadersMaskStep()
// len == 1 today, but this survives future growth
insertIndex += strings.Count(mask, " - name: ")
}Or expose a step-count constant alongside each generator. Either approach removes the silent undercounting risk.
| for insertIndex < len(steps) && !strings.HasPrefix(steps[insertIndex], stepNameLinePrefix) { | ||
| insertIndex++ | ||
| } |
There was a problem hiding this comment.
[/diagnose] If the loop exhausts steps without finding a stepNameLinePrefix line, insertIndex equals len(steps) and the app-token block is silently appended at the very end — after checkout and safe-output steps that reference steps.safe-outputs-app-token.outputs.token. The YAML would be structurally valid but semantically broken, and there is no log message or error to surface it.
💡 Suggested guard
A short diagnostic log after the loop would make this case visible:
for insertIndex < len(steps) && !strings.HasPrefix(steps[insertIndex], stepNameLinePrefix) {
insertIndex++
}
if insertIndex == len(steps) {
consolidatedSafeOutputsJobLog.Printf(
"WARN: app-token insertion reached end of steps slice (len=%d); step ordering may be incorrect",
len(steps),
)
}This is a rare edge case today, but becomes more likely if future step sequences are ever added that push the final checkout step past the end of the current emission order.
🧪 Test Quality Sentinel Report✅ Test Quality Score: 80/100 — Excellent
📊 Metrics & Test Classification (1 test analyzed)
Test Classification Details
Language SupportTests analyzed:
|
There was a problem hiding this comment.
Test Quality Sentinel: 80/100. Test quality is acceptable. 0% of new tests are implementation tests (threshold: 30%). The new test TestGitHubAppTokenStepWithOTLPHeaders is a well-structured regression guard for an observable compiler-output contract. Minor note: the final assert.NotContains call is missing a descriptive message argument (see comment above).
|
|
|
Please refresh the branch, rerun checks, and summarize any remaining blockers.
|
|
@copilot apply copilot-revie skill |
…ounting Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Applied the copilot-review skill and addressed all four open review threads. Changes in the latest commit:
|
observability.otlp.headerscould produce invalid workflow YAML insafe_outputswhen GitHub App token minting is also present. The app-token step insertion could land inside another step's multi-line block, causingrun:content to be emitted atwith:indentation.Problem scope
safe_outputs.GH_AW_OTLP_ATTRIBUTES) similarly shifts the insertion point.Compiler changes
safe_outputsapp-token insertion index calculation to account for OTLP mask step injection, deriving the offset dynamically from the actual helper output (strings.Count) so the arithmetic stays self-consistent if either helper grows to emit more than one step.- name:boundary.Regression coverage
safe-outputs.github-app+ OTLP headers path (OTEL_EXPORTER_OTLP_HEADERS).GH_AW_OTLP_ATTRIBUTES-only path.insertIndexincremented twice).runblock remains intact and is not relocated under awith:block.