feat(docs): add markdown-link-check to docs CI (#259)#300
feat(docs): add markdown-link-check to docs CI (#259)#300ClintEastman02 wants to merge 24 commits into
Conversation
Adds a broken-link checker that validates internal cross-references and external URLs across all Markdown sources at PR time, preventing silent link rot in documentation that also serves as agent context.
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #300 +/- ##
=======================================
Coverage ? 88.16%
=======================================
Files ? 201
Lines ? 48868
Branches ? 4444
=======================================
Hits ? 43085
Misses ? 5783
Partials ? 0 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
The link #domain--requiresrepo was missing the underscore; the heading renders as #domain--requires_repo.
krokoko
left a comment
There was a problem hiding this comment.
Thanks for picking up CA-12 — this is exactly the kind of cheap fitness function the docs need, and the WORKFLOWS.md anchor fix already proves the gate has real targets. A few findings beyond the existing comments (the suggestions to run the CI step through mise and fold it into drift-prevention would also resolve the command duplication and the concurrency concern, so I've left those out):
1. The pipeline fails open — the gate can pass while checking nothing (docs/mise.toml:31, docs/package.json:12)
find … | xargs markdown-link-check runs without pipefail, so the pipeline's exit code is xargs's alone. If find errors (e.g. a docs directory is renamed) the job still goes green — reproducible with sh -c "find /nonexistent | xargs echo"; echo $? → 0. Suggested hardening:
- run under
bash -o pipefail(orset -euo pipefailin the script), - use
find … -print0 | xargs -0 -r(NUL-safe, and-rskips the run on empty input instead of invoking the checker with no args), - assert a minimum file count before checking (corpus is ~46 today) so a shrunk scan fails loudly rather than silently passing.
2. The aws-cdk bootstrap ignore pattern hides a genuinely dead link (docs/.markdown-link-check.json:9)
The ignored URL is used in docs/decisions/ADR-002-least-privilege-bootstrap-policies.md:71 and currently returns 404 (the bootstrap code moved within the aws-cdk repo). This is the exact rot class the gate exists to catch, suppressed on day one. Suggest repointing ADR-002 to a SHA-pinned permalink (or the published CDK docs) and deleting the ignore entry — allowlist entries are best reserved for categorically-unverifiable URLs.
3. aliveStatusCodes treats 301/302/307/308 as alive (docs/.markdown-link-check.json:23)
A permanently-moved link whose destination 404s will still pass. Suggest dropping the 3xx codes and letting the checker follow redirects to validate the terminal status.
4. Blanket npmjs.com ignore (docs/.markdown-link-check.json:6)
A typo'd package URL passes silently, and retryOn429 already covers the rate-limiting that motivates this. Worth narrowing or removing.
5. -maxdepth 1 will silently drop future subdirectories (docs/mise.toml:31, docs/package.json:12)
guides/, design/, decisions/ are flat today so coverage is currently complete, but the first nested folder vanishes from the check with no signal. Either drop -maxdepth 1 for the named dirs (keep it only on ..), or rely on the min-file-count assertion from point 1. Related: docs/README.md is never scanned (no . start point).
6. The failure path was never demonstrated
The test plan validates only the green path. Since the gate's entire value is in failing, it would be great to link a throwaway commit with a deliberately broken link showing the job go red — that also empirically confirms point 1's exit-code handling.
Nit: the ^link$ ignore is legitimate (it masks the intentional placeholder in ADR-004's example prose) but opaque — a one-line note in the config or Developer Guide would save a future maintainer the archaeology.
One consideration for the drift-prevention suggestion: external-URL checking is non-hermetic (network flakiness, transient 5xx), so folding it into the main build puts that flakiness on every build's critical path. Splitting internal-link checking (hermetic, belongs in build/drift-prevention) from external-URL checking (scheduled or PR-only) might be the sweet spot.
Happy to discuss any of these — points 1 and 2 are the ones I'd consider blocking.
|
Thanks for the thorough review @krokoko — addressed all points: Inline comments (drift-prevention integration):
1. [BLOCKING] Pipeline fails open:
2. [BLOCKING] aws-cdk dead link:
3. aliveStatusCodes:
4. Blanket npmjs.com ignore:
5. -maxdepth 1:
6. Failure path demonstration:
7. Nit (^link$ opaque):
Architecture (internal vs external split):
|
…prevention Remove dedicated link-check CI job from docs.yml and wire it into the existing drift-prevention task so it runs as part of every build. Harden the script with pipefail, temp-file approach, min-file-count assertion, and xargs -r. Fix dead aws-cdk link in ADR-002, narrow npmjs ignore to package paths only, drop 3xx from aliveStatusCodes, and document the ^link$ ignore pattern.
|
Thanks @ClintEastman02 — nice work on this, and thanks for the thorough follow-up on the earlier review. I verified the three fail-open fixes empirically (the Verdict: approve with one small required fix. Required before merge
Non-blocking suggestions
Verified accurate along the way: both |
The check runs on every PR via the build's drift-prevention step, not only on PRs that touch documentation.
|
Fixed — updated the developer guide sentence to: "The same check runs automatically in CI on every pull request, as part of the build's drift-prevention step." Starlight mirror regenerated. |
- Remove dead replacementPatterns block ({{BASEURL}} was never defined)
- Use find -print0 + xargs -0 for space-safe path handling
- Add scope comment to link-check task documenting what is/isn't scanned
- Broaden drift-prevention description to cover documentation links
|
Also addressed the non-blocking suggestions:
|
Extract link-check logic into docs/scripts/link-check.sh and have both package.json and mise.toml delegate to it, removing the circular mise dependency from the npm script.
isadeks
left a comment
There was a problem hiding this comment.
Solid, useful addition — the script is careful (set -euo pipefail, mktemp+trap
cleanup, a ≥10-file sanity floor), the anchor fix is real (I confirmed ###
Domain & \requires_reposlugifies todomain--requires_repo; the old #domain--requiresrepogenuinely 404s), and I verified empirically that the tool exits nonzero on broken links in multi-file mode and that failure propagates throughxargsunderpipefail. No correctness bugs in the script itself.
The findings are about integration design and over-claimed coverage, ranked by
impact:
Worth addressing before merge:
- Every-PR blocking (mise.toml:104) — wiring this into drift-prevention (a
hard dep of mise run build, which runs on all PRs with no paths: filter) means
live external-URL checks gate every PR, not "docs PRs" as the body says. This
is the one most likely to cause unrelated red checks from transient
network/bot-block flakiness. Consider a paths:-filtered docs job (matching the
body's intent) rather than the universal build path, or restricting CI to
internal links. - aliveStatusCodes:[200,206] (config:19) — this overrides the default alive
set; I confirmed a 301 endpoint is marked dead. Legitimate
redirect/permanent-move links will fail the build.
Coverage is narrower than the PR claims (the "validates internal
cross-references" line):
3. Cross-file #anchors are not validated — only the file is (verified).
Same-file anchors are.
4. The ^/sample-autonomous-cloud-coding-agents/ ignore disables checking for
all site-absolute internal links.
5. Hard-coded find guides design decisions silently omits docs/abca-plugin/
and any future subtree.
Nit:
6. ^link$ ignore is broader than the one ADR-004 placeholder it's meant for.
None are crashes; the highest-value fixes are #1 (scope the trigger so it
doesn't gate every PR) and #3/#4 (either tighten the coverage claim in the
docs or acknowledge the anchor/internal-link gap). Want me to post these as
inline PR comments, or draft a single summary comment in the blockers/nits
format?
The doc lives in draft PR #364 and exists on neither this branch nor main, so the three links (disable-bundling.ts, AGENTS.md, review_pr.md) are dead. Drop them and keep the stable #366 reference, removing the merge-order dependency on #364 and the link-checker (#300) risk. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
External http(s) URLs are now ignored so network/bot-block flakiness cannot fail an unrelated PR, since the gate runs on every build via drift-prevention. Removes aliveStatusCodes and the now-dead external retry/header options, and updates the Developer Guide to match.
… synth) (aws-samples#366) (aws-samples#371) * perf(test): disable Lambda bundling in CDK unit-test synths (aws-samples#366) Template.fromStack() triggers a full CDK synth that bundles every NodejsFunction via esbuild — ~28s for the 413-resource AgentStack. Unit tests assert on CloudFormation structure, not bundled Lambda code, so the bundling is pure overhead. A jest setupFiles hook sets aws:cdk:bundling-stacks: [] via CDK_CONTEXT_JSON, which a bare new App() picks up with no call-site changes. Measured: single AgentStack synth 28.7s -> 1.9s (~15x); github-tags.test.ts ~135s -> 9s; full //cdk:test 155s -> 25s (8-core local). All 2177 tests pass; coverage thresholds unchanged (lines 92.36 >= 91). Durable regression guards added: AGENTS.md "Common mistakes" entry and a .abca/commands/review_pr.md checklist item — do not re-enable bundling in tests unless asserting on a bundled asset, and minimize full-stack synths (synth once in beforeAll). Both link docs/design/CI_BUILD_PERFORMANCE.md (PR aws-samples#364). Refs aws-samples#366. Part of aws-samples#363. Likely supersedes the sharding approach (aws-samples#365) — to be re-evaluated once this lands. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * docs(test): correct bundling opt-out to postCliContext (aws-samples#366) The documented per-test opt-out used constructor `context`, which CDK silently overwrites with CDK_CONTEXT_JSON in App.loadContext — so `new App({ context: { 'aws:cdk:bundling-stacks': ['**'] } })` does NOT re-enable bundling. Only `postCliContext` (applied last) overrides the global disable. Corrects the opt-out instruction and precedence comment in all three documented locations (disable-bundling.ts, AGENTS.md, review_pr.md). Verified empirically against aws-cdk-lib@2.257.0. Comment/doc-only; runtime logic unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(test): clarify bundling opt-out is for asset output, not synth (aws-samples#366) Disabling bundling does not stop tests from synthesizing — `Template.fromStack()` still runs a full synth; it only skips the esbuild asset-bundling step. The `postCliContext` opt-out exists solely for the rare test that needs real bundled-asset output (asset hash / S3 key), where an unbundled synth would silently yield a placeholder. Makes this explicit in disable-bundling.ts and AGENTS.md. Doc-only. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(test): drop dead CI_BUILD_PERFORMANCE.md links (aws-samples#366) The doc lives in draft PR aws-samples#364 and exists on neither this branch nor main, so the three links (disable-bundling.ts, AGENTS.md, review_pr.md) are dead. Drop them and keep the stable aws-samples#366 reference, removing the merge-order dependency on aws-samples#364 and the link-checker (aws-samples#300) risk. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(test): harden bundling-disable against ambient CDK_CONTEXT_JSON (aws-samples#366) Two robustness fixes to the global bundling-disable setup file, both from PR aws-samples#371 review (finding 1 hardening nits): - Spread BUNDLING_DISABLED_CONTEXT LAST when merging a pre-set CDK_CONTEXT_JSON, so an ambient `aws:cdk:bundling-stacks` value can no longer silently win and re-enable bundling. This is safe precisely because the documented per-test opt-out is `postCliContext` (applied after the env var regardless), so the escape hatch still works. - Guard `JSON.parse` of the pre-set var with try/catch. A malformed value previously threw inside setupFiles and failed every test in every file with an opaque parse error; now the blame is localized to the offending value. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(cdk): add executable guard that Lambda bundling is disabled (aws-samples#366) PR aws-samples#371 review finding 1: the ~15× synth speedup rested entirely on the `disable-bundling.ts` setupFiles entry, but the only regression guards were honor-system prose (AGENTS.md) and a review_pr.md checklist item. Dropping the setupFiles wiring, reordering Jest setup, or a CDK rename of the context key would silently revert the suite to full-bundling synths with no failing check. Adds two assertions: the synth context carries `aws:cdk:bundling-stacks: []`, and a bare-App stack reports `bundlingRequired === false`. Verified empirically against aws-cdk-lib@2.259.0 that without the disable a bare App yields `bundlingRequired === true`, so these discriminate a real regression. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * test(cdk): cover the postCliContext bundling opt-out precedence (aws-samples#366) PR aws-samples#371 review finding 2: the documented per-test opt-out (`postCliContext`) and the `loadContext` precedence this PR corrected had zero test coverage, so a future CDK version reordering context application could silently break the one supported escape hatch. Locks the contract with two assertions verified against aws-cdk-lib@2.259.0: `postCliContext: { 'aws:cdk:bundling-stacks': ['**'] }` re-enables bundling (`bundlingRequired === true`), while constructor `context` does NOT (the env var set by the global disable clobbers it). Uses a bare `Stack` rather than a heavy AgentStack since the precedence is a property of CDK context loading, not of any particular stack. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(test): reframe bundling as overhead-plus-smoke-test, not "pure overhead" (aws-samples#366) PR aws-samples#371 review finding 3: "pure overhead" undersold the tradeoff. esbuild bundling also validates that every Lambda `entry` resolves and the handler compiles, so disabling it in unit tests means a broken entry path or handler TS error stops surfacing at `Template.fromStack()` and only fails at real synth/deploy or in the `agent/` build. The tradeoff is still acceptable — real synth/deploy bundles, and handler code has its own compile/test under `agent/` — but the wording now names what unit tests cede. Updated in both places the phrase lived: the disable-bundling.ts header and AGENTS.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: bgagent <345885+scottschreckengaust@users.noreply.github.com> Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Summary
markdown-link-checkas a devDependency in the docs packagelink-checkCI job runs on pull requests touching documentation, validating internal cross-references and external URLs across source Markdown (docs/guides/,docs/design/,docs/decisions/, root.mdfiles).markdown-link-check.json) handles known-flaky external hosts (npmjs bot-blocking, rate-limited APIs) and intentional patterns (Starlight site paths, placeholder text)mise //docs:link-checktask for local runsdocs/design/WORKFLOWS.md(caught by the new checker)Closes #259.
Test plan