Unify weekly + per-PR security scanning into a single workflow#1460
Unify weekly + per-PR security scanning into a single workflow#1460vikrantpuppala wants to merge 5 commits into
Conversation
Replaces vulnerabilityCatcher.yml (weekly-only) with securityScan.yml, which runs two scanners with complementary coverage across two trigger modes: * On every pull_request to main (pr-scan job): fails the job on any unsuppressed CVSS >= 7 finding. Not yet required-to-merge; once the outstanding libthrift and bouncycastle findings are cleared, flip the check to required. * On the existing weekly schedule + workflow_dispatch (weekly-scan job): emails on findings. Backstops the PR gate by catching CVEs newly filed against unchanged code. Fixes the broken notification chain -- previously the OWASP maven step exited non-zero on findings, which short-circuited the Send Email step so notifications only fired when the scan was clean (i.e. never when they mattered). Scanner stack: * OWASP dependency-check (NVD CPE-based). Existing setup; reuses owasp-suppressions.xml and the failBuildOnCVSS=7 threshold from jdbc-core/pom.xml. * OSV-Scanner v2.3.8 (purl-based via OSV.dev; federates GHSA/NVD/PyPA/RustSec). Catches advisories with no NVD CPE entry that OWASP misses. Verified to catch CVE-2025-66566 in at.yawk.lz4:lz4-java and CVE-2026-5598 (severity 8.9) in bouncycastle, both invisible to OWASP. Reads the cyclonedx aggregate SBOM produced by `mvn package` so it sees the actually-resolved local dependency tree (not deps.dev's stale published-artifact metadata for the project's own GA). Adds: * `.github/workflows/securityScan.yml` -- the new unified workflow. * `osv-scanner.toml` -- OSV suppressions mirroring the OWASP ones. * Eight new entries in `owasp-suppressions.xml` for documented CPE/ecosystem false positives: Arrow R-only (CVE-2024-52338), gRPC-Go-only (CVE-2026-33186), protobuf Python-only (CVE-2026-0994), and five libthrift non-Java-binding CVEs (C_glib/Go/Node/Rust). Each has a justification comment block. * `cyclonedx-maven-plugin` 2.9.1 in the parent pom, bound to the package phase with skipNotDeployed=false so it runs on the non-deployed parent and jdbc-core modules. Removes: * `.github/workflows/vulnerabilityCatcher.yml` -- superseded by securityScan.yml's weekly-scan job. Local verification: * `mvn package -DskipTests -Ddependency-check.skip=true` produces `target/bom.json` with the expected resolved tree. * OWASP scan shows the 4 unsuppressed libthrift CVEs (the only real Java-applicable findings; will be cleared by the libthrift 0.23 bump follow-up). * OSV scan shows 2 unsuppressed CVSS>=7 findings: bouncycastle CVE-2026-5598 (separate small follow-up) and the same libthrift cluster. Both scanners' findings are listed as follow-up work in NEXT_CHANGELOG-adjacent tracking. This PR's own CI is expected to fail until those follow-ups land, which is fine because the gate isn't required-to-merge yet. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
The previous workflow had two parallel jobs (`pr-scan` and `weekly-scan`)
that duplicated the JFrog OIDC + maven config + OWASP setup. Restructure
into a single `security-scan` job that runs the same scan for all
triggers, then branches at the terminal steps:
- pull_request: exits non-zero on findings; reviewers see the red X.
- schedule (and workflow_dispatch): composes + sends the email
notification, then exits non-zero.
Adds:
- A `Collect findings` step that parses both scanners' JSON output
into a single set of job outputs (owasp_count, osv_count,
has_findings). The terminal steps consume these outputs instead of
re-parsing reports. Also writes a markdown summary to
GITHUB_STEP_SUMMARY so findings are visible in the run UI without
downloading artifacts.
- NVD database caching keyed on date. Cuts ~2 minutes off OWASP's
runtime by avoiding the full ~150 MB NVD feed download on every
run. The restore-key prefix means we always restore the most-recent
cache; OWASP's own auto-update logic keeps the data fresh.
Fixes a bug in my previous version: the OWASP findings check was
grepping the HTML report for "CVSS score >= 7", which is not actually
present in the HTML output. Read the JSON report directly instead and
count vulnerabilities with cvssv3.baseScore >= 7. (The OWASP plugin
suppresses suppressed CVEs from the report entirely, so the count
naturally reflects unsuppressed findings.)
Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
|
🔴 Major #1: pr-scan step ordering risk — OSV depends on SBOM but Build step has no if: always().
If the Build step fails (Maven compile error, JFrog flake), target/bom.json won't exist. OSV-Scanner runs with if: always() and tries to scan a missing file → osv-scanner ... target/bom.json || true swallows
🔴 Major #2: OWASP failBuildOnCVSS=7 will fail the pr-scan owasp step short-circuiting osv only if osv lacks if: always(). Good news: osv does have if: always(). But the implication is that on findings, the pr-scan step owasp exits non-zero, the job reports failure, AND osv still runs and may report additional findings. So PR However: the OWASP step exit code is the only signal for OWASP findings in the PR job. The osv step explicitly filters severity≥7 and emits ::error:: + exit 1. There's no symmetric severity filter for OWASP
🟡 Minor #3: set -uo pipefail (no -e) in OSV step is intentional but worth a comment.
The || true swallows OSV's non-zero exit (OSV exits 1 on ANY finding regardless of severity), and the severity≥7 filter does the actual gating. Reasonable. But anyone reading this for the first time will Drop -e: osv-scanner exits 1 on any finding (severity-blind);we do our own severity filter below.set -uo pipefail 🟡 Minor #4: jq severity filter uses tonumber? // 0 — silently treats malformed severity as 0 (no finding). select((.max_severity | tonumber? // 0) >= 7) If OSV ever changes its severity format to a string ("HIGH"), tonumber? returns null → // 0 → severity 0 → finding ignored. Probably fine since OSV uses CVSS numerics, but the silent fail-open is a footgun. TOTAL_FINDINGS=$(jq '[.results[].packages[]? | .groups[]?] | length' /tmp/osv-out.json) This way if 50 findings are detected but the severity filter shows 0, ops people will notice the mismatch. 🟡 Minor #5: osv-scanner.toml is keyed by CVE ID; not scoped by package. [[IgnoredVulns]] Unlike the OWASP regex which scopes the suppression to a specific artifact, OSV's IgnoredVulns is CVE-id global. If a future dependency picks up the same CVE id legitimately (e.g., a non-Java OSV does support [[PackageOverrides]] for package-scoped overrides — worth using: [[PackageOverrides]] Or at minimum, document this trade-off in the file header so future-readers know suppressions are CVE-id global. Coverage Gaps 🟡 #6: No SBOM for databricks-jdbc-thin (uber jar). The cyclonedx aggregate BOM is from mvn package of the multi-module reactor. But the published artifact is the shaded uber jar (databricks-jdbc:3.x) which has different effective dependencies (relocated, Not blocking for this PR — but worth a follow-up: also scan the uber jar via osv-scanner scan source . 🟡 #7: PR scan doesn't run on forks (no JFrog OIDC).
OK on its own — fork PRs would fail OIDC token mint. But then the subsequent Build (generates cyclonedx SBOM) step has no if: guard and will try to run without Maven settings configured. It might still work
🟡 #8: SBOM is committed to artifact but not the repo. The aggregate BOM is generated fresh per CI run. That's fine for the scanner, but: if a customer / security team wants to independently audit the driver's SBOM for a specific release, they have no way to Not a blocker for this PR — separate "publish SBOM" PR worth filing. Maintainability 🟢 Suppression sync — the PR header acknowledges this: ▎ "Keep the two files in sync when adding/removing entries." But there's no automation to enforce it. Worth a follow-up CI check that parses both files and validates that every entry in osv-scanner.toml has a corresponding X in owasp-suppressions.xml and 🟡 #9: Duplication between pr-scan and weekly-scan jobs. The two jobs duplicate ~120 lines (JFrog OIDC + Maven settings + checkout + JDK setup). Use a reusable composite action or a shared setup-and-build job that both depend on. Today the duplication is fine; if |
|
@gopalldb thanks for the thorough review — really helpful framing. Quick map of where each point lands given two changes that happened since you wrote it: Already addressed before your review landed (force-push):
Direction change after your review (next push):
The release pipeline at `databricks/secure-public-registry-releases-eng` already explicitly skips OWASP (`-Ddependency-check.skip=true` in every Maven step) and runs its own artifact-level scan via `databricks/gh-action-scan`. So removing OWASP from this repo's CI doesn't disturb release gating. That mootss your (2) (OWASP silent-success risk) and (9 part 2) (suppression sync between owasp-suppressions.xml and osv-scanner.toml). Will fix in the next push:
Acknowledged as follow-ups (out of scope for this PR):
Pushing the OSV-only + review-fix commit shortly. |
Removes the OWASP step, NVD database cache, and OWASP findings parsing
from securityScan.yml. OSV-Scanner now drives the gate alone.
Motivation:
- OSV's database is a strict superset of NVD's (federates GHSA, NVD,
PyPA, RustSec, Go vuln DB). We've already seen two real CVEs OSV
catches that OWASP misses entirely -- CVE-2025-66566 in lz4 and
CVE-2026-5598 in bouncycastle, both GHSA-only with no NVD CPE.
- The release pipeline at databricks/secure-public-registry-releases-eng
already skips OWASP (every Maven step has -Ddependency-check.skip=true)
and runs its own artifact-level scan via databricks/gh-action-scan.
So removing OWASP from this repo's CI does not disturb release gating.
- OWASP's NVD database (~150-700 MB), the NVD_API_KEY secret, and the
cache machinery added minutes per run for coverage we don't actually
use any more.
The OWASP plugin block stays in jdbc-core/pom.xml and owasp-suppressions.xml
stays in the repo, because the disabled-anyway in-repo release.yml /
release-thin.yml workflows still reference them. Cleaning those up is a
separate change.
Also addresses these from the PR review:
#1 (SBOM existence guard): the OSV step now explicitly checks
target/bom.json exists before scanning, and that osv-scanner
produced a non-empty output. Build failures now surface with a
clear ::error:: instead of a downstream jq error.
#3 (set -uo pipefail without -e): added a comment explaining that -e
is dropped intentionally because osv-scanner exits 1 on any
finding regardless of severity; the severity >= 7 filter below is
the real gate.
#4 (severity filter silently treats malformed scores as 0): the
Collect findings step now logs total_findings alongside
high_count, so a mismatch (e.g. "10 total / 0 high") would be
visible in the step summary and warn ops about OSV format drift.
#5 (suppressions are CVE-id global, not package-scoped): OSV-Scanner
v2.3.8 does not actually support per-package CVE-id ignores
(verified against internal/config/config.go -- PackageOverrides
has no per-CVE field). Documented the trade-off explicitly in
osv-scanner.toml's header so future readers understand the
scoping model and the acceptable-risk reasoning.
#7 (fork PR maven config): added a comment in the workflow explaining
that fork PRs work even with JFrog OIDC skipped, because all of
the driver's direct dependencies are published to public Maven
Central. JFrog is just a mirror, not a source.
Verified locally: OSV reports the same 2 unsuppressed CVSS>=7 findings
(bouncycastle CVE-2026-5598 + libthrift cluster) as before, with no
schema errors against the corrected osv-scanner.toml.
Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
|
Quick correction on (5) (suppression scoping) from my earlier reply: I intended to use `[[PackageOverrides]]` for per-package CVE-id scoping, but OSV-Scanner v2.3.8 actually doesn't support that intersection. The `PackageOverrideEntry.Vulnerability` struct in `internal/config/config.go` has only an `ignore` boolean field, no per-CVE id. So `[[PackageOverrides]]` blanket-ignores all vulns on the matched packages (way too broad), and `[[IgnoredVulns]]` stays the only viable per-CVE option (which is global across all packages). Kept the `[[IgnoredVulns]]` approach but added a detailed comment block in `osv-scanner.toml`'s header documenting the trade-off and the acceptable-risk reasoning -- if a future Maven dep ever legitimately picks up one of the CVE IDs we've suppressed, the suppression would silently apply to it too. Mitigated because every entry we have is ecosystem-mismatched (the Go/Python/R/C_glib binding of a library, not the Java binding), so a legitimate Java affectation would itself be a notable advisory event we'd want to revisit. If you'd prefer we eat the broader `[[PackageOverrides]]` blanket on `org.apache.thrift:libthrift` (since that's the only package family with multiple suppressed CVEs in our config), I can do that -- but for the cross-family ones (Arrow, gRPC, protobuf) it'd still be one CVE per family so the distinction collapses anyway. |
PR-time and weekly serve different audiences:
- PRs are evaluated by code: false positives are expensive because
every author hits them. Keep CVSS >= 7 as the gate threshold so
we don't block on MEDIUM/LOW noise.
- The weekly is read by humans, not enforced. False positives are
cheap (one person reads the email, ignores most of it). Report
EVERYTHING so emerging MEDIUM risk is visible before it crosses
the gate.
Changes:
- `Collect findings` step now produces an `all_findings` JSON sorted
by severity desc (written to /tmp/all-findings.json -- larger than
the 1 MB GH Actions output cap allows), plus a `high_count` for the
PR gate. `total_findings` covers every finding regardless of
severity.
- PR terminal step gates on `high_count != '0'` (unchanged semantics;
just no longer uses the now-deleted has_findings output).
- Weekly email body now lists ALL findings in a styled HTML table
sorted by severity desc, with HIGH rows highlighted red and
MEDIUM rows highlighted amber. The email and the
Fail-on-findings step trigger on `total_findings != '0'`.
- Step summary in the Actions UI also lists every finding in a
markdown table.
Verified locally against the existing /tmp/osv-out.json: previously the
gate showed 2 findings; with the new logic it surfaces 4 -- the same 2
HIGH (bouncycastle 8.9, libthrift 7.3) plus 2 MEDIUM bouncycastle CVEs
(6.3 and 5.5) that the team would otherwise discover only when one of
them later got re-scored above 7.
Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
Before: the PR-fail step printed only a count ("2 unsuppressed CVSS>=7
finding(s) in this PR") and pointed the author at the step summary
panel or downloaded artifacts. The Collect-findings step wrote findings
to GITHUB_STEP_SUMMARY but not stdout. Authors clicking into the failing
run's default "Logs" view had to take a second navigation step to see
which CVEs actually triggered the fail.
After: both steps echo the findings list to stdout (sorted by severity
desc, with [severity] pkg ids format). The PR-fail step shows only
HIGH findings (the ones it blocks on); the Collect-findings step shows
all findings. The step summary table is unchanged -- it's still the
rich rendering -- but the job log now has enough information to act
without extra clicks.
Example new output on a HIGH-finding fail:
::error:: 2 unsuppressed CVSS>=7 finding(s) in this PR:
[8.9] org.bouncycastle:bcprov-jdk18on@1.79 GHSA-p93r-85wp-75v3
[7.3] org.apache.thrift:libthrift@0.19.0 GHSA-7pwc-h2j2-rjgj
Fix by either: ...
Full step summary: <URL>
Co-authored-by: Isaac
Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
Summary
Consolidates security scanning into one workflow with one job (
securityScan.yml). ReplacesvulnerabilityCatcher.yml. Adds OSV-Scanner alongside the existing OWASP check, plus cyclonedx SBOM generation so OSV scans the actually-resolved local dependency tree.This bundles the work that was originally split across (now-closed) PRs #1458 (suppressions + email-notification fix) and #1459 (per-PR gate). One unified workflow file, one set of suppression rules, one place to look.
Design: single job, terminal steps gated by event
No duplicated steps across jobs. Same scan logic for every trigger; only the notification mechanism differs.
What changed
.github/workflows/vulnerabilityCatcher.ymlsecurityScan.yml..github/workflows/securityScan.ymlowasp-suppressions.xmlosv-scanner.tomlpom.xmlpackagephase,skipNotDeployed=false. Emits an aggregatetarget/bom.jsonfor OSV to read.Why two scanners
owasp-suppressions.xmlandfailBuildOnCVSS=7threshold.Why cyclonedx SBOM
OSV-Scanner's Maven resolver consults deps.dev's published-artifact metadata. For the project's own GA, that means it sees the previously-published 3.3.3 pom — not whatever bumps are sitting on
main. Result: in CI, OSV would keep showing yesterday's released state instead of the PR branch's actual dependency tree. Producing a CycloneDX aggregate SBOM at build time captures the actually-resolved local tree.Fixes from #1458 included here
The weekly scan has been failing every Sunday for 7+ consecutive weeks (since early April 2026) without sending a notification email. Root cause: the OWASP maven plugin step exits non-zero when CVEs are found (because
<failBuildOnCVSS>7</failBuildOnCVSS>is set injdbc-core/pom.xml:357), short-circuiting the rest of the job. The subsequent steps never ran.The new workflow has
continue-on-error: trueon the OWASP step and routes finding-detection through an explicitCollect findingsstep that reads the OWASP JSON report directly. Whether the job ultimately fails (and whether email goes out) is decided after that step, not by maven's exit code.Findings on
maintodayThis workflow, run against current
main, reports unsuppressed findings that will be addressed in separate follow-up PRs:So this PR's own CI is expected to fail on these 2 known findings until the follow-ups land. Aligned with the rollout plan: add the workflow now, mark it required-to-merge in branch protection after the burn-down.
Expected runtime
Each invocation: ~5–7 minutes (vs ~4 min for the old weekly). Breakdown:
mvn package(with cyclonedx SBOM): ~45–90sNVD caching cuts ~2 minutes off OWASP's runtime when the cache is warm. The cache key rotates daily so we still get fresh CVE data.
What this PR is NOT
Test plan
NO_CHANGELOG=true
This pull request and its description were written by Isaac.