Skip to content

STIX Bundle Imports: Speed up import, enforce STIX fidelity, surface ADM validation errors#477

Merged
seansica merged 7 commits into
nextfrom
beta
May 20, 2026
Merged

STIX Bundle Imports: Speed up import, enforce STIX fidelity, surface ADM validation errors#477
seansica merged 7 commits into
nextfrom
beta

Conversation

@seansica

@seansica seansica commented May 14, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR overhauls the STIX bundle import workflow. The motivating problem was performance — the Enterprise ATT&CK bundle takes long enough to import that most reverse proxies time the request out — but tackling that exposed a few correctness issues in the same code path, and rather than ship a perf-only fix that leaves the bugs in place, this PR addresses them together.

At a high level:

  • Bundle import is dramatically faster. The Enterprise bundle now imports in well under a minute on developer hardware. The old code walked the bundle one object at a time, doing a database read and a database write per object and firing the full lifecycle hook chain serially. The new pipeline groups the bundle's objects by STIX type in dependency order, pre-fetches existing versions for an entire group in a single query, validates and runs hooks in parallel within each group, and persists each group in a single bulk write.
  • A long-standing false-positive on every bundle import is gone. Every well-formed ATT&CK bundle was emitting one "Not in contents" warning per marking-definition. The import path was checking for marking-definitions in x_mitre_contents even though Workbench's own export deliberately omits them (marking-defs go on the collection's object_marking_refs instead). The check now exempts them.
  • ADM validation errors are recorded and surfaced. Previously, ADM validation failures during import were silently attached to each document's workspace.validation and never appeared in the import response. Importing an older bundle (e.g. Mobile ATT&CK v17.1) looked like a clean import even when the bundle had hundreds of schema mismatches. The response's workspace.import_categories.errors now carries one entry per failing object, including the full Zod issue list under a details array. Mongoose schema-validation failures during bulk insert get the same treatment.
  • Bundle imports are now byte-faithful in stix.*. Several lifecycle hooks and event listeners legitimately rewrite STIX fields on user-driven POST/PUT flows — for example, AnalyticsService stamps stix.name from the ATT&CK ID, and the analytics listener rewrites stix.external_references to embed a URL to the parent detection strategy. Those rewrites were also firing during import, silently deviating persisted objects from the bundle source-of-truth. The framework now structurally enforces a contract that stix.* is read-only during import; workspace metadata population still runs.
  • ADM validation is enabled by default. The VALIDATE_WITH_ADM_SCHEMAS env var defaulted to false, so deployments that didn't explicitly set it had been running with no ADM validation at all. Together with the surfacing fix above, this means import responses will now actually reflect what's wrong with imported content. Deployments that need to opt out can still set the var to false.

How the speedup works

The old import was sequential end-to-end. For each STIX object it ran a fresh database query to check for an existing version, ran composition and validation, called beforeCreate, did a single-document save, then ran afterCreate and fired the cross-service event cascade — and only then moved on to the next object.

The new pipeline preserves the same dependency ordering (data sources before data components, analytics before detection strategies, SDOs before relationships, and so on) but processes a whole tier of same-type objects together. For each tier:

  1. One $in query fetches every existing version of every stixId in the tier — replacing N round-trips with one.
  2. Composition, validation, and the beforeCreate hook run in parallel with a small concurrency cap (25).
  3. A single insertMany writes the whole tier.
  4. afterCreate and the post-write event emission run in a second parallel pass over the inserted documents — so the metadata cascades that populate workspace.embedded_relationships (e.g., AnalyticsService updating an analytic when its parent detection strategy lands) still fire, just in parallel instead of strictly serial.

Lifecycle hooks and event listeners continue to run for every imported object. The change is only in how they're scheduled, not whether.

The new stix-fidelity contract (relevant for future hook/listener work)

Workbench has accumulated a number of hooks and listeners that mutate stix.* as part of their normal work — name normalization, alias deduplication, external-reference rewrites, family/role defaults. These are correct on user-driven flows where Workbench is the authority on the object's display values, but they're incorrect on imports, where the bundle is the source of truth.

Rather than scattering if (!options.import) gates throughout the codebase and hoping nobody forgets one, the framework now freezes the stix subtree of any object handed to a hook or listener during import. Any attempted write to a frozen property throws a TypeError in Node strict mode, pointing at the line that needs gating. Forgetting the gate is no longer a silent bug — the first import test will crash with a clear stack trace at the offending line.

In practice, this turned into a one-line gate at five sites (analytics, campaigns, groups, software beforeCreate hooks, and one analytics event listener that updates external references). For anyone adding new lifecycle code in the future, the rule is documented and mechanically enforced:

Workspace mutations are always allowed. If you need to mutate stix.*, wrap the block in if (!options?.import) { ... }.

Documentation

Three new docs ship with this PR:

  • docs/user/stix-bundle-import.md — how to call the endpoint, what validateContents does, how to read the response, the import_categories.errors taxonomy, re-import semantics.
  • docs/developer/stix-bundle-import-pipeline.md — implementer walk-through of the tier-batched pipeline with stage diagrams.
  • docs/developer/import-fidelity-contract.md — the stix-frozen contract, what the framework enforces, and the author rules for hooks, listeners, and any new lifecycle code.

Implications for callers

A couple of behavioral notes that won't break existing clients but are worth being aware of:

  • Fail-open vs strict validation are clearer. By default (validateContents=false or unset) the import is fail-open: a document that fails ADM validation is still persisted, and the failure is recorded both on the document itself (workspace.validation) and in the import response (workspace.import_categories.errors). When validateContents=true, failing documents are dropped from the bulk insert and only the validation entry is written to the response. In neither mode does the import as a whole abort — other objects continue to be processed.
  • The import response now actually reflects content issues. A bundle that used to "import cleanly" with no errors may now report many entries in import_categories.errors. The objects are still being persisted (in fail-open mode); the errors are just visible where they previously were not.

Known follow-ups (not addressed in this PR)

A couple of issues surfaced during this work that deserve their own PRs rather than getting bolted onto this one:

  1. The x_mitre_attack_spec_version gate needs a rethink. The import has a separate spec-version check that throws unless forceImport=attack-spec-version-violations is set; this overlaps conceptually with the validateContents fail-open / strict distinction and the two should probably be unified. While we're there, the system could also opportunistically bump x_mitre_attack_spec_version on imported objects whose value is older than the running ADM version when validation succeeds — that would let well-formed older bundles import cleanly under the latest spec without per-object editing.
  2. Compatibility with older official ATT&CK releases. With ADM validation now on by default, every official ATT&CK STIX bundle released prior to v19.1 will produce some number of validation error entries on import. v19.1 is currently the only release that imports without ADM errors. This doesn't break anything (fail-open mode still persists the objects), but it changes what operators will see in the import response, and a follow-up should: publish a compatibility matrix mapping Workbench versions to the official ATT&CK STIX bundles they validate cleanly against, and decide whether the combination of "stix fidelity contract" + "validation now actually visible" + "validation now on by default" rises to the level of a Workbench major version bump.

Commits

SHA Title
1ba2148 perf(import-bundle): batch and parallelize STIX bundle import
b12ced6 feat(config): enable ADM validation by default
eea4f1e feat(import-bundle): enforce stix-fidelity contract on lifecycle hooks
d69472e fix(import-bundle): surface ADM validation errors in import response
029b0e1 docs(import-bundle): add user guide, pipeline overview, fidelity contract
b86d506 fix(import-bundle): skip x_mitre_contents check for marking-definitions
6785185 fix(import-bundle): surface Mongoose validation errors on bulk insert

Recommend reviewing in this order — each commit stands alone and the review is easier when read as a progression.

Test plan

  • npm run test:file -- --recursive app/tests/api/collection-bundles/ passes (30/30).
  • Full local test suite passes.
  • Import Enterprise ATT&CK end-to-end and confirm timing (target: under a minute on dev hardware).
  • Import an older official ATT&CK release (e.g. Mobile v17.1) and confirm import_categories.errors carries the expected validation entries with full details.
  • Re-import an Enterprise bundle a second time and confirm import_categories.duplicates matches the bundle size with no spurious errors.
  • ?validateContents=true with a known-malformed bundle: failing objects are dropped from persistence and reported in the response with details.
  • Open an analytic in the frontend after import and confirm both that its workspace.embedded_relationships is populated (inbound metadata still working) and that its stix.external_references is identical to the bundle (fidelity contract holding).

seansica added 7 commits May 14, 2026 11:06
Replaces the per-object sequential loop with a tier-batched pipeline.
Objects are grouped by STIX type (preserving the existing dependency
order); each tier pre-fetches existing versions in one $in query,
composes and validates in parallel with bounded concurrency, persists
via a single insertMany(ordered:false), and runs afterCreate +
emitCreatedEvent in a second parallel pass.

Adds repository.retrieveAllByStixIds and repository.saveMany, factors
composeForImport out of _createFromImport so it can be batched, and
memoizes locally-derived .partial() Zod schemas for WIP objects.

Cuts Enterprise bundle import from 5+ minutes (reverse-proxy timeout
territory) to well under a minute on developer hardware.
Flips VALIDATE_WITH_ADM_SCHEMAS to default true. Deployments that
did not explicitly enable it previously ran no ADM validation at
all on POST, PUT, or bundle import, silently masking malformed
content. Deployments that want to keep validation off can still
set VALIDATE_WITH_ADM_SCHEMAS=false explicitly.
Adds an import-fidelity contract that prevents lifecycle hooks and
event listeners from deviating bundle `stix.*` content during a STIX
bundle import. Workspace mutations remain permitted; only stix is
frozen.

New app/lib/import-safety.js exports deepFreezeStix(doc), which
recursively freezes stix (including nested arrays and their
elements). The framework calls it before invoking any hook or
listener in import mode — in BaseService._createFromImport and in
the bulk import pipeline. With stix frozen, a missing or misplaced
`if (!options.import)` gate fails closed with a TypeError pointing
at the violating line on the first import test, rather than
silently rewriting bundle content.

Five hooks/listeners that legitimately mutate stix.* on user-driven
flows are gated for import: analytics-service.beforeCreate (the
stix.name stamp), campaigns/groups/software.beforeCreate (alias
normalization and the malware is_family default), and the
analytics-service.handleAnalyticsReferenced listener (the
external_references URL rewrite). The three afterCreate emit sites
that drive metadata cascades now forward `options` in their event
payloads so listeners can see when an import is in progress.
The bundle-import pipeline's fail-open branch (the default, when
`?validateContents=true` is not set) attached the ADM error list to
each failing document's `workspace.validation` but never wrote a
matching entry to the import response's
`workspace.import_categories.errors`. A bundle with hundreds of
validation failures could look like a clean import.

composeForImport now also returns the full validationErrors array;
processTier writes one entry per failing object with
`error_type: validationError`, a summary `error_message`, and a
`details` array containing every `{message, path, code}` from the
Zod output. This applies in both branches:

  - validateContents=true (strict): the doc is dropped from the
    bulk insert and the error is recorded with full details.
  - validateContents=false (default): the doc is still persisted
    with workspace.validation attached, but the error is also
    mirrored into import_categories.errors so the response
    surfaces the failure up front.

Also fixes the strict-branch entry's error_type, which was
previously `saveError` despite the failure being a validation
failure.
…ract

Adds three new documents covering the bundle-import work:

  - docs/user/stix-bundle-import.md — how to call the endpoint,
    the two validation modes (fail-open vs strict), the
    import_categories.errors taxonomy, and re-import semantics.

  - docs/developer/stix-bundle-import-pipeline.md — implementer
    walk-through of the tier-batched pipeline with stage diagrams,
    the dependency-tier table, concurrency primitives, and bulk
    persistence helpers.

  - docs/developer/import-fidelity-contract.md — the stix-frozen
    contract for hooks and listeners, what the framework enforces,
    and the author rules for adding new lifecycle code.
The export side (stix-bundles-service) deliberately omits
marking-definitions from `x_mitre_contents` — they are referenced
on the collection via `object_marking_refs` instead. The import
side did not mirror that convention, so every well-formed bundle
produced one bogus "Not in contents" warning per marking-definition.

Extend the existing `x-mitre-collection` exemption in processTier
to also cover `marking-definition`. No change to map construction
or duplicate-detection — only the false-positive warning is
suppressed.
repository.saveMany() delegates to Model.insertMany() with
ordered:false. Mongoose's default behavior is to silently drop docs
that fail schema validation, so malformed documents were vanishing
from the bulk path without producing any entry in
import_categories.errors. The old per-object save() path surfaced
the same failure as a "Save error", which two regression tests
relied on.

Pass throwOnValidationError:true so Mongoose throws
MongooseBulkWriteError after attempting the valid docs, then walk
err.results in order to map each failure back to its source index
and produce a {index, message, code} entry alongside the existing
MongoBulkWriteError handling. The caller in processTier is
unchanged.
@seansica seansica self-assigned this May 14, 2026
@seansica seansica requested a review from clemiller May 14, 2026 17:02
@codecov

codecov Bot commented May 14, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 81.19205% with 142 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.87%. Comparing base (c091f56) to head (6785185).

Files with missing lines Patch % Lines
...s/stix/collection-bundles-service/import-bundle.js 76.92% 78 Missing ⚠️
app/services/meta-classes/base.service.js 72.22% 25 Missing ⚠️
app/repository/_base.repository.js 80.50% 23 Missing ⚠️
app/lib/validation-schemas.js 46.66% 8 Missing ⚠️
app/services/stix/analytics-service.js 94.82% 3 Missing ⚠️
app/services/stix/data-components-service.js 75.00% 2 Missing ⚠️
app/services/stix/detection-strategies-service.js 80.00% 2 Missing ⚠️
app/lib/import-safety.js 98.83% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             next     #477      +/-   ##
==========================================
+ Coverage   68.64%   68.87%   +0.22%     
==========================================
  Files         219      220       +1     
  Lines       30265    30829     +564     
  Branches     2601     2665      +64     
==========================================
+ Hits        20776    21234     +458     
- Misses       9451     9557     +106     
  Partials       38       38              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@seansica seansica merged commit b78b474 into next May 20, 2026
10 checks passed
@github-actions

Copy link
Copy Markdown

🎉 This PR is included in version 4.17.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant