Add Form 8-K parsing and event storage infrastructure#68
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces initial infrastructure to support SEC Form 8‑K processing by adding a minimal 8‑K parser, an event-item storage schema/repo, DI registrations, and extensive test fixtures/coverage using real 8‑K primary documents.
Changes:
- Added
Form_8_Kparsing entrypoint (structured XML viaedgarSubmission; HTML returns minimal{}) and integrated 8‑K processing into the accession document processing task. - Introduced a
form_8k_eventsstorage table (schema + repository) and wired it into DefaultDI/TestingDI. - Added storage logic (
processForm8K) plus tests and mock filing samples to validate item-code extraction and persistence.
Reviewed changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/task/forms/ProcessAccessionDocFormTask.ts | Routes 8‑K/8‑K/A filings into processForm8K and plumbs filing metadata fields (items, report_date). |
| src/sec/forms/miscellaneous-filings/Form_8_K.ts | Adds Form_8_K.parse() supporting edgarSubmission XML; HTML/XHTML returns {}. |
| src/sec/forms/miscellaneous-filings/Form_8_K.schema.ts | Defines TypeBox schemas for structured 8‑K XML submissions/signatures. |
| src/sec/forms/miscellaneous-filings/Form_8_K.storage.ts | Extracts item codes from filing metadata and/or XML, stores per-item events, and stores signature relationships (XML only). |
| src/storage/form-8k-event/Form8KEventSchema.ts | Defines the Form8KEvent table schema and DI token. |
| src/storage/form-8k-event/Form8KEventRepo.ts | Provides repository methods for saving/querying 8‑K events. |
| src/config/DefaultDI.ts | Registers form_8k_events storage in production DI. |
| src/config/TestingDI.ts | Registers in-memory form_8k_events storage for tests. |
| src/storage/form-8k-event/Form8KEventRepo.test.ts | Unit tests for event repository save/query behavior. |
| src/sec/forms/miscellaneous-filings/Form_8_K.test.ts | End-to-end-ish tests for parsing and storing events using mock filings + metadata. |
| src/sec/forms/miscellaneous-filings/mock_data/form-8k/*.htm | Adds real-world 8‑K primary document samples used by tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (!cik || !form || !fileName) { | ||
| const filingRepo = globalServiceRegistry.get(FILING_REPOSITORY_TOKEN); | ||
| const filings = await filingRepo.query({ accession_number: accessionNumber }); | ||
| const filing = filings?.[0]; | ||
| if (!filing) throw new TaskError("Filing not found"); | ||
| cik = filing.cik; | ||
| form = filing.form ?? undefined; | ||
| filing_date = filing.filing_date; | ||
| file_number = filing.file_number; | ||
| items = filing.items; | ||
| report_date = filing.report_date; | ||
| fileName = fileName ?? filing.primary_doc; | ||
| } |
There was a problem hiding this comment.
filing_date, items, and report_date are only populated when cik, form, or fileName are missing. In the main pipeline (FetchAndStoreFormsTask / UpdateAllFormsTask) those three fields are provided, so items/report_date stay undefined and processForm8K will store zero events for HTML 8-Ks (and filing_date becomes an empty string). Consider always loading the filing record (or at least when any of filing_date/items/report_date/file_number are missing) so 8-K event storage has the necessary metadata.
| FORM_8K_EVENT_REPOSITORY_TOKEN, | ||
| createStorage("form_8k_events", Form8KEventSchema, Form8KEventPrimaryKeyNames, [ | ||
| ["cik", "filing_date"], | ||
| ["item_code"], |
There was a problem hiding this comment.
The form_8k_events storage is indexed on ["cik","filing_date"] and ["item_code"], but Form8KEventRepo.getEventsByAccession() queries by { cik, accession_number }. Without an index that includes accession_number this query will likely degrade to a full scan. Add an index such as ["cik","accession_number"] (and optionally ["accession_number"] / ["cik"] depending on expected query patterns).
| ["item_code"], | |
| ["item_code"], | |
| ["cik", "accession_number"], |
| FORM_8K_EVENT_REPOSITORY_TOKEN, | ||
| new InMemoryTabularStorage(Form8KEventSchema, Form8KEventPrimaryKeyNames, [ | ||
| ["cik", "filing_date"], | ||
| ["item_code"], |
There was a problem hiding this comment.
The in-memory Form 8-K event storage is indexed on ["cik","filing_date"] and ["item_code"], but tests/repo APIs query by { cik, accession_number }. Add an index including accession_number (e.g. ["cik","accession_number"]) so getEventsByAccession() doesn't require scanning all rows.
| ["item_code"], | |
| ["item_code"], | |
| ["cik", "accession_number"], |
|
@copilot open a new pull request to apply changes based on the comments in this thread |
Adds full Form 8-K (current report) support, rebased onto main's dead-letter/extractor-run/version-registry infrastructure: - Form_8_K parser detects edgarSubmission XML vs HTML/XHTML primary docs - Form8KEvent storage schema (one row per item per filing) - 8-K extractor ID registered in EXTRACTOR_IDS and FORM_TO_EXTRACTOR_ID - ProcessAccessionDocFormTask routes 8-K/8-K/A through processForm8K - 15 real SEC EDGAR 8-K filings from Apple, Microsoft, Amazon, Tesla, Meta, and Alphabet as test fixtures - Comprehensive tests: parsing, storage, cross-entity queries, amendments, edge cases, XML form data Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SKG4qTyjPAtmuSipiEiAio
4d5eed1 to
8be7708
Compare
- Add missing setupDatabase()/deleteAll() calls for FORM_8K_EVENT_REPOSITORY_TOKEN (would have crashed in production with "no such table") - Add Value.Convert() after XML parsing in Form_8_K.parse() for type coercion - Remove unused ENTITY_NAME_TYPE import from Form_8_K.schema.ts - Remove redundant `as` cast in Form_8_K.ts parse() - Fix import type for Form8KEvent in Form_8_K.storage.ts - Add readonly to processForm8K parameter properties - Fix accession_number maxLength 20 → 25 to match codebase convention - Use storageArgs spread pattern in ProcessAccessionDocFormTask 8-K case - Use || instead of ?? for report_date fallback (empty strings from XML) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01SKG4qTyjPAtmuSipiEiAio
Summary
This PR adds comprehensive support for parsing SEC Form 8-K filings and storing extracted events in a dedicated repository. It includes the schema definitions, parsing logic, storage layer, and extensive test coverage with real SEC EDGAR filing samples.
Key Changes
Form_8_K.schema.ts): Defined TypeBox schemas for Form 8-K submissions, signatures, and related metadataForm_8_K.ts): Implemented parsing logic to extract structured data from 8-K HTML/XML documentsForm8KEventSchema.ts,Form8KEventRepo.ts): Created dedicated storage layer for Form 8-K events with repository patternForm_8_K.storage.ts): AddedprocessForm8Kfunction to extract and persist 8-K events, handling item codes, signatures, and company relationshipsDefaultDI.tsandTestingDI.tsconfigurationsProcessAccessionDocFormTask.ts): Integrated Form 8-K processing into the document form processing pipelineForm_8_K.test.ts,Form8KEventRepo.test.ts): Added comprehensive unit tests with 14 real SEC EDGAR filing samples covering various 8-K item types (2.02, 5.02, 5.03, 5.07, 7.01, 8.01, 9.01)Implementation Details
https://claude.ai/code/session_01SKG4qTyjPAtmuSipiEiAio