One-click collection-intelligence setup#1982
Conversation
New corpora landed with unreadable document indexes (raw import metadata as descriptions, 0% summary coverage) because nothing composed the existing enrichment machinery at corpus setup: the reference-web CTA installed only the deterministic half, and the LLM action templates (descriptions, summaries) sat in the Action Library waiting to be manually added and batch-run. CorpusIntelligenceSetupService composes the default bundle in one idempotent call: installs the reference-enrichment add_document action and starts the first weave, clones the Document Description Updater + Document Summary Generator templates, and batch-runs each over every document already present. Exposed as the setupCorpusIntelligence mutation + corpusIntelligenceSetupStatus query; createCorpus now returns objId so the New Corpus modal's default-on opt-in can chain setup after creation. An IntelligenceSetupBanner inside IntelligencePanel offers setup on both the intelligence overview and the insight-panel CAML embed, and disappears once the bundle is installed. Live-proven on the dev stack: status flips not-set-up -> fully-set-up, the weave starts, and the description/summary agents drain over existing docs (the Fervo demo corpus index is now human-readable).
Code Review — PR #1982: One-click collection-intelligence setupOverall: Solid feature addition. The service architecture is clean, idempotency is handled correctly at every layer, and permission semantics follow project conventions. A few items worth addressing before merge. What it does (summary)
Issues to Address1. Backend
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
…note - intelligence_setup.py: count active documents through CorpusDocumentService.get_corpus_documents (corpus-as-gate; the setup user holds UPDATE→READ) instead of reaching into the private Corpus._get_active_documents — same set, include_caml=False on both. - corpus_mutations.py: rate-limit SetupCorpusIntelligence with WRITE_HEAVY, matching StartCorpusActionBatchRun (it likewise fans out batch agent runs). - intelligence_setup.py: document the deliberate non-atomic in-flight check / analysis start (duplicate weave is recoverable; the writer is idempotent). createCorpus already emits obj_id via DRFMutation (base.py:197); no change needed. log_action accepts **extra, so the kwargs call is safe.
|
Code Review: One-click collection-intelligence setup Overview: This PR adds a well-structured orchestration layer composing existing enrichment machinery (reference-web analyzer, description/summary templates) into a single idempotent setupCorpusIntelligence call. No blockers found, but a few items worth addressing. Strengths
Issues to address before merge 1. Incomplete TypeScript output type (mutations.ts) SetupCorpusIntelligenceOutputs.summary is missing referenceActionAlreadyInstalled, which is required=True in the backend CorpusIntelligenceSetupSummaryType. The frontend does not consume it today but the type should mirror the schema. 2. Misleading toast when the batch-run cap is hit (IntelligenceSetupBanner.tsx) When batch_run_on_corpus hits BATCH_RUN_MAX_DOCS, outcome.queued_count stays 0 but ok=True. The toast is built from sum(t.queuedCount), so when the cap fires on all templates the user sees 'Collection intelligence is set up.' even though nothing was queued. Consider checking payload.summary.templates.some(t => t.error) and showing a toast.warn instead. Optional cleanup
Summary: Orchestration design, idempotency, permission gating, and service layer usage are all sound. Items 1 and 2 are worth fixing before merge. The rest is optional cleanup. |
- mutations.ts: add referenceActionAlreadyInstalled to SetupCorpusIntelligenceOutputs.summary (interface + query selection) to mirror the required backend field. - IntelligenceSetupBanner: when ok=True but nothing queued AND a template carried an error (e.g. BATCH_RUN_MAX_DOCS cap), show toast.warning instead of 'Collection intelligence is set up.' - corpus_actions.py: INTELLIGENCE_SETUP_TEMPLATE_NAMES list -> tuple (module constant is never mutated); test compares against list(...) accordingly.
Code Review: One-click collection-intelligence setupOverviewSolid orchestration layer that composes existing enrichment machinery into a single idempotent call. The design is clean — service layer owns the logic, GraphQL is a thin adapter, frontend stays simple. The idempotency story is well-reasoned and the IDOR-safe messaging convention is followed correctly throughout. Issues to Address1.
try:
with transaction.atomic():
action = template.clone_to_corpus(corpus, creator=user)
except IntegrityError:
...If Suggested fix — add a broader except after IntegrityError: except IntegrityError:
action = CorpusAction.objects.filter(corpus=corpus, source_template=template).first()
outcome.already_installed = action is not None
except Exception as exc:
outcome.error = f"Failed to install template: {exc}"
logger.exception("Intelligence setup: clone failed for %r on corpus %s", name, corpus.pk)
continue2.
Please confirm the existing Minor Issues3. CT test mock missing
summary: {
referenceAvailable: true,
referenceActionInstalledNow: true,
+ referenceActionAlreadyInstalled: false,
referenceAnalysisStarted: true,4.
What's Well Done
SummaryTwo items before merge: the unhandled-exception gap in |
Root cause of the codecov backend-patch miss: test_enrichment_backfill.py and test_enrichment_writer.py imported config.graphql.schema at MODULE level. Under --cov instrumentation that builds the graphene schema at collection time and errors (graphene-django CustomField field-resolution), so the whole file is dropped and its coverage — including the bootstrap_authority management command and the enrichment services — never reaches the upload. That is why a well-tested command showed 0% patch. Defer the schema import into the _execute helpers (the pattern already used by test_enrichment_tools.py / test_governance_graph.py), so the build happens at runtime and the files' coverage is always captured. Also close the genuinely-uncovered bootstrap_authority branches: unknown creator, unreadable spec file, and missing 'sections' list (now 100%), plus relink_corpora_for_keys per-corpus failure isolation and the empty-keys short-circuit.
…eature/corpus-intelligence-setup
Review fixes: - _setup_templates now contains non-IntegrityError clone failures per template (broad except + log + continue) instead of letting them abort the loop and return a 500 with earlier templates half-installed — honoring the bundle's graceful partial-success contract. - status(): documented the three-query cost (fine at one-per-page-load; revisit if ever polled or rendered per corpus-list row). - IntelligenceSetupBanner.ct.tsx setupMock now includes referenceActionAlreadyInstalled, matching the real SETUP_CORPUS_INTELLIGENCE selection set. Coverage (codecov/patch/Backend + /Frontend both below target): - Backend: cover intelligence_setup error/edge branches — analyzer not registered, in-flight analysis suppresses a duplicate start, failed analysis start, inactive template, and the new contained clone-failure path. - Frontend: cover the banner's error-toast, soft-warning (nothing queued + template error), clean set-up, and catch (network error) branches. Confirmed (no change needed): createCorpus.objId is exposed server-side via DRFMutation (config/graphql/base.py:197), so the post-create intelligence setup chain in Corpuses.tsx receives a real id.
|
test |
…n + perf fixes, demo walkthroughs Squash of six commits addressing the PR #1982 review and the defects uncovered while verifying the feature live. Intelligence setup (review fixes): - batch_run_action(allow_partial=True) queues up to BATCH_RUN_MAX_DOCS instead of refusing on large corpora; per-template remainingCount surfaces the deferred remainder in the banner toast. - Setup status gains canSetup + referenceAvailable; is_fully_set_up no longer demands deployment-unavailable pieces (zombie-CTA fix); the banner hides for viewers who cannot run setup. - setupCorpusIntelligence requires CRUD, matching AddTemplateToCorpus / CreateCorpusAction; shared CorpusActionService.install_template replaces the duplicated clone recipe; reference action dedupe via get_or_create + GovernanceGraphLive consults setup status before installing; post-create chain surfaces ok=false; lookup-only EnrichmentService.get_analyzer; redundant queries trimmed. PDF inline citations: - Enrichment mentions on PDF documents are projected onto PAWLs token bounding boxes via PlasmaPDF (shared opencontractserver/utils/ span_projection.py, also used by datacell grounding) — TOKEN_LABEL with real page numbers instead of unpaintable char-offset spans. Whitespace-insensitive ordinal-occurrence remapping absorbs txt-extract/PAWLs drift; legacy span mentions upgrade in place on re-enrichment. useReferenceMentions fixed (scoped analyzedCorpusId discovery, client.query instead of a hanging looped useLazyQuery, lean GET_REFERENCE_MENTIONS_FOR_ANALYSIS selection — ~176s -> ~0s). GraphQL spec validation + performance: - validation_rules now extend (not replace) graphql-core's spec rules — unknown-argument/field and variable-type validation had been silently disabled in every environment. All 26 invalid frontend documents repaired, including silently-broken features: new DeleteMetadataColumn and UpdateFieldset mutations (BaseService pattern, IDOR-unified messages), corpus chat history shape fix, tokenAuth unified on the WithUser payload, redirect corpus context sourced from route slugs. CI sweep: tests/architecture/test_frontend_graphql_documents.py + scripts/validate_frontend_graphql.py. - Presigned-URL cache lifetime now derives from AWS_QUERYSTRING_EXPIRE (was the 7-day CacheControl max-age) and is clamped to half the signature lifetime — cached file links could 403 for hours. - UserFeedbackQuerySet.visible_to_user inherited-visibility rewritten as a correlated Exists (was an uncorrelated IN materializing the full annotations table per evaluation): GetAnnotationsForAnalysis measured 176s -> 2.3s for a 108-mention document. Branding + docs: - Navbar wordmark [cite] -> [OpenContracts] (serif treatment kept, 19px, aria-label aligned). - README leads with two captioned demo walkthrough GIFs (docs/assets/images/gifs/demo-{1,2}-*.gif): one-click corpus intelligence setup and the explore-and-ask tour with inline statutory citations.
|
|
||
| @login_required | ||
| @graphql_ratelimit(rate=RateLimits.WRITE_HEAVY) | ||
| def mutate(root, info, corpus_id: str) -> "SetupCorpusIntelligence": |
| message = graphene.String() | ||
|
|
||
| @login_required | ||
| def mutate(root, info, column_id) -> "DeleteMetadataColumn": |
Code ReviewOverviewThis is a well-scoped PR that adds one-click collection-intelligence setup while also fixing several pre-existing issues: GraphQL spec validation being silently disabled, the AWS presigned URL cache TTL deriving from the wrong constant, a 176s→2.3s query performance regression, and 26 invalid frontend GraphQL documents. The core new service is clean and the idempotency story is well thought-out. Critical / Correctness
If Potential Bugs / Regressions
The semantics are correct —
No Design / Style
Corpus creation opt-in defaults to
Positive Highlights
SummaryThe PR is well-structured and addresses real user-facing pain points. The two items most worth addressing before merge are the missing error boundary in |
…n + perf fixes, demo walkthroughs Squash of six commits addressing the PR #1982 review and the defects uncovered while verifying the feature live. Intelligence setup (review fixes): - batch_run_action(allow_partial=True) queues up to BATCH_RUN_MAX_DOCS instead of refusing on large corpora; per-template remainingCount surfaces the deferred remainder in the banner toast. - Setup status gains canSetup + referenceAvailable; is_fully_set_up no longer demands deployment-unavailable pieces (zombie-CTA fix); the banner hides for viewers who cannot run setup. - setupCorpusIntelligence requires CRUD, matching AddTemplateToCorpus / CreateCorpusAction; shared CorpusActionService.install_template replaces the duplicated clone recipe; reference action dedupe via get_or_create + GovernanceGraphLive consults setup status before installing; post-create chain surfaces ok=false; lookup-only EnrichmentService.get_analyzer; redundant queries trimmed. PDF inline citations: - Enrichment mentions on PDF documents are projected onto PAWLs token bounding boxes via PlasmaPDF (shared opencontractserver/utils/ span_projection.py, also used by datacell grounding) — TOKEN_LABEL with real page numbers instead of unpaintable char-offset spans. Whitespace-insensitive ordinal-occurrence remapping absorbs txt-extract/PAWLs drift; legacy span mentions upgrade in place on re-enrichment. useReferenceMentions fixed (scoped analyzedCorpusId discovery, client.query instead of a hanging looped useLazyQuery, lean GET_REFERENCE_MENTIONS_FOR_ANALYSIS selection — ~176s -> ~0s). GraphQL spec validation + performance: - validation_rules now extend (not replace) graphql-core's spec rules — unknown-argument/field and variable-type validation had been silently disabled in every environment. All 26 invalid frontend documents repaired, including silently-broken features: new DeleteMetadataColumn and UpdateFieldset mutations (BaseService pattern, IDOR-unified messages), corpus chat history shape fix, tokenAuth unified on the WithUser payload, redirect corpus context sourced from route slugs. CI sweep: tests/architecture/test_frontend_graphql_documents.py + scripts/validate_frontend_graphql.py. - Presigned-URL cache lifetime now derives from AWS_QUERYSTRING_EXPIRE (was the 7-day CacheControl max-age) and is clamped to half the signature lifetime — cached file links could 403 for hours. - UserFeedbackQuerySet.visible_to_user inherited-visibility rewritten as a correlated Exists (was an uncorrelated IN materializing the full annotations table per evaluation): GetAnnotationsForAnalysis measured 176s -> 2.3s for a 108-mention document. Branding + docs: - Navbar wordmark [cite] -> [OpenContracts] (serif treatment kept, 19px, aria-label aligned). - README leads with two captioned demo walkthrough GIFs (docs/assets/images/gifs/demo-{1,2}-*.gif): one-click corpus intelligence setup and the explore-and-ask tour with inline statutory citations.
baaf6e8 to
a38ca64
Compare
Code Review: One-click collection-intelligence setup (#1982)OverviewThis PR adds Critical Bug Fixes1. GraphQL spec validation was silently disabled (
The fix ( 2. AWS presigned URL TTL was derived from the wrong variable (
3. N+1 / performance regression in The uncorrelated Core FeatureArchitecture and permissions: Well-structured. Permission gating at CRUD (the same tier Idempotency design: The combination of fast-path dedup -> savepoint-wrapped clone ->
Issues / Concerns1.
if column is None or BaseService.require_permission(
column, user, PermissionTypes.DELETE, request=info.context
):
return DeleteMetadataColumn(ok=False, message=not_found_msg)
error = BaseService.require_permission(column, user, PermissionTypes.DELETE, request=info.context)
if column is None or error:
return DeleteMetadataColumn(ok=False, message=not_found_msg)2.
3.
except IntegrityError:
return (
CorpusAction.objects.filter(corpus=corpus, source_template=template).first(),
False,
)Practically impossible (the racing insert would have to be rolled back between the 4. No test for The tests cover offer->run->hide and silent-when-fully-set-up. Missing: 5.
Minor Nits
Test Coverage
SummaryWell-crafted PR. The three bundled bug fixes (spec validation disabled, AWS URL cache TTL wrong variable, feedback queryset N+1) are all real production correctness/performance issues. The main feature is cleanly layered, properly permission-gated, and idempotent by design. Main actionable items before merge: (1) align the |
…template race-recovery None path - corpus_queries.py: document why resolve_corpus_intelligence_setup_status deliberately omits @login_required (read-only status for anonymous viewers of public corpora; can_setup is always False without CRUD, no escalation). - corpus_actions.py: explain why install_template's IntegrityError recovery can theoretically return (None, False) and that callers skip on action is None.
…tions; explicit PAGES concatenation
| obj = graphene.Field(FieldsetType) | ||
|
|
||
| @login_required | ||
| def mutate(root, info, id, name=None, description=None) -> "UpdateFieldset": |
…n intelligence, and agent tool grounding
Code ReviewThis is a high-quality PR with a well-designed core feature. The architecture is solid, the idempotency story is well thought through, and several significant bug fixes are bundled in. Below are my findings ordered by severity. Critical / Must-fixNothing blocking. The permission model, IDOR prevention, and service layer conventions are all followed correctly. Significant / Worth discussing1.
action, created = CorpusAction.objects.get_or_create(
corpus=corpus,
trigger=CorpusActionTrigger.ADD_DOCUMENT.value,
analyzer=analyzer, # ← converged to one row by EnrichmentService
defaults={"name": REFERENCE_ENRICHMENT_ACTION_NAME, "creator": user},
)The code's own comment correctly notes: "No DB constraint backs this up (source_template is NULL) — a duplicate from a photo-finish race is wasted work, not corruption." Since Suggestion: This is survivable (idempotent writer), but if the schema doesn't already have a partial unique constraint on 2.
Suggestion: Either rename to Minor / Quality3.
Suggestion: Either add a one-liner 4.
const { data, refetch } = useQuery<...>(GET_CORPUS_INTELLIGENCE_SETUP_STATUS, ...);
// ...
if (!status || status.isFullySetUp || !status.canSetup) return null;The Suggestion: Render a fixed-height placeholder (matching the banner's height, ~52px) while 5. These two mutations ( Suggestion: Add a short 6.
-query analysesForCorpusEnrichment($corpusId: ID) {
- analyses(corpusId: $corpusId) {
+query analysesForCorpusEnrichment($corpusId: String) {
+ analyses(analyzedCorpusId: $corpusId) {
Praise / HighlightsThese are genuinely good changes worth calling out:
Nit
Overall this is merge-ready. The race-condition note (#1 above) is the only item I'd consider pursuing in a follow-up issue; the rest are suggestions. The bundle of concurrent fixes (spec validation, AWS TTL, N+1 queryset, 26 broken GraphQL documents) significantly improves the baseline even before the new feature is considered. |
Code Review: One-click Collection Intelligence SetupOverall this is a well-structured PR. The new feature is coherent, the service layer follows project conventions, and the three bug fixes bundled here fix real production issues. Bundled Bug Fixes1. GraphQL spec validation was silently disabled (schema.py) This is the most important fix in the PR. Passing only custom rules to validate() replaced (not extended) the spec rule set, so 26 invalid frontend documents were executing silently. The fix -- [*specified_rules, DepthLimitValidationRule] -- is correct, and the three new security-hardening tests plus the architectural test_all_literal_documents_are_schema_valid properly pin the invariant. Worth calling out: the removal of 2. AWS presigned URL cache TTL fix (settings/base.py) _signed_url_lifetime_seconds was derived from _AWS_EXPIRY (7-day HTTP CacheControl max-age) instead of AWS_QUERYSTRING_EXPIRE (1-hour presign expiry), allowing the shared URL cache to serve dead 403 links for up to 5 hours. Fixed correctly. Minor note: settings/base.py now imports from opencontractserver.utils.files at module level. Safe today (clamp_shared_url_cache_ttl is a pure stdlib function), but it sets a precedent -- if files.py ever imports a Django model, the settings module will break at startup. Consider inlining the two-liner in settings or extracting it to a separate stdlib-only module. 3. UserFeedbackQuerySet.visible_to_user IN -> correlated EXISTS (QuerySets.py) The correlated Exists() fix is correct. The comment explaining the measured 0.8s evaluation cost on the old IN subquery is exactly the right documentation for a non-obvious performance change. New Feature: CorpusIntelligenceSetupServiceService Layer Follows project conventions: get_or_none + require_permission for the CRUD gate, deferred model imports, ServiceResult envelope, log_action for audit trail. Permission tier at CRUD (not UPDATE) is correct -- AddTemplateToCorpus and CreateCorpusAction already gate the identical writes at CRUD. install_template extraction into CorpusActionService is a good DRY fix -- both AddTemplateToCorpus and the bundle now share the same fast-path dedupe, savepoint clone, IntegrityError recovery, and CRUD grant recipe. The allow_partial=True mode on batch_run_action is clean. Idempotency Thorough: action rows deduplicated, already-run documents skipped, in-flight reference analysis not duplicated, get_or_create on the reference enrichment action. The race-window comment on the non-atomic check+start for the analysis is honest and correct (enrichment writer is idempotent -- duplicate is wasted work, not corruption). The None guard on install_template return value in _setup_templates is present and correct. remaining_count calculation max(0, batch_total - skipped_already_run_count - queued_count) is correct. batch_total is the full active-document count (not the capped eligible set), so the result correctly represents documents deferred for a future call. Minor: is_fully_set_up is a @Property, not a dataclass field IntelligenceSetupStatus.is_fully_set_up is a @Property on the dataclass, but CorpusIntelligenceSetupStatusType exposes it as a plain graphene.Boolean. Graphene reads it via getattr which works for properties -- functional. Worth noting for anyone who adds dataclass-field introspection later. GraphQL SurfacecorpusIntelligenceSetupStatus (no @login_required) The deliberate omission is well-justified in the docstring: anonymous users can reach the intelligence overview for public corpora, and can_setup mirrors the CRUD gate so the CTA never renders for users whose click would fail. setupCorpusIntelligence mutation Rate-limited at WRITE_HEAVY. try/except on from_global_id decoding returning the same opaque failure message is correct IDOR prevention. createCorpus.objId DRFMutation already exposes obj_id = graphene.ID() (camelCased to objId), so selecting it is safe. FrontendIntelligenceSetupBanner Visibility logic is correct: hide on load, on error, when fully set up, and when canSetup is false. Toast messages correctly distinguish success-with-docs-queued, warning-with-template-errors, and generic success. await refetch() after the mutation correctly hides the banner once status flips. Minor: the banner renders null during the initial status query -- a visible layout shift if the query takes >200ms. Not blocking. GovernanceGraphLive bootstrap -- extra round-trip The new fetchSetupStatus call before installing the reference action is correct (prevents double-installation), but adds a sequential network request. network-only means it cannot be served from cache even when IntelligenceSetupBanner already fetched it. Consider cache-first: the bootstrap fires right after the page renders when the banner query has likely populated the cache, and a stale hit that incorrectly skips installation is equivalent to pre-PR behaviour for users who already ran one-click setup. useReferenceMentions -- client.query over useLazyQuery Well-documented fix (observed live hang on the first awaited lazy query re-execution). client.query for an imperative await-in-a-loop is the right call. Mutation type fixes (ID! -> String!, conversation -> obj, etc.) All verified: PinThread/UnpinThread returns obj (confirmed via moderation_mutations.py); UpdateFieldset returns message (not msg); UpvoteMessage/DownvoteMessage/RemoveVote argument type is String server-side. Test CoverageGood overall:
One gap: no test for the no-enrichment-analyzer path through setup() (where reference_available stays False and only the LLM half runs). IntelligenceSetupStatusWithoutEnrichmentTestCase covers the status query but not this setup path. Low risk -- fallback is log + continue -- but worth a follow-up. SummaryThe three bundled bug fixes are important (spec validation regression, AWS TTL bug, IN-subquery perf). The new feature is well-designed and follows project conventions. No blockers. Most actionable suggestions: the settings import precedent and switching the GovernanceGraph status fetch to cache-first. |
Code Review — PR #1982: One-click collection-intelligence setupThe PR is well-structured with a clean service-layer composite. Idempotency design and fault-tolerance are solid. Below are findings ordered by severity. Security[MODERATE] Double-permission check in
corpus = cls.get_or_none(Corpus, corpus_pk, user, PermissionTypes.CRUD, request=request)
if corpus is None:
raise PermissionError(...)This eliminates the separate [LOW]
Architecture / Pattern Adherence[MODERATE]
active_doc_ids = set(corpus._get_active_documents().values_list("id", flat=True))
[MINOR]
Performance[MODERATE]
[MINOR]
Testing[MODERATE] No
[MODERATE] No GraphQL-layer IDOR test for
[MINOR] Magic string
User.objects.get_or_create(username="migration_admin", defaults={"is_superuser": True, ...})CLAUDE.md: no magic strings. Extract to a named constant. [MINOR] Fails are ambiguous — is the mutation broken or the status query? Split into [MINOR] The component renders Frontend[LOW]
[LOW]
Actionable Recommendations (priority order)
No issues found in: core IDOR protection (CRUD gate always enforces), idempotency design, partial-batch fault tolerance, two changelog fragments for two change categories, |
NavMenu branding tests asserted the old [cite] wordmark/aria-label; update
them to [OpenContracts] to match the rebrand in NavMenu.tsx.
DocumentReferencesPanel inbound-row navigation test expected a corpus-scoped
path; GET_DOCUMENT_BY_ID_FOR_REDIRECT no longer carries corpus context
(DocumentType has no corpus field), so useNavigateToDocumentById now builds
the standalone /d/{creator}/{doc} path. Update the route + drop the stale
corpus block from the redirect mock.
…eave, GraphQL IDOR test - setup(): replace get_or_none(READ)+require_permission(CRUD) with a single IDOR-safe get_or_none(Corpus, ..., PermissionTypes.CRUD) — the canonical pattern status() already uses. No bypass change; removes the divergent double-check. - _setup_reference_enrichment: when an enrichment analysis is already QUEUED/RUNNING, report reference_analysis_started=True. The reference web is being built; the summary/toast no longer misleadingly omits the weaving note. - Add test_setup_mutation_idor_for_stranger: a user without corpus access fires setupCorpusIntelligence and gets ok=False + the indistinguishable message and installs no CorpusAction. - Extract the test seed-admin username to a named constant.
Code Review — PR #1982 «One-click collection-intelligence setup»OverviewThis is a large, multi-concern PR (3 759 additions / 426 deletions, 61 files). It ships the headlining
The bundling is understandable since the fixes were discovered during the feature work, but it makes the review surface area very wide. Calling that out — not a blocker. Security / CriticalGraphQL spec validation was disabled on the live endpoint — fixed here. The change in AWS presigned-URL cache TTL was derived from Service ArchitectureThe
Issues to Address1. In 2.
3. Before creating the add_document corpus action, the component now fetches 4. Missing test: partial-batch through a full
5. Default-on intelligence setup in const [setupIntelligence, setSetupIntelligence] = useState(true);Defaulting to Minor / Polish
Test CoverageCoverage is solid for a feature this size:
Gaps: no integration test for the post-create setup chain in SummaryThe core changes are well-executed. Three things worth resolving before merge:
Everything else is polish or acknowledged trade-offs. The security fix (spec validation) and the 3-minute query fix alone justify landing this quickly. |
Problem
A freshly created corpus lands with an unreadable document index: titles all share the same truncated prefix and
Document.descriptionholds whatever the importer wrote (raw metadata dumps like**filing_type:** S-1 **doc_role:** exhibit…), and summary coverage sits at 0%. All the machinery to fix this already exists — the reference-web analyzer, and the Document Description Updater / Document Summary Generator action templates from the Action Library — but nothing composed it at corpus setup, and nothing batch-ran the LLM actions over documents already in the corpus.**filing_type:** S-1 **doc_role:** exhibit **exhibit_number:** 10.(2)(…What this adds
CorpusIntelligenceSetupService(opencontractserver/corpuses/services/intelligence_setup.py) — one idempotent composite:add_documentCorpusAction (same row the governance-graph CTA creates) and starts the first weave immediately.INTELLIGENCE_SETUP_TEMPLATE_NAMES) and batch-runs each over every existing document.Re-running converges: existing rows are reused, already-run documents are skipped, an in-flight weave isn't duplicated. Per-template failures (e.g. the
BATCH_RUN_MAX_DOCScap) surface in the summary without failing the whole call.GraphQL:
setupCorpusIntelligencemutation +corpusIntelligenceSetupStatusquery;createCorpusnow returnsobjIdso follow-ups can chain off creation.Frontend (both entry points):
IntelligenceSetupBannermounted insideIntelligencePanel— appears on the intelligence overview and theinsight-panelCAML embed, offers "Set up", reports the queued fan-out in a toast, and hides once the bundle is installed.Verification
test_intelligence_setup.py: service install/idempotence/permission-gating/status + GraphQL smoke (7 tests).IntelligenceSetupBanner.ct.tsx: offer → run → hide, and silent-when-set-up (2 CT tests, doc screenshot).Stacked on #1977.