Skip to content

Close permissioning-guide audit gaps: relationship privacy, anonymous extracts, group grants#1985

Open
JSv4 wants to merge 21 commits into
mainfrom
claude/magical-fermi-7h8do7
Open

Close permissioning-guide audit gaps: relationship privacy, anonymous extracts, group grants#1985
JSv4 wants to merge 21 commits into
mainfrom
claude/magical-fermi-7h8do7

Conversation

@JSv4

@JSv4 JSv4 commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Permissioning audit gap closure

A full audit of docs/permissioning/consolidated_permissioning_guide.md against the GraphQL and service layers surfaced substantive guide/code divergences. Four intent decisions were made with the maintainer, and this PR makes code and docs agree.

Agreed decisions → behavior changes

1. Relationship privacy recursion implemented (closes the #1655 Phase-C deferral)

RelationshipManager.user_can and RelationshipManager.visible_to_user now enforce created_by_analysis / created_by_extract source permissions exactly like annotations, via a shared module-level _source_privacy_recursion_passes (opencontractserver/shared/Managers.py). Previously a user with doc+corpus grants could see and write analysis-/extract-private relationships through the manager surfaces. The relationship creator keeps access (creator short-circuit, mirrored by Q(creator=user) in the queryset gate) so filter/check parity holds.

2. Extracts are never anonymous-visible

The guide's documented contract ("Extract — never") is now enforced everywhere: a new ExtractManager (opencontractserver/extracts/models.py) denies anonymous users on both visible_to_user and user_can, and ExtractService.get_visible_extracts / check_extract_permission lock down the service boundary. Previously the service exposed is_public extracts in public corpora to anonymous callers (drift copied from the analysis service, reachable through the un-gated extracts resolver).

3. Group-granted source permissions unlock private rows in lists

The created_by_* privacy gates consulted only the USER object-permission tables, so a viewer whose analysis/extract grant arrived via a Django group passed user_can (groups resolve by default) but never saw the rows in lists — a filter/check parity drift. New shared builders in opencontractserver/utils/source_visibility.py join the group tables and replace four divergent inline copies (AnnotationQuerySet.visible_to_user, AnnotationService.get_document_annotations / get_corpus_annotations, RelationshipService.get_document_relationships). Bonus fix found during consolidation: get_corpus_annotations skipped privacy exclusion entirely for anonymous users — anonymous viewers of a public corpus could see analysis-/extract-private annotations on public documents. The exclusion now applies unconditionally.

4. Thread moderation: code is right, guide was wrong

Conversation.can_moderate does NOT grant moderation via EDIT permission on the corpus/document; the guide claimed it did. Guide corrected; the deliberate Conversation.can_moderate vs Corpus.user_can_moderate divergence (#1450) is now documented.

Also in this PR

  • myPermissions mirrors the structural-write break-glass: AnnotationService masked structural _can_update/_can_delete to False for everyone (under-reporting for superusers — a stale comment claimed superusers were "handled upstream", untrue since scoped admin 2026-05); RelationshipService had no structural mask (over-reporting doc+corpus writes for normal users). Both now mask to user.is_superuser.
  • Guide overhaul: every stale *QueryOptimizer name → current per-app service; dead <app>/query_optimizer.py paths → <app>/services/…; resolver locations after the queries.py split; ws/agent-chat/ URL; awaited _check_user_permissions + issue Agent Tools Not Fault Tolerant #820 fault-tolerance split; real agent-test paths; get_users_permissions_for_obj group default (True); Pitfall 1 (structural-only, not empty); CorpusCategory GraphQL mutations; structural Enforcement Locations → Managers.py. Newly documented: the Reconcile CorpusObjsService corpus-as-gate vs GraphQL MIN-permission semantic #1682 get_corpus_documents (corpus-as-gate) vs get_corpus_documents_visible_to_user (MIN) dual semantics, opencontracts.E001 dual enforcement + scan-scope nuance, BaseService.filter_visible_qs, and reference-table rows for Note / UserFeedback / Embedding / CorpusAction / Notification.
  • Stale superuser-bypass docstrings removed in shared/QuerySets.py, shared/Managers.py, conversations/models.py, llms/tools/image_tools.py, documents/services/relationships.py; CLAUDE.md pitfall 2 → ConversationService.
  • Changelog fragments: changelog.d/perm-audit-2026-06.security.md + .fixed.md.

Tests

  • test_authorization_invariants.py: relationship matrix extended with analysis-/extract-rooted fixtures + user- and group-granted readers; the test_no_privacy_widening_via_created_by_analysis sentinel (which pinned the old deferral) is replaced by six real privacy invariants per the agreed behavior change; new ExtractAuthorizationInvariantsTestCase; annotation group-grant parity test.
  • New: test_relationship_privacy_scoping.py, test_extract_anonymous_lockdown.py, test_source_visibility_group_grants.py, test_structural_mypermissions_breakglass.py (all under opencontractserver/tests/permissioning/).

black / isort / flake8 clean on all touched files. The targeted dockerized test run is in flight in the session environment (Docker Hub rate limiting slowed image pulls); any failures will be fixed in follow-up commits on this branch — CI should be treated as the gate.

Follow-up observations (out of scope, flagged during audit)

  • ExtractType has no get_queryset override; related-field traversal (e.g. datacell → extract) relies on parent gating. Pre-existing.
  • Mention-autocomplete resolvers still inline guardian get_objects_for_user in config/graphql/search_queries.py (legal under E001's three-token scan; now documented in the guide as a known exception; service migration is a candidate follow-up).
  • The annotation-side user_can privacy recursion has no creator exemption while AnnotationQuerySet's gate has Q(creator=user) — a latent annotation-creator parity edge not covered by current fixtures (relationships handle this explicitly via the creator short-circuit).

Generated by Claude Code

claude added 2 commits June 12, 2026 06:41
…acts, group grants

2026-06 permissioning audit of docs/permissioning/consolidated_permissioning_guide.md
against the GraphQL + service layers. Four intent decisions were made and code/docs
now agree:

1. Relationship privacy recursion (closes #1655 Phase-C deferral):
   RelationshipManager.user_can + visible_to_user now enforce
   created_by_analysis/created_by_extract source permissions exactly like
   annotations, via the shared module-level _source_privacy_recursion_passes
   (opencontractserver/shared/Managers.py). Non-creators need the requested
   permission on the source as well as the doc+corpus MIN; the relationship
   creator short-circuit is mirrored by Q(creator=user) in the queryset gate.

2. Extracts are never anonymous-visible: new ExtractManager denies anonymous
   on both manager surfaces; ExtractService.get_visible_extracts /
   check_extract_permission lock down the service boundary (previously
   exposed is_public extracts in public corpora to anonymous callers through
   the un-gated extracts resolver).

3. Group-granted analysis/extract permissions now unlock private rows in
   list queries: new opencontractserver/utils/source_visibility.py builders
   join the group object-permission tables and replace four divergent inline
   copies (AnnotationQuerySet.visible_to_user, AnnotationService x2,
   RelationshipService) — closing a filter/check parity drift vs user_can.
   Also fixes an anonymous privacy leak in get_corpus_annotations (privacy
   exclusion previously skipped for anonymous users).

4. Thread moderation: code semantics confirmed as intended; the guide's
   'EDIT permission grants moderation' claim removed and the
   Conversation.can_moderate / Corpus.user_can_moderate divergence (#1450)
   documented.

Also: pre-computed myPermissions now mirror the structural-write break-glass
(superusers True / others False on structural rows) in both AnnotationService
and RelationshipService (relationships previously reported raw doc+corpus
writes on structural rows); stale superuser-bypass docstrings cleaned up
(QuerySets/Managers/conversations/image_tools/documents services); guide
corrected for stale optimizer names, dead query_optimizer.py paths, resolver
locations, ws/agent-chat URL, awaited _check_user_permissions, real test
paths, get_users_permissions_for_obj default, Pitfall 1, CorpusCategory
management; documented #1682 corpus-as-gate vs MIN dual semantics, the
opencontracts.E001 dual enforcement + scan scope, BaseService.filter_visible_qs,
and missing reference-table rows; CLAUDE.md pitfall 2 updated to
ConversationService.

Tests: invariant suite extended (relationship privacy matrix incl. group
grants, ExtractAuthorizationInvariantsTestCase, annotation group-grant
parity; the test_no_privacy_widening_via_created_by_analysis sentinel that
pinned the deferral is replaced by real privacy invariants per the agreed
behavior change) plus four new files under tests/permissioning/.
- pyupgrade (--py39-plus): drop now-redundant quoted annotations in
  shared/Managers.py and utils/source_visibility.py (both files use
  from __future__ import annotations)
- isort 6.0.1 (pre-commit rev): blank-line fix in extract_service.py
- mypy: annotate Extract.objects override with type: ignore[misc]
  (house pattern used by Document/Relationship/Annotation managers) and
  narrow Optional[Extract] in test_extract_anonymous_lockdown.py
@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown

Code Review — PR #1985: Permissioning audit gap closure

Overview

This is a well-scoped security hardening PR that closes four real permission enforcement gaps. The code changes are conservative, the test additions are strong, and the documentation overhaul is genuinely valuable. Approachable to merge after addressing a few items below.


Security Fixes — Confirmed Correct

1. Relationship privacy recursion (closes #1655 Phase-C)

The extraction of _check_annotation_privacy_recursion into the module-level _source_privacy_recursion_passes and its wiring into RelationshipManager.user_can + RelationshipManager.visible_to_user is clean. The filter/check parity is correctly maintained:

  • Q(creator=user) in the queryset gate mirrors the creator short-circuit in user_can
  • _source_privacy_recursion_passes runs after the creator check (intentional), so a relationship's own creator keeps access without needing source permissions

2. Extract anonymous lockdown

ExtractManager correctly overrides both visible_to_user and user_can in lockstep. The getattr(user, "is_anonymous", True) default is a good fail-closed pattern. The parallel enforcement in ExtractService.check_extract_permission and get_visible_extracts removes the drift.

3. Group-granted source permissions

Centralizing into source_visibility.py is the right call. The previous code missed group-level guardian grants in three separate places; now there is one builder that all paths share.

4. myPermissions structural mask

The change from Value(False) to Value(user_is_superuser) for structural _can_update/_can_delete in both annotation and relationship services correctly aligns the pre-computed UI permissions with what AnnotationManager.user_can will actually allow. The bool(getattr(user, "is_superuser", False)) wrap is appropriate for Value() annotations.


Issues

🔴 Documentation bug: inverted boolean logic in the RemoveAnnotation example

In consolidated_permissioning_guide.md, the example code contains:

# Single call handles all privacy/structural logic (falsy = granted)
if annotation_obj is None or BaseService.require_permission(
    annotation_obj, info.context.user, PermissionTypes.DELETE,
    request=info.context,
):
    return RemoveAnnotation(ok=False, message="Permission denied")

The comment "falsy = granted" contradicts standard patterns. If require_permission returns truthy when the check fails, the if-statement is correct semantically — but it is opposite to the established codebase convention (if not user_can(...): deny) and will mislead contributors who copy-paste it. Either rewrite with not BaseService.require_permission(...), or clarify that require_permission raises on denial (in which case no return-value check is needed at all). This is in a very prominent section of the guide that serves as reference code.

🟡 N+1 query risk in _source_privacy_recursion_passes for bulk callers

The docstring correctly documents the risk: Bulk callers SHOULD select_related("created_by_analysis", "created_by_extract") on their root queryset.

The list-side paths (privacy gate as subqueries) are safe. The single-object user_can path is the danger zone — any caller iterating a queryset and calling user_can per row will hit 1–2 extra queries per row without select_related. Consider adding a # PERF NOTE comment at the call sites in both AnnotationManager.user_can and RelationshipManager.user_can where _source_privacy_recursion_passes is invoked.

🟡 Missing test: explicit extract grant unlocking an extract-rooted relationship

test_privacy_recursion_grants_via_user_level_source_grant pins that an analysis grant unlocks the analysis-rooted relationship and does NOT bleed into the extract-rooted sibling. But it does not assert the positive case for the extract side: that a user with an explicit READ grant on the extract can see private_rel_via_extract on both user_can and visible_to_user. Adding those assertions would give the extract path symmetrical pinning.

🟡 ExtractManager.visible_to_user kwargs forwarded to base class

ExtractManager.visible_to_user passes lightweight=lightweight and with_doc_label_annotations=with_doc_label_annotations to super().visible_to_user(). If BaseVisibilityManager.visible_to_user does not declare those kwargs, this will raise TypeError at runtime. Worth a quick sanity check against the base class signature.


Minor Observations

Pre-existing annotation-creator parity gap (noted in Follow-up observations): AnnotationQuerySet.visible_to_user carries Q(creator=user) but AnnotationManager.user_can has no matching creator short-circuit, while RelationshipManager now has both. Pre-existing and out of scope here, but should be tracked as a follow-up since _source_privacy_recursion_passes now pins the relationship side — the annotation side is now a conspicuous asymmetry.

Guide _compute_effective_permissions code block: Replacing the old concrete implementation with ... placeholders makes the section less useful as a teaching example. Minor doc concern only.


Test Coverage

The new test suite is strong:

  • Six invariant tests covering block, user-grant, group-grant, write escalation, creator retention, and structural bypass
  • test_extract_anonymous_lockdown.py tests manager + service layers
  • test_structural_mypermissions_breakglass.py tests pre-computed UI permissions end-to-end
  • test_source_visibility_group_grants.py pins group grant parity

Only gap is the extract-positive case noted above.


Summary

All four security fixes are correctly implemented. The consolidation into source_visibility.py is a solid DRY win. The guide overhaul accurately reflects the current code. The require_permission example should be corrected before merge. The remaining items are low-risk and easy to follow up on this branch.

… PERF note

- Guide: expand the RemoveAnnotation example's comment to spell out
  BaseService.require_permission's truthiness inversion (returns '' on
  grant, denial string on deny; never raises) so the idiom cannot be
  misread or mis-transcribed to the legacy user_can direction
- Managers.py: PERF pointer at the RelationshipManager privacy-recursion
  call site (select_related the created_by_* FKs in bulk user_can loops)
- Invariants: symmetrical positive pin for the extract branch —
  explicit extract READ grant unlocks the extract-rooted relationship on
  both surfaces and does not bleed into the analysis-rooted sibling

JSv4 commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed the review in 7b1069b:

  • 🔴 require_permission example — the example's logic was already correct, and the two suggested rewrites would not be: require_permission returns "" (falsy) on grant and a denial string (truthy) on deny — it never raises — so if require_permission(...): bail is the documented house idiom (see the "Truthiness inversion" section of BaseService.require_permission's docstring), and if not require_permission(...) would flip the gate. The real problem was my compressed (falsy = granted) comment; the example now spells out the inversion explicitly and points readers at BaseService.user_has when they want a boolean grant.
  • 🟡 extract-positive pin — added test_privacy_recursion_grants_via_extract_grant: explicit extract READ unlocks the extract-rooted relationship on both surfaces and does not bleed into the analysis-rooted sibling.
  • 🟡 PERF note — added at the RelationshipManager.user_can call site (the AnnotationManager.user_can docstring already carries the full version of this note).
  • 🟡 ExtractManager kwargs — verified non-issue: BaseVisibilityManager.visible_to_user(self, user=None, lightweight=False, with_doc_label_annotations=False) declares both kwargs.
  • Annotation-creator parity asymmetry — agreed it's now conspicuous with the relationship side pinned; it's listed in the PR's follow-up section and intentionally left out of scope here (changing it is an annotation-side behavior change that deserves its own invariant work).

Generated by Claude Code

CI surfaced the two remaining tests that pinned the OLD anonymous-extract
behavior the 2026-06 permissioning audit deliberately removed:

- test_query_optimizer_methods.py::test_get_visible_extracts_anonymous_only_public
  asserted anonymous users see public extracts; renamed to
  ..._anonymous_sees_nothing and now pins the agreed semantic (extracts are
  never anonymous-visible at any layer; full coverage in
  test_extract_anonymous_lockdown.py).
- test_stats.py::test_public_corpus_stats_query asserted identical
  corpusStats for anonymous and authenticated viewers; corpusStats counts
  are per-viewer by the resolver's own contract, so the anonymous branch now
  expects totalExtracts: 0 (other counts unchanged).

Changelog fragment updated to document the user-visible stats consequence.
@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown

Code Review -- PR #1985: Close permissioning-guide audit gaps

Overview: This is a well-structured permissioning security audit closing four real gaps: (1) relationship privacy recursion (closes #1655 Phase-C deferral), (2) anonymous extract lockdown, (3) group-granted source permissions for list queries, and (4) structural myPermissions break-glass alignment. The guide overhaul is thorough and changelog fragments are properly formatted.


Security -- All four behavior changes look correct

Relationship privacy recursion (Managers.py): The module-level _source_privacy_recursion_passes cleanly replaces the former AnnotationManager._check_annotation_privacy_recursion instance method and wires the identical logic into RelationshipManager.user_can. The ordering is correct -- creator short-circuit runs before the new recursion call, matching the Q(creator=user) disjunct in RelationshipManager.visible_to_user. Filter/check parity is pinned on both surfaces.

Anonymous extract lockdown: Implementing the denial at the manager layer (ExtractManager) in addition to the service layer is good defense-in-depth -- agent tools and any future direct manager calls are covered without relying solely on service routing.

Group-grant parity in list queries (source_visibility.py): The four divergent inline copies are eliminated and the shared builders now consult both user- and group-level guardian tables. The visible_analyses_for / visible_extracts_for pattern is consistent with how the rest of the codebase builds lazy subqueries for __in= filters.

Structural myPermissions mask: The Case(When(structural=True, then=Value(user_is_superuser))) ORM pattern is correct. Both AnnotationService (previously hardcoded False for everyone) and RelationshipService (previously no mask at all) now report what the mutation gate will actually allow.


Bug Findings

None found after reading the full diff. Two items to verify at CI time:

  • visible_extracts_for(anon) returns Extract.objects.none(), so Q(created_by_extract__in=Extract.objects.none()) generates an IN (SELECT ... WHERE 1=0) subquery. Modern Django handles this cleanly -- confirm on the CI Postgres version.
  • The unconditional privacy exclusion in get_corpus_annotations (removal of the if not user.is_anonymous: guard) correctly fixes the anonymous leak: anonymous users now get visible_extracts_for(anon) = none(), so all extract-rooted non-structural annotations are excluded -- matching the documented contract.

Performance

N+1 potential in _source_privacy_recursion_passes (acknowledged in the docstring): The FK dereference instance.created_by_analysis hits the database when not prefetched. The docstring note "bulk callers SHOULD select_related" is correct and sufficient since user_can is a single-object check by design. No change needed, but flagging for awareness if a future bulk loop over relationships is added.

OR-union in source visibility builders: The three-queryset union produces separate SQL subqueries -- equivalent to the previous inline pattern and consistent with the rest of the codebase. No regression.


Code Quality

Strong points:

  • Extracting _source_privacy_recursion_passes to module-level removes duplication cleanly while keeping the recursion logic auditable in one place.
  • source_visibility.py is well-scoped with a clear module docstring explaining the "any permission row" grant-shape decision.
  • bool(getattr(user, "is_superuser", False)) for the structural write mask is safe attribute access.
  • Tests are comprehensive: cross-contamination checks (analysis grant does NOT unlock extract-rooted rows), creator retention, structural bypass, write-permission recursion, group grants verified on both surfaces.

Minor nit (not blocking): the visible_extracts_for comment "is_public is intentionally absent -- extract privacy has never keyed off the flag" is true but slightly confusing since Extract.is_public exists in the schema. Suggested: "is_public is absent here because ExtractManager is the authoritative anonymous denial gate -- for authenticated users the privacy subquery only needs to identify which extracts the user owns or holds a grant on."


Documentation

The guide overhaul is thorough. All stale QueryOptimizer references are updated. The new sections on get_corpus_documents dual semantics (#1682), E001 scope nuance, BaseService.filter_visible_qs, and the reference table additions (Note / UserFeedback / Embedding / CorpusAction / Notification) are all welcome.

The "Pitfall 1" correction in CLAUDE.md ("returns structural annotations only" instead of "returns empty") is a correctness fix -- good catch.


Test Coverage

Area Manager surface Service surface Filter/check parity Group grants
Relationship privacy recursion test_authorization_invariants test_relationship_privacy_scoping yes yes
Extract anonymous lockdown test_extract_anonymous_lockdown test_extract_anonymous_lockdown yes n/a
Annotation group-grant parity test_authorization_invariants test_source_visibility_group_grants yes yes
Structural myPermissions mask n/a test_structural_mypermissions_breakglass yes (vs user_can) n/a

Coverage looks complete. The existing test_query_optimizer_methods and test_stats updates are properly annotated as intentional behavior changes with rationale comments.


Summary

The implementation is correct, well-tested, and the guide cleanup significantly improves maintainability. No blocking issues found. The main gate is CI confirmation that all new permissioning invariant tests pass against the live Postgres database.

@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown

placeholder_review_body_v2

Reviewing the visible_extracts_for docstring surfaced a latent
filter/check parity violation (same class as the group-grant drift this
PR fixes): user_can's privacy recursion delegates to
Extract.objects.user_can, which grants READ on is_public extracts for
authenticated users, while the list-side privacy gates omitted the flag
for extracts (analyses always included it). An extract-rooted row on a
public extract therefore passed user_can(READ) but never surfaced in
visible_to_user or service listings — a public-extract fixture would
have failed the parity invariant.

visible_extracts_for now includes is_public on the authenticated branch
(anonymous still gets none — ExtractManager denies outright), mirroring
visible_analyses_for. No practical exposure today: no user-facing flow
sets Extract.is_public. Pinned by
test_public_extract_source_passes_both_surfaces; changelog updated.
@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown

Code Review — Permissioning Audit Gap Closure

This is a well-scoped, high-quality security hardening PR. The four agreed behavior changes are all correct, the test coverage is excellent, and the shared source_visibility.py module cleanly eliminates four divergent inline copies. The findings below are ordered by severity.


Security

1. source_visibility.py: any-permission match on guardian tables (intentional, but flag explicitly)

visible_analyses_for / visible_extracts_for include a row if the user holds any guardian grant on the source — not specifically a READ codename:

user_grant_ids = AnalysisUserObjectPermission.objects.filter(user=user).values_list(...)
group_grant_ids = AnalysisGroupObjectPermission.objects.filter(
    group_id__in=user.groups.values_list("id", flat=True)
).values_list(...)

A user who holds only CREATE or UPDATE on an analysis — but not READ — clears the privacy gate and sees the private annotations/relationships. The PR comment says "deliberately preserved from the original inline implementations", but those implementations may have inherited this from an earlier design assumption. If the intent is "source READ implies annotation visibility", tightening to codename="read_analysis" would more precisely enforce it. If the intent is "any grant = trusted collaborator", the current approach is fine, but the docstring should state this explicitly as intent rather than as a fidelity rationale.

2. ExtractManager.visible_to_user / ExtractManager.user_can — asymmetry in None-user handling

visible_to_user guards at the top and returns queryset.none() for None/anonymous. user_can calls resolve_user_for_user_can(user) first and only then checks is_anonymous. The behaviour is equivalent for all current inputs (both return false/empty for None), but the two code paths are structurally different. If resolve_user_for_user_can ever returns something truthy for an anonymous-like input that the base-class guard would not catch, user_can and visible_to_user could diverge. The invariant test suite (ExtractAuthorizationInvariantsTestCase) should catch any regression, but normalising via the same helper in both methods — or adding a comment explaining the equivalence — would make this easier to maintain.


Potential Bugs

3. _source_privacy_recursion_passes: stale FK dereference on deleted source

If the source Analysis/Extract was deleted after the relationship was written, source_analysis is None returns False — permanently denying everyone (including the original creator) the relationship. Whether this is the correct posture should be a deliberate decision; the docstring does not address it. If intentional, say so explicitly; if not, a tombstone check or creator bypass may be needed.

4. RelationshipManager.visible_to_userprivacy_gate built unconditionally

visible_analyses_for(user) and visible_extracts_for(user) are called unconditionally at the top of the method even for users who only need the creator short-circuit path. This is a minor performance nit (two subqueries built unnecessarily for owner-only access), not a correctness bug.


Performance

5. _source_privacy_recursion_passes: N+1 on bulk user_can calls

The docstring correctly warns about prefetching requirements, but neither AnnotationManager.user_can nor RelationshipManager.user_can have any mechanism to enforce that callers include select_related("created_by_analysis", "created_by_extract") on their root queryset. Any loop that calls obj.user_can(...) without that prefetch will hit 2 extra DB queries per object. Current mutation code paths operate on single objects so this is not a problem today, but a note in the manager-level docstring or a QuerySet.user_can_batch() helper would prevent future regressions.


Code Quality

6. visible_extracts_for omits is_public but ExtractManager.visible_to_user does not

visible_extracts_for deliberately excludes is_public ("no user-facing flow sets it"). But ExtractManager.visible_to_user delegates to BaseVisibilityManager, which does include is_public. This creates a latent parity gap: a user who can see an extract via visible_to_user (because is_public=True) cannot necessarily clear the annotation/relationship privacy gate. If an is_public=True extract were ever created (e.g. via admin command), the system would silently mismatch. Expressing this as a test assertion — or adding an invariant that no is_public=True extract exists in the lockdown suite — would make the implicit contract explicit.

7. get_corpus_annotations anonymous path change is correct but unexplained at the call site

The old guard if not user.is_anonymous: ... apply privacy exclusion is removed and the exclusion now runs unconditionally. The PR description flags this as a bonus fix (anonymous viewers of public corpora could previously see analysis-private annotations), but there is no comment at the call site explaining why the exclusion sits outside any auth guard. A one-line comment would help future readers understand this is intentional.


Documentation

The guide overhaul is thorough and accurate. A few minor notes:

  • The RemoveAnnotation mutation example in the guide uses BaseService.require_permission in a conditional: if annotation_obj is None or BaseService.require_permission(...). If require_permission raises on denial (rather than returning truthy), the conditional logic in the documentation snippet would be wrong. This is docs-only and not a runtime bug, but worth verifying it matches the actual method signature to avoid confusing readers.
  • The "Follow-up observations" section in the PR description is valuable; capturing those as GitHub issues or # TODO(issue #NNNN) inline comments would prevent them from being lost after merge.

Tests

Test coverage is comprehensive and well-structured. Specific positives:

  • The _UserCanInvariantsMixin matrix approach ensures filter/check parity automatically across all new user/instance combinations.
  • test_relationship_creator_retains_access_to_privacy_rooted_rows correctly pins the Q(creator=user) short-circuit.
  • test_structural_rows_bypass_privacy prevents regression of structural-row visibility.
  • test_anonymous_filter_check_parity for Extract is exactly the right design.

One gap: there is no test for item 3 above (relationship/annotation with a hard-deleted source analysis). If permanently denying access is the intended posture, a test that deletes the Analysis and confirms user_can returns False (and visible_to_user also excludes the row) would pin the behaviour explicitly.


Summary

The core changes are correct and close real security gaps. The PR is in good shape to merge once:

  1. CI is green (the PR description notes it was still running at submission time)
  2. The any-permission-gate question (item 1) is confirmed as intentional and the source_visibility.py docstring updated accordingly
  3. The deleted-source posture (item 3) is confirmed or addressed

Items 2, 4-7 are low-severity and can be addressed in follow-up.

…r_can)

Review follow-up that turned out to be a real parity bug, not a docstring
question: the privacy-gate builders matched ANY guardian row on the
source analysis/extract, while user_can recursion resolves READ via the
read codename — so an update_analysis-only grantee saw private rows in
lists while user_can(READ) denied them (the same filter/check violation
class this PR closes elsewhere). visible_analyses_for /
visible_extracts_for now filter permission__codename=read_analysis /
read_extract on both the user- and group-level tables; pinned by
test_update_only_source_grant_does_not_unlock_lists.

Also from the review round:
- _source_privacy_recursion_passes docstring documents the
  deleted-source posture (created_by_* FKs are on_delete=SET_NULL, so
  deletion turns rows into normal rows; the is-None branch is a
  fail-closed race guard, not a permanent lock)
- ExtractManager.user_can carries an equivalence note for its
  None/anonymous guard vs the visible_to_user surface
- get_corpus_annotations privacy exclusion now explains at the call
  site why it is intentionally unconditional (old anonymous guard was
  the leak)
- changelog fragment updated for the grant-shape tightening
@JSv4 JSv4 force-pushed the claude/magical-fermi-7h8do7 branch from edc1f56 to 7b189db Compare June 12, 2026 07:28

JSv4 commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator Author

Dispositions for the latest review (addressed in eec7f04 and 7b189db):

  1. Any-permission grant shape — upgraded from "document the intent" to a real fix: this was a live filter/check parity violation, not a defensible design choice. user_can's recursion resolves READ via the read codename, so an update_analysis-only grantee cleared the old list gates while failing user_can(READ). The builders now filter permission__codename="read_analysis" / "read_extract" on both the user- and group-level tables; pinned by test_update_only_source_grant_does_not_unlock_lists.
  2. ExtractManager None-handling asymmetry — added an equivalence comment; the invariant suite already pins the surfaces agreeing.
  3. Deleted-source posture — the review's premise doesn't hold: both created_by_* FKs are on_delete=SET_NULL (annotations/models.py), so deleting the source NULLs the FK and the row becomes a normal row — nobody is permanently locked out. The source is None branch fires only in the read-vs-SET_NULL race window and fails closed for that window by design. Now documented in the _source_privacy_recursion_passes docstring; no tombstone logic needed.
  4. Unconditional privacy_gate construction — the builders are lazy querysets composed into the same single SQL statement for every caller class; there's no separate owner-only query to optimize. No change.
  5. N+1 note — already addressed in the previous round (7b1069b): PERF comment at the relationship call site; the annotation manager docstring carries the full version.
  6. is_public extract gap — this review raced the push: fixed in eec7f04 exactly as described (authenticated branch of visible_extracts_for now includes is_public, pinned by test_public_extract_source_passes_both_surfaces). Anonymous remains fully denied via ExtractManager.
  7. get_corpus_annotations call-site comment — added: the exclusion is intentionally outside any auth guard; the old anonymous guard was itself the leak.

Docs note on require_permission was handled in 7b1069b (it returns ""/denial-string and never raises — the example now states this explicitly).


Generated by Claude Code

@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown

Code Review - PR 1985: Permissioning Audit Gap Closure

Summary: Security-focused PR closing four concrete gaps surfaced during a permissioning audit. Changes are well-scoped, well-documented, and backed by dedicated test files.


WHAT THIS PR DOES WELL

  1. DRY consolidation is correct and necessary.
    opencontractserver/utils/source_visibility.py is the right home for which-sources-can-this-user-see. Replacing four divergent inline copies in AnnotationQuerySet, AnnotationService.get_document_annotations, get_corpus_annotations, and RelationshipService.get_document_relationships with shared builders that honour group grants eliminates the class of drift that caused the original bug. The builders return lazy querysets composing as SQL subqueries without materializing extra rows.

  2. Filter/check parity maintained in lockstep.
    ExtractManager overrides both visible_to_user and user_can, denying anonymous on both surfaces. RelationshipManager follows the same pattern. Parity between the filter and check surfaces is the invariant the test suite pins.

  3. Structural myPermissions fix is correct.
    AnnotationService was masking structural rows to False for everyone including superusers; RelationshipService had no mask at all. Both now use When(structural=True, then=Value(user_is_superuser)) correctly reflecting the break-glass without adding a superuser short-circuit to the effective-permission path.

  4. get_corpus_annotations anonymous privacy leak is a meaningful bonus fix.
    The shared builder handles it automatically: anonymous maps to public-analyses-only / no extracts, so the exclusion applies unconditionally with no extra code path required.

  5. Test coverage is solid.
    Four new files with narrow scope: service+manager layers (test_extract_anonymous_lockdown), analysis/extract-rooted relationships (test_relationship_privacy_scoping), group-grant parity across list surfaces (test_source_visibility_group_grants), and superuser/non-superuser masking (test_structural_mypermissions_breakglass). The invariants test extension replaces the deferred-sentinel with real privacy invariants.


ISSUES AND OBSERVATIONS

  1. [Minor] _source_privacy_recursion_passes N+1 risk is documented but callers do not enforce it.

The docstring warns that instance.created_by_analysis / created_by_extract FK descriptors each hit the DB when relations are not prefetched. The user_can surface is called one instance at a time in mutations, so that is fine. But any future bulk caller that omits select_related will silently hit N+1 per row. A comment in RelationshipManager.user_can at the call site pointing to the select_related requirement would reduce risk for future callers.

  1. [Minor] ExtractManager.user_can double-resolves the user.

The method calls resolve_user_for_user_can(user) itself, then passes resolved to super().user_can(resolved, ...), which calls resolve_user_for_user_can again. Idempotent for a concrete User instance so it is safe, but slightly cleaner to pass the original user to super() and use resolved only for the anonymous gate, or to add a brief comment explaining why resolved is forwarded. Not a bug.

  1. [Design note] visible_extracts_for now includes is_public for authenticated users.

Semantically correct (matches user_can recursion) and no user-facing flow currently sets the flag. However, if a future migration or admin action sets is_public=True on an extract it would silently become visible in privacy gates for all authenticated users without necessarily being visible in ExtractService.get_visible_extracts. A comment in visible_extracts_for noting this and pointing to ExtractService would help the two surfaces stay in sync.

  1. [Pre-existing, tracked] Annotation-creator parity asymmetry.

Correctly documented in Follow-up observations: relationships have a creator short-circuit in _source_privacy_recursion_passes but annotations lack the corresponding creator exemption in AnnotationManager.user_can. A follow-up issue number in the docstring would help prevent this from being forgotten once an issue is filed.

  1. [Behavior change] Anonymous corpusStats now returns totalExtracts: 0.

test_stats.py correctly documents this with a BEHAVIOR CHANGE comment. For deployments with public corpora that anonymous users currently query this will be a visible GraphQL response change. Confirm downstream consumers (dashboards, embed widgets, public-facing pages) are aware before deploying.


DOCUMENTATION QUALITY

The permissioning guide update is thorough: stale QueryOptimizer references replaced, accurate code locations, get_corpus_documents dual-semantics documented, E001 scan-scope nuance explicit. The moderation section correction (EDIT does NOT grant moderation; CorpusModerator does) is particularly important since the guide was the source of the misconception.

One doc nit: the Privacy Filtering in List Queries code example uses ... for the anonymous branch. A one-line note that the ellipsis is intentionally abbreviated would help readers who try to copy the pattern.


VERDICT

Core changes are correct. Security fixes close real vulnerabilities: group-grant parity drift, relationship privacy recursion gap, anonymous extract exposure, anonymous corpus annotation leak. Test coverage matches stated invariants. Changelog fragments follow project convention.

Approve after confirming CI is green.

@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown

Code Review — PR #1985: Permissioning audit gap closure

Overview

This is a high-quality, security-focused PR that closes four substantive gaps between the permissioning guide and the implementation. The changes are well-scoped and thoroughly documented.


Strengths

DRY consolidation via source_visibility.py — Four divergent inline privacy gate implementations are collapsed into two shared builders. Each previously omitted group-level guardian grants, the extract is_public flag, and the grant-shape codename check. One canonical implementation means future fixes propagate everywhere.

Real security bugs fixed:

  • Anonymous viewers of a public corpus could see created_by_analysis-private annotations (the if not user.is_anonymous guard in get_corpus_annotations skipped privacy exclusion entirely for anonymous visitors).
  • Group-granted source permissions unlocked user_can but never list queries — a filter/check parity violation exploitable when users are granted source access via groups.
  • Analysis-/extract-rooted relationships were readable by anyone with doc+corpus grants — the Phase A — Per-model user_can() overrides + deprecate user_has_permission_for_obj (follow-up to #1637) #1655 Phase-C deferral was a real access-control hole.

Structural myPermissions mask — Masking _can_update/_can_delete to user.is_superuser on structural rows is subtle but important: the UI should not offer edit affordances for operations the mutation gate will refuse.

Test coverage — Four new test modules plus significant additions to the invariant matrix. Covering the full cross-product of (user grant type) x (surface) x (source type) is the right approach. test_update_only_source_grant_does_not_unlock_lists is particularly good — it pins the grant-shape fix that would otherwise be easy to miss.


Issues and Observations

1. get_corpus_annotations structural write mask not updated (minor)

get_document_annotations and get_document_relationships both receive the new user_is_superuser mask on structural rows. get_corpus_annotations in the diff does not appear to annotate with _can_update/_can_delete at all — if that is intentional (corpus view does not pre-compute per-row write permissions), a brief inline comment would prevent future confusion. If it does annotate somewhere, this is an omission.

2. Queryset union SQL shape in source_visibility.py

The Django | operator produces a UNION at the SQL level; when embedded as __in=queryset this becomes WHERE created_by_analysis_id IN (SELECT ... UNION SELECT ... UNION SELECT ...). PostgreSQL handles this fine, and the laziness is the right property for embedding as a subquery. Not a regression over the old pattern. Worth a benchmark if annotation lists on large-source tables become slow.

A single-filter alternative that compiles to one WHERE clause (no UNION):

return Analysis.objects.filter(
    Q(is_public=True)
    | Q(creator=user)
    | Q(id__in=user_grant_ids)
    | Q(id__in=group_grant_ids)
)

Not blocking, just worth considering.

3. No dedicated test for the anonymous corpus annotation leak

The PR description calls out get_corpus_annotations previously leaking analysis-private annotations to anonymous viewers. The fix (making the exclusion unconditional) is correct, but there is no test that specifically creates a public corpus + public document + private-analysis annotation and asserts that AnonymousUser cannot see it via get_corpus_annotations. Given this was a real leak, an explicit regression test would be valuable.

4. _skip_signals not used in new test fixtures

Per CLAUDE.md, signal handlers (notifications, etc.) can fire when creating Analysis, Extract, and Relationship fixtures. The new tests do not call _skip_signals on created instances. In a TestCase this rolls back per-test so state will not be corrupted, but signal overhead and any notification records created could interfere with unrelated assertions. Consider adding obj._skip_signals = True on fixture objects, or clearing notifications at the top of setUp.

5. require_permission truthiness inversion (documentation smell)

The guide now documents that require_permission returns "" (falsy) when GRANTED and a truthy string when DENIED. The inline comment in the example calls it out well, but the name implies a boolean or raise — the truthy-on-denial shape is a footgun when copied without the comment. This PR does not change the API, but since the guide now highlights it prominently, an alias or rename (e.g. permission_denied_reason) would reduce future bugs. Follow-up candidate.


Minor / Nits

  • ExtractManager.user_can calls resolve_user_for_user_can then checks getattr(resolved, "is_anonymous", True). The visible_to_user side uses the same getattr pattern before delegating to super(). Intentionally symmetric and pinned by test_manager_user_can_denies_anonymous. Good.
  • Changelog fragments are well-scoped with file paths and line numbers per convention.
  • The guide correctly flags the mention-autocomplete get_objects_for_user inline usage as a known documented exception to E001.

Summary

The core logic is correct. The security fixes are real and the test matrix is comprehensive. Three items worth addressing before merge: (1) clarify whether get_corpus_annotations intentionally skips the structural write mask, (2) add an explicit regression test for the anonymous corpus annotation leak, and (3) consider _skip_signals hygiene in the new fixtures. The queryset union shape and the require_permission API design are follow-up observations, not blockers.

@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown

[flake8]
extend-ignore = E203
max-line-length = 120
exclude = .tox,.git,/migrations/,/static/CACHE/,docs,node_modules,venv
per-file-ignores =
./config/settings/local.py:F405
./config/settings/production.py:F405
./config/settings/base.py:F405
./config/settings/test.py:F405
./config/settings/test_integration.py:F405
./opencontractserver/tasks/doc_tasks.py: E203
./opencontractserver/utils/pdf_tools.py: E203,F401
./opencontractserver/tests/fixtures/init.py: F401

[pycodestyle]
max-line-length = 120
exclude = .tox,.git,/migrations/,/static/CACHE/,docs,node_modules,venv

[isort]
line_length = 88
known_first_party = opencontractserver,config
multi_line_output = 3
default_section = THIRDPARTY
skip = venv/
skip_glob = **/migrations/*.py
include_trailing_comma = true
force_grid_wrap = 0
use_parentheses = true
profile = black

[end-of-file-fixer]
exclude = frontend,staticfiles

Mypy configuration lives in mypy.ini (see issue #1331 and

docs/typing/README.md). It is split out because the per-module

baseline allow-list is large.

[coverage:run]
include = opencontractserver/*
omit =
migrations
tests
/temp_
/tmp_
/pycache/

django_coverage_plugin (template coverage) is intentionally not loaded:

Coverage 7.x falls back to the legacy C trace function whenever any file

tracer plugin is configured, which silently defeats COVERAGE_CORE=sysmon

(a ~10x cheaper instrumentation path on Python 3.12+). The project owns 7

Django templates total, all 4xx/5xx pages plus a couple of admin views —

template coverage was a trivial slice of total coverage but the plugin was

forcing a ~150% test-suite wall-time tax. Restore the plugin if/when

coverage.py supports sysmon-compatible plugin tracers.

[coverage:report]
ignore_errors = True
skip_empty = True
skip_covered = False
exclude_lines =
pragma: no cover
def repr
raise AssertionError
raise NotImplementedError
if name == .main.:
if TYPE_CHECKING:
@AbstractMethod

From the deep review round, verified findings (each confirmed against
code before acting):

Privacy gates added to three surfaces that skipped created_by_* checks:
- AnnotationService.get_label_distribution_for_corpus (user-facing via
  corpusIntelligenceAggregates) leaked label names/counts of
  analysis-/extract-private annotations; now takes the requesting user
  (caller updated in config/graphql/corpus_queries.py)
- RelationshipService.get_relationship_summary counted private rows in
  the document-type summary
- RelationshipService.get_corpus_relationships (MCP corpus listing)
  listed private relationships; its corpus-as-gate document scope is
  retained deliberately per issue #1682 (MCP = pipeline-facing caller,
  the documented default) and now says so inline

ExtractService.get_visible_extracts parity with check_extract_permission
(pre-existing drift tightened while in the file): the extract leg
consulted only USER-level guardian rows with any codename, the corpus
leg used codename__contains=read substring matching; both legs now join
the group object-permission tables and match read_extract / read_corpus
exactly, mirroring user_can resolution.

Reviewer claims rejected after verification (documented instead):
- 'deleted source permanently locks rows' — created_by_* FKs are
  on_delete=SET_NULL; rows become normal rows. The source-is-None branch
  is a fail-closed race guard; docstring now says so.
- get_corpus_relationships 'should use MIN semantics' — sole caller is
  MCP (pipeline-facing), where corpus-as-gate is the documented default.

Also: AnnotationManager.user_can docstring no longer phrases the
structural break-glass as a 'superuser bypass'; get_corpus_annotations
documents why it returns an unannotated queryset (no per-row _can_*
mask) and why the privacy exclusion is unconditional; guide's
abbreviated AnnotationService block is labelled as such.

Tests: new test_privacy_gate_regressions.py (anonymous corpus leak,
label-distribution privacy, relationship aggregate/MCP listing privacy);
ExtractListingGrantParityTestCase (group grants unlock listing,
update-only extract grant does not); group-extract-grant counterparts in
test_relationship_privacy_scoping and
test_source_visibility_group_grants.

JSv4 commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator Author

Dispositions for the deep review (addressed in bdef2f2; every claim verified against code before acting):

🔴 1 — codename__contains="read": real and fixed, with one correction: it's pre-existing code, not new (this PR had only touched the anonymous branches of that file). Not exploitable today — no corpus codename besides read_corpus contains "read" — but tightened to exact matches anyway.
🔴 2 — missing group-level corpus grants in get_visible_extracts: confirmed and fixed — plus the twin gap the review missed: the extract leg also consulted only user-level rows (any codename). Both legs now join the group tables and match read_extract/read_corpus exactly, with ExtractListingGrantParityTestCase pinning both directions (group grants unlock; update-only extract grant does not).
🟠 3 — broad except (ImportError, Exception): confirmed pre-existing in BaseVisibilityManager, but the severity claim is inverted — the fallback drops guardian rows, so it under-discloses rather than over-discloses. Tightening the exception envelope on load-bearing base-manager code mid-PR risks surfacing unrelated latent errors; deferred to follow-ups.
🟠 4 — aggregate privacy gates: confirmed for both. get_label_distribution_for_corpus (user-facing via corpusIntelligenceAggregates) and get_relationship_summary now apply the created_by_* gate; regression pins in the new test_privacy_gate_regressions.py.
🟠 5 — get_corpus_relationships corpus-as-gate: the conditional resolves to "keep": its sole caller is mcp/tools.py, and per issue #1682 corpus-as-gate is the documented default for pipeline-facing callers. Documented inline with an explicit warning that any future GraphQL caller must switch to the _visible_to_user variant. It was missing the privacy gate though — added and pinned.
🟠 6 — anonymous ChatMessage THREAD restriction: confirmed real (ChatMessage's anonymous branch lacks the THREAD filter its Conversation counterpart has) but pre-existing and outside this PR's audited scope — added to follow-ups alongside the conversations group-grant gaps the review itself files as pre-existing.
🟠 7 — Datacell._validate_manual_entry: confirmed real (validation_config.get before the or {} guard) but unrelated to permissioning; follow-up.
🟡 8 — analysis_id == 0 sentinel: pre-existing API contract with the frontend's "__none__" mapping; renaming it touches the GraphQL contract — follow-up.
🟡 9 — single-filter + .distinct(): the single-filter form was already adopted in bdef2f2's predecessor; note the premise is off — queryset | OR-merges the WHERE clause (it's .union() that emits SQL UNION), and .distinct() is unnecessary because all predicates are single-table (no join fan-out).
🟡 10 — "Superuser bypass" docstring: fixed — now phrased as the structural-write break-glass with an explicit "NOT a blanket bypass" note.
Test gaps: group→extract counterparts added in both files. The weak assertions in test_query_optimizer_methods.py are pre-existing tests outside this change's behavior — left untouched per repo convention (noted as follow-up).


Generated by Claude Code

@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown

test comment

@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown

Code Review: Permissioning Audit Gap Closure

Summary: This is a well-scoped, well-documented security fix that closes four documented permissioning gaps and fixes multiple information leaks across the annotation, relationship, extract, and corpus layers. The changes are correct, the tests are strong, and the documentation is being kept in sync. Recommend merging with a few minor observations addressed.


What this PR does

  • Relationship privacy recursion (closes issue Phase A — Per-model user_can() overrides + deprecate user_has_permission_for_obj (follow-up to #1637) #1655 Phase-C): RelationshipManager.user_can and .visible_to_user now enforce created_by_analysis/created_by_extract source permissions exactly like annotations.
  • Extract anonymous lockdown: A new ExtractManager + ExtractService ensures anonymous users can never see extracts at any layer (manager, service, or privacy-gate subquery).
  • Group-grant parity: Four divergent inline privacy gate implementations consolidated into source_visibility.py, which correctly joins group object-permission tables.
  • Structural myPermissions fix: Annotation service was under-reporting (masking to False for everyone), relationship service was over-reporting (no mask at all). Both now mask to user.is_superuser.
  • Privacy leak fixes in aggregate surfaces: get_label_distribution_for_corpus, get_relationship_summary, and get_corpus_relationships all gained the missing privacy gate.

Positive highlights

  • DRY win: _source_privacy_recursion_passes and source_visibility.py replace four divergent inline copies. Correct architectural direction.
  • Invariant coverage is strong: test_authorization_invariants.py matrix extended with group-reader users and analysis-/extract-rooted fixtures — filter/check parity is mechanically pinned.
  • Test isolation: Each fix has its own dedicated test file. Naming is clear.
  • Changelog fragments include file paths, line numbers, and impact.
  • Documentation overhaul: Permissioning guide is now accurate; stale docstrings removed.
  • Fail-closed posture: getattr(user, "is_anonymous", True) treating unknown objects as anonymous is intentional and correct.

Observations and suggestions

Medium — Performance: user.groups evaluated twice per request in source_visibility.py

visible_analyses_for and visible_extracts_for each independently call user.groups.values_list("id", flat=True). These compile to two independent SQL subqueries. On hot paths like get_document_annotations where both are called back-to-back, the DB evaluates the same auth_user_groups subquery twice. Not a blocker, but consider accepting an optional user_group_ids parameter so callers on the hot path can pre-compute once:

user_group_ids = user.groups.values_list("id", flat=True)
visible_analyses = visible_analyses_for(user, user_group_ids=user_group_ids)
visible_extracts = visible_extracts_for(user, user_group_ids=user_group_ids)

Low — Annotation creator vs relationship creator short-circuit asymmetry

The PR flags this as a "latent parity edge": annotation-side user_can privacy recursion has no creator short-circuit, while AnnotationQuerySet's gate has Q(creator=user). Relationships handle this correctly. The asymmetry means an annotation creator lacking source READ could see their own annotation in lists but user_can(READ) returns False. This predates the PR — please ensure it is tracked as a dedicated follow-up.

Low — get_label_distribution_for_corpus new optional user parameter

The function gains user=None with anonymous fallback semantics. The only current production caller (resolve_corpus_intelligence_aggregates) correctly passes info.context.user. A quick grep for other callers would confirm no internal/test callers silently regress to anonymous behaviour.

Low — ExtractType GraphQL type has no get_queryset override (pre-existing, flagged in PR)

Related-field traversal (e.g. datacell -> extract) bypasses ExtractManager.visible_to_user. The lockdown at the extracts resolver and ExtractService layers is now enforced, but the related-manager route remains unguarded. Please file a dedicated tracking issue — PR description follow-up sections are ephemeral.

Informational — check_extract_permission anonymous guard placement

The new if getattr(user, "is_anonymous", True): return False, None before Extract.objects.get(id=extract_id) is the right IDOR posture: indistinguishable from "not found". Confirmed correct.

Informational — get_corpus_relationships corpus-as-gate retention

Deliberate use of get_corpus_documents per issue #1682 for the MCP pipeline caller is now documented inline with a clear warning for future callers. No action needed.


Test coverage assessment

Fix Test coverage
Relationship privacy recursion test_authorization_invariants.py (matrix + 6 new invariants), test_relationship_privacy_scoping.py
Extract anonymous lockdown test_extract_anonymous_lockdown.py + updated test_query_optimizer_methods.py + test_stats.py
Group-grant parity test_source_visibility_group_grants.py + invariant fixtures
Structural myPermissions break-glass test_structural_mypermissions_breakglass.py
Aggregate privacy leaks test_privacy_gate_regressions.py

Coverage is excellent. The invariant suite is the right place for matrix tests and is extended correctly.


Minor nits

  • _source_privacy_recursion_passes docstring is thorough but long for an internal helper. The "Deleted-source posture" and "Performance note" paragraphs would read better as inline comments at the specific branches they describe.
  • The # type: ignore[attr-defined] on instance.created_by_analysis / instance.created_by_extract are correct. A brief note that callers are always Annotation/Relationship instances (which declare these FKs) would clarify why the ignores are safe.

JSv4 commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator Author

Re the latest review round — no code changes warranted, two requests fulfilled:

  • Follow-up tracking: all out-of-scope items from the review rounds (annotation-creator parity edge, ChatMessage anonymous THREAD asymmetry, conversations group grants, the broad except fallback, ExtractType.get_queryset, Datacell validator, sentinel/API nits, and the claude-review workflow malfunction) are now durably tracked in Permissioning audit follow-ups (from PR #1985) #1986 rather than living only in the PR description.
  • get_label_distribution_for_corpus callers: grep confirms exactly two call sites — the corpusIntelligenceAggregates resolver (passes info.context.user) and the new regression tests (pass user=). Nothing regresses to the anonymous fallback.
  • user.groups "evaluated twice": declined as written — the builders return lazy querysets, so both group-id subselects compile into the single SQL statement of the enclosing query; there are no extra round-trips, and auth_user_groups is a small indexed table. Threading a user_group_ids= parameter through a security-critical module for an unmeasured micro-win isn't worth the API surface; tracked as item 11 in Permissioning audit follow-ups (from PR #1985) #1986 should profiling ever disagree.
  • Docstring reshuffling nits: leaving as-is deliberately — for security-critical helpers the consolidated docstring is the audit surface; scattering the deleted-source and performance notes across branches optimizes for casual reading at the cost of the next auditor.

Generated by Claude Code

@codecov

codecov Bot commented Jun 12, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Codecov flagged one uncovered line in shared/Managers.py: the
extract-side fail-closed branch of _source_privacy_recursion_passes
(created_by_extract_id non-null but the dereference yields None during
the read-vs-SET_NULL race window). The analysis-side twin was already
pinned in test_authorization_invariants_coverage.py via the
descriptor-cache simulation; this adds the mirrored extract-side test
using the same mechanism. Also serves as the correctly-framed
deleted-source-posture test requested in review (deletion itself nulls
the FK via SET_NULL and rows become normal rows — only the in-memory
race window hits this branch, and it fails closed).
@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown

test

@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown

Code Review - Permissioning Audit Gap Closure

This is a substantial, high-quality security PR that closes real information-disclosure vulnerabilities. The changes are well-structured, test coverage is comprehensive, and the PR description gives an accurate account of what changed and why.


Summary

What this PR does:

  1. Closes Phase-C deferral (Phase A — Per-model user_can() overrides + deprecate user_has_permission_for_obj (follow-up to #1637) #1655): RelationshipManager.user_can and RelationshipManager.visible_to_user now enforce created_by_analysis/created_by_extract privacy the same way annotations do.
  2. Locks extracts to authenticated-only access at every layer (ExtractManager, ExtractService).
  3. Fixes filter/check parity for group-granted source permissions — a user with a group-level analysis READ now appears in list queries the same way they pass user_can.
  4. Fixes get_corpus_annotations silently skipping the privacy gate for anonymous users (the old if not user.is_anonymous guard).
  5. Fixes myPermissions reporting for structural rows: was always False for annotations (under-reporting for superusers); was always raw doc+corpus values for relationships (over-reporting for non-superusers).
  6. Adds privacy gates to get_relationship_summary and get_corpus_relationships that were entirely missing.
  7. Fixes codename__contains="read" substring match in get_visible_extracts to exact codename="read_extract", and adds group grants to both legs.

The DRY refactor is particularly good: opencontractserver/utils/source_visibility.py consolidates four divergent inline builders that had independently drifted from user_can semantics.


Issues

Medium — visible_extracts_for uses queryset union while visible_analyses_for uses a single filter

In source_visibility.py, the two builder functions use different SQL shapes. visible_analyses_for uses a single Analysis.objects.filter(Q(is_public=True) | Q(creator=user) | Q(id__in=user_grant_ids) | Q(id__in=group_grant_ids)) — one WHERE clause. visible_extracts_for uses a queryset union:

visible = Extract.objects.filter(Q(creator=user) | Q(is_public=True))
return visible | Extract.objects.filter(id__in=user_grant_ids) | Extract.objects.filter(id__in=group_grant_ids)

The comment in visible_analyses_for says "Queryset | would OR-merge to the same SQL here", but this holds only when both querysets are unannotated, unsliced filters on the same model. When used as __in subqueries (the primary usage here), the union shape may generate less optimal SQL because PostgreSQL cannot always push a UNION subquery through a correlated outer reference.

Suggestion: Make visible_extracts_for consistent with visible_analyses_for:

return Extract.objects.filter(
    Q(is_public=True) | Q(creator=user) | Q(id__in=user_grant_ids) | Q(id__in=group_grant_ids)
)

Medium — get_label_distribution_for_corpus new user parameter: silent regression risk for non-GraphQL callers

The new signature adds user=None at the end. user=None is treated as anonymous (public analyses only, no extracts) — MORE restrictive than any authenticated user. The only call site updated in this PR (config/graphql/corpus_queries.py) is correctly updated. But any future or existing caller that omits user will silently under-count labels for authenticated users — the kind of regression that only surfaces in production after a corpus has private analysis-rooted rows.

Suggestion: Document in the docstring that user=None is conservative/anonymous and that every user-facing caller must pass the requesting user. Confirming there are no other callers would also be reassuring.


Low — get_corpus_relationships corpus-as-gate caveat needs a concrete caller reference

The comment "A future user-facing GraphQL caller MUST switch this to get_corpus_documents_visible_to_user" is vague enough that a new contributor could add a GraphQL resolver calling get_corpus_relationships without reading this warning. Adding the file path of the current sole caller (the MCP tool) would make the constraint more concrete and searchable.


Observation — Annotation-creator parity gap (correctly flagged in follow-up observations)

RelationshipManager.user_can has an explicit creator short-circuit before _source_privacy_recursion_passes; AnnotationManager.user_can does not. An annotation creator who later loses source-analysis access is denied by user_can(READ) but would appear in visible_to_user. This is called out accurately in the PR description's follow-up observations section. Out of scope here but worth a tracking issue.


Observation — ExtractManager.user_can double-resolution coupling

ExtractManager.user_can calls resolve_user_for_user_can, then passes the already-resolved User to super().user_can(). This relies on BaseVisibilityManager.user_can calling resolve_user_for_user_can again as an idempotent no-op for a real User instance. It is pinned by the invariant test suite, so low-risk today, but the coupling is subtle and worth noting for future refactors of the resolution path.


Positive call-outs

_source_privacy_recursion_passes extraction: Moving from a private instance method (_check_annotation_privacy_recursion) to a module-level function shared between AnnotationManager and RelationshipManager is the right abstraction. The deleted-source race-window posture (fail closed during the SET_NULL window) and the select_related performance note are both worth having in the docstring.

get_corpus_annotations fix: Removing the if not user.is_anonymous guard and making the privacy gate unconditional (with anonymous semantics encoded inside the shared builders) is cleaner than the old conditional block and closes the actual leak.

check_extract_permission IDOR posture: Denying anonymous users before any DB lookup — response indistinguishable from "not found" — is correct.

get_visible_extracts codename fix: Changing codename__contains="read" to exact codename="read_extract" removes a theoretical footgun where any update_read_... codenames would have bypassed the gate.

myPermissions superuser fix: Annotation-side mask was Value(False) for ALL users on structural rows (wrong for superusers — UI hid write affordances it should show for admins); now correctly Value(user.is_superuser). Relationship side had no structural mask at all; now consistent with annotations.

Test quality: Each new test file covers a specific audit finding, uses named fixtures, and has assertion messages that name the specific leak being guarded against. The matrix-based invariant approach in test_authorization_invariants.py makes coverage claims auditable at a glance.

test_stats.py behavior change: The comment clearly marks anonymous totalExtracts: 0 as an intentional behavior change, not a test bug. The separate assertion block for the anonymous case is cleaner than the original shared loop.

Changelog fragments: Both fixed and security fragments are present and use the correct changelog.d/ pattern.


Overall this is a solid PR. The two medium items (SQL shape inconsistency in source_visibility.py, silent-regression risk on get_label_distribution_for_corpus) are worth addressing before merge, but they are incremental improvements on a foundation that is already materially more correct than what it replaces.

- visible_extracts_for now uses the same single-filter OR shape as
  visible_analyses_for (the review-round conversion only updated the
  analyses builder — behavior-identical, but the two security-critical
  builders should not diverge in shape)
- get_label_distribution_for_corpus docstring states explicitly that
  user=None is the most-restrictive anonymous shape (omission
  under-counts, never leaks) and names the current call sites
- get_corpus_relationships corpus-as-gate caveat now names its sole
  caller (opencontractserver/mcp/tools.py) so the #1682 constraint is
  searchable

JSv4 commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator Author

Both mediums addressed in f614ed9:

  • Builder shape inconsistency — real oversight from the earlier conversion (only the analyses builder was updated): visible_extracts_for now uses the identical single-filter OR shape. One precision note: Django's queryset | OR-merges the WHERE clause for same-model unannotated filters — it never emits SQL UNION (that's .union()) — so the change is shape-consistency, not a query-plan fix; behavior and SQL are equivalent either way.
  • user=None regression risk — docstring now states explicitly that None is the most-restrictive anonymous shape (omission under-counts for authenticated viewers; it can never leak) and names both current call sites, which remain the only ones (re-grepped).
  • Corpus-as-gate caveat — now names the sole caller (opencontractserver/mcp/tools.py) so the Reconcile CorpusObjsService corpus-as-gate vs GraphQL MIN-permission semantic #1682 constraint is searchable.

The two observations (annotation-creator parity, ExtractManager double-resolution coupling) are already tracked as items 1 and (inline-documented) in #1986.


Generated by Claude Code

@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown

Code Review

Overview

This PR closes substantive permissioning audit gaps across four agreed behavior changes: relationship privacy recursion, anonymous extract lockdown, group-grant list parity, and structural myPermissions break-glass correction. The documentation overhaul is thorough and the test coverage is genuinely good. Overall this is a high-quality security-hardening PR.


Positives

  • DRY consolidation in source_visibility.py: four duplicated inline privacy-gate implementations replaced with two shared lazy-queryset builders. The module docstring is one of the clearer explanations of a subtle invariant in the codebase.
  • Filter/check parity test discipline: every behavior change has an explicit paired test asserting both visible_to_user and user_can agree (e.g. test_anonymous_filter_check_parity, test_listing_matches_user_can_after_group_grant). This is exactly the right pattern.
  • Fail-safe defaults: getattr(user, "is_anonymous", True) fails closed when the argument has no is_anonymous attribute; user=None in get_label_distribution_for_corpus defaults to anonymous (most-restrictive) rather than leaking.
  • Old sentinel test replaced properly: test_no_privacy_widening_via_created_by_analysis documented a known deferral; replacing it with six real invariant tests rather than deleting it silently is the right call.
  • Changelog fragments are correctly per-file and well-scoped.

Issues

1. get_label_distribution_for_corpus(user=None) — silent under-count for authenticated callers (annotation_service.py:629)

The docstring says "every user-facing caller MUST pass the requesting user" and the GraphQL resolver was correctly updated (corpus_queries.py:+43). However, the user=None default means any future call site that omits user silently returns an under-count for authenticated users with no error or warning. Given this parameter is now load-bearing for correctness, consider either making it required (remove the default) or emitting a warnings.warn when user is None. The fail-safe direction is correct; the silence is risky.

2. Double resolve_user_for_user_can in ExtractManager.user_can (extracts/models.py)

resolved = resolve_user_for_user_can(user)
if resolved is None or getattr(resolved, "is_anonymous", True):
    return False
return super().user_can(resolved, ...)  # super() calls resolve_user_for_user_can(resolved) again

The docstring notes this is idempotent for a User instance — true. The minor maintenance risk is that the anonymous guard above is only effective because super().user_can also re-resolves; if a future refactor removes the second resolution, the guard's intent becomes load-bearing in a non-obvious way. A cleaner shape: return super().user_can(user, ...) (pass the original) and let the single resolution in the base class do the work.

3. user parameter in get_relationship_summary lacks a type annotation (relationship_service.py:251)

def get_relationship_summary(cls, document_id: int, corpus_id: int, user) -> dict:

Every other parameter is type-annotated. Missing annotation reduces mypy coverage on the privacy gate path added in this PR.

4. Exclusion pattern repeated across six call sites (low priority)

The privacy-gate exclusion triple:

.exclude(Q(created_by_analysis__isnull=False) & Q(structural=False) & ~Q(created_by_analysis__in=visible_analyses_for(user)))
.exclude(Q(created_by_extract__isnull=False)  & Q(structural=False) & ~Q(created_by_extract__in=visible_extracts_for(user)))

appears in AnnotationQuerySet.visible_to_user, AnnotationService.get_document_annotations, get_corpus_annotations, get_label_distribution_for_corpus, RelationshipService.get_relationship_summary, and get_corpus_relationships. The sources are consolidated in source_visibility.py; consider also extracting apply_source_privacy_gate(qs, user) there so the exclusion logic itself has one home. Not a blocker — the repetition is legible and identical across sites.


Verification request (pre-merge)

The follow-up observations section notes that ExtractType has no get_queryset override, so related-field traversal (e.g. datacell -> extract) relies on parent gating. ExtractManager.visible_to_user now enforces the anonymous lockdown at the manager layer, but a DataCell.objects.filter(extract__is_public=True) traversal on an anonymous request may bypass the override. The PR marks this pre-existing, which is fine — just flagging so it gets a tracking issue if it does not already have one.


Summary

Approve with the items above addressed or acknowledged. Issue 1 (user=None default) is the highest-risk item — a one-line fix eliminates a silent-failure mode for future callers. Issues 2–4 are style/low-priority. The security fixes are correct, the invariant tests are thorough, and the documentation overhaul significantly improves the accuracy of the reference guide.

…quire user on label aggregate

Review round follow-ups:

- The created_by_* exclusion triple appeared at six call sites
  (annotation document/corpus listings, label distribution, relationship
  document listing/summary/MCP corpus listing). It now has one home:
  apply_source_privacy_gate(qs, user) in utils/source_visibility.py,
  alongside the source builders. The positive-Q composition inside
  AnnotationQuerySet.visible_to_user keeps its own shape (needs the
  creator disjunct and single-WHERE composition with the doc/corpus
  EXISTS predicates) and the helper docstring cross-references it.
- get_label_distribution_for_corpus's user parameter is now REQUIRED and
  keyword-only: omission is a TypeError at the call site instead of a
  silent anonymous-shaped under-count. Both existing callers already
  passed it by keyword.

Declined from the same review, with rationale in the PR thread:
forwarding the original (unresolved) user to super().user_can in
ExtractManager — forwarding the resolved User is the robust direction
if the base class ever stops resolving; and annotating just one user
parameter in relationship_service.py, where the file convention leaves
user untyped throughout.
@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown

placeholder

@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown

Code Review — PR #1985: Permissioning Audit Gap Closure

This is a substantial and well-structured security PR. The audit surfaced four real bugs, each addressed with deliberate intent and matched invariant tests. Overall quality is high. Findings below are in descending priority.

Critical / Correctness

1. Creator parity gap in user_can vs visible_to_user for relationships (and annotations)

The PR body acknowledges this as a follow-up observation, but it deserves explicit attention: RelationshipManager.visible_to_user adds Q(creator=user) to the privacy gate so the row's creator always sees it in list queries. But _source_privacy_recursion_passes — shared by AnnotationManager.user_can and RelationshipManager.user_can — has no creator exemption. A user who created a privacy-rooted relationship but later lost access to the source analysis/extract would see it in lists (via visible_to_user's Q(creator=user) disjunct) but be denied READ on the single-object check (via user_can).

The invariant tests check shared_reader (never the creator) against privacy-rooted rows, so this parity gap is currently undetected by the test suite. In practice the scenario is unlikely (these rows are system-created via pipeline), but the filter/check parity contract the PR itself establishes makes this a documented gap. A follow-up issue or sentinel test would close this cleanly.

Security Bugs Fixed — Confirmed

2. Extract anonymous visibility leak

The previous get_visible_extracts exposed public extracts in public corpora to anonymous users — a clear drift from the guide's documented "Extract — never" contract. The three-layer fix (manager, service, privacy gate in visible_extracts_for) is correct and consistent. The test_stats.py change capturing the behavioral change to totalExtracts: 0 for anonymous users is the right way to document it.

3. Group-grant filter/check parity

The prior inline copies of the source-visibility gate only consulted UserObjectPermission tables, so a viewer whose analysis/extract READ grant came via a Django group passed user_can (group-resolving by default) but never saw the private rows in lists. The new visible_analyses_for / visible_extracts_for builders correctly join both user and group tables. Consolidating the four inline copies into one module is the right DRY call here.

4. Label distribution privacy leak

get_label_distribution_for_corpus previously counted analysis-/extract-private annotation labels for any viewer with doc+corpus READ, leaking label names and counts for rows they could not actually see. Applying apply_source_privacy_gate before aggregation is correct. The new user parameter is keyword-only and required (TypeError at the call site, no silent under-count) — good API design.

Minor Concerns

5. Breaking signature change — verify no other callers

get_label_distribution_for_corpus(user=...) is now required-keyword. The PR updates config/graphql/corpus_queries.py (the only apparent call site), but the E001 scanner only covers config/graphql/. Worth confirming no Celery tasks or test helpers call the old positional signature: grep -r "get_label_distribution_for_corpus" opencontractserver/

6. Double resolution in ExtractManager.user_can

ExtractManager.user_can calls resolve_user_for_user_can and then forwards the resolved user to super().user_can(resolved, ...). The comment notes this is an idempotent no-op for a resolved User instance (correct — BaseVisibilityManager would run the same resolver again). This is safe but a slight code smell. Not a bug — flagged for consideration.

7. apply_source_privacy_gate sync comment is load-bearing

The exclude-pair shape in apply_source_privacy_gate must stay in sync with AnnotationQuerySet.visible_to_user's positive-Q shape. These two paths express the same logic in different syntactic forms and are not type-checked against each other. A test that runs both paths on the same queryset state would pin this more tightly — test_source_visibility_group_grants.py covers the service path but not the queryset path directly.

Positive Observations

  • The module-level _source_privacy_recursion_passes promotion (from AnnotationManager instance method to a shared free function) is the correct refactor to enable reuse with RelationshipManager.
  • Replacing codename__contains="read" with exact codename="read_extract" / codename="read_corpus" in get_visible_extracts closes a latent privilege-escalation vector (a read_extract_admin permission would have matched the old substring).
  • The myPermissions structural write mask fixes are clean: both annotation and relationship services now correctly report user.is_superuser for structural _can_update/_can_delete rather than hardcoded False (under-reporting for superusers) or unchecked doc+corpus True (over-reporting for normal users).
  • test_stats.py behavioral-change documentation (comment + assertion) is exactly the right way to handle a deliberate semantic change in a test that was previously passing.
  • Changelog fragments follow the project's required format.

Summary

Four real security gaps closed with good DRY consolidation and well-scoped invariant tests. The main open item is the creator parity gap in user_can (acknowledged in the PR body as out-of-scope; recommend a follow-up issue). The get_label_distribution_for_corpus caller check is worth a quick grep before merge. Everything else is either correct or low-risk.

 item 1)

The known annotation-side divergence — a row's creator without source
access appears in visible_to_user (the queryset gate's creator disjunct)
but is denied by user_can(READ) (no creator exemption in the privacy
recursion) — was documented in the PR, issue #1986, and an in-code
pointer, but nothing executable pinned it. This sentinel asserts the
CURRENT divergent behavior with both assertions referencing the issue,
mirroring the retired Phase-C deferral sentinel's idiom: whoever fixes
the asymmetry must consciously flip both assertions together. It also
encodes why the invariant matrix excludes creator-of-row-without-source
fixtures (they would fail read-equivalence by design until the fix).

Locally verified: the coverage file passes 51/51 with the sentinel.

JSv4 commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator Author

Latest round addressed in 4198242:

  • Item 1 (creator parity gap) — the follow-up issue has existed for several rounds (Permissioning audit follow-ups (from PR #1985) #1986 item 1, plus the in-code pointer at the queryset gate); the genuinely new suggestion was the sentinel test, now added: test_annotation_creator_parity_gap_sentinel pins the current divergence (creator-of-row visible in lists, denied by user_can(READ)) with both assertions referencing the issue — same idiom as the Phase-C deferral sentinel this PR retired. It also documents why the invariant matrix excludes that fixture shape. Verified locally (coverage file 51/51).
  • Item 5 (caller grep) — answered for the fourth time, same result: the resolver and the regression tests are the only call sites; both pass user by keyword. The grep result is also recorded in the method's docstring.
  • Item 7 (queryset-path parity test "missing") — factually stale: test_gate_matches_queryset_visibility_for_non_creator (added several rounds ago) compares apply_source_privacy_gate membership against Annotation.objects.visible_to_user membership row-by-row, pre- and post-grant — that is the queryset path.
  • Item 6 — previously declined twice with rationale (forwarding the resolved User is the robust direction); the reviewer's own assessment this round is "not a bug".

With this, every distinct suggestion across all twelve review rounds has been either implemented, pinned by a test, tracked in #1986, or declined with evidence on this thread.


Generated by Claude Code

@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown

Code Review — PR #1985: Permissioning Audit Gap Closure

Overall Assessment: Strong security PR. Recommend merge with minor follow-up tracking.

This PR closes a cluster of real security vulnerabilities surfaced during an audit: relationship privacy bypass, extract anonymous access leak, group-grant filter/check parity drift, and several aggregate/listing surfaces that skipped the privacy gate entirely. The audit was thorough and the implementation is high quality. Findings below are roughly ordered by severity.


Issues worth tracking (not blockers)

1. Annotation creator asymmetry is now explicitly documented but still a real parity violation (QuerySets.py comment, issue #1986 item 1)

The code at the Q(creator=user) disjunct in AnnotationQuerySet.visible_to_user now carries a KNOWN ASYMMETRY comment: a creator who lost source access passes the list gate while user_can(READ) denies them. The comment says "tracked, not accidental" — which is correct and appreciated — but the consequence is that visible_to_user and user_can diverge for a real class of users (creator who no longer holds the source permission). The invariant suite pins filter/check parity as a hard property across the whole codebase; this named exception weakens the guarantee. A follow-up test fixture that explicitly documents this divergence (and what the UI consequence is) would make the exception legible at CI time, not just in a code comment.

2. test_stats.py removes anonymous_client from the extract-count assertion but doesn't add the complementary totalExtracts: 0 check

# test_stats.py (after PR)
for client in [self.owner_client, self.collaborator_client, self.regular_client]:
    # anonymous_client removed — correct, it now sees 0
    ...

There's no assertion anywhere in the test that anonymous_client sees totalExtracts: 0 after the lockdown. The behavioral change is correct; the test just doesn't pin the new anonymous behavior. A small addition would make the regression surface immediately if the lockdown drifts again:

anon_result = self.anonymous_client.execute(query, variable_values=variables)
self.assertEqual(anon_result["data"]["corpusStats"]["totalExtracts"], 0)

3. _source_privacy_recursion_passes performance note is correct but invisible to callsite linters

The docstring says: "Callers that invoke user_can in a loop MUST select_related('created_by_analysis', 'created_by_extract') on their root queryset." This is a correct and important note, but it's only documentation — nothing prevents a future service method from calling user_can in a loop on a plain queryset and incurring N queries per row. Consider a separate manager-level assertion or at least a test fixture that verifies the known loop callers (if any exist today) do prefetch.


Code quality observations (positive)

source_visibility.py is excellent DRY practice. Consolidating four divergent inline copies into one home — with proper docstrings explaining the security rationale, the is_public extract flag history, the corpus-gating non-regression, and why the gate is not corpus-gated here — is exactly the right approach. The apply_source_privacy_gate function + two builders pattern is clean.

ExtractManager anonymous lockdown is correctly layered. Both visible_to_user and user_can are overridden in lockstep, with resolve_user_for_user_can used to normalize None inputs consistently. The comment acknowledging the idempotent double-resolution (base class re-resolves a User instance as a no-op) is appreciated.

The _source_privacy_recursion_passes deleted-source posture is explicitly documented. The on_delete=SET_NULL race window behavior is explicitly called out — failing closed in the race window is the right choice and it's good to see it documented.

The grant-shape codename fix is a genuine parity fix, not just a refactor. The previous Exists(ExtractUserObjectPermission.objects.filter(user=user, content_object_id=OuterRef("id"))) matched ANY guardian row regardless of codename, so an update_extract-only grantee saw private rows in lists while failing user_can(READ). The new permission__codename="read_extract" filter is semantically correct. This fix is pinned by test_update_only_source_grant_does_not_unlock_lists — good.

Test suite design is strong. Five new test files each with a single responsibility, plus invariant matrix extension, plus updated existing tests. The ExtractAuthorizationInvariantsTestCase, ExtractAnonymousLockdownTestCase, RelationshipServicePrivacyScopingTestCase, AnnotationServiceGroupGrantTestCase, and StructuralMyPermissionsBreakGlassTestCase all pin independent security properties. Using setUp fixtures rather than setUpClass is appropriate given the test interaction risk.

Replacing test_no_privacy_widening_via_created_by_analysis with real invariants is the right call. The old test documented a deferral with assertIsNone(self.relationship.created_by_analysis_id) — effectively pinning nothing. The replacement tests pin the actual intended behavior. The PR description explains the transition clearly.


Minor style observations

  • The corpus_queries.py one-liner (user=info.context.user) is the only change to a config/graphql/ resolver, and it's the correct fix for the now-required user= keyword arg to get_label_distribution_for_corpus. No concern there.
  • documents/services/relationships.py docstring cleanup (removing stale superuser-bypass comments) follows the project's no-dead-docstring convention. Good hygiene.
  • The CLAUDE.md pitfall Bump traefik from v2.8.7 to v2.9.1 in /compose/production/traefik #2 update (ConversationQueryOptimizerConversationService) is accurate and needed.

Pre-merge checklist

  • CI green (Docker Hub rate limiting was a factor per PR body — confirm the full test matrix passes, especially the new permissioning tests)
  • test_stats.py anonymous case for totalExtracts: 0 (minor — could be a follow-up)
  • Issue Permissioning audit follow-ups (from PR #1985) #1986 item 1 (annotation creator asymmetry) tracked in the issue tracker

JSv4 commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator Author

No changes needed for this round — all three tracked items predate it:

  1. CI-legible fixture for the creator asymmetry — landed one commit before this review posted: test_annotation_creator_parity_gap_sentinel (4198242) pins the divergence at CI time with both assertions referencing Permissioning audit follow-ups (from PR #1985) #1986 item 1, exactly as suggested.
  2. "No totalExtracts: 0 assertion for anonymous" — factually incorrect against the diff: test_stats.py:145-153 contains the dedicated anonymous block (self.assertEqual(stats, {**expected_stats, "totalExtracts": 0}), with the BEHAVIOR CHANGE comment) — added in 3b7df6f when the anonymous case was split out of the shared loop. The review appears to have quoted the loop without the block that follows it.
  3. N+1 loop-caller enforcement — there are no loop callers of user_can over privacy-rooted rows today (all call sites are single-object checks), so a fixture verifying "known loop callers prefetch" would assert over an empty set. The MUST-strength requirement is documented at the function and at the RelationshipManager call site; adjacent concern is already Permissioning audit follow-ups (from PR #1985) #1986 item 11.

Checklist status: CI was fully green on the bdef2f2 line (full-suite pytest 8,598 passing + codecov patch checks); the subsequent commits are locally verified (permissioning directory, invariants, gate consumers, coverage file 51/51) and the current tip's run is in flight. #1986 item 1 exists and is now triply referenced (issue, in-code pointer, sentinel test).


Generated by Claude Code

@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown

Code Review

This is a well-structured security audit PR that closes four real information-disclosure vulnerabilities. The changes are correct, consistently applied at all enforcement layers, and thoroughly tested.

Bugs / Correctness Issues

None blocking. The four targeted vulnerabilities are correctly fixed:

  1. Anonymous annotation leak on public corpus - the old guard in get_corpus_annotations skipped privacy filtering entirely for anonymous viewers. The unconditional apply_source_privacy_gate(qs, user) replacement is correct.

  2. Extract anonymous lockdown - enforced at all three layers (manager, service, GraphQL resolver) with filter/check parity tests. check_extract_permission correctly returns (False, None) (IDOR-safe posture), and the corpus stats test is updated to expect totalExtracts: 0 for anonymous.

  3. Group grants not honored in list queries - source_visibility.py consolidates four divergent inline copies and correctly joins the group-permission tables, matching what user_can privacy recursion already did.

  4. Label distribution leak - the required keyword-only user parameter is correct; the only two callers are the GraphQL resolver (updated in this PR) and the new test. Omission is a TypeError - no silent under-count risk.

Minor Issues Worth Noting

1. Deleted-source exposure is underdocumented (low)

_source_privacy_recursion_passes explains that on_delete=SET_NULL means deleting an Analysis/Extract un-gates its rows — those annotations/relationships become normal public rows visible to everyone. This is documented in the code comment but does not appear in the permissioning guide update. An operator who deletes a private analysis expecting data to stay private would be surprised. Recommend adding a note to the guide's Source Privacy banner.

2. Acknowledged annotation-creator parity asymmetry (issue #1986 item 1)

The KNOWN ASYMMETRY comment in AnnotationQuerySet.visible_to_user is correctly placed — a creator who lost source access passes the list gate (Q(creator=user)) but would be denied by user_can(READ). Relationships resolved this with a creator short-circuit; the annotation side is deferred. Confirming it is on the radar.

3. get_visible_extracts None guard asymmetry (low)

check_extract_permission uses getattr(user, "is_anonymous", True) defensively, but get_visible_extracts calls user.is_anonymous directly. All current callers pass a non-None user, so this is safe in practice, but the two surfaces are inconsistent. A one-liner if user is None or user.is_anonymous: would align them.

4. Double resolve_user_for_user_can in ExtractManager.user_can (trivial)

The manager resolves the user then passes resolved to super().user_can(), which calls resolve_user_for_user_can again. Idempotent and acknowledged in the comment. Fine as-is.

Pre-existing Issue (not introduced by this PR)

DocumentRelationshipService.get_visible_relationships pre-computes _can_update/_can_delete as creator-only, while user_has_permission() uses MIN(source, target, corpus). A user with a guardian UPDATE grant who is not the creator would see the row as read-only in the UI but pass the actual mutation check. Worth a follow-up.

Strengths


Overall: Approve with the suggestion to add a note to the permissioning guide about the deleted-source exposure behavior (item 1 above). The rest are optional clean-ups.

…visible_extracts

Review round follow-ups (the round's remaining items were already done
or are tracked in #1986):

- The guide's Source Privacy banner now states the on_delete=SET_NULL
  consequence explicitly: deleting a private Analysis/Extract un-gates
  its rows (they become ordinary doc+corpus-governed rows) — an
  operator deleting a source expecting continued privacy would be
  surprised. Previously documented only in the
  _source_privacy_recursion_passes docstring.
- get_visible_extracts now uses the same fail-closed None/anonymous
  guard shape as check_extract_permission and ExtractManager (both
  reads: the top lockdown branch and the corpus-scope branch — a bare
  user.is_anonymous would AttributeError on None rather than fail
  closed).
- #1986 gains item 6: DocumentRelationshipService.get_visible_relationships
  pre-computes _can_update/_can_delete as creator-only while
  user_has_permission computes MIN(source, target, corpus) — a
  pre-existing myPermissions under-report verified during this round,
  out of scope here.

JSv4 commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator Author

Round addressed in 344d608:

  • Deleted-source semantics in the guide — adopted, the round's one genuinely new finding: the Source Privacy banner now states the on_delete=SET_NULL consequence (deleting a private Analysis/Extract un-gates its rows into ordinary doc+corpus-governed rows; delete or re-scope the rows first if they must stay private). One precision note: the rows don't become "visible to everyone" — doc+corpus permissions still apply — but the operator-surprise point stands and is now documented where operators look.
  • get_visible_extracts None-guard asymmetry — adopted: both user.is_anonymous reads (lockdown branch and corpus-scope branch) now use the same fail-closed user is None or getattr(...) shape as check_extract_permission and the manager. (A bare read would have failed loud, not open, on None — but consistency is right.)
  • DocumentRelationshipService.get_visible_relationships myPermissions under-report — verified real (creator-only Case at documents/services/relationships.py:263-280 vs MIN-based user_has_permission) and pre-existing in a method this PR never touched; now tracked as Permissioning audit follow-ups (from PR #1985) #1986 item 6. Same mirror principle as this PR's structural-mask fix, but fail-safe in direction and out of scope here.
  • Items 2 and 4 were already resolved (sentinel 4198242; documented idempotence).

Generated by Claude Code

@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown

Code Review — Permissioning Audit Gap Closure

Overview

This is a security-critical PR that closes multiple real vulnerabilities found during an audit of the permissioning guide against the actual code. Four behavior changes were agreed with the maintainer; the rest is doc/comment cleanup. The scope is large (24 files, +2479/-534) but well-organized: each fix has its own invariant tests and a dedicated changelog fragment.


Security Fixes — Correct and Necessary

1. Relationship privacy recursion (closes #1655 Phase-C)
The _source_privacy_recursion_passes module-level function replacing the old _check_annotation_privacy_recursion instance method is a clean DRY consolidation. The creator short-circuit in RelationshipManager.user_can correctly runs before the privacy recursion so relationship creators retain READ on their own analysis-/extract-rooted rows, matching the Q(creator=user) disjunct in the queryset gate. Filter/check parity is pinned by the new RelationshipAuthorizationInvariantsTestCase matrix.

2. get_corpus_annotations anonymous leak fix
Removing the if not user.is_anonymous guard that completely skipped privacy filtering for anonymous visitors is a real security fix. The guard was backward — it made anonymous users less filtered than authenticated users. Correctly delegating the anonymous case to apply_source_privacy_gate (which routes through visible_analyses_for, returning public analyses, and visible_extracts_for, returning none) is right.

3. Group-granted source permissions — source_visibility.py
The centralized visible_analyses_for / visible_extracts_for builders consolidating four divergent inline copies are well-designed. The grant-shape fix (matching read_analysis / read_extract codename exactly rather than any guardian row) is an important filter/check parity correction: an update_analysis-only grantee was previously unlocking list visibility while failing user_can(READ). The sentinel test test_update_only_source_grant_does_not_unlock_lists correctly pins this.

4. ExtractManager anonymous lockdown
The user_can override correctly uses resolve_user_for_user_can first, then forwards the resolved User instance to super() so the base class does not double-resolve. The visible_to_user override correctly falls closed when user is None. Both surfaces agree, pinned by ExtractAuthorizationInvariantsTestCase.

5. Aggregate surfaces (get_label_distribution_for_corpus, get_relationship_summary, get_corpus_relationships)
The privacy gates on these summary surfaces are necessary — leaking label names/counts for private rows is a genuine information disclosure even without exposing the rows themselves. Making user a required keyword-only argument to get_label_distribution_for_corpus (TypeError on omission, never a silent under-count) is the right API design for a security parameter.

6. Structural myPermissions break-glass
Fixing AnnotationService (was masking structural writes to False for everyone, silently under-reporting for superusers) and RelationshipService (was reporting raw doc+corpus writes for structural rows, over-reporting for normal users) to both mask to user.is_superuser is correct and the UI will now agree with what mutations will actually allow.


Issues and Concerns

[SECURITY] Known annotation creator parity gap remains open (issue #1986 item 1)
The sentinel test test_annotation_creator_parity_gap_sentinel documents a real filter/check divergence:

  • Annotation.objects.visible_to_user(creator) admits the row via Q(creator=user)
  • annotation.user_can(creator, READ) denies the creator via _source_privacy_recursion_passes (no creator exemption on the annotation side)

Relationships resolved this with an explicit creator short-circuit. Annotations have not. This means an annotation's own creator who has no source access appears in list queries but gets mutation errors — inconsistent UX at minimum, and a parity violation that could produce surprising behavior depending on how the list results are used downstream. The PR correctly pins and tracks this rather than leaving it undocumented, but it warrants explicit attention in follow-up.

[PERF] N+1 risk from _source_privacy_recursion_passes
_source_privacy_recursion_passes dereferences instance.created_by_analysis / instance.created_by_extract when the IDs are non-null. For single-object calls this is fine. The docstring correctly warns that bulk callers looping user_can per row must select_related("created_by_analysis", "created_by_extract"). This is a behavior change for existing callers of RelationshipManager.user_can — callers that were previously unaffected (privacy recursion was not implemented) now incur up to 2 extra queries per row without the prefetch. Worth verifying that agent tools and GraphQL resolvers that call user_can in loops have the necessary prefetches added.

[DESIGN] visible_extracts_for is intentionally not corpus-gated
The docstring in source_visibility.py explains this deliberately: the gate mirrors the manager-level Extract.objects.user_can (no corpus AND), while the row's own corpus is still gated by the enclosing annotation/relationship query. This is correct but subtle. Future readers should note the explicit comment distinguishing this from ExtractService.get_visible_extracts (the hybrid corpus-AND surface).

[DESIGN] get_corpus_relationships corpus-as-gate caller warning
The inline comment correctly flags that a future user-facing GraphQL caller of get_corpus_relationships would need to switch to get_corpus_documents_visible_to_user. Consider whether this future-caller guard should also be reinforced at the MCP tool boundary to prevent accidental reuse in the wrong context.


Test Coverage

The test coverage is exceptionally thorough:

  • Matrix-based parity tests (_UserCanInvariantsMixin) pin visible_to_user ⟺ user_can(READ) for every new fixture permutation
  • Positive AND negative cases for every grant shape (user-level, group-level, creator, public)
  • Cross-bleeding tests (analysis grant does not unlock extract-rooted rows and vice versa)
  • Orphaned-source race window tests (fail-closed when FK is non-null but dereference yields None)
  • Aggregate surface regression tests (test_privacy_gate_regressions.py)
  • The sentinel-test pattern for known divergences (test_annotation_creator_parity_gap_sentinel) is a principled approach to tracking intentional deviations

Documentation

The guide overhaul is comprehensive and accurate. Notable additions:

  • Explicit documentation of the get_corpus_documents vs get_corpus_documents_visible_to_user dual-semantic (issue Reconcile CorpusObjsService corpus-as-gate vs GraphQL MIN-permission semantic #1682) is long overdue
  • Correcting the moderation section (EDIT does not grant moderation; CorpusModerator with non-empty permissions does) removes a genuine misunderstanding
  • The BaseService.filter_visible_qs addition and opencontracts.E001 scan-scope nuance are both accurate

Summary

Approve with follow-up items noted.

This PR closes real security vulnerabilities (anonymous annotation leak, group-grant parity, relationship privacy bypass, anonymous extract exposure, aggregate leaks) with comprehensive tests and thorough documentation. The code is clean, DRY, and follows project conventions.

Worth verifying before merge:

  • Callers of RelationshipManager.user_can in loops (agent tools, GraphQL resolvers) should have select_related("created_by_analysis", "created_by_extract") added — these callers were previously unaffected by the privacy recursion and now incur extra FK lookups without the prefetch.

Track as follow-up:

  • Issue Permissioning audit follow-ups (from PR #1985) #1986 item 1: annotation creator parity gap (pinned by sentinel test, needs a semantic decision)
  • Mention-autocomplete resolvers in search_queries.py still inline guardian calls (correctly documented as a known E001 scan-scope exception)

…te context

Review follow-through: the #1682 caller-intent warning now exists on
BOTH sides of the boundary — the service docstring names this caller,
and the call site itself warns against copying the pattern into a
user-facing GraphQL resolver (which requires the MIN-semantics variant).

Also verified this round's pre-merge ask: no loop callers of
RelationshipManager.user_can exist anywhere (config/graphql, llms/tools,
mcp) — the AnnotatePermissionsForReadMixin fallback never calls user_can
(it reads prefetched guardian metadata), GraphQL relationship lists flow
through the annotated services, and every mutation gate is
single-object — so the privacy recursion's per-call FK dereference has
no N+1 surface today.

JSv4 commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator Author

Round addressed in 902a3fb (second consecutive approval — thank you):

  • Pre-merge verification (loop callers of RelationshipManager.user_can) — verified clean: there are none. AnnotatePermissionsForReadMixin's fallback never calls user_can (it resolves per-row myPermissions from prefetched guardian metadata — zero user_can references in mixins.py); every GraphQL relationship list flows through the annotated services (_can_* pre-computed, no per-row checks); and all ten relationship permission gates in the mutation files are single-object. The privacy recursion's per-call FK dereference therefore has no N+1 surface today, and the MUST-prefetch note guards future loop callers.
  • MCP boundary reinforcement — adopted: the get_corpus_relationships call in opencontractserver/mcp/tools.py now carries its own Reconcile CorpusObjsService corpus-as-gate vs GraphQL MIN-permission semantic #1682 context note (sanctioned corpus-as-gate caller; do not copy into GraphQL), so the warning exists on both sides of the boundary.
  • Follow-up items — both already tracked: the annotation-creator gap is Permissioning audit follow-ups (from PR #1985) #1986 item 1 (sentinel-pinned at CI time), and the mention-resolver E001 exception is Permissioning audit follow-ups (from PR #1985) #1986 item 10.

Generated by Claude Code

@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown

test

@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown

Code Review - PR 1985: Permissioning audit gap closure

Overview

This PR closes four security gaps surfaced during a permissioning audit: relationship privacy recursion, anonymous extract lockdown, group-granted source permissions, and myPermissions accuracy for structural rows. The approach - consolidated utility builders, shared manager function, per-surface guards, and a saturating test matrix - is the right architecture for this class of security invariant.


Security - Confirmed fixes

1. Relationship privacy recursion - The shared _source_privacy_recursion_passes function correctly places the privacy check after the creator short-circuit in RelationshipManager.user_can. The structural-READ short-circuit and its interaction with the list-side Q(structural=True) disjunct are explicitly documented and well-reasoned.

2. Anonymous extract lockdown - ExtractManager denies anonymous on both surfaces with fail-closed default (getattr(user, 'is_anonymous', True)). Consistent with the rest of the codebase.

3. Group grants in privacy gates - The read_analysis / read_extract codename-exact match closes the old any-codename parity drift (an update_analysis-only grantee was clearing list gates while user_can(READ) denied them). Pinned by test_update_only_source_grant_does_not_unlock_lists.

4. myPermissions structural mask - Both services now mask structural _can_update/_can_delete to user.is_superuser via Case/When. Correctly mirrors what user_can will enforce in mutations.


Issues to Address

[Bug - dead code] get_visible_extracts corpus-id branch (extract_service.py ~line 182):

The outer anonymous guard already sets qs = Extract.objects.none() before the corpus block runs. The inner anonymous check inside the corpus_id block is therefore unreachable for anonymous callers and can never return a non-empty queryset. The dead branch will mislead future maintainers. Suggest removing the inner anonymous guard and leaving a comment that anonymous is handled above.

[Minor - double resolution] ExtractManager.user_can (extracts/models.py):

resolve_user_for_user_can(user) is called and the resolved value is passed to super().user_can(resolved, ...). However, BaseVisibilityManager.user_can will call resolve_user_for_user_can again on the already-resolved value. Idempotent on a User instance, but asymmetric with visible_to_user (which calls super() without pre-resolution). A brief inline comment noting the double-resolution is intentionally safe would be sufficient.

[Maintenance concern] apply_source_privacy_gate vs AnnotationQuerySet.visible_to_user - two shapes must stay in sync:

The note at the top of apply_source_privacy_gate documents this. The test_gate_matches_queryset_visibility_for_non_creator test helps but is scoped to non-creators by design. A future contributor changing apply_source_privacy_gate without updating the positive-Q shape inside AnnotationQuerySet.visible_to_user could silently break parity. Stronger fix: have AnnotationQuerySet.visible_to_user call visible_analyses_for / visible_extracts_for (it already imports from source_visibility) rather than building its own Q objects - eliminating the divergent shape entirely.

[Follow-up, not a blocker] Annotation-side creator asymmetry (issue 1986 item 1): The sentinel test in test_authorization_invariants_coverage.py correctly documents the known divergence. Noted here for visibility only.


Test Coverage

Thorough overall. Specific observations:

  • test_privacy_recursion_requires_source_permission_for_writes correctly verifies READ-only source grant + full doc+corpus CRUD is still denied for DELETE on privacy-rooted rows.
  • ExtractAuthorizationInvariantsTestCase uses _UserCanInvariantsMixin to pin the full filter/check parity matrix; anonymous denial is tested explicitly on is_public=True instances.
  • test_annotation_creator_parity_gap_sentinel - both assertions are correctly positioned; the comment accurately describes what must change together when 1986 is resolved.
  • test_structural_rows_bypass_privacy_in_listing correctly passes structural=True so the privacy-rooted non-structural row from setUp does not mask a regression.

One gap: LabelDistributionPrivacyRegressionTestCase covers anonymous and ungranted-authenticated callers but not a group-granted source. The gate goes through apply_source_privacy_gate (tested elsewhere) so risk is low - but a group-grant case would complete the coverage for the aggregate surface.


Documentation

The permissioning guide update is accurate and substantially improves navigability. The two-semantics section for CorpusDocumentService (get_corpus_documents vs get_corpus_documents_visible_to_user) is particularly clear. Small suggestion: the SOURCE PRIVACY banner callout deleted-source note could link to _source_privacy_recursion_passes, which has the fuller explanation including the race-window detail.


Summary

The security fixes are correct. The test coverage is strong. There is one dead-code issue to clean up before merge, and a maintenance-risk concern about the two-shape sync between apply_source_privacy_gate and AnnotationQuerySet.visible_to_user. The rest are low-priority suggestions. This is a well-executed security PR.

…ate grant matrix

Review round follow-ups:

- The anonymous guard in get_visible_extracts now EARLY-RETURNS
  Extract.objects.none() instead of falling through with an empty
  queryset, which made the corpus-scope block's inner anonymous check
  unreachable-in-effect (it gated a queryset that was already empty).
  The inner branch is removed; only authenticated users reach the
  corpus permission check, and the comment says so.
- LabelDistributionPrivacyRegressionTestCase gains the group-granted
  case, completing the grant matrix (anonymous / ungranted / owner /
  group-granted) for the aggregate surface end-to-end.
- The guide's deleted-source note now cross-links
  _source_privacy_recursion_passes for the full mechanics including
  the race-window guard.

Premise correction recorded for the same round: the suggested 'stronger
fix' (have AnnotationQuerySet.visible_to_user consume the shared
builders) has been in place since the first group-grant commit — the
queryset already calls visible_analyses_for/visible_extracts_for; only
the gate COMPOSITION differs (positive-Q with creator disjunct vs
exclude pair), which is necessary and pinned by
test_gate_matches_queryset_visibility_for_non_creator.

JSv4 commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator Author

Round addressed in 37d8f3c:

  • Dead-code cleanup — adopted, with a sharper fix than suggested: the anonymous guard in get_visible_extracts now early-returns Extract.objects.none() (the fall-through qs = none() shape was what made the corpus-block's inner anonymous check redundant-in-effect — it gated an already-empty queryset). The inner branch is removed and the corpus permission check is documented as authenticated-only.
  • Aggregate grant matrix — adopted: LabelDistributionPrivacyRegressionTestCase gains the group-granted case, completing anonymous / ungranted / owner / group-granted coverage for the surface.
  • Guide banner cross-link — adopted: the deleted-source note now points to _source_privacy_recursion_passes for the full mechanics including the race-window guard.
  • "Stronger fix" for the two-shape sync — premise correction: AnnotationQuerySet.visible_to_user has consumed the shared visible_analyses_for / visible_extracts_for builders since the first group-grant commit (see shared/QuerySets.py — the queryset imports from source_visibility). What differs is only the gate composition (positive-Q with the creator disjunct vs the exclude pair), which cannot be unified: the queryset shape needs the creator disjunct (Permissioning audit follow-ups (from PR #1985) #1986 item 1) and single-WHERE composition with the doc/corpus EXISTS predicates. That residual difference is exactly what test_gate_matches_queryset_visibility_for_non_creator pins.
  • Double-resolution comment — already present at the call site since two rounds ago ("idempotent no-op for the User instance").

Generated by Claude Code

@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown

Code Review — Permissioning Audit Gap Closure

This is a substantial, security-focused PR that closes real authorization gaps. The approach is sound and the test coverage is excellent. Below are findings organized by severity.


Overview

Four concrete behavior fixes:

  1. Relationship privacy recursionRelationshipManager.user_can / visible_to_user now enforce created_by_* source permissions (closes Phase A — Per-model user_can() overrides + deprecate user_has_permission_for_obj (follow-up to #1637) #1655 Phase-C deferral)
  2. Extract anonymous lockdown — new ExtractManager + service-level guard
  3. Group-granted source permissionssource_visibility.py shared builders join guardian group tables, fixing filter/check parity drift
  4. Aggregate privacy gatesget_label_distribution_for_corpus, get_relationship_summary, get_corpus_relationships now apply source privacy filtering

Bugs / Correctness Issues

1. apply_source_privacy_gate missing the creator disjunct — inconsistency with manager-level queryset

The PR documentation in source_visibility.py calls this out explicitly:

NOTE: AnnotationQuerySet.visible_to_user composes the equivalent gate as a positive Q filter (structural | creator | no-source | source-visible) rather than this exclude pair, because it needs the creator disjunct … keep the two shapes in sync when changing either.

But the current apply_source_privacy_gate is NOT in sync: it has no creator disjunct. In the service path (e.g. get_document_annotations / get_document_relationships), the service calls Relationship.objects.filter(doc_filters) directly — not .visible_to_user(user) — so the manager's creator Q never fires. The result: if a user creates an annotation or relationship with created_by_analysis=X and later loses access to X, they will be invisible-to-themselves in service-level list queries while still being visible via the manager-level queryset. The inconsistency is more pronounced for relationships, where the PR text says "the relationship's own creator passes via the creator short-circuit" in the queryset gate — but apply_source_privacy_gate has no such exemption for either model.

This is flagged in the PR body as a "follow-up observation (out of scope)", but given that the relationship creator exemption is explicitly documented as part of the new design, it would be cleaner to either:

  • Add Q(creator=user) to apply_source_privacy_gate (if the anonymous branch handles creator=AnonymousUser() safely, which it does — AnonymousUser has no PK match), or
  • Document exactly which surfaces use the manager path (creator-exempted) vs the service path (not exempted)

2. get_label_distribution_for_corpus — breaking API change, verify all callers

user is now a required keyword-only argument. The diff shows one updated caller in corpus_queries.py. If any other caller exists (e.g. management commands, test helpers, Celery tasks), they will fail at runtime with TypeError. A quick grep is warranted before merge:

grep -rn "get_label_distribution_for_corpus" opencontractserver/ config/

Similarly, get_relationship_summary now presumably requires a user argument — all callers should be verified.


Security Observations

3. check_extract_permission anonymous check vs. Extract.objects.get() ordering

The new anonymous guard is placed correctly before the DB lookup, which is the right IDOR posture. Good.

4. resolve_user_for_user_can double-resolution in ExtractManager.user_can

The manager resolves the user, checks anonymous, then calls super().user_can(resolved, ...). The base class's user_can also calls resolve_user_for_user_can(). The comment acknowledges this is an idempotent no-op when passed a User instance — that holds, but it's a subtle coupling. If the resolution logic ever changes (e.g. for the impersonation mechanism mentioned in the guide), both layers would need updating. A test covering ExtractManager.user_can(None, ...) and user_can(AnonymousUser(), ...) (both in test_extract_anonymous_lockdown.py) would pin this.

5. Deleted-source race window (documented, correct behavior)

The _source_privacy_recursion_passes docstring correctly documents that the fail-closed posture during the SET_NULL race window is intentional. No action needed, but noting it's visible.


Performance Considerations

6. Nested IN subqueries in apply_source_privacy_gate

The gate adds two .exclude() calls, each containing ~Q(created_by_analysis__in=visible_analyses_for(user)). Each visible_analyses_for / visible_extracts_for call returns a queryset with 4 OR-joined conditions including 2 uncorrelated id__in subqueries (user-grant and group-grant). When embedded as a NOT-IN, this becomes deeply nested correlated subqueries.

On large annotation tables, this could be slow. Consider profiling on a realistic dataset. If it becomes a problem, the typical remedy is to use Exists() instead of __in=subquery for the outer exclude, or to compute the IDs in a single raw SQL pass.

7. N+1 in _source_privacy_recursion_passes — unenforced select_related requirement

The docstring correctly warns callers to select_related("created_by_analysis", "created_by_extract") on root querysets used in loops. But there's no enforcement or queryset-level annotation to catch callers that forget. This is a documentation-only constraint. Consider adding an assertion or at minimum a grep check in CI to find callers of user_can in for-loops without select_related.


Code Quality

8. _source_privacy_recursion_passes placement

A module-level private function placed between two class definitions (after AnnotationManager-related helpers, before RelationshipManager) is slightly unconventional. A top-of-file comment marking this as a shared implementation detail, or moving it closer to the two managers that use it, would improve navigation.

9. visible_extracts_for includes is_public=True for authenticated users

This is correct per the documented parity with user_can (an authenticated user CAN read a public extract through the manager), and the PR body explains the reasoning. However, the PR also says "no user-facing flow currently sets Extract.is_public". If that remains true, this branch is currently dead code. Consider adding a note or a test asserting the behavior for the day it matters.


Documentation / Guide

The guide updates are thorough and accurate. Notable improvements:

One minor nit: the inline code snippet in the guide for RemoveAnnotation documents BaseService.require_permission's truthiness inversion with a TRUTHINESS INVERSION comment. This is useful but the function returns "" (falsy) when granted, which is unusual Python. The comment is helpful; consider also documenting this in the require_permission docstring itself so callers consulting source get the same warning.


Test Coverage

The new test files are comprehensive:

  • test_extract_anonymous_lockdown.py — covers manager and service surfaces, both visible_to_user and user_can, plus the "worst case" (public extract in public corpus)
  • test_relationship_privacy_scoping.py — covers the new recursion
  • test_source_visibility_group_grants.py — covers the group-grant parity fix with the update_analysis-only grant edge case
  • test_structural_mypermissions_breakglass.py — covers the superuser/normal-user mask for pre-computed permissions
  • test_privacy_gate_regressions.py — covers the corpus-wide annotation listing anonymous leak

The replaced test_no_privacy_widening_via_created_by_analysis sentinel is appropriately replaced with real invariants — the old sentinel only pinned the deferral, not the fix.

One gap: no test directly asserts that the creator of a private-source annotation/relationship sees their own row in service-level list queries (get_document_annotations / get_document_relationships). This would catch the inconsistency in point 1 above.


Summary

This is well-designed, well-documented work on a genuinely complex security surface. The critical items to address before merge:

  1. Confirm all callers of get_label_distribution_for_corpus and get_relationship_summary pass user (grep check)
  2. Document or resolve the apply_source_privacy_gate creator-disjunct gap — either add Q(creator=user) or pin the divergence with a test
  3. CI green — PR body notes the test run was in-flight; CI must pass before merge

The group-grant parity fix, the extract anonymous lockdown, and the relationship privacy recursion are all correct and important. The source_visibility.py consolidation is a genuine DRY improvement over the four divergent inline copies it replaces.

…y fix)

Review round 17 surfaced a real inconsistency this PR's own manager
changes made visible: after relationships gained the creator
short-circuit on BOTH manager surfaces, the service listings (which
flow through apply_source_privacy_gate) became the odd surface out — a
relationship's creator who lost source access kept READ via
user_can AND visible_to_user yet vanished from the document view. The
gate now carries the same authenticated creator exemption as the
queryset gate's Q(creator=user) disjunct; anonymous callers get an
always-false predicate instead (Q(creator=<anonymous>) is not a valid
lookup, and anonymous users can never be a row's creator).

Surface alignment after this change:
- Relationship: all three surfaces (user_can, manager visible_to_user,
  service listings/aggregates) honour the row's creator.
- Annotation: both LIST surfaces honour the creator; user_can remains
  the single tracked divergence (issue #1986 item 1, sentinel-pinned).

Tests: creator-retention pins for both models
(test_creator_retains_own_privacy_rooted_row_in_listing,
test_row_creator_exempt_from_gate_in_document_listing); the scoped
cross-shape parity test's rationale updated (its non-creator scope is
now a focus choice, not a necessity). Changelog updated.

JSv4 commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator Author

Round 17 addressed in f8a2b6e — its item 1 was the most substantive finding in many rounds, and it was right:

  • Creator-disjunct gap in apply_source_privacy_gate — confirmed real and fixed. The deeper framing: this PR's own relationship manager changes created a 2-vs-1 surface disagreement (both manager surfaces honoured a creator who lost source access; the service listings hid the row). The gate now carries the same authenticated-only creator exemption as the queryset disjunct — anonymous callers get an always-false predicate, since Q(creator=AnonymousUser()) is not a valid lookup and anonymous users can never be a row's creator (the safety caveat in the review was on the right track but the mechanism differs: Django rejects the unsaved-instance lookup rather than null-matching it). After the fix: relationships agree across all three surfaces; annotations agree across both list surfaces, with user_can remaining the single tracked divergence (Permissioning audit follow-ups (from PR #1985) #1986 item 1, sentinel-pinned). The missing test the review named exists now for both models.
  • Caller grepsget_label_distribution_for_corpus: two call sites, both pass user (fourth grep, same answer). get_relationship_summary: its user parameter predates this PR (positional, unchanged signature) — sole caller is document_types.py:1201, which has always passed it; no API break occurred there.
  • Double-resolution pin — the requested tests already exist: test_manager_user_can_denies_anonymous covers both user_can(None, …) and user_can(AnonymousUser(), …), and the parity test covers the surfaces agreeing.
  • require_permission docstring — the inversion warning already lives there: the guide's TRUTHINESS INVERSION comment was copied from BaseService.require_permission's own "Truthiness inversion" docstring section, which includes the same do-not-transcribe warning.
  • Remaining nits (gate-shape placement, NOT-IN profiling, N+1 CI grep, is_public dead-branch note) — placement is already adjacent to both consumers; the is_public branch is not untested (pinned by test_public_extract_source_passes_both_surfaces); profiling/CI-grep enforcement are Permissioning audit follow-ups (from PR #1985) #1986 territory (items 11/12 adjacent).

Full tests/permissioning/ suite running against the change locally; CI runs the same head.


Generated by Claude Code

@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown

Code Review: Permissioning Audit Gap Closure

This is a well-executed, high-quality security-hardening PR. The changes are correct, DRY, well-documented, and thoroughly tested. Here are my observations.


What this does well

Architecture / DRY

  • opencontractserver/utils/source_visibility.py is a clean extraction of the privacy-gate subquery builders. The four previously-divergent inline copies are all gone; a single apply_source_privacy_gate(qs, user) is the one place to change if the gate shape ever needs updating. The module-level _source_privacy_recursion_passes in Managers.py does the same for the single-object check surface.
  • The get_label_distribution_for_corpus user as keyword-only required parameter is exactly the right defensive pattern — a future caller that forgets it will get a TypeError at the call site, not a silent under-filter.

Correctness of the anonymous always-false predicate
The Q(pk__in=[]) used as the anonymous creator_exempt sentinel is correct: Django generates a SQL WHERE FALSE for an empty Python list, so ~Q(pk__in=[]) is always true (no row exempted), which is the intended semantics for anonymous users. Clear and correct.

Fail-closed race window
The deleted-source handling in _source_privacy_recursion_passes (lines 708-710 in Managers.py) is correctly fail-closed: if the FK was set but the source is None (SET_NULL landed between reading analysis_id and dereferencing the relation), the function returns False rather than granting access. This is the right posture.

Guard-shape tightening
Changing the inline gates from "any-permission row" to read_analysis / read_extract codename specifically, and adding group-table joins, are both genuine correctness fixes — an update_analysis-only grantee would previously pass list gates while failing user_can(READ). The test_update_only_source_grant_does_not_unlock_lists invariant pins this.


Issues and observations

1. Known annotation-creator parity gap (issue #1986, acknowledged)

The docstring on apply_source_privacy_gate notes that user_can for annotations does NOT have a creator short-circuit before privacy recursion, while the queryset disjunct and the service gate both have Q(creator=user). The sentinel test test_annotation_creator_parity_gap_sentinel pins this. This is fine as a tracked follow-up but reviewers should be aware it means the annotation user_can surface is strictly more conservative than the list surface for creators who lost source access. Worth confirming the sentinel test is present in the final commit.

2. _source_privacy_recursion_passes: prefer is not None over truthiness check

analysis_id = getattr(instance, "created_by_analysis_id", None)
if analysis_id:          # <-- falsy for 0 or None

Django PKs are always positive integers so if analysis_id: is safe in practice, but if analysis_id is not None: is the standard idiom for FK id fields and avoids a subtle footgun if a migration ever changes the PK type. Same for extract_id. Low priority, just a style nit.

3. Nested subquery depth in visible_analyses_for / visible_extracts_for

Each call to apply_source_privacy_gate generates a total of ~5 correlated SQL subqueries per row filter (user grants, group grants for analyses; user grants, group grants, + groups membership for extracts). For the queryset path this is fine since PostgreSQL's planner inlines them. For bulk callers that invoke user_can per row in Python, the docstring correctly mandates select_related("created_by_analysis", "created_by_extract"), but there is no enforcement mechanism. Consider a class-level assertion or prefetch_related_objects guard for the highest-traffic callers (e.g., the GraphQL documentRelationships resolver) if profiling shows this to be a bottleneck.

4. Behavior change for anonymous users on extracts is a breaking change for running apps

test_query_optimizer_methods.py::test_get_visible_extracts_anonymous_only_public and test_stats.py::test_public_corpus_stats_query were updated from the old semantic (anonymous sees public extracts) to "anonymous sees nothing". Any application that was relying on the old undocumented behavior (e.g. an unauthenticated audit trail reader) will silently see empty extract lists after this ships. The CHANGELOG security fragment calls this out but it may be worth adding a migration note or a prominent note in release communication.

5. get_corpus_relationships corpus-as-gate semantics should be re-verified

The comment in relationship_service.py says:

A future user-facing GraphQL caller MUST switch this to get_corpus_documents_visible_to_user

This is the corpus-as-gate vs. MIN semantics from issue #1682 documented in CLAUDE.md. The comment is good, but I'd suggest promoting it to a # TODO(#1682): ... tag so it surfaces in a grep TODO sweep and doesn't get lost.

6. get_relationship_summary privacy gate placement

Looking at the service:

qs = apply_source_privacy_gate(
    apply_source_privacy_gate(...), user
)

If the gate is applied twice in some code path (one in a helper, once in the caller), there's no functional harm but extra query complexity. Worth a quick scan to confirm this isn't accidentally double-applied via get_document_relationships + the summary's own application.

7. ExtractManager.user_canresolved variable handling

if resolved is None or getattr(resolved, "is_anonymous", True):
    return False

The resolved variable comes from the permission resolution path. A None result here falls through to return False, which is correct. But the docstring says both visible_to_user and user_can deny anonymous. It would be worth a short comment explaining that resolved is None covers the None user input rather than just the anonymous path, to clarify for future readers why both conditions are needed.


Test coverage assessment

The test coverage is strong:

  • test_relationship_privacy_scoping.py — relationship-specific scoping
  • test_extract_anonymous_lockdown.py — anonymous extract, plus ExtractListingGrantParityTestCase
  • test_source_visibility_group_grants.py — group-grant parity on list surfaces
  • test_structural_mypermissions_breakglass.pymyPermissions reporting
  • test_privacy_gate_regressions.py — explicit regression pins for the three aggregate surfaces
  • Extended test_authorization_invariants.py with RelationshipAuthorizationInvariantsTestCase and ExtractAuthorizationInvariantsTestCase

One gap: there is no test verifying that passing user=AnonymousUser() (as opposed to user=None) to get_label_distribution_for_corpus also returns zero private rows. The test_privacy_gate_regressions.py tests cover the authenticated stranger case and may already cover anonymous, but if they don't, it's worth adding.


Documentation

The guide overhaul is excellent. The dual-semantics table for get_corpus_documents vs get_corpus_documents_visible_to_user and the E001 scan-scope caveat are valuable additions. The pitfall list and stale comment cleanup reduce future confusion.


Summary

Approve with minor follow-ups. The four security issues being closed (relationship privacy, anonymous extract lockdown, group-grant list parity, anonymous annotation leak) are real and the fixes are correct. Test coverage is thorough. The main actionable items from this review are small: the is not None nit in _source_privacy_recursion_passes, promoting the get_corpus_relationships TODO, and verifying the annotation-creator sentinel test is present.

Style follow-up from review: the truthiness checks were moved verbatim
from the original annotation-side implementation; Django PKs are
positive so behavior is identical, but the explicit form is the
standard idiom for FK id fields.

JSv4 commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator Author

Round 18 (third consecutive approval) — one nit adopted in 68679d2, the rest closes with cites:

  • is not None idiom — adopted (the truthiness checks were moved verbatim from the original annotation implementation; behavior-identical for positive PKs, but the explicit form is right for FK id fields).
  • Sentinel present? — yes: test_annotation_creator_parity_gap_sentinel landed in 4198242 and is on the branch.
  • Anonymous label-distribution test — already exists by intent and name: test_private_labels_hidden_from_anonymous in test_privacy_gate_regressions.py passes AnonymousUser() (not None) and asserts the private label is absent — present since the file was created.
  • Double-application scan — confirmed clean: six call sites, each applies apply_source_privacy_gate exactly once (3 in annotation_service.py, 3 in relationship_service.py); the summary builds its own queryset rather than layering on get_document_relationships.
  • resolved is None comment — the call-site comment already explains it ("the resolver maps None / unresolvable ids to None and passes AnonymousUser through, so both surfaces deny exactly the same caller set").
  • # TODO(#1682) promotion — declined: the corpus-as-gate scope is a permanent semantic boundary, not pending work; tagging it TODO would misrepresent it as unfinished, and the #1682 reference is already greppable on both sides of the boundary (service docstring + MCP call site).
  • Release communication for the anonymous-extract change — the security changelog fragment carries the user-visible consequence prominently; per the repo's release process those fragments ARE the release-notes feeder (collated into CHANGELOG.md at release), so the note will surface exactly where release readers look.
  • N+1 enforcement mechanism — declined as in prior rounds (no loop callers exist; documented MUST at the function and call site; Permissioning audit follow-ups (from PR #1985) #1986-adjacent if one ever appears).

Generated by Claude Code

@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown

Code Review: Permissioning Audit Gap Closure

This is a substantial, security-focused PR. The overall approach is sound — DRYing up duplicated privacy gate logic, enforcing a consistent anonymous-extract lockdown, and closing the relationship privacy deferral from #1655. Here is a structured review.


Overview

The PR makes four agreed behavior changes:

  1. Relationship privacy recursion now enforced (created_by_analysis / created_by_extract) — closes Phase A — Per-model user_can() overrides + deprecate user_has_permission_for_obj (follow-up to #1637) #1655 Phase-C
  2. Extracts are never anonymous-visible at any layer (ExtractManager + ExtractService + annotation/relationship gates)
  3. Group-granted source permissions now unlock private rows in list queries (previously only user object-permission tables were consulted)
  4. myPermissions structural-write mask corrected for both annotation and relationship services

Strengths

  • Solid DRY consolidation. The new opencontractserver/utils/source_visibility.py replaces four independent inline copies of the privacy gate, all of which had the same group-grant omission. One canonical location is the right call per CLAUDE.md §6.
  • Filter/check parity maintained throughout. Every manager surface override is paired with a matching queryset override, and both are tested by the invariant suite — exactly the right discipline for this class of bug.
  • Changelog fragments correctly used (changelog.d/perm-audit-2026-06.{fixed,security}.md) rather than editing CHANGELOG.md directly.
  • Known gaps are explicitly documented. issue #1986 item 1 (annotation-side creator parity gap in user_can) and the mention-autocomplete inline-guardian carveout are named in both the guide and code comments, preventing future confusion.
  • Breaking behavior change is tested and explained. test_stats.py correctly updates the anonymous extract count assertion with a comment naming the decision.

Issues and Suggestions

1. Potential double-resolution in ExtractManager.user_can — verify idempotence

ExtractManager.user_can calls resolve_user_for_user_can itself, then passes resolved to super().user_can(). The comment says the base class resolution call is "an idempotent no-op for the User instance." This should be verified in BaseVisibilityManager.user_can / UserCanMixin: if the base class calls resolve_user_for_user_can unconditionally rather than checking isinstance(user, AbstractBaseUser), the annotation on resolved may be stale (e.g. if the resolver stamps something on the instance). This is unlikely to be a bug today, but the assumption should be pinned by a quick comment citing the relevant line in UserCanMixin, or the double-call eliminated by forwarding early only when anonymous.

2. apply_source_privacy_gate calls visible_analyses_for / visible_extracts_for once per exclude() — document or memoize

Each exclude() call in apply_source_privacy_gate creates a fresh queryset from visible_analyses_for(user) / visible_extracts_for(user). These are lazy, so the SQL is fine when the result is used in a single queryset evaluation. However, if apply_source_privacy_gate is ever called with a queryset that is .union()-ed or re-evaluated, the subqueries will be re-computed. A brief note in the function's docstring clarifying the "call-once-per-gate" assumption would prevent future surprises.

3. Q(pk__in=[]) as always-false predicate — acceptable, but brittle

creator_exempt = Q(pk__in=[])

This is a common Django pattern for an always-false Q object and PostgreSQL handles IN () correctly, but it is not immediately obvious to a reader. Wrapping it in a named local or adding an inline comment clarifying the intent ("always-false — never matches any row") would help reviewers.

4. _source_privacy_recursion_passes carries an N+1 risk for looped user_can calls

The docstring correctly warns: "Callers that invoke user_can in a loop MUST select_related('created_by_analysis', 'created_by_extract')." This is good documentation, but it is easy to miss. Consider adding the same note to AnnotationManager.user_can and RelationshipManager.user_can at the call site (not just in the helper's docstring), so future callers see it where they will actually read the code.

5. Comment density exceeds CLAUDE.md guideline in places

CLAUDE.md says "Never write multi-paragraph docstrings or multi-line comment blocks — one short line max." Many of the new functions (apply_source_privacy_gate, _source_privacy_recursion_passes, visible_extracts_for) have docstrings running 10–20 lines. Given the security-critical nature and the audit-trail value, the extra documentation is defensible, but it does diverge from the project style guide. The team may want to decide explicitly whether permissioning code is an exception to that rule.

6. is_public on visible_extracts_for (authenticated branch) — document user-facing impact path

The PR correctly includes is_public=True in visible_extracts_for to restore filter/check parity, and the docstring notes "No user-facing flow currently sets Extract.is_public". However, the ExtractManager.visible_to_user authenticated path delegates to BaseVisibilityManager, which also checks is_public. So if a future migration sets is_public on an extract, the manager surface and the source-visibility gate are consistent — that is correct. The only risk is if someone later reads the is_public omission in the old inline code as "intentional" and re-introduces it. The existing comment is clear enough; a cross-reference to ExtractManager would add belt-and-suspenders.

7. get_corpus_annotations anonymous privacy leak fix — confirm test covers pre-fix behavior

The PR description says "get_corpus_annotations skipped privacy exclusion entirely for anonymous users." The new test test_privacy_gate_regressions.py should reproduce the pre-fix behavior as a regression sentinel. Please confirm that at least one test in that file explicitly asserts that an anonymous viewer of a public corpus cannot see analysis-/extract-private annotations on public documents (not just that the authenticated path works correctly).

8. corpus_queries.py one-liner — confirm the pre-existing user parameter gap is intentional

# before
visible_doc_ids=visible_doc_ids,
top_n=...,
exclude_label_prefix=...,

# after
visible_doc_ids=visible_doc_ids,
top_n=...,
exclude_label_prefix=...,
user=info.context.user,        # new

This looks like a bug fix (the user context wasn't being passed to resolve_corpus_intelligence_aggregates, which means the label distribution was ignoring the caller's permission scope). The PR description mentions it under "Also in this PR" but doesn't call it out explicitly. It should be noted in the changelog fragments since it changes the observable output of corpusIntelligenceAggregates for non-owner authenticated users.


Test Coverage

The four new test files cover the main behavioral invariants:

  • test_extract_anonymous_lockdown.py — all three surfaces (service, manager filter, manager check), including parity
  • test_source_visibility_group_grants.py — group-granted analysis and extract grants unlocking annotation listings
  • test_relationship_privacy_scoping.py — relationship listing privacy at the service layer
  • test_structural_mypermissions_breakglass.py — superuser vs normal-user annotation/relationship write masks

The coverage is solid. Two things to confirm:

  • Does test_privacy_gate_regressions.py include a case that would have FAILED before the get_corpus_annotations anonymous-exclusion fix? (i.e., does it exercise the pre-fix code path, not just the corrected one?)
  • Are there tests for check_extract_permission called with user=None (not just AnonymousUser())? The service now guards getattr(user, 'is_anonymous', True) which covers None, but a test would pin it.

Minor Nits

  • _source_privacy_recursion_passes is placed between unique_blob_paths_for_many and AnnotationManager in Managers.py. A brief separator comment or blank line indicating "shared helper for AnnotationManager + RelationshipManager" would help orientation.
  • The PR description says "The targeted dockerized test run is in flight." CI should be the gate before merge — the review here is based on static analysis only.

Summary

This is a well-reasoned, thoroughly documented security PR. The DRY consolidation into source_visibility.py is the right architectural move, the parity invariants are pinned in the right places, and the breaking change (extract anonymity) is correctly documented and tested. The items above are mostly documentation/clarity improvements and two verification asks (idempotent double-resolution and anonymous regression sentinel). No blocking issues found in the logic.

One-test follow-up from review round 19: the service's fail-closed
getattr(user, 'is_anonymous', True) default for a literal None user is
now pinned directly (the manager surfaces' None cases were already
covered).

JSv4 commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator Author

Round 19 — one test adopted in f4374fd (check_extract_permission(None, …) now pinned at the service layer; the manager None cases were already covered). Everything else closes with cites:

1/4. Double-resolution & call-site N+1 notes — both comments already exist at the call sites (rounds 10–14); resolve_user_for_user_can is a pure resolver and stamps nothing.
2. Gate re-evaluation note — the gate is applied exactly once at each of its six call sites (verified by grep last round) and the builders embed as subqueries in a single statement; standard queryset laziness applies.
3. Q(pk__in=[]) comment — the requested comment exists verbatim at that line: "Always-false predicate: anonymous callers get no exemption…".
5. Comment density — the quoted CLAUDE.md rule ("never write multi-paragraph docstrings… one short line max") does not appear in CLAUDE.md; the repo's actual house style for security-critical permission code is dense audit-trail docstrings (Managers.py, QuerySets.py, BaseService are the precedents these new docstrings follow).
6. is_public cross-referencevisible_extracts_for's docstring already references ExtractManager twice, including the manager-parity rationale.
7. Anonymous regression sentineltest_anonymous_sees_only_non_private_rows is the first test in test_privacy_gate_regressions.py and asserts exactly the requested case; it fails on pre-fix code by construction (the old guard skipped the exclusion entirely for anonymous viewers).
8. Changelog coverage of the resolver user= change — it is the headline of the security fragment's aggregate-surfaces bullet (label-name/count leak in corpusIntelligenceAggregates + the required keyword-only user), not an unflagged drive-by.

With three consecutive approvals and the deltas reduced to single-test/single-idiom items, treating the review loop as converged; the gate to merge is the routine CI pass on the tip.


Generated by Claude Code

@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown

Code Review

Overview

This PR closes four substantive permissioning gaps found during a guide/code audit: relationship privacy recursion (Phase-C deferral from #1655), anonymous-visible extracts, group-granted source permissions not unlocking list queries, and myPermissions misreporting structural-write access. The approach is sound — consolidating divergent inline copies into shared source_visibility.py builders, then wiring every affected surface through them. Test coverage is comprehensive and the known asymmetries are explicitly flagged and pinned.


Security Analysis

Group-grant fix (item 3) is the most impactful security correction. The prior inline implementations used only user-level guardian tables, so a group-granted analysis/extract viewer passed user_can checks but never appeared in list results — a filter/check parity drift that source_visibility.py now closes. The fix is correct: visible_analyses_for / visible_extracts_for join both the user and group permission tables via subqueries, and apply_source_privacy_gate routes every affected queryset through them.

Codename exactness improvement. Changing codename__contains="read" to codename="read_analysis" / codename="read_corpus" / codename="read_extract" in get_visible_extracts and source_visibility.py is a security tightening — previously an update_analysis-only grantee cleared the list gate while failing user_can(READ). The new codename-exact matching closes this parity drift.

apply_source_privacy_gate anonymous case logic. The Q(pk__in=[]) always-false predicate for anonymous creators is correct Django — compiles to WHERE id IN (), always false. The double .exclude() with ~creator_exempt (which becomes always-true for anonymous) correctly excludes all non-structural extract-rooted rows.

get_corpus_annotations anonymous path fix. Previously the if not user.is_anonymous: guard skipped the privacy exclusion entirely for anonymous users, exposing analysis-/extract-private annotations on public corpora. The unconditional apply_source_privacy_gate(qs, user) call closes this correctly.


Specific Observations

1. AnalysisService.get_visible_analyses still lacks group grants (pre-existing, not introduced here)

source_visibility.py's visible_analyses_for() correctly adds group-level grants for the privacy-gate subquery path. However, AnalysisService.get_visible_analyses (line ~120 in analysis_service.py) — the function that drives the analysis list resolver — still only checks AnalysisUserObjectPermission without a AnalysisGroupObjectPermission leg. A group-granted analysis reader therefore passes the annotation/relationship privacy gates in this PR but still cannot see the analysis row itself in the analysis list view.

This is pre-existing and out of this PR's scope, but given the audit context it seems worth tracking as a follow-up (the same filter/check parity class as what was fixed here).

2. _source_privacy_recursion_passes N+1 is documented but callers may not notice

The docstring correctly warns that bulk user_can loops require select_related("created_by_analysis", "created_by_extract"). This is fine. One concrete follow-up to consider: AnnotationManager.user_can currently has the same footprint. The get_document_relationships service is safe because it drives the _can_* annotations via a DB-side Case/When (no per-row user_can loop). The risk is future code that loops user_can on rows fetched without select_related — the docstring warning is the right guard.

3. Annotation-side creator asymmetry (issue #1986 item 1) — sentinel is the right choice

The test_annotation_creator_parity_gap_sentinel approach is correct: pins the divergence explicitly so any future reconciliation must update both assertions consciously. No action needed here; just confirming this is the right call.

4. get_label_distribution_for_corpus signature change is safe

Verified: only two callers exist — corpus_queries.py (updated in this PR) and test_privacy_gate_regressions.py (new in this PR). The TypeError-at-callsite posture for user is the right tradeoff: silent omission would have been a silent under-count, harder to catch.

5. ExtractManager.user_can double-resolution of user is benign but worth documenting

The user_can override calls resolve_user_for_user_can(user), then forwards resolved to super().user_can(resolved, ...). The base class calls resolve_user_for_user_can again internally. This is idempotent (a User instance resolves to itself) as the comment states, but a reader unfamiliar with the mixin might wonder if there's a double-dispatch bug. The comment is good; no change needed.

6. visible_extracts_for includes is_public=True for authenticated users — confirm intent

The docstring notes this was a latent parity drift (original inline omitted is_public for extracts while including it for analyses). The fix aligns with Extract.objects.user_can which grants READ on is_public rows. This is consistent, but worth a quick confirmation: public extracts are now privacy-gate-unlocked for any authenticated user (not just the creator or explicit grantee). That seems correct per the guide, but is a behavior expansion worth calling out explicitly in the release notes.

7. test_stats.py behavior change

The modification is correctly handled — anonymous clients now expect totalExtracts: 0 with a clear comment. The test logic is sound.


Minor Nits

  • apply_source_privacy_gate: the Q(pk__in=[]) always-false predicate is correct but a _NEVER_MATCH = Q(pk__in=[]) module-level constant would make intent more obvious to future readers. (Very minor.)
  • In RelationshipManager.visible_to_user, the privacy gate builds privacy_gate = ... | Q(creator=user) but then immediately re-assigns with privacy_gate = privacy_gate | Q(creator=user) — the first assignment already includes the disjunct. Verify this is not an accidental double-OR. Looking at the code: the initial privacy_gate does NOT include Q(creator=user) (it only has structural/no-source/source-visible). The | Q(creator=user) is added only in the else: branch for authenticated users. Correct.

Summary

This is a well-scoped security PR with clean implementation, good test coverage, and honest documentation of known asymmetries. The core changes (shared source_visibility.py builders, relationship privacy recursion, anonymous extract lockdown, structural myPermissions mask) are all correct. The one pre-existing gap worth a follow-up issue is AnalysisService.get_visible_analyses missing group grants on the analysis listing surface.

Review round 20 caught the asymmetry left by round 6: the extract
listing received group-grant legs and exact read_* codenames, but its
sibling get_visible_analyses kept the identical pre-fix drift —
user-table-only guardian Exists (any codename) on the analysis leg and
a codename__contains='read' substring match on the corpus leg, while
check_analysis_permission resolves both through user_can (group grants
+ exact codename). Both legs now join AnalysisGroupObjectPermission /
CorpusGroupObjectPermission and match read_analysis / read_corpus
exactly. A group-granted analysis reader previously passed this PR's
privacy gates yet could not see the analysis row itself in the list
view.

Anonymous semantics unchanged (public analyses in public corpora remain
visible — the documented contract, unlike extracts).

Tests: test_analysis_listing_grant_parity.py (group grants on both legs
unlock the listing + single-object surface agreement; update-only
analysis grant does not unlock). Changelog bullet extended to cover
both siblings.

JSv4 commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator Author

Round 20 addressed in 1c88b6b — its item 1 was a genuine catch, and better than "track as follow-up": fixed in-PR. AnalysisService.get_visible_analyses carried the identical drift its sibling get_visible_extracts had before round 6 (user-table-only guardian Exists with any codename on the analysis leg; codename__contains="read" substring on the corpus leg) — leaving it unfixed after this PR's gate changes would have produced exactly the wedge the review describes: a group-granted reader passing the privacy gates while unable to see the analysis row in the list view. Both legs now join the group tables and match read_analysis / read_corpus exactly, with test_analysis_listing_grant_parity.py pinning both directions (group grants unlock + single-object agreement; update-only grant does not unlock). Anonymous semantics deliberately unchanged (public analyses stay visible — the documented contract, unlike extracts). Changelog bullet extended to cover both siblings.

Remaining items were confirmations (2, 3, 4, 5, 7 — agreed on all counts) or already covered: item 6's release-notes ask is carried by the security fragment's is_public sentence (fragments are the release-notes feeder), and the _NEVER_MATCH constant nit is served by the existing inline always-false comment. Verification of the change running locally against the new test plus the legacy analysis-visibility suites; CI runs the same head.


Generated by Claude Code

@claude

claude Bot commented Jun 12, 2026

Copy link
Copy Markdown

Code Review: Permissioning Audit Gap Closure

Overview

This is a well-executed, security-critical PR that closes several real vulnerabilities uncovered during a systematic audit of the consolidated permissioning guide. The changes are substantial (27 files, +2741/-564) but tightly scoped: every behavioral change is tied to a documented decision, and the guide, code, and tests are all updated together.


Security Findings Addressed

All four are genuine vulnerabilities, well-scoped and correctly fixed:

  1. Anonymous annotation leak (Critical): get_corpus_annotations previously skipped the created_by_* privacy gate entirely for anonymous users (old if not user.is_anonymous guard), meaning analysis-/extract-private annotations on public corpora were visible to unauthenticated callers. Fixed unconditionally in apply_source_privacy_gate.

  2. Anonymous extract exposure (Critical): Anonymous users could access is_public=True extracts in public corpora through the service layer, contradicting the documented "Extract — never" anonymous contract. The new ExtractManager denies both surfaces at the manager layer, and the service mirrors it.

  3. Group-grant filter/check parity (Medium): All four inline privacy-gate implementations queried only *UserObjectPermission tables, so group-granted source access passed user_can but never appeared in list queries. The shared source_visibility.py builders fix this across all call sites at once.

  4. Structural myPermissions over-reporting (Medium): RelationshipService had no structural mask on pre-computed permissions, over-reporting write access for normal users on structural rows. Fixed to mirror the annotation-side user.is_superuser mask.


Code Quality: Strengths

  • DRY consolidation is correct: Four divergent inline copies of the privacy gate collapsed into one apply_source_privacy_gate in source_visibility.py. The code in QuerySets.py, annotation_service.py, and relationship_service.py are all measurably cleaner.

  • _source_privacy_recursion_passes extraction: Moving _check_annotation_privacy_recursion to a module-level function shared between AnnotationManager and RelationshipManager is the right call — the logic is identical and the private-method boundary was the only thing preventing sharing.

  • Sentinel test pattern: test_annotation_creator_parity_gap_sentinel is exactly the right approach for the known AnnotationQuerySet/user_can asymmetry (issue Permissioning audit follow-ups (from PR #1985) #1986 item 1). It pins the current divergent state so any future fix must consciously update both assertions together.

  • Grant-shape correction: Tightening from any-guardian-row to read_* codename-exact matching in visible_analyses_for / visible_extracts_for and get_visible_extracts / get_visible_analyses is the correct filter/check parity fix. The test_update_only_source_grant_does_not_unlock_lists tests pin both surfaces.

  • Fail-closed race window: The SET_NULL race in _source_privacy_recursion_passes (where created_by_*_id is non-null but the FK dereference yields None) correctly returns False. The docstring explains the design decision clearly.


Issues and Concerns

1. Annotation-creator parity gap (issue #1986 item 1) — UX consequence

The sentinel test documents this clearly, but the concrete UX consequence is worth flagging: an annotation's creator who loses source access will still see their annotation appear in list results (via Q(creator=user) in visible_to_user) but any single-object fetch will fail (user_can(READ) denies via _source_privacy_recursion_passes). This can manifest as a list that renders rows that then error on detail-view load. The sentinel is the right mechanism, but the follow-up should be scheduled before any release that adds creator-authored analysis/extract rows to main UI paths.

2. get_label_distribution_for_corpus is a breaking API change

The user parameter is added as a required keyword-only arg (*, user). The only call site updated in this PR is the GraphQL resolver. Any other caller will get a TypeError at runtime — not caught by linting or type checkers. Recommend confirming before merge:

grep -rn "get_label_distribution_for_corpus" --include="*.py"

3. Performance: subqueries now run on every annotation/relationship list path

apply_source_privacy_gate adds two NOT IN subquery clauses. The four inline implementations it replaces had the same cost, so this is not a per-call regression. However, those four only ran on some paths; the shared gate now runs on all paths, including get_label_distribution_for_corpus which aggregates across a full corpus. Worth verifying query plans on high-annotation corpora in staging.

4. ExtractManager.user_can resolves the user twice

resolved = resolve_user_for_user_can(user)
...
return super().user_can(resolved, ...)  # base class calls resolve again (no-op)

The base-class second call is idempotent on an already-resolved User instance, as the comment explains. Minor; noted for future readers.

5. CI status

The PR description explicitly notes CI was not complete at submission time. Given the scope (4 new security behaviors, 6 new test files, changes to shared manager/queryset infrastructure), CI is the real gate here. Do not merge until the full suite passes.


Documentation Review

The consolidated_permissioning_guide.md overhaul is excellent:

  • Thread moderation correction (EDIT does not grant moderation) is now documented correctly, with the #1450 reconciliation noted inline.
  • The #1682 dual-semantics section (corpus-as-gate vs MIN) is new and important. The matching inline comment in mcp/tools.py is exactly the right cross-reference pattern — callers of the corpus-as-gate variant are labeled so no one silently swaps them.
  • All stale QueryOptimizer path/class references corrected throughout.
  • E001 scan-scope nuance ("green E001 means no inline Tier-0 in config/graphql/", not "none anywhere") is now explicit — prevents future contributors from assuming blanket service-layer coverage.
  • The abbreviated code block in the annotations section (prose sketches instead of verbatim code that was already drifting) is cleaner and more honest.

One minor note: BaseService.filter_visible_qs is described as "fails closed — raises TypeError rather than passing unfiltered rows through." Worth confirming the implementation actually raises on misuse rather than silently returning an unfiltered queryset.


Test Coverage Assessment

Strong overall.

New files:

  • test_extract_anonymous_lockdown.py — service, manager, filter/check parity, authenticated-path regression, group grants, update-only grants. Thorough.
  • test_analysis_listing_grant_parity.py — covers group grant and update-only grant on both legs of get_visible_analyses.
  • test_privacy_gate_regressions.py — regression pins for the three audit-surfaced aggregate/listing leaks.

Extended test_authorization_invariants.py relationship matrix is thorough: user grant, group grant, extract grant, creator exemption, structural bypass, and write-requires-source-permission.

One gap to verify: the changelog lists get_relationship_summary as a fixed leak surface. Confirm test_privacy_gate_regressions.py has explicit coverage for that method.


Pre-merge Checklist

  • CI passes on all new test files (not yet run per PR description)
  • Confirm no other callers of get_label_distribution_for_corpus without user=: grep -rn "get_label_distribution_for_corpus" --include="*.py"
  • Confirm corpus.get_documents() deprecation warning was added in a prior PR (referenced in docs but absent from this diff)
  • Schedule Permissioning audit follow-ups (from PR #1985) #1986 item 1 (annotation creator parity) before next release to avoid list-vs-detail UX inconsistency for annotation creators who lose source access

JSv4 commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator Author

Round 21 — no changes warranted; every item verifies against the branch:

  • Caller grep (5th request): two call sites, both pass user= — also stated in the function's docstring itself since round 8.
  • get_relationship_summary coverage: explicit — RelationshipAggregatePrivacyRegressionTestCase.test_summary_excludes_private_rows_for_ungranted_viewer / ..._includes_private_rows_for_owner in test_privacy_gate_regressions.py.
  • filter_visible_qs fail-closed: confirmed in source — BaseService.filter_visible_qs raises TypeError on inputs lacking visible_to_user (shared/services/base.py, the SECURITY-commented branch) rather than returning unfiltered rows.
  • corpus.get_documents() deprecation: present on main before this PR — DeprecationWarning at corpuses/models.py (verified during the original audit; correctly absent from this diff).
  • Label-distribution "now runs on all paths" perf framing: the gate previously ran on no path there — that absence was the leak being fixed; the subquery cost is the fix working. Staging plans worth watching as with any aggregate, agreed.
  • Creator-gap UX scheduling and CI as the merge gate: both are maintainer-process items already tracked (Permissioning audit follow-ups (from PR #1985) #1986 item 1 with sentinel; CI cycling the tip).

With rounds 14–21 producing three approvals and otherwise only items that already exist on the branch by name, I'll respond to future automated rounds only if they contain new, verifiable findings — the disposition ledger above and #1986 cover everything else.


Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants