Skip to content

Short-circuit mprotect on matching tracker prot#71

Open
jserv wants to merge 1 commit into
mainfrom
mem
Open

Short-circuit mprotect on matching tracker prot#71
jserv wants to merge 1 commit into
mainfrom
mem

Conversation

@jserv
Copy link
Copy Markdown
Contributor

@jserv jserv commented Jun 5, 2026

Repeat mprotect with the same prot on RELRO, JIT, and GC-style ranges no longer walks page tables: a pre-check confirms every overlapping region already records the requested prot and is not MAP_NORESERVE, in which case the call returns 0 without touching the tracker or PTEs. The skip is gated on a per-request safety helper that forces the slow path for pure PROT_READ, because sys_mmap installs MEM_PERM_RW PTEs for non-exec mappings (including PROT_READ requests) and only the explicit guest_update_perms call inside mprotect tightens them to MEM_PERM_R. tests/test-negative pins this behavior with a regression test that mprotects PROT_READ onto a PROT_READ mmap and confirms a subsequent write traps SIGSEGV.

The fast path runs before any mutation. When PTE work is required, it runs BEFORE guest_region_set_prot so a -ENOMEM from guest_{update_perms, extend_page_tables,invalidate_ptes} leaves the tracker at the OLD prot; the next retry sees the mismatch and re-attempts instead of silently no-op'ing on stale tracker state. The low-IPA branch was also missing return-value checks on guest_update_perms and guest_invalidate_ptes; both now propagate -LINUX_ENOMEM.

guest_region_set_prot's two table-full split paths now set a sticky guest_t.regions_tracker_stale flag. The fast path checks this flag and falls back to unconditional PTE work for the lifetime of the process once the tracker has lied even once. The flag is not propagated through fork IPC v10, so a child that inherits a stale tracker re-arms the flag the next time it hits the same condition; the only window of incorrect skipping is the very first matching mprotect after such a fork.

guest_region_remove is restructured as a single in/out compaction pass. The interior-split branch snapshots the source region, shifts the suffix to make room, writes left and right halves, and returns. The previous in-place layout aliased the source slot when out == in: it overwrote *r with the left half and then read the right half from that clobbered slot, corrupting both halves on the only growth path. The trim-only paths consolidate to a single survivor block.

guest_region_add_ex_owned_gpa replaces the O(n) bubble-insert with a binary search plus single memmove. The two query helpers guest_region_range_prot_uniform and guest_region_range_has_noreserve use the same binary-search prefix-skip.

find_free_gap_inner stops iterating once a region's start crosses max_addr; later regions cannot affect a candidate gap inside the search window.


Summary by cubic

Short-circuits same-prot mprotect to skip page-table work when safe, and hardens region tracking to avoid stale or incorrect skips. This speeds up RELRO/JIT/GC workloads and fixes a region-split corruption bug.

  • New Features

    • Add safe fast path for mprotect when all overlapping regions already have the target prot and none are MAP_NORESERVE; skips tracker and PTE work. Forces slow path for pure PROT_READ. Gated by a sticky regions_tracker_stale flag.
    • Do PTE changes before tracker updates and return -LINUX_ENOMEM on failures, preventing silent no-ops on retries.
  • Refactors

    • Rewrite guest_region_remove as a single in/out compaction pass to fix interior-split corruption and simplify trim-only paths.
    • Speed up region ops: binary-search insert in guest_region_add_ex_owned_gpa, new guest_region_first_end_above, guest_region_range_prot_uniform, guest_region_range_has_noreserve, and early-exit in gap search.
    • Add regression test: mprotect(PROT_READ) onto PROT_READ mapping keeps it read-only; a write traps SIGSEGV.

Written for commit d509397. Summary will update on new commits.

Review in cubic

Repeat mprotect with the same prot on RELRO, JIT, and GC-style ranges no
longer walks page tables: a pre-check confirms every overlapping region
already records the requested prot and is not MAP_NORESERVE, in which
case the call returns 0 without touching the tracker or PTEs. The skip
is gated on a per-request safety helper that forces the slow path for
pure PROT_READ, because sys_mmap installs MEM_PERM_RW PTEs for non-exec
mappings (including PROT_READ requests) and only the explicit
guest_update_perms call inside mprotect tightens them to MEM_PERM_R.
tests/test-negative pins this behavior with a regression test that
mprotects PROT_READ onto a PROT_READ mmap and confirms a subsequent
write traps SIGSEGV.

The fast path runs before any mutation. When PTE work is required, it
runs BEFORE guest_region_set_prot so a -ENOMEM from guest_{update_perms,
extend_page_tables,invalidate_ptes} leaves the tracker at the OLD prot;
the next retry sees the mismatch and re-attempts instead of silently
no-op'ing on stale tracker state. The low-IPA branch was also missing
return-value checks on guest_update_perms and guest_invalidate_ptes;
both now propagate -LINUX_ENOMEM.

guest_region_set_prot's two table-full split paths now set a sticky
guest_t.regions_tracker_stale flag. The fast path checks this flag and
falls back to unconditional PTE work for the lifetime of the process
once the tracker has lied even once. The flag is not propagated through
fork IPC v10, so a child that inherits a stale tracker re-arms the flag
the next time it hits the same condition; the only window of incorrect
skipping is the very first matching mprotect after such a fork.

guest_region_remove is restructured as a single in/out compaction pass.
The interior-split branch snapshots the source region, shifts the suffix
to make room, writes left and right halves, and returns. The previous
in-place layout aliased the source slot when out == in: it overwrote *r
with the left half and then read the right half from that clobbered slot,
corrupting both halves on the only growth path. The trim-only paths
consolidate to a single survivor block.

guest_region_add_ex_owned_gpa replaces the O(n) bubble-insert with a
binary search plus single memmove. The two query helpers
guest_region_range_prot_uniform and guest_region_range_has_noreserve use
the same binary-search prefix-skip.

find_free_gap_inner stops iterating once a region's start crosses max_addr;
later regions cannot affect a candidate gap inside the search window.
@jserv jserv requested a review from Max042004 June 5, 2026 07:15
cubic-dev-ai[bot]

This comment was marked as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant