Conversation
Repeat mprotect with the same prot on RELRO, JIT, and GC-style ranges no
longer walks page tables: a pre-check confirms every overlapping region
already records the requested prot and is not MAP_NORESERVE, in which
case the call returns 0 without touching the tracker or PTEs. The skip
is gated on a per-request safety helper that forces the slow path for
pure PROT_READ, because sys_mmap installs MEM_PERM_RW PTEs for non-exec
mappings (including PROT_READ requests) and only the explicit
guest_update_perms call inside mprotect tightens them to MEM_PERM_R.
tests/test-negative pins this behavior with a regression test that
mprotects PROT_READ onto a PROT_READ mmap and confirms a subsequent
write traps SIGSEGV.
The fast path runs before any mutation. When PTE work is required, it
runs BEFORE guest_region_set_prot so a -ENOMEM from guest_{update_perms,
extend_page_tables,invalidate_ptes} leaves the tracker at the OLD prot;
the next retry sees the mismatch and re-attempts instead of silently
no-op'ing on stale tracker state. The low-IPA branch was also missing
return-value checks on guest_update_perms and guest_invalidate_ptes;
both now propagate -LINUX_ENOMEM.
guest_region_set_prot's two table-full split paths now set a sticky
guest_t.regions_tracker_stale flag. The fast path checks this flag and
falls back to unconditional PTE work for the lifetime of the process
once the tracker has lied even once. The flag is not propagated through
fork IPC v10, so a child that inherits a stale tracker re-arms the flag
the next time it hits the same condition; the only window of incorrect
skipping is the very first matching mprotect after such a fork.
guest_region_remove is restructured as a single in/out compaction pass.
The interior-split branch snapshots the source region, shifts the suffix
to make room, writes left and right halves, and returns. The previous
in-place layout aliased the source slot when out == in: it overwrote *r
with the left half and then read the right half from that clobbered slot,
corrupting both halves on the only growth path. The trim-only paths
consolidate to a single survivor block.
guest_region_add_ex_owned_gpa replaces the O(n) bubble-insert with a
binary search plus single memmove. The two query helpers
guest_region_range_prot_uniform and guest_region_range_has_noreserve use
the same binary-search prefix-skip.
find_free_gap_inner stops iterating once a region's start crosses max_addr;
later regions cannot affect a candidate gap inside the search window.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Repeat mprotect with the same prot on RELRO, JIT, and GC-style ranges no longer walks page tables: a pre-check confirms every overlapping region already records the requested prot and is not MAP_NORESERVE, in which case the call returns 0 without touching the tracker or PTEs. The skip is gated on a per-request safety helper that forces the slow path for pure PROT_READ, because sys_mmap installs MEM_PERM_RW PTEs for non-exec mappings (including PROT_READ requests) and only the explicit guest_update_perms call inside mprotect tightens them to MEM_PERM_R. tests/test-negative pins this behavior with a regression test that mprotects PROT_READ onto a PROT_READ mmap and confirms a subsequent write traps SIGSEGV.
The fast path runs before any mutation. When PTE work is required, it runs BEFORE guest_region_set_prot so a -ENOMEM from guest_{update_perms, extend_page_tables,invalidate_ptes} leaves the tracker at the OLD prot; the next retry sees the mismatch and re-attempts instead of silently no-op'ing on stale tracker state. The low-IPA branch was also missing return-value checks on guest_update_perms and guest_invalidate_ptes; both now propagate -LINUX_ENOMEM.
guest_region_set_prot's two table-full split paths now set a sticky guest_t.regions_tracker_stale flag. The fast path checks this flag and falls back to unconditional PTE work for the lifetime of the process once the tracker has lied even once. The flag is not propagated through fork IPC v10, so a child that inherits a stale tracker re-arms the flag the next time it hits the same condition; the only window of incorrect skipping is the very first matching mprotect after such a fork.
guest_region_remove is restructured as a single in/out compaction pass. The interior-split branch snapshots the source region, shifts the suffix to make room, writes left and right halves, and returns. The previous in-place layout aliased the source slot when out == in: it overwrote *r with the left half and then read the right half from that clobbered slot, corrupting both halves on the only growth path. The trim-only paths consolidate to a single survivor block.
guest_region_add_ex_owned_gpa replaces the O(n) bubble-insert with a binary search plus single memmove. The two query helpers guest_region_range_prot_uniform and guest_region_range_has_noreserve use the same binary-search prefix-skip.
find_free_gap_inner stops iterating once a region's start crosses max_addr; later regions cannot affect a candidate gap inside the search window.
Summary by cubic
Short-circuits same-prot
mprotectto skip page-table work when safe, and hardens region tracking to avoid stale or incorrect skips. This speeds up RELRO/JIT/GC workloads and fixes a region-split corruption bug.New Features
mprotectwhen all overlapping regions already have the target prot and none areMAP_NORESERVE; skips tracker and PTE work. Forces slow path for purePROT_READ. Gated by a stickyregions_tracker_staleflag.-LINUX_ENOMEMon failures, preventing silent no-ops on retries.Refactors
guest_region_removeas a single in/out compaction pass to fix interior-split corruption and simplify trim-only paths.guest_region_add_ex_owned_gpa, newguest_region_first_end_above,guest_region_range_prot_uniform,guest_region_range_has_noreserve, and early-exit in gap search.mprotect(PROT_READ)ontoPROT_READmapping keeps it read-only; a write trapsSIGSEGV.Written for commit d509397. Summary will update on new commits.