Skip to content

Compaction pr updates#680

Open
jshook wants to merge 2 commits into
compaction-prfrom
compaction-pr-updates
Open

Compaction pr updates#680
jshook wants to merge 2 commits into
compaction-prfrom
compaction-pr-updates

Conversation

@jshook

@jshook jshook commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

This branch PR makes two key changes:

  • Cherry-picks robust physical core count, also PRed to main separately
  • Makes compaction worker pools simply re-use the core thread pool used by nearly all other jvector internal tasks, as controlled by the parameter jvector.physical_core_count.

Given the thread resource changes here, this is likely impacting in a couple key areas:

  1. Compaction will take fewer resources, and will not saturate a system as much as it did, protecting front-end serving resources in a typical embedding scenario.
  2. If results from a couple comprehensive tests are accurate, this is a net improvement as well to compaction wall-clock and CPU efficiency (work done/cycle) due to less contention from the previous multi-pool setup which effectively allocated 1.5x or higher (not counting the core thread pool) threads vs system availability.

These previously run tests also indicated robust recall results across several scenarios including different datasets and sizes up to 10M, so recall was effectively unchanged compared to previous results with this compaction branch. What did change was the operational envelope (as described above)

Nonetheless, given that this is a relatively non-trivial adjustment to the operational profile near a release, we need to have sufficient testing on it.

I feel very strongly that we should not merge the upstream compaction-pr without this pr merged into it first, as the system saturation which would occur with the current thread pool configuration would certainly impact front-end operations.

@github-actions

github-actions Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Before you submit for review:

  • Does your PR follow guidelines from CONTRIBUTIONS.md?
  • Did you summarize what this PR does clearly and concisely?
  • Did you include performance data for changes which may be performance impacting?
  • Did you include useful docs for any user-facing changes or features?
  • Did you include useful javadocs for developer oriented changes, explaining new concepts or key changes?
  • Did you trigger and review regression testing results against the base branch via Run Bench Main?
  • Did you adhere to the code formatting guidelines (TBD)
  • Did you group your changes for easy review, providing meaningful descriptions for each commit?
  • Did you ensure that all files contain the correct copyright header?

If you did not complete any of these, then please explain below.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant