Skip to content

Improve crossover dual simplex#948

Open
rg20 wants to merge 2 commits intoNVIDIA:mainfrom
rg20:improve_crossover_dual_simplex
Open

Improve crossover dual simplex#948
rg20 wants to merge 2 commits intoNVIDIA:mainfrom
rg20:improve_crossover_dual_simplex

Conversation

@rg20
Copy link
Contributor

@rg20 rg20 commented Mar 10, 2026

Description

This PR includes two performance improvements.

  1. Take advantage of hyper sparsity while computing reduced costs in dual pushes in crossover code
  2. Optimize right looking lu by replacing the linear degree bucket search with O(1) search

Issue

Checklist

  • I am familiar with the Contributing Guidelines.
  • Testing
    • New or existing tests cover these changes
    • Added tests
    • Created an issue to follow-up
    • NA
  • Documentation
    • The documentation is up to date with these changes
    • Added new documentation
    • NA

rg20 added 2 commits March 10, 2026 11:42
…Replace linear degree-bucket search with O(1) swap-with-last removal using col_pos/row_pos position arrays, and eliminate O(row_degree) pre-traversal in schur_complement via a persistent last_in_row[] array
@rg20 rg20 requested a review from a team as a code owner March 10, 2026 18:47
@rg20 rg20 requested review from aliceb-nv and chris-maes March 10, 2026 18:47
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 10, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@rg20 rg20 added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Mar 10, 2026
@rg20 rg20 added this to the 26.04 milestone Mar 10, 2026
@rg20
Copy link
Contributor Author

rg20 commented Mar 10, 2026

/ok to test 5f377a2

@coderabbitai
Copy link

coderabbitai bot commented Mar 10, 2026

📝 Walkthrough

Walkthrough

Enhanced dual simplex optimization by introducing sparse delta_y computation in crossover operations and adding persistent position-tracking state (col_pos, row_pos, last_in_row) to replace linear searches with constant-time operations in LU factorization routines.

Changes

Cohort / File(s) Summary
Sparse Dual Computation
cpp/src/dual_simplex/crossover.cpp
Added Arow parameter to dual_push function signature; introduced sparse delta_y representation with explicit expansion using Arow; updated dual y-update logic to apply changes only to nonzero entries.
LU Factorization Position Tracking
cpp/src/dual_simplex/right_looking_lu.cpp
Introduced col_pos, row_pos, and last_in_row state arrays to enable O(1) degree updates and element removals; propagated these parameters through load_elements, update_Cdegree_and_col_count, update_Rdegree_and_row_count, schur_complement, and remove_pivot_col function signatures; added initialization of position trackers after computing initial degree and bucket structures.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Improve crossover dual simplex' is directly related to the main changes in the PR, which focus on performance improvements in the dual simplex crossover code.
Description check ✅ Passed The pull request description accurately describes the two performance improvements implemented: hyper-sparsity optimization in dual pushes and O(1) bucket search in right-looking LU.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
cpp/src/dual_simplex/crossover.cpp (1)

418-436: Consider hoisting delta_expanded allocation outside the while loop.

Currently delta_expanded is allocated as a new std::vector<f_t>(n, 0.) on every iteration of the while (superbasic_list.size() > 0) loop (starting at line 391). For large problems with many superbasic variables, this results in repeated O(n) allocations and deallocations.

Moving the allocation before the loop and using std::fill to reset would avoid this overhead.

♻️ Suggested refactor
+  std::vector<f_t> delta_expanded(n);
   while (superbasic_list.size() > 0) {
     // ... existing code ...
     
     // delta_zN = -N^T delta_y
     std::vector<f_t> delta_zN(n - m);
-    std::vector<f_t> delta_expanded(n, 0.);
+    std::fill(delta_expanded.begin(), delta_expanded.end(), 0.);
     
     // Iterate directly over sparse delta_y instead of checking zeros
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/dual_simplex/crossover.cpp` around lines 418 - 436, Hoist the
allocation of delta_expanded (and optionally delta_zN) out of the while
(superbasic_list.size() > 0) loop: create std::vector<f_t> delta_expanded(n)
once before the loop and inside each iteration reset its contents with
std::fill(delta_expanded.begin(), delta_expanded.end(), 0.0) (and similarly
reset delta_zN or ensure it is resized once and overwritten). Update the loop
body that accumulates into delta_expanded and the subsequent assignment to
delta_zN[k] = -delta_expanded[nonbasic_list[k]] to rely on the reused buffers;
if n can change, ensure you call delta_expanded.resize(n) before the fill.
cpp/src/dual_simplex/right_looking_lu.cpp (1)

649-661: Extract the bucket-position bootstrap into one helper.

The col_pos / row_pos initialization is now duplicated in both factorization entry points, with only the bucket bounds differing. This is subtle invariant code; a small helper next to initialize_degree_data() would reduce drift the next time the bucket bookkeeping changes.

Also applies to: 1049-1062

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/dual_simplex/right_looking_lu.cpp`:
- Around line 405-409: work_estimate is using Cdegree[pivot_j] and
Rdegree[pivot_i] after those entries have been set to -1 by
update_Cdegree_and_col_count() / update_Rdegree_and_row_count(), causing
work_estimate to decrease incorrectly; fix by using the actual degree before it
is clobbered — either capture a local variable like int deg_j = Cdegree[pivot_j]
and/or deg_i = Rdegree[pivot_i] before calling the update functions, or compute
the degree from the corresponding count arrays (e.g. col_count[pivot_j] /
row_count[pivot_i]) or by traversing first_in_col/first_in_row if counts are not
available — then use deg_j/deg_i when updating work_estimate instead of reading
Cdegree/Rdegree after they were set to -1.

---

Nitpick comments:
In `@cpp/src/dual_simplex/crossover.cpp`:
- Around line 418-436: Hoist the allocation of delta_expanded (and optionally
delta_zN) out of the while (superbasic_list.size() > 0) loop: create
std::vector<f_t> delta_expanded(n) once before the loop and inside each
iteration reset its contents with std::fill(delta_expanded.begin(),
delta_expanded.end(), 0.0) (and similarly reset delta_zN or ensure it is resized
once and overwritten). Update the loop body that accumulates into delta_expanded
and the subsequent assignment to delta_zN[k] = -delta_expanded[nonbasic_list[k]]
to rely on the reused buffers; if n can change, ensure you call
delta_expanded.resize(n) before the fill.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6a904205-beca-40f2-bce0-a3dcdb15be1e

📥 Commits

Reviewing files that changed from the base of the PR and between 2b21118 and 5f377a2.

📒 Files selected for processing (2)
  • cpp/src/dual_simplex/crossover.cpp
  • cpp/src/dual_simplex/right_looking_lu.cpp

Comment on lines +405 to 409
// Initialize row_last_workspace from last_in_row (O(1) per row, no full row traversal)
for (i_t p1 = first_in_col[pivot_j]; p1 != kNone; p1 = elements[p1].next_in_column) {
element_t<i_t, f_t>* e = &elements[p1];
const i_t i = e->i;
i_t row_last = kNone;
for (i_t p3 = first_in_row[i]; p3 != kNone; p3 = elements[p3].next_in_row) {
row_last = p3;
}
work_estimate += 2 * Rdegree[i];
row_last_workspace[i] = row_last;
row_last_workspace[elements[p1].i] = last_in_row[elements[p1].i];
}
work_estimate += 4 * Cdegree[pivot_j];
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix work_estimate accounting after pivot elimination.

Line 409, Line 499, and Line 509 still read Cdegree[pivot_j] / Rdegree[pivot_i], but those were already set to -1 by update_Cdegree_and_col_count() and update_Rdegree_and_row_count() before this helper is called. That makes work_estimate decrease on every pivot instead of approximating the actual loop cost.

Also applies to: 499-509

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/dual_simplex/right_looking_lu.cpp` around lines 405 - 409,
work_estimate is using Cdegree[pivot_j] and Rdegree[pivot_i] after those entries
have been set to -1 by update_Cdegree_and_col_count() /
update_Rdegree_and_row_count(), causing work_estimate to decrease incorrectly;
fix by using the actual degree before it is clobbered — either capture a local
variable like int deg_j = Cdegree[pivot_j] and/or deg_i = Rdegree[pivot_i]
before calling the update functions, or compute the degree from the
corresponding count arrays (e.g. col_count[pivot_j] / row_count[pivot_i]) or by
traversing first_in_col/first_in_row if counts are not available — then use
deg_j/deg_i when updating work_estimate instead of reading Cdegree/Rdegree after
they were set to -1.

elements[nz].x = A.x[p];
elements[nz].next_in_column = kNone;
if (p > col_start) { elements[nz - 1].next_in_column = nz; }
elements[nz].next_in_row = kNone; // set the current next in row to None (since we don't know
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind restoring the comments here. Those are helpful

}
work_estimate += 2 * Rdegree[i];
row_last_workspace[i] = row_last;
row_last_workspace[elements[p1].i] = last_in_row[elements[p1].i];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit. I think this would read clearer if we did

const i_t i = elements[p1].i;
row_last_workspace[i] = last_in_row[i]; 

initialize_degree_data(A, column_list, Cdegree, Rdegree, col_count, row_count, work_estimate);

// Position arrays for O(1) degree-bucket removal
std::vector<i_t> col_pos(n);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should document what col_pos stores. As we do with other variables. I think it is

std::vector<i_t> col_pos(n);  // if Cdegree[j] = nz, than j is in col_count[nz][col_pos[j]]

col_pos[col_count[d][pos]] = pos;
}
}
std::vector<i_t> row_pos(n);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add the comment

std::vector<i_t> row_pos(n); // If Rdegree[i] = nz, than i is in row_count[nz][row_pos[i]]

std::vector<i_t>& Cdegree,
std::vector<std::vector<i_t>>& row_count,
std::vector<std::vector<i_t>>& col_count,
std::vector<i_t>& last_in_row,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between row_last_workspace and last_in_row?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants