Skip to content

Scaling test#906

Open
akifcorduk wants to merge 65 commits intoNVIDIA:mainfrom
akifcorduk:scaling_test
Open

Scaling test#906
akifcorduk wants to merge 65 commits intoNVIDIA:mainfrom
akifcorduk:scaling_test

Conversation

@akifcorduk
Copy link
Contributor

@akifcorduk akifcorduk commented Feb 25, 2026

This PR adds row scaling to MIP and uses PDLP scaling for both barrier and PDLP.

Summary by CodeRabbit

  • New Features

    • MIP row-scaling enabled by default for improved numeric robustness.
    • Solver returns partial results early when time limits are reached; time-limit enforcement is more proactive.
    • New selectable MIP scaling configurations with logging of the selected scaling setup.
  • Bug Fixes

    • Fixed inconsistent unscaling and feasibility handling to prevent mismatched solution states.
  • Chores

    • Removed a deprecated MIP configuration option.
    • Consolidated and replaced the scaling strategy; tests and build updated accordingly.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
cpp/src/mip_heuristics/mip_scaling_strategy.cu (1)

260-263: row_skip_scaling is now correctly zero-initialized before use.

thrust::fill at Lines 261-262 ensures row_skip_scaling is deterministic before compute_big_m_skip_rows populates it. This resolves the prior uninitialized-read risk.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/mip_scaling_strategy.cu` around lines 260 - 263, The
prior uninitialized-read risk for row_skip_scaling is fixed by explicitly
zero-initializing it; ensure you keep the thrust::fill call that uses
handle_ptr_->get_thrust_policy() to fill row_skip_scaling.begin()..end() with
i_t(0) before calling compute_big_m_skip_rows so compute_big_m_skip_rows only
reads initialized values from row_skip_scaling; confirm the allocation of
iteration_scaling remains unchanged and placed after row_skip_scaling is
initialized.
🧹 Nitpick comments (2)
cpp/src/mip_heuristics/mip_scaling_strategy.cu (2)

66-80: Prefer thrust::maximum<T> / thrust::minimum<T> over custom max_op_t / min_op_t.

Both structs are direct re-implementations of standard Thrust functors that are already __host__ __device__-compatible and accepted by CUB's DeviceSegmentedReduce::Reduce.

♻️ Proposed refactor
-template <typename t_t>
-struct max_op_t {
-  __host__ __device__ t_t operator()(const t_t& lhs, const t_t& rhs) const
-  {
-    return lhs > rhs ? lhs : rhs;
-  }
-};
-
-template <typename t_t>
-struct min_op_t {
-  __host__ __device__ t_t operator()(const t_t& lhs, const t_t& rhs) const
-  {
-    return lhs < rhs ? lhs : rhs;
-  }
-};

Then replace usages with thrust::maximum<f_t>{} and thrust::minimum<f_t>{} at call sites.

As per coding guidelines: "Avoid reinventing functionality already available in Thrust, CCCL, or RMM libraries."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/mip_scaling_strategy.cu` around lines 66 - 80, Replace
the custom functors max_op_t and min_op_t with the standard Thrust functors:
remove or stop using max_op_t and min_op_t and update all call sites to use
thrust::maximum<f_t>{} and thrust::minimum<f_t>{} (or the appropriate template
type instead of f_t) so the code leverages the existing __host__
__device__-compatible Thrust implementations accepted by
CUB::DeviceSegmentedReduce::Reduce.

104-177: compute_big_m_skip_rows recomputes row_inf_norm redundantly, and ignores the passed temp_storage stream.

Two issues:

  1. The function hard-codes op_problem.handle_ptr->get_stream() for its CUB calls (unlike compute_row_inf_norm which accepts a stream_view parameter). This inconsistency would silently break if a caller ever passes a different stream.
  2. The max-abs segmented reduction on Lines 121-130 recomputes row_inf_norm that is already computed in the first call to compute_row_inf_norm on loop iteration 0 (Line 301). Since compute_big_m_skip_rows is called once before the loop, row_inf_norm gets computed twice before any scaling is applied.

Consider adding a stream_view parameter to compute_big_m_skip_rows consistent with compute_row_inf_norm, and passing the already-computed row_inf_norm from a preceding compute_row_inf_norm call instead of re-running the max reduce.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/mip_scaling_strategy.cu` around lines 104 - 177,
compute_big_m_skip_rows currently reruns the max-abs segmented reduction into
row_inf_norm and hard-codes op_problem.handle_ptr->get_stream(), causing
redundant work and a stream mismatch; change the signature of
compute_big_m_skip_rows to accept a raft::stream_view (or stream_view alias used
elsewhere) and remove the initial max-abs cub::DeviceSegmentedReduce call so the
function uses the row_inf_norm that was produced by compute_row_inf_norm; also
replace all uses of op_problem.handle_ptr->get_stream() in
compute_big_m_skip_rows with the new stream_view and ensure
temp_storage/temp_storage_bytes and other CUB/Thrust calls use that stream_view
when invoked; finally update callers to compute_big_m_skip_rows to pass the
stream_view and to call compute_row_inf_norm once beforehand so row_inf_norm is
not recomputed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/mip_heuristics/mip_scaling_strategy.cu`:
- Around line 327-370: You apply per-iteration scaling (iteration_scaling) to
coefficients and bounds but never record the accumulated per-row scaling
factors, so later dual unscaling can't be performed; add and update a per-row
accumulator in op_problem_scaled_ (e.g., op_problem_scaled_.row_scaling_factors
initialized to 1) and multiply it by the iteration scaling for each row each
iteration. Concretely: locate iteration_scaling, coefficient_row_index, and the
three thrust::transform calls that multiply op_problem_scaled_.coefficients,
.constraint_lower_bounds, and .constraint_upper_bounds; add a thrust::transform
to update op_problem_scaled_.row_scaling_factors by applying iteration_scaling
(use a permutation iterator with coefficient_row_index when mapping scalings
through coefficient indices to row indices for coefficients, and a direct
iterator for constraint bounds) so the per-row product is maintained for
post-solve dual unscaling.

---

Duplicate comments:
In `@cpp/src/mip_heuristics/mip_scaling_strategy.cu`:
- Around line 260-263: The prior uninitialized-read risk for row_skip_scaling is
fixed by explicitly zero-initializing it; ensure you keep the thrust::fill call
that uses handle_ptr_->get_thrust_policy() to fill
row_skip_scaling.begin()..end() with i_t(0) before calling
compute_big_m_skip_rows so compute_big_m_skip_rows only reads initialized values
from row_skip_scaling; confirm the allocation of iteration_scaling remains
unchanged and placed after row_skip_scaling is initialized.

---

Nitpick comments:
In `@cpp/src/mip_heuristics/mip_scaling_strategy.cu`:
- Around line 66-80: Replace the custom functors max_op_t and min_op_t with the
standard Thrust functors: remove or stop using max_op_t and min_op_t and update
all call sites to use thrust::maximum<f_t>{} and thrust::minimum<f_t>{} (or the
appropriate template type instead of f_t) so the code leverages the existing
__host__ __device__-compatible Thrust implementations accepted by
CUB::DeviceSegmentedReduce::Reduce.
- Around line 104-177: compute_big_m_skip_rows currently reruns the max-abs
segmented reduction into row_inf_norm and hard-codes
op_problem.handle_ptr->get_stream(), causing redundant work and a stream
mismatch; change the signature of compute_big_m_skip_rows to accept a
raft::stream_view (or stream_view alias used elsewhere) and remove the initial
max-abs cub::DeviceSegmentedReduce call so the function uses the row_inf_norm
that was produced by compute_row_inf_norm; also replace all uses of
op_problem.handle_ptr->get_stream() in compute_big_m_skip_rows with the new
stream_view and ensure temp_storage/temp_storage_bytes and other CUB/Thrust
calls use that stream_view when invoked; finally update callers to
compute_big_m_skip_rows to pass the stream_view and to call compute_row_inf_norm
once beforehand so row_inf_norm is not recomputed.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a1c672 and 80ac99d.

📒 Files selected for processing (1)
  • cpp/src/mip_heuristics/mip_scaling_strategy.cu

@anandhkb anandhkb added this to the 26.04 milestone Feb 25, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/mip_heuristics/mip_scaling_strategy.cu`:
- Around line 146-150: Thrust kernel launches in mip_scaling_strategy.cu (e.g.,
the thrust::transform that uses handle_ptr->get_thrust_policy() and
min_nonzero_bounds_abs_transform_t<f_t>{}) are missing explicit CUDA error
checks; after each Thrust call (all thrust::transform / thrust::copy /
thrust::fill invocations at the reported locations) add a CUDA
device-synchronization error check using the same pattern as CUB calls (e.g.,
wrap a cudaDeviceSynchronize() or thrust::detail::backend::cuda::sync call
followed by RAFT_CUDA_TRY or
CUDA_CHECK(cudaGetLastError())/CUDA_CHECK(cudaDeviceSynchronize()) ) so any
launch or runtime error is caught and propagated—apply this change to each
affected symbol/site in the file (the transform using
min_nonzero_bounds_abs_transform_t, other thrust::transform/copy/fill
invocations around lines reported) to ensure consistent error handling.
- Around line 300-309: Clamp the user-provided
row_scaling_k_min/row_scaling_k_max to a safe exponent range before calling
ldexp and before passing into the kernel: compute safe_k_min/safe_k_max by
clamping row_scaling_k_min/row_scaling_k_max to a predetermined safe exponent
window (e.g., the sensible defaults like -20..20 or the numeric limits for f_t's
exponent range), then use safe_k_min/safe_k_max when computing
min_scaling/max_scaling (instead of raw row_scaling_k_min/row_scaling_k_max),
include those safe values in the kernel lambda capture, and replace the per-row
k and required_k clamps with clamps against safe_k_min/safe_k_max so ldexp(1,k)
cannot underflow/overflow.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6b2a631 and e2b3aea.

📒 Files selected for processing (4)
  • cpp/src/mip_heuristics/diversity/diversity_manager.cu
  • cpp/src/mip_heuristics/diversity/diversity_manager.cuh
  • cpp/src/mip_heuristics/mip_scaling_strategy.cu
  • cpp/src/mip_heuristics/mip_scaling_strategy.cuh

Comment on lines +146 to +150
thrust::transform(handle_ptr->get_thrust_policy(),
row_bounds_begin,
row_bounds_end,
row_min_nonzero_bounds.begin(),
min_nonzero_bounds_abs_transform_t<f_t>{});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the file structure and the specific lines mentioned
wc -l cpp/src/mip_heuristics/mip_scaling_strategy.cu

Repository: NVIDIA/cuopt

Length of output: 106


🏁 Script executed:

# Read the section around lines 146-150 to see the context
sed -n '140,160p' cpp/src/mip_heuristics/mip_scaling_strategy.cu

Repository: NVIDIA/cuopt

Length of output: 1229


🏁 Script executed:

# Search for all thrust:: launches in the file and check if they are followed by RAFT_CUDA_TRY
cat cpp/src/mip_heuristics/mip_scaling_strategy.cu | grep -n "thrust::" | head -20

Repository: NVIDIA/cuopt

Length of output: 1476


🏁 Script executed:

# More detailed check: look for the pattern of thrust launches and nearby error checks
python3 <<'EOF'
import re

with open('cpp/src/mip_heuristics/mip_scaling_strategy.cu', 'r') as f:
    lines = f.readlines()

# Find all thrust:: calls with context
results = []
for i, line in enumerate(lines):
    if 'thrust::' in line and not line.strip().startswith('//'):
        # Look at the next few lines to see if there's a RAFT_CUDA_TRY
        context_start = max(0, i - 1)
        context_end = min(len(lines), i + 5)
        context = ''.join(lines[context_start:context_end])
        
        # Check if RAFT_CUDA_TRY appears in the next few lines
        has_check = 'RAFT_CUDA_TRY' in context
        results.append({
            'line': i + 1,
            'code': line.strip(),
            'has_check': has_check,
            'context': context.strip()
        })

# Print summary
print("Thrust launches found:")
print("=" * 80)
for r in results:
    print(f"Line {r['line']}: {r['code'][:70]}")
    print(f"  Has error check: {r['has_check']}")
    print()
EOF

Repository: NVIDIA/cuopt

Length of output: 4849


🏁 Script executed:

# Also check what RAFT_CUDA_TRY pattern looks like in this file
grep -n "RAFT_CUDA_TRY" cpp/src/mip_heuristics/mip_scaling_strategy.cu | head -10

Repository: NVIDIA/cuopt

Length of output: 626


🏁 Script executed:

# Check the specific ranges mentioned in the comment
echo "=== Line 146-150 ==="
sed -n '146,150p' cpp/src/mip_heuristics/mip_scaling_strategy.cu

echo ""
echo "=== Line 207-225 ==="
sed -n '207,225p' cpp/src/mip_heuristics/mip_scaling_strategy.cu

echo ""
echo "=== Line 320-321 ==="
sed -n '320,321p' cpp/src/mip_heuristics/mip_scaling_strategy.cu

echo ""
echo "=== Line 325-336 ==="
sed -n '325,336p' cpp/src/mip_heuristics/mip_scaling_strategy.cu

Repository: NVIDIA/cuopt

Length of output: 1835


🏁 Script executed:

# Check whether this file uses a specific pattern - perhaps thrust operations are wrapped at a higher level
grep -B2 -A2 "thrust::transform\|thrust::upper_bound\|thrust::count\|thrust::fill" cpp/src/mip_heuristics/mip_scaling_strategy.cu | head -50

Repository: NVIDIA/cuopt

Length of output: 2085


🏁 Script executed:

# Check if there's any wrapper or special handling in thrust_policy
find . -type f -name "*.hpp" -o -name "*.h" | xargs grep -l "thrust_policy\|get_thrust_policy" | head -5

Repository: NVIDIA/cuopt

Length of output: 222


🏁 Script executed:

# Check the RAFT handle implementation to see if get_thrust_policy includes error checking
rg "get_thrust_policy|thrust_policy" -A 3 -B 1 --max-count=10

Repository: NVIDIA/cuopt

Length of output: 50369


🏁 Script executed:

# Look for any error checking patterns after thrust operations in the broader codebase
rg "thrust::(transform|fill|upper_bound|count|transform_reduce)" -A 2 | grep -E "CUDA|cuda|error|check" | head -20

Repository: NVIDIA/cuopt

Length of output: 1439


🏁 Script executed:

# Specifically check other scaling/heuristics files in the same directory for comparison
ls -la cpp/src/mip_heuristics/

Repository: NVIDIA/cuopt

Length of output: 1531


🏁 Script executed:

# Check how other files in the same directory handle Thrust launches
find cpp/src/mip_heuristics -name "*.cu" -o -name "*.cuh" | xargs grep -l "thrust::" | head -3

Repository: NVIDIA/cuopt

Length of output: 210


🏁 Script executed:

# Sample one other file to see if this is a systemic pattern
head -250 cpp/src/mip_heuristics/mip_scaling_strategy.cu | tail -50 | rg "thrust::" -A 1

Repository: NVIDIA/cuopt

Length of output: 857


🏁 Script executed:

# More comprehensive check: do ANY thrust calls in this repo have post-call error checks?
rg "thrust::(transform|fill|upper_bound|count)" -A 1 | grep -B1 "RAFT_CUDA_TRY\|cudaPeekAtLastError" | head -20

Repository: NVIDIA/cuopt

Length of output: 38


Add explicit CUDA error checks after Thrust launches.

CUB calls are checked with RAFT_CUDA_TRY, but Thrust kernel launches lack explicit error verification. This inconsistency can hide launch and runtime failures, weakening error propagation.

🔧 Minimal fix pattern
   thrust::transform(handle_ptr->get_thrust_policy(),
                     row_bounds_begin,
                     row_bounds_end,
                     row_min_nonzero_bounds.begin(),
                     min_nonzero_bounds_abs_transform_t<f_t>{});
+  RAFT_CUDA_TRY(cudaPeekAtLastError());

Apply the same pattern after each Thrust launch.

Affected locations: 146–150, 207–225, 320–321, 325–336, 331–336, 352–353, 374–387, 396–402, 462–468, 470–475, 476–481.

Per coding guidelines: "Every CUDA kernel launch and memory operation must have error checking with CUDA_CHECK or equivalent verification" and "Verify error propagation from CUDA to user-facing APIs is complete."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
thrust::transform(handle_ptr->get_thrust_policy(),
row_bounds_begin,
row_bounds_end,
row_min_nonzero_bounds.begin(),
min_nonzero_bounds_abs_transform_t<f_t>{});
thrust::transform(handle_ptr->get_thrust_policy(),
row_bounds_begin,
row_bounds_end,
row_min_nonzero_bounds.begin(),
min_nonzero_bounds_abs_transform_t<f_t>{});
RAFT_CUDA_TRY(cudaPeekAtLastError());
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/mip_scaling_strategy.cu` around lines 146 - 150,
Thrust kernel launches in mip_scaling_strategy.cu (e.g., the thrust::transform
that uses handle_ptr->get_thrust_policy() and
min_nonzero_bounds_abs_transform_t<f_t>{}) are missing explicit CUDA error
checks; after each Thrust call (all thrust::transform / thrust::copy /
thrust::fill invocations at the reported locations) add a CUDA
device-synchronization error check using the same pattern as CUB calls (e.g.,
wrap a cudaDeviceSynchronize() or thrust::detail::backend::cuda::sync call
followed by RAFT_CUDA_TRY or
CUDA_CHECK(cudaGetLastError())/CUDA_CHECK(cudaDeviceSynchronize()) ) so any
launch or runtime error is caught and propagated—apply this change to each
affected symbol/site in the file (the transform using
min_nonzero_bounds_abs_transform_t, other thrust::transform/copy/fill
invocations around lines reported) to ensure consistent error handling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants