Conversation
…o modify to make reproducible
…hysics particles as fission progeny, causing failure of scan-sort as there were missing fission progeny)
Resolved conflicts in 5 files: - include/openmc/particle_data.h: kept local_secondary_bank rename, restored n_secondaries/secondary_bank_index accessors needed by new ParticleProductionFilter from develop - src/eigenvalue.cpp: trivial formatting conflicts - src/particle.cpp: kept refactored event_revive_from_secondary(SourceSite&) - src/physics.cpp, src/physics_mg.cpp: kept local_secondary_bank usage, adapted new secondary_bank_index tracking from develop Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove debug validation functions (debug_validate_local_bank_ordering, debug_validate_global_bank_ordering) and their call sites - Remove TODO-marked debug sanity check in sort_bank() - Remove commented-out sort and parent_id debugging code - Remove unused global_secondary_bank variable - Remove commented-out current_work assignment in from_source() - Fix simulation_particles_completed not being reset between batches - Add event-based mode guard for shared secondary bank - Update weightwindows and particle_restart_fixed test baselines - Remove accidentally tracked helper scripts and test artifacts from merge Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ed secondary Bug 9: When a secondary particle failed during event_revive_from_secondary (e.g., exhaustive_find_cell failure), the alive() check skipped transport_history_based_single_particle, so event_death() was never called. This meant global keff tallies, pulse-height scoring, track finalization, and progeny recording were all skipped. Fix by calling transport_history_based_single_particle unconditionally — if the particle is dead, the while loop doesn't iterate and event_death is still called, matching the event-based path behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bug 10: Both CE and MG create_fission_sites did not set site.n_split, leaving it at the struct default of 0. This reset fission-born secondaries' split budget, allowing excessive particle population growth when weight windows are active with fissionable material. Propagate the parent's n_split count, matching create_secondary() and split(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bug 11: The inner loop variable 'site' in Phase 2 of transport_history_based_shared_secondary shadowed the outer 'site' reference, which would trigger -Wshadow warnings. Rename inner loop variables to 'secondary_site'. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bug 12: n_split was set from site.n_split after from_source(&site) already copied it (from_source line 187: n_split() = src->n_split). Remove the redundant assignment. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Covers the interaction between weight window splitting, photon transport, pulse-height tallies, and shared secondary bank mode. This combination exercises the Bug 8 fix (pulse-height energy subtraction in create_secondary) alongside weight window splits (which correctly skip the subtraction since split particles are clones, not new secondaries). Parametrized with local/shared subdirectories. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- S1: Add {0} default to current_work_ to prevent uninitialized reads
- S3: Add {0} defaults to SourceSite::parent_id and progeny_id
- S4: Guard n_tracks()++ in event_revive_from_secondary for shared mode
since the counter is never consumed in shared secondary transport
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ondary The survival_biasing test was silently running in shared secondary mode due to auto-enable when weight windows are active. Parametrize with explicit shared_secondary_bank setting to test both paths, restoring the original pre-shared-secondary baseline for the local variant. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces a “shared secondary bank” transport mode aimed at making fixed-source weight-window runs more reproducible and scalable (sorting and MPI load-balancing secondary tracks), along with new/updated regression coverage for shared-vs-local behavior.
Changes:
- Add shared-secondary-bank transport paths for both history-based and event-based kernels, including new sorting/load-balancing logic for secondary generations.
- Extend
SourceSitewith additional metadata (e.g., born weights / split count) and plumb through MPI + Python bindings. - Update/add regression tests to run in both “local” and “shared” modes and add new shared-secondary-focused test cases.
Reviewed changes
Copilot reviewed 56 out of 62 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/regression_tests/weightwindows/test.py | Parametrize WW regression to run in local/shared subdirs; make WW input file loading path-stable. |
| tests/regression_tests/weightwindows/survival_biasing/test.py | Same local/shared parametrization + path-stable WW file loading; update source API usage. |
| tests/regression_tests/weightwindows/survival_biasing/shared/results_true.dat | Add golden results for shared mode. |
| tests/regression_tests/weightwindows/survival_biasing/shared/inputs_true.dat | Add golden inputs for shared mode (includes shared_secondary_bank). |
| tests/regression_tests/weightwindows/survival_biasing/local/results_true.dat | Add golden results for local mode. |
| tests/regression_tests/weightwindows/survival_biasing/local/inputs_true.dat | Add golden inputs for local mode. |
| tests/regression_tests/weightwindows/shared/results_true.dat | Add golden results for shared mode. |
| tests/regression_tests/weightwindows/shared/inputs_true.dat | Add golden inputs for shared mode (includes shared_secondary_bank). |
| tests/regression_tests/weightwindows/results_true.dat | Remove single golden result in favor of per-subdir results. |
| tests/regression_tests/weightwindows/local/results_true.dat | Add golden results for local mode. |
| tests/regression_tests/weightwindows/local/inputs_true.dat | Add golden inputs for local mode. |
| tests/regression_tests/weightwindows_pulse_height/test.py | New regression test combining WW + pulse-height tally under local/shared subdirs. |
| tests/regression_tests/weightwindows_pulse_height/shared/results_true.dat | Golden results for shared subdir. |
| tests/regression_tests/weightwindows_pulse_height/shared/inputs_true.dat | Golden inputs for shared subdir. |
| tests/regression_tests/weightwindows_pulse_height/local/results_true.dat | Golden results for local subdir. |
| tests/regression_tests/weightwindows_pulse_height/local/inputs_true.dat | Golden inputs for local subdir. |
| tests/regression_tests/weightwindows_pulse_height/init.py | Package marker for new regression test directory. |
| tests/regression_tests/pulse_height/test.py | Parametrize existing pulse-height regression for local/shared subdirs. |
| tests/regression_tests/pulse_height/shared/results_true.dat | Golden results for shared subdir. |
| tests/regression_tests/pulse_height/shared/inputs_true.dat | Golden inputs for shared subdir. |
| tests/regression_tests/pulse_height/local/results_true.dat | Golden results for local subdir. |
| tests/regression_tests/pulse_height/local/inputs_true.dat | Golden inputs for local subdir. |
| tests/regression_tests/particle_restart_fixed_shared_secondary/test.py | New particle-restart regression targeting fixed-source + shared secondary bank. |
| tests/regression_tests/particle_restart_fixed_shared_secondary/settings.xml | Settings enabling fixed-source + shared secondary bank. |
| tests/regression_tests/particle_restart_fixed_shared_secondary/results_true.dat | Golden restart-output text for the new restart regression. |
| tests/regression_tests/particle_restart_fixed_shared_secondary/materials.xml | Materials for new restart regression. |
| tests/regression_tests/particle_restart_fixed_shared_secondary/geometry.xml | Geometry for new restart regression. |
| tests/regression_tests/particle_restart_fixed_shared_secondary/init.py | Package marker for new restart regression directory. |
| tests/regression_tests/particle_production_fission/test.py | New regression verifying ParticleProductionFilter behavior under local/shared modes. |
| tests/regression_tests/particle_production_fission/shared/results_true.dat | Golden results for shared mode. |
| tests/regression_tests/particle_production_fission/shared/inputs_true.dat | Golden inputs for shared mode. |
| tests/regression_tests/particle_production_fission/local/results_true.dat | Golden results for local mode. |
| tests/regression_tests/particle_production_fission/local/inputs_true.dat | Golden inputs for local mode. |
| tests/regression_tests/particle_production_fission/init.py | Package marker for new particle-production regression directory. |
| tests/regression_tests/mg_fixed_source_ww_fission_shared_secondary/test.py | New MG fixed-source WW regression with fission neutrons + shared secondary bank. |
| tests/regression_tests/mg_fixed_source_ww_fission_shared_secondary/results_true.dat | Golden results for new MG shared-secondary regression. |
| tests/regression_tests/mg_fixed_source_ww_fission_shared_secondary/inputs_true.dat | Golden inputs for new MG shared-secondary regression. |
| tests/regression_tests/mg_fixed_source_ww_fission_shared_secondary/init.py | Package marker for new MG regression directory. |
| src/tallies/tally_scoring.cpp | Adjust IFP source-bank indexing to align with new 0-based current_work(). |
| src/tallies/filter_particle_production.cpp | ParticleProductionFilter now reads from the local-secondary bank accessor. |
| src/simulation.cpp | Add shared-secondary transport algorithms; refactor seeding/work partitioning; 0-based current_work() semantics. |
| src/settings.cpp | Parse <shared_secondary_bank> and auto-enable it for fixed-source WW runs when not explicitly set. |
| src/physics.cpp | Route secondaries into local-secondary bank and carry extra metadata needed for shared-secondary transport. |
| src/physics_mg.cpp | Same as CE physics: local-secondary bank + metadata for shared-secondary transport. |
| src/particle.cpp | Refactor secondary creation/revival for shared-secondary mode; track-counting changes. |
| src/particle_restart.cpp | Update RNG seeding logic and shared-secondary bookkeeping for restart runs. |
| src/output.cpp | Print “Track Rate” when weight windows are enabled. |
| src/initialize.cpp | Add compatibility guard: disable shared-secondary bank when pulse-height tallies are present. |
| src/ifp.cpp | Adjust IFP source-bank indexing to align with new 0-based current_work(). |
| src/finalize.cpp | Reset shared-secondary-related settings/state on finalize/reset. |
| src/event.cpp | Factor event-kernel loop into helper and add init path for shared-secondary event transport. |
| src/bank.cpp | Add shared secondary banks + sorting + MPI redistribution helper for secondary generations. |
| openmc/settings.py | Add Python-side Settings.shared_secondary_bank with XML read/write support. |
| openmc/lib/core.py | Extend C-API _SourceSite struct mapping to include added SourceSite fields. |
| include/openmc/simulation.h | Update simulation API for new work partitioning, seeding helpers, and shared-secondary transport entry points. |
| include/openmc/shared_array.h | Change SharedArray constructor semantics + add thread_unsafe_append. |
| include/openmc/settings.h | Add use_shared_secondary_bank setting. |
| include/openmc/particle.h | Update particle API for revised secondary revival flow. |
| include/openmc/particle_data.h | Extend SourceSite; rename secondary bank storage to local-secondary bank; add track counter. |
| include/openmc/event.h | Declare shared-secondary event init + common transport loop helper. |
| include/openmc/bank.h | Declare shared-secondary banks, generalized sort function, and MPI redistribution helper. |
| docs/source/io_formats/settings.rst | Document <shared_secondary_bank> settings XML element. |
You can also share your feedback on Copilot code review. Take the survey.
| double speed_tracks = | ||
| simulation::simulation_tracks_completed / time_active.elapsed(); | ||
| fmt::print( | ||
| " {:<33} = {:.6} tracks/second\n", "Track Rate (active)", speed_tracks); |
There was a problem hiding this comment.
simulation::simulation_tracks_completed is used here to compute a "Track Rate", but in non-shared-secondary mode the counter is only incremented locally (per MPI rank) while print_runtime() runs on the master rank. This will under-report track rate in MPI runs and is also inconsistent with shared-secondary mode where the counter is updated using global totals. Consider either reducing/summing the counter across ranks before printing, or clearly separating per-rank vs global track counters.
| " {:<33} = {:.6} tracks/second\n", "Track Rate (active)", speed_tracks); | |
| " {:<33} = {:.6} tracks/second (local to this rank)\n", | |
| "Track Rate (active, local)", speed_tracks); |
- Fix missing final track state write in event_death() when the secondary bank is empty (lost during event_revive refactor) - Parametrize test_weightwindows with shared_secondary [False, True] - Pin test_photon_heating to local mode to work around Compton relaxation negative heating bug (fix in separate PR) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Description
This PR introduces a shared secondary bank mode, so as to provide better load balancing across MPI ranks and OpenMP threads when weight windows are in use and thus mitigate the "long history" problem with weight windows. The long history problem naturally arises as weight windows can cause certain rare yet high importance particles to split heavily. In practice, this can often mean many threads are sitting idle waiting for a single thread to finish a very long history (potentially having millions of secondary particles stemming from a single source particle).
When enabled, the shared secondary bank allows for MPI ranks and OpenMP threads to load balance secondary particles. To accomplish this, the transport loop is run following alternative logic.
Current Behavior (Non-Shared Secondary Bank)
Under normal operation with the traditional secondary bank, transport runs in a "depth first" mode, where each particle appends its secondaries to a bank that the particle itself owns. When that particle finishes, it then loads a secondary from its local bank and processes it, continuing this loop until all secondaries for that particle are finished before potentially sourcing another. This means that no coordination needs to happen between threads.
One could in theory create a greedy algorithm where threads that have run out of work can steal from the banks of other threads. This isn't too hard to implement, but has the downside of breaking reproducibility as the stolen particles no longer are processed in the original PRNG stream (which is particle-owned).
New Mode: Shared Secondary Bank
The shared secondary bank is enabled automatically whenever weight windows are in use or when
settings.shared_secondary_bankis set. When enabled, an alternative "breadth first" transport loop is used. In spirit, this mode operates a very similar manner to eigenvalue mode, where fission particles are banked and simulated in the following batch. In this mode, the transport loop is broken into a series of "secondary generations" that execute sequentially within a normal OpenMC batch. Any secondary particles produced in that secondary generation are banked and then simulated in the following secondary generation. This strategy is already in use by the Celeritas MC code.Reproducibility is provided by essentially using the same strategy we use now for the fission bank, which is taken from:
The Brown paper developed a fast sorting algorithm that is used to sort the fission bank, and we do the same here. We perform a sorting of the particles between each secondary generation by which particle produced them and what progeny number of their parent particle they were. In this manner, we get a consistent ordering of secondaries through the secondary generations, allowing for a consistent seeding scheme.
Once sorted, particles are load balanced between MPI ranks. Shared memory load balancing is not a problem in this mode, as there is no "long history" problem within a secondary generation. While particles may produce different numbers of secondaries still, the secondaries aren't simulated until the next secondary generation. As such, all tasks are (relatively speaking) much more uniform in work cost, and the long history problem is eliminated. Typically, long histories are not cause by secondary chains that are a million generations deep -- they are caused by single particles wanting to split a million times over the course of just a few generations as it enters a rare high-importance pathway (e.g., a beam port through a bioshield with a detector on the other side).
The seeding scheme that is used for the shared secondary mode is fully reproducible. That said, as far as I can see, there is not a good way to have it match the same results as in the traditional non-shared mode. They are statistically similar of course, but will not produce bitwise the same answers. They are different PRNG handling schemes.
The shared secondary mode handles consistent particle seeding by assigning seeds based on how many "tracks" have been simulated so far in the overall simulation. This is done by simply incrementing a global counter at the start of each batch and secondary generation by the number of tracks run. Each "track" is simply a particle being simulated from birth -> death, exclusive of any secondaries. This is different to what we have historically dubbed a "particle" or "particle history" in OpenMC, which is a simulated particle birth -> death inclusive of all secondaries. Thus, # tracks >= # of particle histories.
Changes to stdout
When the shared secondary bank is enabled we now print more info to stdout that allows for the user to see the secondary generations progressing and to see if aggressive weight window splitting is causing huge numbers of secondaries to be born. Previously, a user just observed potentially extremely long run times and low particle/sec tracking rates when heavy splitting was underway, but didn't have an idea if this was caused by code inefficiency or large numbers of secondary particles. The new output for a weight window simulation with 500 particles/batch looks like:
Additionally, we also give another piece of timing data when the shared secondary bank is enabled: the number of "tracks/second". This allows one to better gauge performance of the code when there is significant splitting. This value should be much more stable across changes to weight window parameters compared to the particles/sec metric.
Implementation and Preservation of the Non-Shared Secondary Mode
The new shared secondary mode does not replace the original particle-owned secondary mode. As such, when not using weight windows users will not observe any changes in their existing results. The downside is that we have to maintain two different secondary handling pathways. The upsides though are:
The existing particle-owned bank is actually still used to store particles temporarily in the shared mode, meaning that logic didn't really need to change in the physics areas of the code. Secondaries are generated and stored the same way in both cases. The main difference is that these local banks are combined into a larger shared bank at the end of each particle's lifetime. This allows for a lot of logical processes (like particle production filter) to work as normal without modification.
In some cases where splitting is very uniform between particle histories it may be more optimal to just run without the shared bank mode. The shared bank does ensure optimal load balancing though at the cost of more buffering and some sorting operations. I have observed a few use cases where the original mode is faster, though often this is by a fairly small margin. Comparatively, on cases where long histories are present, the shared bank can offer a factor of speedup that asymptotically approaches the total number of processors in the simulation (e.g., 100x speedup on a single node is possible, or millions of times faster for large distributed jobs).
Performance Evaluation
I tested the JET model using random ray generated weight windows. I did a single node CPU test case (8 MPI ranks x 24 core (48 thread) per rank, for 192 cores (384 threads) total) 304 particles/sec with the traditional mode and 2,376 particles/sec with the shared secondary mode. This is a 7.8x speedup, and the weight window parameters performed very well for the problem and resulted in good uncertainties. On problems like JETSON 2D with some parameters I have observed up to a 53x speedup, though for other sets of weight windows on that problem I have also observed a ~2x slow down.
Generally it seems like the weight windows are a life saver in a lot of cases and should allow for performance to be maintained across a much wider variety of weight window parameters, as the long history problem is greatly mitigated. This should reduce the need for weight window tuning significantly and provide better out-of-the-box weight window effectiveness. In some cases it may be worthwhile to disable the shared mode to see if better performance isn't possible, but I would recommend you do most of your weight window tuning with the default shared secondary mode on as they are much more flexible and performance doesn't drop off a cliff in the same manner that it does without the shared bank.
Testing
I added a number of additional tests. In some cases I simply added a
@parametrizeto test out the new mode when it made sense. I originally only tested it for weight window use cases, but I discovered there were some pretty significant potential error modes with the new logic and as such developed a few different tests to cover those cases (e.g., what happens when fission neutrons are added to the shared bank).The pulse height tests do have two different cases for shared/regular secondaries, even though the shared mode is disabled. I thought I'd leave these tests in so they will be ready to go when that PR comes along.
Limitations & Future Work
One cop out is that I disable the shared secondary mode (and give a warning) when pule height tallies are in use. These add some complexity that doesn't naturally map to the shared secondary mode context. However, there is a path forward here, but I'll leave that capability for another PR. I've added a TODO in the code at the sentinel point. For now, people using the pulse height tallies will simply be getting the performance they currently get in the code right now, so no harm done at least.
Checklist