Skip to content

fix: add missing @:DEALLOCATE in simulation modules#1476

Merged
sbryngelson merged 16 commits into
MFlowCode:masterfrom
SVS87:fix/deallocate-missing-arrays
Jun 25, 2026
Merged

fix: add missing @:DEALLOCATE in simulation modules#1476
sbryngelson merged 16 commits into
MFlowCode:masterfrom
SVS87:fix/deallocate-missing-arrays

Conversation

@SVS87

@SVS87 SVS87 commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Description

Part of #1459

Add matching @:DEALLOCATE calls in:

  • m_derived_variables
  • m_global_parameters
  • m_hyperelastic
  • m_rhs
  • m_riemann_solvers
  • m_time_steppers

Remaining files not yet addressed:

  • src/common/m_boundary_common.fpp, m_helper.fpp, m_model.fpp, m_mpi_common.fpp, m_variables_conversion.fpp
  • src/post_process/m_start_up.fpp
  • src/simulation/m_acoustic_src.fpp, m_bubbles_EE.fpp, m_ibm.fpp (pending PR Local Aware IBM #1378), m_qbmm.fpp

Type of change

  • Bug fix
  • New feature
  • Refactor
  • Documentation
  • Other: describe

Testing

Ran ./mfc.sh test -j 4 locally on Ubuntu/WSL with GNU compiler and MPI.
578 tests passed, 0 failed. Spelling and lint checks passed.

Checklist

  • I added or updated tests for new behavior
  • I updated documentation if user-facing behavior changed

See the developer guide for full coding standards.

GPU changes (expand if you modified src/simulation/)
  • GPU results match CPU results
  • Tested on NVIDIA GPU or AMD GPU

AI code reviews

Reviews are not triggered automatically. To request a review, comment on the PR:

  • @coderabbitai review — incremental review (new changes only)
  • @coderabbitai full review — full review from scratch
  • /review — Qodo review
  • /improve — Qodo code suggestions
  • @claude full review — Claude full review (also triggers on PR open/reopen/ready)
  • Add label claude-full-review — Claude full review via label

@SVS87 SVS87 requested a review from sbryngelson as a code owner June 1, 2026 07:31
@codecov

codecov Bot commented Jun 1, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 25.00000% with 30 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.45%. Comparing base (05eda2d) to head (c4414f3).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
src/simulation/m_derived_variables.fpp 0.00% 7 Missing and 3 partials ⚠️
src/common/m_boundary_common.fpp 0.00% 7 Missing ⚠️
src/simulation/m_rhs.fpp 46.15% 5 Missing and 2 partials ⚠️
src/simulation/m_time_steppers.fpp 40.00% 1 Missing and 2 partials ⚠️
src/simulation/m_ibm.fpp 33.33% 1 Missing and 1 partial ⚠️
src/simulation/m_hyperelastic.fpp 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1476      +/-   ##
==========================================
- Coverage   60.48%   60.45%   -0.04%     
==========================================
  Files          83       83              
  Lines       19888    19907      +19     
  Branches     2951     2954       +3     
==========================================
+ Hits        12030    12035       +5     
- Misses       5864     5871       +7     
- Partials     1994     2001       +7     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sbryngelson

Copy link
Copy Markdown
Member

Code Review

Reviewed the diff by checking each newly-added @:DEALLOCATE against its allocation site (guard conditions, allocation rank, and whether it was allocated with @:ALLOCATE).

Overview

This PR (part of #1459) replaces a few bare deallocate calls with the @:DEALLOCATE macro and adds matching deallocations for arrays that were previously leaked at finalization across six simulation modules. The motivation is sound: @:DEALLOCATE pairs with @:ALLOCATE so that device (OpenACC/OpenMP) mappings are released alongside the host array, whereas a bare deallocate leaves the device copy mapped.

Correctness — verified symmetric ✅

For these, the new deallocation matches the allocation guard and structure exactly:

  • m_derived_variables.fppaccel_mag, x_accel, fd_coeff_{x,y,z} are all allocated under if (probe_wrt) (with the n>0/p>0 nesting). Moving the fd_coeff deallocation inside the probe_wrt block and switching from if (allocated(...)) to the n>0/p>0 guards is correct and behavior-preserving, and it now also releases the device copies (these are in GPU_DECLARE(create=...)). Good cleanup.
  • m_global_parameters.fppptil: alloc and dealloc both under if (bubbles_euler). ✅
  • m_hyperelastic.fppGs_hyper(1:num_fluids): allocated unconditionally in init, deallocated unconditionally in finalize. ✅
  • m_riemann_solvers.fppGs_rs unconditional both sides; Res_gs under if (viscous) both sides. ✅
  • m_time_steppers.fpp — checked each: q_T_sf%sf (chemistry), pb_ts(1:2)/pb_ts(1,2)%sf/rhs_pb and the mv_ts set (unconditional — one of the three alloc branches always runs), max_dt (cfl_dt), bc_type (loop i=1,num_dims matches the n>0/p>0 allocation since num_dims tracks dimensionality), rk_coef (any(time_stepper==[1,2,3])). All symmetric. ✅
  • m_rhs.fppalf_sum%sf (mpp_lim .and. bubbles_euler); the qbmm block (mom_3d(0:2,0:2,1:nb)%sf, mom_sp(1:nmomsp)%sf, then containers) matches the allocation exactly; blkmod1 (alt_soundspeed). The replacement of the explicit $:GPU_EXIT_DATA(delete='[alf_sum%sf]') + deallocate with @:DEALLOCATE(alf_sum%sf) is equivalent and cleaner. ✅

Issue — incomplete deallocation of qL_prim / qR_prim (m_rhs.fpp) ⚠️

This is the one spot I'd flag. The allocation (≈ lines 264–284) builds a nested structure:

@:ALLOCATE(qL_prim(1:num_dims))
do i = 1, num_dims
    @:ALLOCATE(qL_prim(i)%vf(1:sys_size))
    do l = eqn_idx%mom%beg, eqn_idx%mom%end
        @:ALLOCATE(qL_prim(i)%vf(l)%sf(...))   ! GPU create + @:ACC_SETUP_VFs
    end do
    @:ACC_SETUP_VFs(qL_prim(i), qR_prim(i))
end do

but the new finalization only does:

@:DEALLOCATE(qL_prim)
@:DEALLOCATE(qR_prim)

The inner %vf(l)%sf arrays were created on the device via @:ALLOCATE/@:ACC_SETUP_VFs. @:DEALLOCATE(qL_prim) only issues an exit-data delete for the top-level container, so those inner device mappings are not released — a device-memory leak. On the host side Fortran recursively frees the allocatable components, so there's no host leak, but the GPU side is left incomplete. This is exactly the case the PR is trying to fix elsewhere.

The flux_n/flux_src_n/flux_gsrc_n cleanup immediately above is the template to follow — it walks the structure and deallocates each %vf(l)%sf and %vf before the container. Suggest mirroring that, e.g.:

do i = 1, num_dims
    do l = eqn_idx%mom%beg, eqn_idx%mom%end
        @:DEALLOCATE(qL_prim(i)%vf(l)%sf)
        @:DEALLOCATE(qR_prim(i)%vf(l)%sf)
    end do
    @:DEALLOCATE(qL_prim(i)%vf, qR_prim(i)%vf)
end do
@:DEALLOCATE(qL_prim, qR_prim)

(Confirm whether any of those %sf are pointer-associated rather than owned before deallocating — flux_src_n above nullifys some entries, so the same care may apply here.)

Testing note

The reported "578 tests passed" on a GNU+MPI CPU build is good for confirming no host-side regression, but it won't exercise the device-side exit-data paths that are the whole point of switching to @:DEALLOCATE. Since these all occur at module finalization (program teardown), a leak won't surface as a test failure even on GPU. A GPU build/run (and ideally a memory check) would give more confidence that the device mappings are actually balanced.

Summary

Solid, well-scoped refactor; 5 of 6 files are clean and symmetric. The only substantive item is the incomplete qL_prim/qR_prim teardown in m_rhs.fpp, which leaves device sub-arrays mapped — worth completing to match the allocation and the adjacent flux_n pattern.

SVS87 added 3 commits June 15, 2026 00:28
Add matching @:DEALLOCATE calls in:
- m_derived_variables
- m_global_parameters
- m_hyperelastic
- m_rhs
- m_riemann_solvers
- m_time_steppers

Remaining files not yet addressed:
- src/common/m_boundary_common.fpp, m_helper.fpp, m_model.fpp,
  m_mpi_common.fpp, m_variables_conversion.fpp
- src/post_process/m_start_up.fpp
- src/simulation/m_acoustic_src.fpp, m_bubbles_EE.fpp,
  m_ibm.fpp (pending PR MFlowCode#1378), m_qbmm.fpp

Part of MFlowCode#1459
- m_rhs: fix incomplete qL_prim/qR_prim teardown by walking
  nested %vf/%sf structure before deallocating containers
- m_boundary_common: replace plain deallocate with @:DEALLOCATE
- m_mpi_common: replace plain deallocate with @:DEALLOCATE,
  mirror unified memory path in finalization
- m_variables_conversion: add missing @:DEALLOCATE for Res_vc
- m_start_up (post_process): annotate FFT arrays as program-lifetime
- m_helper, m_model, m_acoustic_src, m_bubbles_EE, m_qbmm:
  annotate allocations as program-lifetime (no finalize subroutine)

Part of MFlowCode#1459
- m_ibm: add @:DEALLOCATE for ghost_points, simplify models
  deallocation, replace plain deallocate for recv_* arrays
- lint_source.py: add check_allocate_deallocate_pairing() to
  flag @:ALLOCATE without matching @:DEALLOCATE anywhere in src/
  Known program-lifetime exemptions are listed explicitly

Closes MFlowCode#1459
@SVS87 SVS87 force-pushed the fix/deallocate-missing-arrays branch from 31573f7 to 9c819d4 Compare June 15, 2026 05:02
@SVS87

SVS87 commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

Added check_allocate_deallocate_pairing() to lint_source.py to catch any future @:ALLOCATE without a matching @:DEALLOCATE. Currently using a PROGRAM_LIFETIME_EXEMPTIONS set to suppress known pre-existing violations in modules without a finalize subroutine. Open to suggestions on a better design — e.g. an inline comment tag like ! @:PROGRAM_LIFETIME near the allocation — if the hardcoded list is too fragile long-term.

@SVS87 SVS87 force-pushed the fix/deallocate-missing-arrays branch from 81de5d4 to 3b3a25e Compare June 16, 2026 01:47
…e pairing

Replace global exemption list with per-file initialize/finalize
subroutine pairing. Only checks modules with both s_initialize_*
and s_finalize_* subroutines. Known pre-existing missing
deallocations are listed separately.
@SVS87 SVS87 force-pushed the fix/deallocate-missing-arrays branch from 14a5eb8 to 5b4f846 Compare June 18, 2026 20:59
@SVS87 SVS87 marked this pull request as ready for review June 19, 2026 06:50
SVS87 added 2 commits June 20, 2026 01:04
recv_forces_snap, recv_torques_snap, recv_ids, and recv_ft are allocated
with plain allocate and only touched on the GPU via transient copyin/copy
clauses, so they never enter the persistent present table. Using
@:DEALLOCATE made the OpenACC runtime attempt an exit-data delete on
arrays not in the present table, causing find_in_present_table failures
on Frontier. Reverted to plain deallocate to match their allocation.
@sbryngelson sbryngelson marked this pull request as draft June 20, 2026 05:59
@SVS87 SVS87 marked this pull request as ready for review June 20, 2026 20:20
@sbryngelson sbryngelson marked this pull request as draft June 22, 2026 15:27
@sbryngelson sbryngelson marked this pull request as ready for review June 24, 2026 18:32
Copilot AI review requested due to automatic review settings June 24, 2026 18:32

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot couldn't run its full agentic review because no GitHub Actions runner was available. Make sure your repository has a runner available to run Copilot's review, or add a copilot-setup-steps.yml file specifying one with the runs-on attribute. See the docs for more details.

This PR aims to prevent memory leaks by ensuring @:ALLOCATE calls in module initialization paths have matching @:DEALLOCATE calls in finalize paths, and adds a linter rule to detect missing pairings.

Changes:

  • Added a new linter check to flag @:ALLOCATE in s_initialize_* without corresponding @:DEALLOCATE in s_finalize_*.
  • Added/standardized @:DEALLOCATE usage across multiple Fortran modules (simulation/common/post_process) and replaced some raw deallocate / GPU directives with the macro.
  • Added “program-lifetime allocation” notes in modules that intentionally do not implement s_finalize_*.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
toolchain/mfc/lint_source.py Adds linter rule to enforce ALLOCATE/DEALLOCATE pairing across init/finalize subroutines
src/simulation/m_time_steppers.fpp Adds missing deallocations in module finalize routine
src/simulation/m_rhs.fpp Replaces manual deallocation with macro and adds missing cleanup for allocated arrays
src/simulation/m_qbmm.fpp Documents program-lifetime allocations where no finalize exists
src/simulation/m_ibm.fpp Simplifies deallocation logic and adds missing deallocation for ghost_points
src/simulation/m_hyperelastic.fpp Adds missing deallocation for Gs_hyper
src/simulation/m_global_parameters.fpp Adds missing deallocation for ptil under bubbles_euler
src/simulation/m_derived_variables.fpp Switches to macro deallocation; changes where FD coefficient arrays are deallocated
src/simulation/m_bubbles_EE.fpp Documents program-lifetime allocations where no finalize exists
src/simulation/m_acoustic_src.fpp Documents program-lifetime allocations where no finalize exists
src/post_process/m_start_up.fpp Adds note about program-lifetime allocations (but currently shows duplicated allocation line)
src/common/m_variables_conversion.fpp Adds missing Res_vc deallocation in viscous path
src/common/m_mpi_common.fpp Uses @:DEALLOCATE for MPI buffers when not in NV unified memory path
src/common/m_model.fpp Documents program-lifetime GPU allocations where no finalize exists
src/common/m_helper.fpp Documents program-lifetime allocations where no finalize exists
src/common/m_boundary_common.fpp Replaces raw deallocate calls with @:DEALLOCATE

Comment thread src/post_process/m_start_up.fpp Outdated
Comment on lines 881 to 882
@:ALLOCATE(data_in(Nx*Nyloc*Nzloc))
@:ALLOCATE(data_in(Nx*Nyloc*Nzloc))
Comment thread src/simulation/m_derived_variables.fpp Outdated
Comment on lines 511 to 527
if (probe_wrt) then
deallocate (accel_mag, x_accel)
@:DEALLOCATE(accel_mag)
@:DEALLOCATE(x_accel)
if (n > 0) then
deallocate (y_accel)
@:DEALLOCATE(y_accel)
if (p > 0) then
deallocate (z_accel)
@:DEALLOCATE(z_accel)
end if
end if
@:DEALLOCATE(fd_coeff_x)
if (n > 0) then
@:DEALLOCATE(fd_coeff_y)
end if
if (p > 0) then
@:DEALLOCATE(fd_coeff_z)
end if
end if
Comment thread toolchain/mfc/lint_source.py Outdated
Comment on lines +325 to +326
allocate_re = re.compile(r"@:ALLOCATE\((\w[\w%]*)")
deallocate_re = re.compile(r"@:DEALLOCATE\((\w[\w%]*)")
Comment thread toolchain/mfc/lint_source.py Outdated
Comment on lines +423 to +425
for name, lineno in allocated.items():
if name not in deallocated and name not in KNOWN_MISSING:
errors.append(f" {rel}:{lineno} @:ALLOCATE({name}...) in s_initialize_* has no matching " f"@:DEALLOCATE in s_finalize_*. Fix: add @:DEALLOCATE in the finalize subroutine")
Comment thread toolchain/mfc/lint_source.py Outdated
all_errors.extend(check_fypp_list_duplicates(repo_root))
all_errors.extend(check_duplicate_lines(repo_root))
all_errors.extend(check_hardcoded_byte_size(repo_root))
all_errors.extend(check_allocate_deallocate_pairing(repo_root))
@github-actions

Copy link
Copy Markdown

Claude Code Review

Head SHA: 8308a00

Files changed:

  • 9
  • src/common/m_boundary_common.fpp
  • src/common/m_mpi_common.fpp
  • src/common/m_variables_conversion.fpp
  • src/simulation/m_derived_variables.fpp
  • src/simulation/m_global_parameters.fpp
  • src/simulation/m_hyperelastic.fpp
  • src/simulation/m_ibm.fpp
  • src/simulation/m_rhs.fpp
  • src/simulation/m_time_steppers.fpp

Findings

Memory leak in m_derived_variables.fpp when ib=.true., probe_wrt=.false.

fd_coeff_x/y/z (and accel_mag, x_accel, y_accel, z_accel) are allocated under probe_wrt .or. ib in s_initialize_derived_variables_module:

if (probe_wrt .or. ib) then
    @:ALLOCATE(fd_coeff_x(-fd_number:fd_number, 0:m))
    ...
    @:ALLOCATE(accel_mag(0:m, 0:n, 0:p))
    @:ALLOCATE(x_accel(0:m, 0:n, 0:p))
    ...
end if

The old finalization correctly handled both allocation paths: fd_coeff_* were freed via if (allocated(fd_coeff_x)) deallocate (fd_coeff_x) outside any probe_wrt guard. The PR replaces this with @:DEALLOCATE calls all moved inside if (probe_wrt). When ib=.true., probe_wrt=.false., the entire deallocation block is skipped, leaking fd_coeff_x/y/z, accel_mag, x_accel, y_accel, and z_accel.

Fix: gate the fd_coeff deallocation on probe_wrt .or. ib to match the allocation guard, and do the same for the accel arrays.

Raw deallocate left in m_mpi_common.fpp unified-memory path

The #else branch of #ifndef __NVCOMPILER_GPU_UNIFIED_MEM introduced by this PR uses a bare deallocate:

$:GPU_EXIT_DATA(delete='[buff_send, buff_recv]')
deallocate (buff_send, buff_recv)

Every other conversion in this PR replaces raw deallocate with @:DEALLOCATE. The bare call here may be intentional (to avoid @:DEALLOCATE emitting its own GPU_EXIT_DATA after the manual one), but it is inconsistent with the project convention and is likely to be caught by lint_source.py. If the intent is to prevent double GPU_EXIT_DATA, this should be documented with a comment explaining why @:DEALLOCATE cannot be used here.

@sbryngelson sbryngelson merged commit 839a4b6 into MFlowCode:master Jun 25, 2026
82 checks passed
sbryngelson added a commit that referenced this pull request Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants