Skip to content
18 changes: 8 additions & 10 deletions .claude/rules/common-pitfalls.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@
- Grid dimensions: `m`, `n`, `p` (cells in x, y, z). 1D: n=p=0. 2D: p=0.
- Interior domain: `0:m`, `0:n`, `0:p`
- Buffer/ghost region: `-buff_size:m+buff_size` (similar for n, p in multi-D)
- `buff_size` depends on WENO order and features (typically `2*weno_polyn + 2`)
- `buff_size` is **not** a single formula: it's set per reconstruction scheme (WENO/MUSCL/IGR) in
`s_configure_coordinate_bounds` (`m_helper_basic.fpp`) and floored higher for Lagrange bubbles and IB.
Read that routine for the current value rather than assuming one.
- Domain bounds: `idwint(1:3)` (interior `0:m`), `idwbuff(1:3)` (with ghost cells)
- Cell-center coords: `x_cc(-buff_size:m+buff_size)`, `y_cc(...)`, `z_cc(...)`
- Cell-boundary coords: `x_cb(-1-buff_size:m+buff_size)`
Expand Down Expand Up @@ -38,8 +40,8 @@
- Boundary condition symmetry requirements must be maintained

## Compiler-Specific Issues
- CI-gated compilers (must always pass): gfortran, nvfortran, Cray ftn, and Intel ifx
- AMD flang is additionally supported for `--gpu mp` builds but not in the CI matrix
- See the compiler-backend matrix in `.claude/rules/gpu-and-mpi.md` for which compilers
are CI-gated and which backends each supports.
- Each compiler has different strictness levels and warning behavior
- Fypp macros must expand correctly for both GPU and CPU builds

Expand All @@ -54,12 +56,8 @@
- Do not regenerate ALL golden files unless you understand every output change

## PR Checklist
Before submitting a PR:
- [ ] `./mfc.sh format -j 8` (auto-format)
- [ ] `./mfc.sh precheck -j 8` (5 CI lint checks)
- [ ] `./mfc.sh build -j 8` (compiles)
- [ ] `./mfc.sh test --only <relevant> -j 8` (tests pass)
The base loop (format → precheck → build → test → one logical commit) is the
Development Workflow Contract in `CLAUDE.md`. Beyond it, watch for:
- [ ] If adding parameters: definitions.py (_r + _nv) updated; cmake reconfigured; case_validator.py if constraints
- [ ] If modifying `src/common/`: all three targets tested
- [ ] If changing output: golden files regenerated for affected tests
- [ ] One logical change per commit
- [ ] If changing output: golden files regenerated for affected tests (`./mfc.sh test --generate --only <tests>`)
12 changes: 3 additions & 9 deletions .claude/rules/fortran-conventions.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,7 @@ Every Fortran module follows this pattern:
- Finalization subroutine: `s_finalize_<feature>_module`

## Naming
- Modules: `m_<feature>`
- Public subroutines: `s_<verb>_<noun>`
- Public functions: `f_<verb>_<noun>`
- Private/local variables: no prefix required
- Constants: descriptive names, not ALL_CAPS
See "Naming Conventions" in `CLAUDE.md`.

## Forbidden Patterns

Expand Down Expand Up @@ -57,10 +53,8 @@ Enforced by convention/code review (not automated):
- Fortran-side runtime validation also exists in `m_checker*.fpp` files using `@:PROHIBIT`

## Precision Types
- `wp` (working precision): used for computation. Double by default.
- `stp` (storage precision): used for field data arrays and I/O. Double by default.
- In single-precision mode (`--single`): both become single.
- In mixed-precision mode (`--mixed`): wp=double, stp=half.
`wp`/`stp` are defined in `CLAUDE.md` (`wp` = computation, `stp` = field-data storage + I/O). Detail:
- Modes: default both double; `--single` → both single; `--mixed` → wp=double, stp=half.
- MPI type matching: `mpi_p` must match `wp`, `mpi_io_p` must match `stp`.
- Always use generic intrinsics: `sqrt` not `dsqrt`, `abs` not `dabs`.
- Cast with `real(..., wp)` or `real(..., stp)`, never `dble(...)`.
Expand Down
45 changes: 15 additions & 30 deletions .claude/rules/gpu-and-mpi.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,26 +23,17 @@ compiles to either OpenACC or OpenMP target offload depending on the build flag:

### Key GPU Macros (always use the `GPU_*` prefix)

Inline macros (use `$:` prefix):
- `$:GPU_PARALLEL_LOOP(collapse=N, private=[...], reduction=[...], reductionOp='+')` —
Parallel loop over GPU threads. Most common GPU macro.
- `$:END_GPU_PARALLEL_LOOP()` — Required closing for GPU_PARALLEL_LOOP.
- `$:GPU_LOOP(collapse=N, ...)` — Inner loop within a GPU parallel region.
- `$:GPU_ENTER_DATA(create=[...])` — Allocate device memory (unscoped).
- `$:GPU_EXIT_DATA(delete=[...])` — Free device memory.
- `$:GPU_UPDATE(host=[...])` — Copy device → host (before MPI send).
- `$:GPU_UPDATE(device=[...])` — Copy host → device (after MPI receive).
- `$:GPU_ROUTINE(parallelism='[seq]')` — Mark routine for device compilation.
- `$:GPU_DECLARE(create=[...])` — Declare device-resident data.
- `$:GPU_ATOMIC(atomic='update')` — Atomic operation on device.
- `$:GPU_WAIT()` — Synchronization barrier.

Block macros (use `#:call`/`#:endcall`):
- `GPU_PARALLEL(...)` — GPU parallel region (used for scalar reductions like `maxval`/`minval`).
- `GPU_DATA(copy=..., create=..., ...)` — Scoped data region.
- `GPU_HOST_DATA(use_device_addr=[...])` — Host code with device pointers.

Typical GPU loop pattern (used 750+ times in the codebase):
Full set with signatures in `parallel_macros.fpp`. The ones you reach for most:
- `$:GPU_PARALLEL_LOOP(collapse=N, private=[...], reduction=[...], reductionOp='+')`
+ `$:END_GPU_PARALLEL_LOOP()` — parallel spatial loop; by far the most common (see pattern below).
- `$:GPU_LOOP(collapse=N, ...)` — inner loop *within* a parallel region.
- `$:GPU_UPDATE(host=[...])` / `$:GPU_UPDATE(device=[...])` — device↔host copies (around MPI; see below).
- `#:call GPU_PARALLEL(...)` — block region for scalar reductions (`maxval`/`minval`).

Others in `parallel_macros.fpp`: `GPU_ENTER_DATA`/`GPU_EXIT_DATA`, `GPU_DECLARE`, `GPU_ROUTINE`,
`GPU_ATOMIC`, `GPU_WAIT`, and the block macros `GPU_DATA`, `GPU_HOST_DATA`.

Typical GPU loop pattern (the dominant spatial-loop idiom):
```
$:GPU_PARALLEL_LOOP(private='[i,j,k,l]', collapse=3)
do l = idwbuff(3)%beg, idwbuff(3)%end
Expand Down Expand Up @@ -108,16 +99,10 @@ Use `#ifdef` for feature, target, compiler, and library gating:
- `MFC_POST_PROCESS` — Only in post_process builds

### Compiler gating (for compiler-specific workarounds)
- `_CRAYFTN` — Cray Fortran compiler
- `__NVCOMPILER_GPU_UNIFIED_MEM` — NVIDIA unified memory (GH-200 / `--unified`)
- `__PGI` — Legacy PGI/NVIDIA compiler
- `__INTEL_COMPILER` — Intel compiler
- `FRONTIER_UNIFIED` — Frontier HPC unified memory

### Library-specific code
- FFTW (`m_fftw.fpp`) uses heavy `#ifdef` gating for `MFC_GPU` and `__PGI`
- CUDA Fortran (`cudafor` module) is gated behind `__NVCOMPILER_GPU_UNIFIED_MEM`
- SILO/HDF5 interfaces may have conditional paths
Compiler/feature macros: `_CRAYFTN`, `__NVCOMPILER_GPU_UNIFIED_MEM` (NVIDIA unified mem, GH-200 /
`--unified`), `__PGI` (legacy PGI/NVIDIA), `__INTEL_COMPILER`, `FRONTIER_UNIFIED`. Library code is
similarly gated (FFTW in `m_fftw.fpp` on `MFC_GPU`/`__PGI`; CUDA Fortran `cudafor` on
`__NVCOMPILER_GPU_UNIFIED_MEM`; SILO/HDF5 paths). Grep the relevant file for exact usage.

When adding new `#ifdef` blocks, always provide an `#else` or `#endif` path so
the code compiles in all configurations (CPU-only, GPU-ACC, GPU-OMP, with/without MPI).
Expand Down
24 changes: 18 additions & 6 deletions .claude/rules/parameter-system.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Parameter System

## Overview
MFC has ~3,400 simulation parameters defined in Python and read by Fortran via namelist files.
MFC's simulation parameters are defined in Python and read by Fortran via namelist files.

## Parameter Flow: Python → Fortran

Expand Down Expand Up @@ -37,7 +37,10 @@ at CMake configure time — no manual Fortran edits needed for simple scalar par
**Exceptions — still require manual Fortran edits:**
- Array variables (e.g. `logical, dimension(num_fluids_max)`) → declare in `src/*/m_global_parameters.fpp`
- Derived-type members (`fluid_pp%attr`, `patch_icpp(i)%attr`) → declare in the relevant derived type
- Case-optimization parameters → add to `CASE_OPT_PARAMS` and the `#:else` block in `src/simulation/m_global_parameters.fpp`
- Case-optimization parameters → add to `CASE_OPT_PARAMS` and the `#:else` block in `src/simulation/m_global_parameters.fpp`.
Gotcha: under `--case-optimization` these are baked into the binary and dropped from the simulation namelist
(`case_dicts.py` filters them), so changing one needs a *rebuild*, not just a case edit — and building without
the flag makes them read from `.inp` again.

## Case Files
- Case files are Python scripts (`.py`) that define a dict of parameters
Expand All @@ -47,15 +50,24 @@ at CMake configure time — no manual Fortran edits needed for simple scalar par
- Search parameters with `./mfc.sh params <query>`

## Fortran-Side Runtime Validation
Each target has `m_checker*.fpp` files (e.g., `src/simulation/m_checker.fpp`,
`src/common/m_checker_common.fpp`) containing runtime parameter validation using
`@:PROHIBIT(condition, message)`. When adding parameters with physics constraints,
add Fortran-side checks here in addition to `case_validator.py`.
Runtime parameter validation uses `@:PROHIBIT(condition, message)`. Put a check where it runs:
- **Shared across all three targets** → `src/common/m_checker_common.fpp` (`s_check_inputs_common`,
with `#ifndef MFC_*` gates for target-specific exclusions). This holds most checks.
- **Simulation-only** → `src/simulation/m_checker.fpp` (WENO/MUSCL/IGR/time-stepping/compiler checks).
- **Pre/post-only** → `src/{pre,post}_process/m_checker.fpp`. Note: their `s_check_inputs` are
currently empty — that's the right place for a pre/post-only constraint, not `m_checker_common.fpp`.

Add Fortran-side checks in addition to `case_validator.py`.

## Analytical Initial Conditions
String expressions in parameters become Fortran code via `case.py.__get_analytic_ic_fpp()`.
These are compiled into the binary, so syntax errors cause build failures, not runtime errors.

Gotcha: each IC variable (`alpha_rho`, `vel`, `pres`, `alpha`, `Y`, `Bx`...) maps to an `eqn_idx%…`
expression in `QPVF_IDX_VARS` (`case.py`). Adding a conserved variable that patches can set means
updating that map *and* the Fortran `eqn_idx` builder to agree — a mismatch is a silent wrong-index, not
an error. (This is also why `Bx`/`By`/`Bz` use `eqn_idx%B%end-1/%end`, to stay valid in 1D/2D.)

Available variables in analytical IC expressions:
- `x`, `y`, `z` — cell-center coordinates (mapped to `x_cc(i)`, `y_cc(j)`, `z_cc(k)`)
- `xc`, `yc`, `zc` — patch centroid coordinates
Expand Down
115 changes: 42 additions & 73 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,13 @@ toolchain for building/running/testing, and supports GPU acceleration via OpenAC
OpenMP target offload. It must compile with gfortran, nvfortran, Cray ftn, and Intel ifx (CI-gated).
AMD flang is additionally supported for OpenMP target offload GPU builds.

## Working Style

Make surgical changes: every changed line should trace to the request. Don't refactor,
reformat, or "improve" adjacent code — in a four-compiler, golden-file-gated codebase,
incidental edits are how regressions slip in. For general behavioral guidance (simplicity,
surfacing assumptions, verifiable success criteria), invoke the `karpathy-guidelines` skill.

## Commands

Prefer using `./mfc.sh` as the entry point for building, running, testing, formatting,
Expand All @@ -15,81 +22,41 @@ compilers directly unless you have a specific reason.

All commands run from the repo root via `./mfc.sh`.

```bash
# Building
./mfc.sh build -j 8 # Build all 3 targets (pre_process, simulation, post_process)
./mfc.sh build -t simulation -j 8 # Build only simulation
./mfc.sh build --gpu acc -j 8 # Build with OpenACC GPU support
./mfc.sh build --gpu mp -j 8 # Build with OpenMP target offload GPU support
./mfc.sh build --debug -j 8 # Debug build
./mfc.sh build -i case.py --case-optimization -j 8 # Case-optimized build (10x speedup)

# Running
./mfc.sh run case.py -n 4 # Run case with 4 MPI ranks
./mfc.sh run case.py --no-build # Run without rebuilding
./mfc.sh run case.py -e batch -N 2 -n 4 -c phoenix -a ACCOUNT # Batch submit on Phoenix

# Testing
./mfc.sh test -j 8 # Run full test suite (560+ tests)
./mfc.sh test --only 1D -j 8 # Only 1D tests
./mfc.sh test --only 2D Bubbles -j 8 # Only 2D bubble tests
./mfc.sh test --only <UUID> -j 8 # Run one specific test by UUID
./mfc.sh test -l # List all tests with UUIDs and traces
./mfc.sh test -% 10 -j 8 # Run 10% random sample
./mfc.sh test --generate --only <feature> # Regenerate golden files after intentional output change

# Verification (pre-commit CI checks)
./mfc.sh precheck -j 8 # Run all 6 lint checks (same as CI gate)
./mfc.sh format -j 8 # Auto-format Fortran (.fpp/.f90) + Python
./mfc.sh lint # Ruff lint + Python unit tests
./mfc.sh spelling # Spell check

# Module loading (HPC clusters only — must use `source`)
source ./mfc.sh load -c p -m g # Load Phoenix GPU modules
source ./mfc.sh load -c f -m g # Load Frontier GPU modules
source ./mfc.sh load -c p -m c # Load Phoenix CPU modules

# Other
./mfc.sh validate case.py # Validate case file without running
./mfc.sh params <query> # Search 3,400 case parameters
./mfc.sh clean # Remove build artifacts
./mfc.sh new <name> # Create new case from template
```

## System Identification and Module Loading

MFC targets HPC clusters. Before building on a cluster, load the correct modules
via `source ./mfc.sh load -c <slug> -m <mode>`.

To identify the current system, check multiple signals — hostname alone is not always
sufficient (compute nodes may differ from login nodes):
Run `./mfc.sh <command> --help` for the full flag set; the most-used invocations:

```bash
hostname # e.g., login-phoenix-gnr-2.pace.gatech.edu
echo $LMOD_SYSHOST # e.g., "phoenix" (most reliable when set)
echo $CRAY_LD_LIBRARY_PATH # Non-empty → Cray system (Frontier, Carpenter Cray)
echo $MODULESHOME # Confirms module system is available
# Build / run / test (-j N = parallel jobs)
./mfc.sh build -j 8 # all 3 targets; flags: -t <target>, --gpu acc|mp, --debug,
# -i case.py --case-optimization (10x speedup)
./mfc.sh run case.py -n 4 # run with 4 MPI ranks; --no-build; -e batch (toolchain/templates/)
./mfc.sh test -j 8 # full suite; --only <1D|Bubbles|UUID>, -l, -% N (sample),
# --generate (regenerate golden files after an intended output change)

# Verify before committing
./mfc.sh precheck -j 8 # all CI lint checks
./mfc.sh format -j 8 # auto-format Fortran (.fpp/.f90) + Python
./mfc.sh lint # ruff lint + Python unit tests (spelling: ./mfc.sh spelling)

# Case files
./mfc.sh validate case.py # validate without running
./mfc.sh params <query> # search case parameters
./mfc.sh new <name> # new case from template (clean: ./mfc.sh clean)
```

Supported systems and their slugs (full list in `toolchain/modules`):
Module loading (`source ./mfc.sh load -c <slug> -m <mode>`) is covered under System Identification below.

| Slug | System | GPU Backend | Example |
|------|--------|-------------|---------|
| `p` | GT Phoenix | OpenACC (nvfortran) | `source ./mfc.sh load -c p -m g` |
| `f` | OLCF Frontier | OpenACC/OpenMP (Cray ftn) | `source ./mfc.sh load -c f -m g` |
| `tuo` | LLNL Tuolumne | OpenMP (Cray ftn) | `source ./mfc.sh load -c tuo -m g` |
| `d` | NCSA Delta | OpenACC (nvfortran) | `source ./mfc.sh load -c d -m g` |
| `b` | PSC Bridges2 | OpenACC (nvfortran) | `source ./mfc.sh load -c b -m g` |
| `cc` | DoD Carpenter (Cray) | CPU only | `source ./mfc.sh load -c cc -m c` |
| `c` | DoD Carpenter (GNU) | CPU only | `source ./mfc.sh load -c c -m c` |
| `o` | Brown Oscar | OpenACC (nvfortran) | `source ./mfc.sh load -c o -m g` |
| `h` | UF HiPerGator | OpenACC (nvfortran) | `source ./mfc.sh load -c h -m g` |
## System Identification and Module Loading

The `-m` flag selects mode: `g`/`gpu` for GPU builds, `c`/`cpu` for CPU-only.
Batch job templates for `./mfc.sh run -e batch -c <system>` are in `toolchain/templates/`.
On an HPC cluster, load modules before building: `source ./mfc.sh load -c <slug> -m <mode>`
(`-m g`/`gpu` or `c`/`cpu`). The `source` is required — plain `./mfc.sh load` errors, since
the command sets environment variables in the current shell.

IMPORTANT: `source` (or `.`) is required for `load` — it sets environment variables
in the current shell. Using `./mfc.sh load` without `source` will error.
Slugs live in `toolchain/modules` (e.g. `p` Phoenix, `f` Frontier, `tuo` Tuolumne, `d` Delta,
`b` Bridges2, `c`/`cc` Carpenter GNU/Cray, `o` Oscar, `h` HiPerGator; GPU backend per system
is defined there). To identify the current system, check `$LMOD_SYSHOST` (most reliable),
then a non-empty `$CRAY_LD_LIBRARY_PATH` (→ Cray: Frontier / Carpenter-Cray), then `hostname`
— login and compute nodes may differ. Batch templates for `./mfc.sh run -e batch -c <system>`
are in `toolchain/templates/`.

## Development Workflow Contract

Expand All @@ -99,7 +66,7 @@ IMPORTANT: Follow this loop for ALL code changes. Do not skip steps.
2. **Plan** — For multi-file changes, outline your approach before implementing.
3. **Implement** — Make small, focused changes. One logical change per commit.
4. **Format** — Run `./mfc.sh format -j 8` to auto-format.
5. **Verify** — Run `./mfc.sh precheck -j 8` (same 6 checks as CI lint gate).
5. **Verify** — Run `./mfc.sh precheck -j 8` (same checks as the CI lint gate).
6. **Build** — Run `./mfc.sh build -j 8` to verify compilation.
7. **Test** — Run relevant tests: `./mfc.sh test --only <feature> -j 8`.
For changes to `src/common/`, test ALL three targets: `./mfc.sh test -j 8`.
Expand All @@ -119,11 +86,11 @@ src/
simulation/ # CFD solver (GPU-accelerated via OpenACC / OpenMP target offload)
post_process/ # Data output and visualization
toolchain/ # Python CLI, build system, testing, parameter management
mfc/params/definitions.py # ~3,400 parameter definitions (source of truth)
mfc/params/definitions.py # parameter definitions (source of truth)
mfc/case_validator.py # Physics constraint validation
mfc/test/ # Test runner and case generation
examples/ # Example simulation cases (case.py files)
tests/ # 560+ regression test golden files
tests/ # regression test golden files
```

Source files are `.fpp` (Fortran + Fypp macros), preprocessed to `.f90` by CMake.
Expand All @@ -137,6 +104,7 @@ NEVER use double-precision intrinsics: `dsqrt`, `dexp`, `dlog`, `dble`, `dabs`,
NEVER use `d` exponent literals (`1.0d0`). Use `1.0_wp` instead.
NEVER use `stop` or `error stop`. Use `call s_mpi_abort()` or `@:PROHIBIT()`/`@:ASSERT()`.
NEVER use `goto`, `COMMON` blocks, or global `save` variables.
(Headline subset; full lint-enforced list — incl. Python/shell rules — in `.claude/rules/fortran-conventions.md`.)

Every `@:ALLOCATE(...)` MUST have a matching `@:DEALLOCATE(...)`.
Every new parameter MUST be added in at least 2 places (3 if it has constraints):
Expand All @@ -153,13 +121,14 @@ Changes to `src/common/` affect ALL three executables. Test comprehensively.
- Modules: `m_<feature>` (e.g., `m_bubbles`)
- Public subroutines: `s_<verb>_<noun>` (e.g., `s_compute_pressure`)
- Public functions: `f_<verb>_<noun>`
- Private/local variables: no prefix required. Constants: descriptive names, not ALL_CAPS.
- 2-space indentation, lowercase keywords, explicit `intent` on all arguments

## Precision System

- `wp` = working precision (computation). `stp` = storage precision (field data arrays and I/O).
- Default: both double. Single mode: both single. Mixed: wp=double, stp=half.
- MPI types must match: `mpi_p` ↔ `wp`, `mpi_io_p` ↔ `stp`.
- Both double by default. See `.claude/rules/fortran-conventions.md` for single/mixed
modes, casting rules, and MPI type matching (`mpi_p` ↔ `wp`, `mpi_io_p` ↔ `stp`).

## Code Review Priorities

Expand Down
Loading
Loading