MFlowCode · sbryngelson · May 29, 2026 · May 29, 2026 · May 29, 2026 · May 29, 2026
@@ -4,7 +4,9 @@
 - Grid dimensions: `m`, `n`, `p` (cells in x, y, z). 1D: n=p=0. 2D: p=0.
 - Interior domain: `0:m`, `0:n`, `0:p`
 - Buffer/ghost region: `-buff_size:m+buff_size` (similar for n, p in multi-D)
-- `buff_size` depends on WENO order and features (typically `2*weno_polyn + 2`)
+- `buff_size` is **not** a single formula: it's set per reconstruction scheme (WENO/MUSCL/IGR) in
+  `s_configure_coordinate_bounds` (`m_helper_basic.fpp`) and floored higher for Lagrange bubbles and IB.
+  Read that routine for the current value rather than assuming one.
 - Domain bounds: `idwint(1:3)` (interior `0:m`), `idwbuff(1:3)` (with ghost cells)
 - Cell-center coords: `x_cc(-buff_size:m+buff_size)`, `y_cc(...)`, `z_cc(...)`
 - Cell-boundary coords: `x_cb(-1-buff_size:m+buff_size)`
@@ -38,8 +40,8 @@
 - Boundary condition symmetry requirements must be maintained
 
 ## Compiler-Specific Issues
-- CI-gated compilers (must always pass): gfortran, nvfortran, Cray ftn, and Intel ifx
-- AMD flang is additionally supported for `--gpu mp` builds but not in the CI matrix
+- See the compiler-backend matrix in `.claude/rules/gpu-and-mpi.md` for which compilers
+  are CI-gated and which backends each supports.
 - Each compiler has different strictness levels and warning behavior
 - Fypp macros must expand correctly for both GPU and CPU builds
 
@@ -54,12 +56,8 @@
 - Do not regenerate ALL golden files unless you understand every output change
 
 ## PR Checklist
-Before submitting a PR:
-- [ ] `./mfc.sh format -j 8` (auto-format)
-- [ ] `./mfc.sh precheck -j 8` (5 CI lint checks)
-- [ ] `./mfc.sh build -j 8` (compiles)
-- [ ] `./mfc.sh test --only <relevant> -j 8` (tests pass)
+The base loop (format → precheck → build → test → one logical commit) is the
+Development Workflow Contract in `CLAUDE.md`. Beyond it, watch for:
 - [ ] If adding parameters: definitions.py (_r + _nv) updated; cmake reconfigured; case_validator.py if constraints
 - [ ] If modifying `src/common/`: all three targets tested
-- [ ] If changing output: golden files regenerated for affected tests
-- [ ] One logical change per commit
+- [ ] If changing output: golden files regenerated for affected tests (`./mfc.sh test --generate --only <tests>`)
@@ -15,11 +15,7 @@ Every Fortran module follows this pattern:
 - Finalization subroutine: `s_finalize_<feature>_module`
 
 ## Naming
-- Modules: `m_<feature>`
-- Public subroutines: `s_<verb>_<noun>`
-- Public functions: `f_<verb>_<noun>`
-- Private/local variables: no prefix required
-- Constants: descriptive names, not ALL_CAPS
+See "Naming Conventions" in `CLAUDE.md`.
 
 ## Forbidden Patterns
 
@@ -57,10 +53,8 @@ Enforced by convention/code review (not automated):
 - Fortran-side runtime validation also exists in `m_checker*.fpp` files using `@:PROHIBIT`
 
 ## Precision Types
-- `wp` (working precision): used for computation. Double by default.
-- `stp` (storage precision): used for field data arrays and I/O. Double by default.
-- In single-precision mode (`--single`): both become single.
-- In mixed-precision mode (`--mixed`): wp=double, stp=half.
+`wp`/`stp` are defined in `CLAUDE.md` (`wp` = computation, `stp` = field-data storage + I/O). Detail:
+- Modes: default both double; `--single` → both single; `--mixed` → wp=double, stp=half.
 - MPI type matching: `mpi_p` must match `wp`, `mpi_io_p` must match `stp`.
 - Always use generic intrinsics: `sqrt` not `dsqrt`, `abs` not `dabs`.
 - Cast with `real(..., wp)` or `real(..., stp)`, never `dble(...)`.

@@ -23,26 +23,17 @@ compiles to either OpenACC or OpenMP target offload depending on the build flag:
 
 ### Key GPU Macros (always use the `GPU_*` prefix)
 
-Inline macros (use `$:` prefix):
-- `$:GPU_PARALLEL_LOOP(collapse=N, private=[...], reduction=[...], reductionOp='+')` —
-  Parallel loop over GPU threads. Most common GPU macro.
-- `$:END_GPU_PARALLEL_LOOP()` — Required closing for GPU_PARALLEL_LOOP.
-- `$:GPU_LOOP(collapse=N, ...)` — Inner loop within a GPU parallel region.
-- `$:GPU_ENTER_DATA(create=[...])` — Allocate device memory (unscoped).
-- `$:GPU_EXIT_DATA(delete=[...])` — Free device memory.
-- `$:GPU_UPDATE(host=[...])` — Copy device → host (before MPI send).
-- `$:GPU_UPDATE(device=[...])` — Copy host → device (after MPI receive).
-- `$:GPU_ROUTINE(parallelism='[seq]')` — Mark routine for device compilation.
-- `$:GPU_DECLARE(create=[...])` — Declare device-resident data.
-- `$:GPU_ATOMIC(atomic='update')` — Atomic operation on device.
-- `$:GPU_WAIT()` — Synchronization barrier.
-
-Block macros (use `#:call`/`#:endcall`):
-- `GPU_PARALLEL(...)` — GPU parallel region (used for scalar reductions like `maxval`/`minval`).
-- `GPU_DATA(copy=..., create=..., ...)` — Scoped data region.
-- `GPU_HOST_DATA(use_device_addr=[...])` — Host code with device pointers.
-
-Typical GPU loop pattern (used 750+ times in the codebase):
+Full set with signatures in `parallel_macros.fpp`. The ones you reach for most:
+- `$:GPU_PARALLEL_LOOP(collapse=N, private=[...], reduction=[...], reductionOp='+')`
+  + `$:END_GPU_PARALLEL_LOOP()` — parallel spatial loop; by far the most common (see pattern below).
+- `$:GPU_LOOP(collapse=N, ...)` — inner loop *within* a parallel region.
+- `$:GPU_UPDATE(host=[...])` / `$:GPU_UPDATE(device=[...])` — device↔host copies (around MPI; see below).
+- `#:call GPU_PARALLEL(...)` — block region for scalar reductions (`maxval`/`minval`).
+
+Others in `parallel_macros.fpp`: `GPU_ENTER_DATA`/`GPU_EXIT_DATA`, `GPU_DECLARE`, `GPU_ROUTINE`,
+`GPU_ATOMIC`, `GPU_WAIT`, and the block macros `GPU_DATA`, `GPU_HOST_DATA`.
+
+Typical GPU loop pattern (the dominant spatial-loop idiom):
 ```
 $:GPU_PARALLEL_LOOP(private='[i,j,k,l]', collapse=3)
 do l = idwbuff(3)%beg, idwbuff(3)%end
@@ -108,16 +99,10 @@ Use `#ifdef` for feature, target, compiler, and library gating:
 - `MFC_POST_PROCESS` — Only in post_process builds
 
 ### Compiler gating (for compiler-specific workarounds)
-- `_CRAYFTN` — Cray Fortran compiler
-- `__NVCOMPILER_GPU_UNIFIED_MEM` — NVIDIA unified memory (GH-200 / `--unified`)
-- `__PGI` — Legacy PGI/NVIDIA compiler
-- `__INTEL_COMPILER` — Intel compiler
-- `FRONTIER_UNIFIED` — Frontier HPC unified memory
-
-### Library-specific code
-- FFTW (`m_fftw.fpp`) uses heavy `#ifdef` gating for `MFC_GPU` and `__PGI`
-- CUDA Fortran (`cudafor` module) is gated behind `__NVCOMPILER_GPU_UNIFIED_MEM`
-- SILO/HDF5 interfaces may have conditional paths
+Compiler/feature macros: `_CRAYFTN`, `__NVCOMPILER_GPU_UNIFIED_MEM` (NVIDIA unified mem, GH-200 /
+`--unified`), `__PGI` (legacy PGI/NVIDIA), `__INTEL_COMPILER`, `FRONTIER_UNIFIED`. Library code is
+similarly gated (FFTW in `m_fftw.fpp` on `MFC_GPU`/`__PGI`; CUDA Fortran `cudafor` on
+`__NVCOMPILER_GPU_UNIFIED_MEM`; SILO/HDF5 paths). Grep the relevant file for exact usage.
 
 When adding new `#ifdef` blocks, always provide an `#else` or `#endif` path so
 the code compiles in all configurations (CPU-only, GPU-ACC, GPU-OMP, with/without MPI).

@@ -1,7 +1,7 @@
 # Parameter System
 
 ## Overview
-MFC has ~3,400 simulation parameters defined in Python and read by Fortran via namelist files.
+MFC's simulation parameters are defined in Python and read by Fortran via namelist files.
 
 ## Parameter Flow: Python → Fortran
 
@@ -37,7 +37,10 @@ at CMake configure time — no manual Fortran edits needed for simple scalar par
 **Exceptions — still require manual Fortran edits:**
 - Array variables (e.g. `logical, dimension(num_fluids_max)`) → declare in `src/*/m_global_parameters.fpp`
 - Derived-type members (`fluid_pp%attr`, `patch_icpp(i)%attr`) → declare in the relevant derived type
-- Case-optimization parameters → add to `CASE_OPT_PARAMS` and the `#:else` block in `src/simulation/m_global_parameters.fpp`
+- Case-optimization parameters → add to `CASE_OPT_PARAMS` and the `#:else` block in `src/simulation/m_global_parameters.fpp`.
+  Gotcha: under `--case-optimization` these are baked into the binary and dropped from the simulation namelist
+  (`case_dicts.py` filters them), so changing one needs a *rebuild*, not just a case edit — and building without
+  the flag makes them read from `.inp` again.
 
 ## Case Files
 - Case files are Python scripts (`.py`) that define a dict of parameters
@@ -47,15 +50,24 @@ at CMake configure time — no manual Fortran edits needed for simple scalar par
 - Search parameters with `./mfc.sh params <query>`
 
 ## Fortran-Side Runtime Validation
-Each target has `m_checker*.fpp` files (e.g., `src/simulation/m_checker.fpp`,
-`src/common/m_checker_common.fpp`) containing runtime parameter validation using
-`@:PROHIBIT(condition, message)`. When adding parameters with physics constraints,
-add Fortran-side checks here in addition to `case_validator.py`.
+Runtime parameter validation uses `@:PROHIBIT(condition, message)`. Put a check where it runs:
+- **Shared across all three targets** → `src/common/m_checker_common.fpp` (`s_check_inputs_common`,
+  with `#ifndef MFC_*` gates for target-specific exclusions). This holds most checks.
+- **Simulation-only** → `src/simulation/m_checker.fpp` (WENO/MUSCL/IGR/time-stepping/compiler checks).
+- **Pre/post-only** → `src/{pre,post}_process/m_checker.fpp`. Note: their `s_check_inputs` are
+  currently empty — that's the right place for a pre/post-only constraint, not `m_checker_common.fpp`.
+
+Add Fortran-side checks in addition to `case_validator.py`.
 
 ## Analytical Initial Conditions
 String expressions in parameters become Fortran code via `case.py.__get_analytic_ic_fpp()`.
 These are compiled into the binary, so syntax errors cause build failures, not runtime errors.
 
+Gotcha: each IC variable (`alpha_rho`, `vel`, `pres`, `alpha`, `Y`, `Bx`...) maps to an `eqn_idx%…`
+expression in `QPVF_IDX_VARS` (`case.py`). Adding a conserved variable that patches can set means
+updating that map *and* the Fortran `eqn_idx` builder to agree — a mismatch is a silent wrong-index, not
+an error. (This is also why `Bx`/`By`/`Bz` use `eqn_idx%B%end-1/%end`, to stay valid in 1D/2D.)
+
 Available variables in analytical IC expressions:
 - `x`, `y`, `z` — cell-center coordinates (mapped to `x_cc(i)`, `y_cc(j)`, `z_cc(k)`)
 - `xc`, `yc`, `zc` — patch centroid coordinates

@@ -6,6 +6,13 @@ toolchain for building/running/testing, and supports GPU acceleration via OpenAC
 OpenMP target offload. It must compile with gfortran, nvfortran, Cray ftn, and Intel ifx (CI-gated).
 AMD flang is additionally supported for OpenMP target offload GPU builds.
 
+## Working Style
+
+Make surgical changes: every changed line should trace to the request. Don't refactor,
+reformat, or "improve" adjacent code — in a four-compiler, golden-file-gated codebase,
+incidental edits are how regressions slip in. For general behavioral guidance (simplicity,
+surfacing assumptions, verifiable success criteria), invoke the `karpathy-guidelines` skill.
+
 ## Commands
 
 Prefer using `./mfc.sh` as the entry point for building, running, testing, formatting,
@@ -15,81 +22,41 @@ compilers directly unless you have a specific reason.
 
 All commands run from the repo root via `./mfc.sh`.
 
-```bash
-# Building
-./mfc.sh build -j 8                        # Build all 3 targets (pre_process, simulation, post_process)
-./mfc.sh build -t simulation -j 8          # Build only simulation
-./mfc.sh build --gpu acc -j 8              # Build with OpenACC GPU support
-./mfc.sh build --gpu mp -j 8              # Build with OpenMP target offload GPU support
-./mfc.sh build --debug -j 8                # Debug build
-./mfc.sh build -i case.py --case-optimization -j 8  # Case-optimized build (10x speedup)
-
-# Running
-./mfc.sh run case.py -n 4                  # Run case with 4 MPI ranks
-./mfc.sh run case.py --no-build            # Run without rebuilding
-./mfc.sh run case.py -e batch -N 2 -n 4 -c phoenix -a ACCOUNT  # Batch submit on Phoenix
-
-# Testing
-./mfc.sh test -j 8                         # Run full test suite (560+ tests)
-./mfc.sh test --only 1D -j 8              # Only 1D tests
-./mfc.sh test --only 2D Bubbles -j 8      # Only 2D bubble tests
-./mfc.sh test --only <UUID> -j 8          # Run one specific test by UUID
-./mfc.sh test -l                           # List all tests with UUIDs and traces
-./mfc.sh test -% 10 -j 8                  # Run 10% random sample
-./mfc.sh test --generate --only <feature>  # Regenerate golden files after intentional output change
-
-# Verification (pre-commit CI checks)
-./mfc.sh precheck -j 8                     # Run all 6 lint checks (same as CI gate)
-./mfc.sh format -j 8                       # Auto-format Fortran (.fpp/.f90) + Python
-./mfc.sh lint                              # Ruff lint + Python unit tests
-./mfc.sh spelling                          # Spell check
-
-# Module loading (HPC clusters only — must use `source`)
-source ./mfc.sh load -c p -m g             # Load Phoenix GPU modules
-source ./mfc.sh load -c f -m g             # Load Frontier GPU modules
-source ./mfc.sh load -c p -m c             # Load Phoenix CPU modules
-
-# Other
-./mfc.sh validate case.py                  # Validate case file without running
-./mfc.sh params <query>                    # Search 3,400 case parameters
-./mfc.sh clean                             # Remove build artifacts
-./mfc.sh new <name>                        # Create new case from template
-```
-
-## System Identification and Module Loading
-
-MFC targets HPC clusters. Before building on a cluster, load the correct modules
-via `source ./mfc.sh load -c <slug> -m <mode>`.
-
-To identify the current system, check multiple signals — hostname alone is not always
-sufficient (compute nodes may differ from login nodes):
+Run `./mfc.sh <command> --help` for the full flag set; the most-used invocations:
 
 ```bash
-hostname                    # e.g., login-phoenix-gnr-2.pace.gatech.edu
-echo $LMOD_SYSHOST          # e.g., "phoenix" (most reliable when set)
-echo $CRAY_LD_LIBRARY_PATH  # Non-empty → Cray system (Frontier, Carpenter Cray)
-echo $MODULESHOME           # Confirms module system is available
+# Build / run / test  (-j N = parallel jobs)
+./mfc.sh build -j 8                 # all 3 targets; flags: -t <target>, --gpu acc|mp, --debug,
+                                    #   -i case.py --case-optimization (10x speedup)
+./mfc.sh run case.py -n 4           # run with 4 MPI ranks; --no-build; -e batch (toolchain/templates/)
+./mfc.sh test -j 8                  # full suite; --only <1D|Bubbles|UUID>, -l, -% N (sample),
+                                    #   --generate (regenerate golden files after an intended output change)
+
+# Verify before committing
+./mfc.sh precheck -j 8              # all CI lint checks
+./mfc.sh format -j 8               # auto-format Fortran (.fpp/.f90) + Python
+./mfc.sh lint                       # ruff lint + Python unit tests   (spelling: ./mfc.sh spelling)
+
+# Case files
+./mfc.sh validate case.py           # validate without running
+./mfc.sh params <query>             # search case parameters
+./mfc.sh new <name>                 # new case from template       (clean: ./mfc.sh clean)
 ```
 
-Supported systems and their slugs (full list in `toolchain/modules`):
+Module loading (`source ./mfc.sh load -c <slug> -m <mode>`) is covered under System Identification below.
 
-| Slug | System | GPU Backend | Example |
-|------|--------|-------------|---------|
-| `p` | GT Phoenix | OpenACC (nvfortran) | `source ./mfc.sh load -c p -m g` |
-| `f` | OLCF Frontier | OpenACC/OpenMP (Cray ftn) | `source ./mfc.sh load -c f -m g` |
-| `tuo` | LLNL Tuolumne | OpenMP (Cray ftn) | `source ./mfc.sh load -c tuo -m g` |
-| `d` | NCSA Delta | OpenACC (nvfortran) | `source ./mfc.sh load -c d -m g` |
-| `b` | PSC Bridges2 | OpenACC (nvfortran) | `source ./mfc.sh load -c b -m g` |
-| `cc` | DoD Carpenter (Cray) | CPU only | `source ./mfc.sh load -c cc -m c` |
-| `c` | DoD Carpenter (GNU) | CPU only | `source ./mfc.sh load -c c -m c` |
-| `o` | Brown Oscar | OpenACC (nvfortran) | `source ./mfc.sh load -c o -m g` |
-| `h` | UF HiPerGator | OpenACC (nvfortran) | `source ./mfc.sh load -c h -m g` |
+## System Identification and Module Loading
 
-The `-m` flag selects mode: `g`/`gpu` for GPU builds, `c`/`cpu` for CPU-only.
-Batch job templates for `./mfc.sh run -e batch -c <system>` are in `toolchain/templates/`.
+On an HPC cluster, load modules before building: `source ./mfc.sh load -c <slug> -m <mode>`
+(`-m g`/`gpu` or `c`/`cpu`). The `source` is required — plain `./mfc.sh load` errors, since
+the command sets environment variables in the current shell.
 
-IMPORTANT: `source` (or `.`) is required for `load` — it sets environment variables
-in the current shell. Using `./mfc.sh load` without `source` will error.
+Slugs live in `toolchain/modules` (e.g. `p` Phoenix, `f` Frontier, `tuo` Tuolumne, `d` Delta,
+`b` Bridges2, `c`/`cc` Carpenter GNU/Cray, `o` Oscar, `h` HiPerGator; GPU backend per system
+is defined there). To identify the current system, check `$LMOD_SYSHOST` (most reliable),
+then a non-empty `$CRAY_LD_LIBRARY_PATH` (→ Cray: Frontier / Carpenter-Cray), then `hostname`
+— login and compute nodes may differ. Batch templates for `./mfc.sh run -e batch -c <system>`
+are in `toolchain/templates/`.
 
 ## Development Workflow Contract
 
@@ -99,7 +66,7 @@ IMPORTANT: Follow this loop for ALL code changes. Do not skip steps.
 2. **Plan** — For multi-file changes, outline your approach before implementing.
 3. **Implement** — Make small, focused changes. One logical change per commit.
 4. **Format** — Run `./mfc.sh format -j 8` to auto-format.
-5. **Verify** — Run `./mfc.sh precheck -j 8` (same 6 checks as CI lint gate).
+5. **Verify** — Run `./mfc.sh precheck -j 8` (same checks as the CI lint gate).
 6. **Build** — Run `./mfc.sh build -j 8` to verify compilation.
 7. **Test** — Run relevant tests: `./mfc.sh test --only <feature> -j 8`.
    For changes to `src/common/`, test ALL three targets: `./mfc.sh test -j 8`.
@@ -119,11 +86,11 @@ src/
   simulation/     # CFD solver (GPU-accelerated via OpenACC / OpenMP target offload)
   post_process/   # Data output and visualization
 toolchain/        # Python CLI, build system, testing, parameter management
-  mfc/params/definitions.py   # ~3,400 parameter definitions (source of truth)
+  mfc/params/definitions.py   # parameter definitions (source of truth)
   mfc/case_validator.py       # Physics constraint validation
   mfc/test/                   # Test runner and case generation
 examples/         # Example simulation cases (case.py files)
-tests/            # 560+ regression test golden files
+tests/            # regression test golden files
 ```
 
 Source files are `.fpp` (Fortran + Fypp macros), preprocessed to `.f90` by CMake.
@@ -137,6 +104,7 @@ NEVER use double-precision intrinsics: `dsqrt`, `dexp`, `dlog`, `dble`, `dabs`,
 NEVER use `d` exponent literals (`1.0d0`). Use `1.0_wp` instead.
 NEVER use `stop` or `error stop`. Use `call s_mpi_abort()` or `@:PROHIBIT()`/`@:ASSERT()`.
 NEVER use `goto`, `COMMON` blocks, or global `save` variables.
+  (Headline subset; full lint-enforced list — incl. Python/shell rules — in `.claude/rules/fortran-conventions.md`.)
 
 Every `@:ALLOCATE(...)` MUST have a matching `@:DEALLOCATE(...)`.
 Every new parameter MUST be added in at least 2 places (3 if it has constraints):
@@ -153,13 +121,14 @@ Changes to `src/common/` affect ALL three executables. Test comprehensively.
 - Modules: `m_<feature>` (e.g., `m_bubbles`)
 - Public subroutines: `s_<verb>_<noun>` (e.g., `s_compute_pressure`)
 - Public functions: `f_<verb>_<noun>`
+- Private/local variables: no prefix required. Constants: descriptive names, not ALL_CAPS.
 - 2-space indentation, lowercase keywords, explicit `intent` on all arguments
 
 ## Precision System
 
 - `wp` = working precision (computation). `stp` = storage precision (field data arrays and I/O).
-- Default: both double. Single mode: both single. Mixed: wp=double, stp=half.
-- MPI types must match: `mpi_p` ↔ `wp`, `mpi_io_p` ↔ `stp`.
+- Both double by default. See `.claude/rules/fortran-conventions.md` for single/mixed
+  modes, casting rules, and MPI type matching (`mpi_p` ↔ `wp`, `mpi_io_p` ↔ `stp`).
 
 ## Code Review Priorities