Skip to content

Combine PETSc.jl with KernelAbstractions/CUDA.jl#237

Open
boriskaus wants to merge 40 commits into
mainfrom
bk/ex19
Open

Combine PETSc.jl with KernelAbstractions/CUDA.jl#237
boriskaus wants to merge 40 commits into
mainfrom
bk/ex19

Conversation

@boriskaus
Copy link
Copy Markdown
Collaborator

@boriskaus boriskaus commented Apr 28, 2026

PETSc has GPU support since a few years, which allows putting matrices and vectors to the GPU using command-line options (provided that PETSc is configured with CUDA support). Yet, to get the best performance, also the residual kernels will have to be rewritten for GPU, which requires writing native CUDA code (for NVIDIA systems), or use Kokkos ("dependency hell" as someone told me).

Julia, on the other hand, is well-known for its excellent GPU support, for example, through packages such as KernelAbstractions, which allows compiling the same vcode for NVIDIA, AMD, or Mac hardware in a straightforward manner.

Here, the PETSc ex19 example (typically used to test PETSc installations) was translated from C to Julia and combined with KernelAbstractions such that residual routines also run on the GPU. It uses coloring and finite differences to approximate the Jacobian and works with multigrid preconditioners.

The documentation has been updated to show scalability tests on GPU vs. CPU (on 1 or 32 cores).

As a quick summary, it works and already shows great potential but can perhaps be further improved:

KSPSolve:

Resolution GPU time (s) GPU (GFlop/s) CPU-1 time (s) CPU-1 (GFlop/s) CPU-32 time (s) CPU-32 (GFlop/s)
513² 0.116 144.3 4.337 4.1 0.297 61.0
1025² 0.299 249.5 19.10 3.9 1.196 61.7
2049² 1.118 295.5 89.76 3.6 5.757 56.6
4097² 4.540 312.4 422.2 3.3 28.30 49.4

SNESSolve:

Resolution GPU time (s) GPU (GFlop/s) CPU-1 time (s) CPU-1 (GFlop/s) CPU-32 time (s) CPU-32 (GFlop/s)
513² 3.645 6.7 8.041 3.2 2.118 12.3
1025² 5.127 20.5 32.75 3.2 3.698 28.2
2049² 11.37 39.7 144.1 3.1 10.57 42.3
4097² 36.88 47.7 658.3 2.9 43.58 43.2

Disclaimer: Claude Sonnet 4.6 was used in preparing this; Valentin helped steer it back in place...

boriskaus and others added 22 commits April 24, 2026 22:32
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
add get_petsc_arrays/restore_petsc_arrays along with multiple dispatch for GPU

Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
@boriskaus boriskaus changed the title Combine PETSc with KernelAbstractions/CUDA.jl Combine PETSc.jl with KernelAbstractions/CUDA.jl Apr 28, 2026
Comment thread src/dmda.jl
Comment thread src/PETSc.jl Outdated
Co-authored-by: Valentin Churavy <v.churavy@gmail.com>
Comment thread src/string_wrappers.jl
Comment thread src/vec.jl Outdated
Comment thread ext/PETScCUDAExt.jl Outdated
Co-authored-by: Copilot <copilot@github.com>
Comment thread src/vec.jl
Comment thread src/vec.jl Outdated
Copy link
Copy Markdown
Member

@vchuravy vchuravy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should also be tests for withlocalarray(Array, ...) and co

Comment thread src/vec.jl Outdated
boriskaus and others added 4 commits April 28, 2026 18:32
…ry/finally cleanup

- determine_memtype now collects raw PetscMemType enum values and uses
  Val{MT} dispatch (_array_type) instead of backend singleton types,
  as requested by Valentin's review.
- Add _as_petsc_vec helper that converts any AbstractPetscVec to PetscVec
  (non-owning, wraps .ptr), fixing MethodError when VecPtr is passed to
  auto-generated *AndMemType ccall wrappers typed ::PetscVec.
  Applied in vec.jl and PETScCUDAExt.jl at every *AndMemType call site.
- withlocalarray! uses try/finally (no Base.finalize) to avoid double-
  execution of VecRestore* when Julia 1.12 concurrent GC races with
  explicit finalize calls.
  do-block compatibility.
Comment thread docs/src/man/gpu.md Outdated
Comment thread docs/src/man/gpu.md Outdated
Comment thread docs/src/man/gpu.md
Comment thread docs/src/man/gpu.md Outdated
Comment thread examples/ex19.jl Outdated
Comment thread ext/PETScCUDAExt.jl Outdated
Comment thread ext/PETScCUDAExt.jl Outdated
Comment thread src/ts.jl
Comment thread src/vec.jl
Comment thread test/runtests.jl Outdated
Comment thread test/runtests.jl Outdated
@boriskaus boriskaus requested a review from vchuravy May 15, 2026 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants