Skip to content

[Feature]: Restructure frame-parallel execution to include heavier frame-dependent stages #358

@harryswift01

Description

@harryswift01

Feature Request

Problem / Motivation

The current frame-parallel implementation parallelises the FrameGraph/covariance stage, but the recent scaling results suggest that this may not dominate the full CodeEntropy workflow as much as expected, especially for the smaller benchmark systems.

Some expensive frame-dependent work still appears to happen before the current frame-parallel section, particularly in the static stage. This may include the dihedral/conformational analysis and neighbour calculation. As a result, the overall workflow scaling may be limited by serial work outside the current Dask frame execution path.

This also means each parallel task currently has a relatively small unit of work, as workers mainly process the covariance pathway for a frame. A larger frame-based unit of work may reduce overhead and improve scaling.

Proposed Solution

Add clearer profiling/timing around the main LevelDAG stages to identify which parts of the workflow are still dominating runtime. This should include timings for:

  • Static setup/stage execution
  • Dihedral/conformational analysis
  • Neighbour calculation
  • FrameGraph/covariance execution
  • Frame reduction/finalisation

If profiling confirms that frame-dependent work in the static stage is a significant bottleneck, investigate restructuring the workflow so more of this work is moved into the frame-parallel path.

The longer-term structure would be closer to:

for frame or frame_chunk in selected_frames:
    compute covariance contribution
    compute neighbour contribution
    compute heavy frame-dependent dihedral/conformational contributions
    return compact partial results

rather than the current structure where only the covariance path is handled by the frame-parallel FrameGraph.

For dihedral/conformational analysis, this may require a map-reduce style approach because some parts depend on trajectory-wide information, such as peak/state assignment. For example:

Pass 1:
    workers compute partial dihedral angle/histogram data per frame chunk

Reduce:
    combine partial histograms and identify global peaks/states

Pass 2:
    workers assign conformational states using the global peak/state data

Reduce:
    combine final state counts/populations

Neighbour calculation may be a simpler first candidate, as it already appears to follow a frame-based structure.

Alternatives Considered

  • Keep the current implementation as covariance-only frame parallelism.

    • This is useful and provides the initial Dask/HPC infrastructure, but may not give the strongest whole-workflow scaling if other serial stages dominate.
  • Only optimise individual functions within the static stage.

    • This may improve runtime locally, but would not address the larger issue that expensive frame-dependent work remains outside the frame-parallel execution path.
  • Increase the number of Dask workers without changing the task structure.

    • This is unlikely to fully solve the issue if the parallel task size remains small and significant serial work remains outside the parallel path.

Expected Impact

  • Clearer understanding of where CodeEntropy runtime is spent after the initial frame-parallel implementation.
  • Better evidence for whether dihedral/conformational analysis, neighbour calculation, or another stage is limiting scaling.
  • Potentially stronger Dask/HPC scaling by increasing the amount of useful work done per worker.
  • Cleaner long-term parallel structure, closer to an outer frame/chunk loop where all frame-dependent work is grouped together.
  • Potential memory improvements by returning compact partial sums, histograms, or counts instead of building larger all-frame objects where possible.
  • Better benchmark evidence for future paper edits and performance discussion.

Additional Context

The current frame-parallel implementation is an important first step because it introduces the explicit frame-local boundary and Dask/HPC execution infrastructure.

Initial profiling with SnakeViz suggested that the FrameGraph/covariance pathway was the main runtime cost, which motivated parallelising that section first. However, benchmark scaling suggests that other workflow stages may still be contributing enough serial runtime to limit overall speedup.

This issue is intended as a follow-up investigation and possible restructuring step, rather than a replacement for the current implementation.

Metadata

Metadata

Assignees

Labels

Type

No fields configured for Task.

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions