Replace cpu-benchmark with similar stress-ng monte-carlo test #137

quantumsteve · 2026-02-09T20:10:11Z

First attempt at replacing cpu-benchmark with a nearly identical test in stress-ng.

differences

cpu-benchmark uses a full sphere, while stress-ng uses a single quadrand.
cpu-benchmark uses a "terrible" but efficient rng while stress-ng has many options. I chose "lcg" to start with.
cpu-benchmark batches cpu-work into 1000000 samples, while stress-ng uses ~16384 samples.
stress-ng defines samples and ops as int32_t, while samples in cpu-benchmark are int64_t

stress-ng launches multiple processes, which changes the logic for stopping parent and child processes. A quick search recommended psutil.

rafaelfsilva · 2026-02-10T02:38:43Z

Hi @quantumsteve, it seems the tests for Dask are hanging. Could you please take a look at them? Thanks!

henricasanova · 2026-02-10T04:12:33Z

Confirming that tests hang locally (not only on the GitHub runner). Let me know if you need help/guidance regarding this, as I implemented the testing infrastructure for translators/loggers. Oh, and other tests hang, so it's not specific to Dask, which is good as it means it should be easier to diagnose/fix. AND, some tests fail, like for the Bash translator, which should also be relatively easy to diagnose. All the testing code is in tests/test_helpers.py and tests/translators_loggers/.

quantumsteve · 2026-02-10T15:28:43Z

Confirming that tests hang locally (not only on the GitHub runner). Let me know if you need help/guidance regarding this, as I implemented the testing infrastructure for translators/loggers. Oh, and other tests hang, so it's not specific to Dask, which is good as it means it should be easier to diagnose/fix. AND, some tests fail, like for the Bash translator, which should also be relatively easy to diagnose. All the testing code is in tests/test_helpers.py and tests/translators_loggers/.

Yes, I'm struggling to run some tests locally. It looks like the containers are unable to write in my /tmp directory. Do I need to change permissions?

henricasanova · 2026-02-10T17:24:45Z

Confirming that tests hang locally (not only on the GitHub runner). Let me know if you need help/guidance regarding this, as I implemented the testing infrastructure for translators/loggers. Oh, and other tests hang, so it's not specific to Dask, which is good as it means it should be easier to diagnose/fix. AND, some tests fail, like for the Bash translator, which should also be relatively easy to diagnose. All the testing code is in tests/test_helpers.py and tests/translators_loggers/.

Yes, I'm struggling to run some tests locally. It looks like the containers are unable to write in my /tmp directory. Do I need to change permissions?

The permissions should be fine, as I have dealt with that as well. I'll take a look today and let you know what I find out. Testing with Docker is pretty finecky due to users/permissions.

henricasanova · 2026-02-10T17:44:44Z

Confirming that tests hang locally (not only on the GitHub runner). Let me know if you need help/guidance regarding this, as I implemented the testing infrastructure for translators/loggers. Oh, and other tests hang, so it's not specific to Dask, which is good as it means it should be easier to diagnose/fix. AND, some tests fail, like for the Bash translator, which should also be relatively easy to diagnose. All the testing code is in tests/test_helpers.py and tests/translators_loggers/.

Yes, I'm struggling to run some tests locally. It looks like the containers are unable to write in my /tmp directory. Do I need to change permissions?

The permissions should be fine, as I have dealt with that as well. I'll take a look today and let you know what I find out. Testing with Docker is pretty finecky due to users/permissions.

One thing I noticed is that psutil wasn't listed in pyproject.toml. That "fixed" the bash executor test, in that now it hangs like the others :)

henricasanova · 2026-02-10T17:58:33Z

Confirming that tests hang locally (not only on the GitHub runner). Let me know if you need help/guidance regarding this, as I implemented the testing infrastructure for translators/loggers. Oh, and other tests hang, so it's not specific to Dask, which is good as it means it should be easier to diagnose/fix. AND, some tests fail, like for the Bash translator, which should also be relatively easy to diagnose. All the testing code is in tests/test_helpers.py and tests/translators_loggers/.

Yes, I'm struggling to run some tests locally. It looks like the containers are unable to write in my /tmp directory. Do I need to change permissions?

The permissions should be fine, as I have dealt with that as well. I'll take a look today and let you know what I find out. Testing with Docker is pretty finecky due to users/permissions.

One thing I noticed is that psutil wasn't listed in pyproject.toml. That "fixed" the bash executor test, in that now it hangs like the others :)

Ok, news. Connecting to the container and running wfbench my hand, not involving wfcommons at any of my test infrastructure hangs:

bin/wfbench --name split_fasta_00000001 --percent-cpu 1.0 --cpu-work 1 
[WfBench][09:57:07][INFO] Starting split_fasta_00000001 Benchmark
[WfBench][09:57:07][INFO] Starting CPU and Memory Benchmarks for split_fasta_00000001...
stress-ng: info:  [311] defaulting to a 1 day, 0 secs run per stressor
stress-ng: info:  [311] dispatching hogs: 10 monte-carlo
stress-ng: info:  [312] monte-carlo: pi   ~ 2.6666666666667 vs 3.1415926535898 using lcg (average of 1 runs)
stress-ng: info:  [311] skipped: 0
stress-ng: info:  [311] passed: 10: monte-carlo (10)
stress-ng: info:  [311] failed: 0
stress-ng: info:  [311] metrics untrustworthy: 0
stress-ng: info:  [311] successful run completed in 0.06 secs

That should be easy to diagnose, and I'll look at it soon-ish.

henricasanova · 2026-02-10T18:37:37Z

Ok, so the culprit is io_proc.join(), which hangs. Also, I am noticing 76 stress-ng processes, and 3 stress-ng zombie processes while this hangs. I assume that's fine/intended, but thought I'd mention it.

henricasanova · 2026-02-10T18:45:44Z

@quantumsteve A fix is to call io_proc.kill() right before the io_proc.join() because I don't believe that I/O process can actually terminate on its own. I see you had a commented out io_proc.terminate() before the join.... so perhaps you had thought of that. With that fix, the execution does complete. BUT, it leaves behind tons of zombie stress-ng-vm processes, which is of course not good. With this information, likely you can now fix your code? What do you think?

Replace cpu-benchmark with similar stress-ng monte-carlo test

b9d87fc

quantumsteve marked this pull request as draft February 9, 2026 20:10

rafaelfsilva added this to the v1.5 milestone Feb 10, 2026

rafaelfsilva approved these changes Feb 10, 2026

View reviewed changes

quantumsteve and others added 5 commits February 10, 2026 14:33

Add dependency psutil

d3778c2

terminate io process

827c189

check for division by zero

e41861c

mute stress-ng

901e87b

Added a --quiet flag to another stress-ng invocation

8c560de

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace cpu-benchmark with similar stress-ng monte-carlo test #137

Replace cpu-benchmark with similar stress-ng monte-carlo test #137

Uh oh!

quantumsteve commented Feb 9, 2026

Uh oh!

rafaelfsilva commented Feb 10, 2026

Uh oh!

henricasanova commented Feb 10, 2026 •

edited

Loading

Uh oh!

quantumsteve commented Feb 10, 2026

Uh oh!

henricasanova commented Feb 10, 2026

Uh oh!

henricasanova commented Feb 10, 2026

Uh oh!

henricasanova commented Feb 10, 2026 •

edited

Loading

Uh oh!

henricasanova commented Feb 10, 2026

Uh oh!

henricasanova commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Replace cpu-benchmark with similar stress-ng monte-carlo test #137

Are you sure you want to change the base?

Replace cpu-benchmark with similar stress-ng monte-carlo test #137

Uh oh!

Conversation

quantumsteve commented Feb 9, 2026

Uh oh!

rafaelfsilva commented Feb 10, 2026

Uh oh!

henricasanova commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

quantumsteve commented Feb 10, 2026

Uh oh!

henricasanova commented Feb 10, 2026

Uh oh!

henricasanova commented Feb 10, 2026

Uh oh!

henricasanova commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

henricasanova commented Feb 10, 2026

Uh oh!

henricasanova commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

henricasanova commented Feb 10, 2026 •

edited

Loading

henricasanova commented Feb 10, 2026 •

edited

Loading