CRE-4396: workflows caching soak-test (CI)#22529
Conversation
…ching soak test and workflow
|
I see you updated files related to
|
|
✅ No conflicts with other open PRs targeting |
…ainer was restarted/OOMed during the test
There was a problem hiding this comment.
Pull request overview
Risk Rating: MEDIUM (adds new long-running soak test + introduces/changes CI workflows and runner configuration)
Adds a CRE workflow module-cache soak test (with metrics export) and wires it into CI, plus helper utilities to inspect container restarts and scan container logs for expected startup messages. This supports CRE-4396 by validating module cache behavior at scale and collecting Prometheus/cAdvisor data as artifacts.
Changes:
- Introduces a new
Test_V2_CRE_CacheSoaksoak test that deploys many cron workflows, waits for execution, validates module cache enablement, and exports Prometheus + cAdvisor metrics to JSON artifacts. - Adds reusable test helpers for docker container restart stability checks and streaming/decoding container logs.
- Adds a new GitHub Actions workflow to run the caching soak test post-docker-build; bumps
chainlink-testing-frameworktov0.16.2.
Scrupulous human review recommended (before merge):
.github/workflows/cre-wf-caching-test.ymlrunner label (runs-on) formatting and overall workflow correctness (job scheduling is critical to CI reliability).- PromQL queries in
workflow_caching_test.go(ensure gauge vs counter usage matches the metrics’ instrument types). - Resource/file handling in
resource_consumption_test.go(long soak tests should avoid leaking FDs and should have clear failure messages).
Reviewed changes
Copilot reviewed 11 out of 15 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
system-tests/tests/test-helpers/container_restart.go |
New helper to snapshot and assert container restart/OOM/running state for node containers. |
system-tests/tests/test-helpers/container_logs.go |
New helper to list node container names and stream/decode docker multiplexed logs to assert log content. |
system-tests/tests/soak/cre/workflow_caching_test.go |
New caching soak test that deploys workflows, validates cache enablement, and exports metrics. |
system-tests/tests/soak/cre/resource_consumption_test.go |
Adds a PoR resource/memory leak soak test with a fake price provider and leak detector checks. |
system-tests/tests/smoke/cre/v2_module_cache_test.go |
Switches module cache log assertion to the new shared helper (removes docker logs exec path). |
system-tests/tests/go.mod / system-tests/tests/go.sum |
Bumps CTF framework to v0.16.2; adds direct dependency on github.com/moby/moby/client. |
system-tests/tests/.gitignore |
Ignores metrics/ artifact directory. |
system-tests/lib/go.mod / system-tests/lib/go.sum |
Bumps CTF framework to v0.16.2. |
core/scripts/go.mod / core/scripts/go.sum |
Bumps CTF framework to v0.16.2. |
core/scripts/cre/environment/configs/workflow-gateway-don-cache-soak-test.toml |
New CRE env config enabling workflow module cache with MaxLoaded/Idle eviction settings. |
.github/workflows/post-docker-build.yml |
Adds a call to the new caching workflow after docker build. |
.github/workflows/cre-wf-caching-test.yml |
New CI workflow to start obs + CRE env and run the caching soak test, uploading metrics/logs. |
Comments suppressed due to low confidence (1)
system-tests/tests/soak/cre/workflow_caching_test.go:170
platform_workflow_module_cache_memory_saved_bytesis emitted as a gauge (see core/services/workflows/syncer/v2/metrics.go), soincrease(...)is not an appropriate PromQL function here. Use the raw gauge or an over-time aggregation (e.g.,avg_over_time/max_over_time) to capture the value trend.
{
metric: "platform_workflow_module_cache_memory_saved_bytes",
query: fmt.Sprintf("increase(platform_workflow_module_cache_memory_saved_bytes{node_don=\"%%s\", node_index=\"%%d\"}[%s])", cachePrometheusRange),
filename: "metrics/cache_memory_saved_bytes.json",
step: defaultMetricStep,
|




To do:
pull_requesttrigger from CI and adjust timeouts & test duration (@Tofel)