Skip to content

CRE-4396: workflows caching soak-test (CI)#22529

Merged
Tofel merged 15 commits into
developfrom
cache_test_ci
May 20, 2026
Merged

CRE-4396: workflows caching soak-test (CI)#22529
Tofel merged 15 commits into
developfrom
cache_test_ci

Conversation

@Tofel
Copy link
Copy Markdown
Contributor

@Tofel Tofel commented May 19, 2026

To do:

@github-actions
Copy link
Copy Markdown
Contributor

I see you updated files related to core. Please run make gocs in the root directory to add a changeset as well as in the text include at least one of the following tags:

  • #added For any new functionality added.
  • #breaking_change For any functionality that requires manual action for the node to boot.
  • #bugfix For bug fixes.
  • #changed For any change to the existing functionality.
  • #db_update For any feature that introduces updates to database schema.
  • #deprecation_notice For any upcoming deprecation functionality.
  • #internal For changesets that need to be excluded from the final changelog.
  • #nops For any feature that is NOP facing and needs to be in the official Release Notes for the release.
  • #removed For any functionality/config that is removed.
  • #updated For any functionality that is updated.
  • #wip For any change that is not ready yet and external communication about it should be held off till it is feature complete.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 19, 2026

✅ No conflicts with other open PRs targeting develop

@trunk-io
Copy link
Copy Markdown

trunk-io Bot commented May 19, 2026

Static BadgeStatic BadgeStatic BadgeStatic Badge

View Full Report ↗︎Docs

@mchain0 mchain0 changed the title WF Caching CI Test CRE-4396: workflows caching soak-test (CI) May 19, 2026
mchain0
mchain0 previously approved these changes May 19, 2026
@Tofel Tofel marked this pull request as ready for review May 20, 2026 05:32
@Tofel Tofel requested review from a team as code owners May 20, 2026 05:32
Copilot AI review requested due to automatic review settings May 20, 2026 05:32
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Risk Rating: MEDIUM (adds new long-running soak test + introduces/changes CI workflows and runner configuration)

Adds a CRE workflow module-cache soak test (with metrics export) and wires it into CI, plus helper utilities to inspect container restarts and scan container logs for expected startup messages. This supports CRE-4396 by validating module cache behavior at scale and collecting Prometheus/cAdvisor data as artifacts.

Changes:

  • Introduces a new Test_V2_CRE_CacheSoak soak test that deploys many cron workflows, waits for execution, validates module cache enablement, and exports Prometheus + cAdvisor metrics to JSON artifacts.
  • Adds reusable test helpers for docker container restart stability checks and streaming/decoding container logs.
  • Adds a new GitHub Actions workflow to run the caching soak test post-docker-build; bumps chainlink-testing-framework to v0.16.2.

Scrupulous human review recommended (before merge):

  • .github/workflows/cre-wf-caching-test.yml runner label (runs-on) formatting and overall workflow correctness (job scheduling is critical to CI reliability).
  • PromQL queries in workflow_caching_test.go (ensure gauge vs counter usage matches the metrics’ instrument types).
  • Resource/file handling in resource_consumption_test.go (long soak tests should avoid leaking FDs and should have clear failure messages).

Reviewed changes

Copilot reviewed 11 out of 15 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
system-tests/tests/test-helpers/container_restart.go New helper to snapshot and assert container restart/OOM/running state for node containers.
system-tests/tests/test-helpers/container_logs.go New helper to list node container names and stream/decode docker multiplexed logs to assert log content.
system-tests/tests/soak/cre/workflow_caching_test.go New caching soak test that deploys workflows, validates cache enablement, and exports metrics.
system-tests/tests/soak/cre/resource_consumption_test.go Adds a PoR resource/memory leak soak test with a fake price provider and leak detector checks.
system-tests/tests/smoke/cre/v2_module_cache_test.go Switches module cache log assertion to the new shared helper (removes docker logs exec path).
system-tests/tests/go.mod / system-tests/tests/go.sum Bumps CTF framework to v0.16.2; adds direct dependency on github.com/moby/moby/client.
system-tests/tests/.gitignore Ignores metrics/ artifact directory.
system-tests/lib/go.mod / system-tests/lib/go.sum Bumps CTF framework to v0.16.2.
core/scripts/go.mod / core/scripts/go.sum Bumps CTF framework to v0.16.2.
core/scripts/cre/environment/configs/workflow-gateway-don-cache-soak-test.toml New CRE env config enabling workflow module cache with MaxLoaded/Idle eviction settings.
.github/workflows/post-docker-build.yml Adds a call to the new caching workflow after docker build.
.github/workflows/cre-wf-caching-test.yml New CI workflow to start obs + CRE env and run the caching soak test, uploading metrics/logs.
Comments suppressed due to low confidence (1)

system-tests/tests/soak/cre/workflow_caching_test.go:170

  • platform_workflow_module_cache_memory_saved_bytes is emitted as a gauge (see core/services/workflows/syncer/v2/metrics.go), so increase(...) is not an appropriate PromQL function here. Use the raw gauge or an over-time aggregation (e.g., avg_over_time/max_over_time) to capture the value trend.
		{
			metric:   "platform_workflow_module_cache_memory_saved_bytes",
			query:    fmt.Sprintf("increase(platform_workflow_module_cache_memory_saved_bytes{node_don=\"%%s\", node_index=\"%%d\"}[%s])", cachePrometheusRange),
			filename: "metrics/cache_memory_saved_bytes.json",
			step:     defaultMetricStep,

Comment thread system-tests/tests/soak/cre/workflow_caching_test.go Outdated
Comment thread .github/workflows/cre-wf-caching-test.yml
@Tofel Tofel enabled auto-merge May 20, 2026 10:16
@cl-sonarqube-production
Copy link
Copy Markdown

@Tofel Tofel added this pull request to the merge queue May 20, 2026
Merged via the queue into develop with commit 8c875e3 May 20, 2026
217 checks passed
@Tofel Tofel deleted the cache_test_ci branch May 20, 2026 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants