Skip to content

fix: align otel-collector image tag with appVersion#233

Closed
mfroembgen wants to merge 3 commits into
ClickHouse:mainfrom
mfroembgen:codex/219-align-otel-collector-2-28
Closed

fix: align otel-collector image tag with appVersion#233
mfroembgen wants to merge 3 commits into
ClickHouse:mainfrom
mfroembgen:codex/219-align-otel-collector-2-28

Conversation

@mfroembgen

Copy link
Copy Markdown
Contributor

Problem

The clickstack chart on current main has appVersion: 2.28.0, but the default bundled collector image is still pinned to docker.clickhouse.com/clickhouse/clickstack-otel-collector:2.19.0.

As reported in #219, the older collector rejects the ClickHouse exporter json key pushed by newer ClickStack app versions:

'clickhouseexporter.Config' has invalid keys: json

That causes the default collector pod to crash loop.

Changes

  • Bump the default otel-collector.image.tag from 2.19.0 to 2.28.0, matching the current chart appVersion.
  • Update the appVersion workflow so future appVersion bumps also update the collector image tag.
  • Add a patch changeset for the chart fix.

Verification

helm dependency build charts/clickstack
helm lint charts/clickstack
helm template smoke charts/clickstack --set clickhouse.persistence.enabled=false --set mongodb.persistence.enabled=false | rg 'clickstack-otel-collector|hyperdx:2\.28\.0|2\.19\.0|2\.28\.0'

The rendered default collector image is now:

image: "docker.clickhouse.com/clickhouse/clickstack-otel-collector:2.28.0"

and the HyperDX app image also renders as 2.28.0.

Fixes #219.

Related to #220, which has the same root-cause fix but is currently behind main and still targets the previous 2.27.0 appVersion.

@changeset-bot

changeset-bot Bot commented Jun 29, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: a9bc79c

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
helm-charts Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@CLAassistant

CLAassistant commented Jun 29, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@mfroembgen mfroembgen marked this pull request as ready for review June 29, 2026 08:57
@mfroembgen mfroembgen requested a review from a team as a code owner June 29, 2026 08:57
@github-actions

github-actions Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Deep Review

No critical issues found. This PR aligns the bundled otel-collector image tag with appVersion (2.19.02.28.0), fixing the documented crash loop, and adds CI drift guards. The values change is a fix (not a regression), the new validation step passes against the current files (appVersion: 2.28.0 vs tag: "2.28.0" compare equal after quote-stripping), and the drift guards fail loudly rather than silently corrupting state. Recommendations below are robustness and coverage improvements, none blocking.

🟡 P2 -- recommended

  • charts/clickstack/tests/otel-collector_test.yaml:1 -- No helm-unittest asserts the rendered collector Deployment image equals the chart appVersion; drift is guarded only by a raw-text awk check in CI, which would not catch a rendering-path regression (e.g. a subchart image override).
    • Fix: Add a unit test that renders the collector Deployment and asserts the container image ends with :2.28.0 so the rendered output is covered, not just the literal values text.
    • testing, project-standards, maintainability
  • .github/workflows/update-app-version.yml:36 -- The sed '/clickstack-otel-collector/{n;...}' substitution and the awk extractor in helm-test.yaml parse values.yaml by line adjacency and whitespace field-splitting, which break if a key or comment is inserted between repository: and tag: or if quote/indent style changes.
    • Fix: Replace the sed/awk pair with structural yq key-path access (yq -i '.["otel-collector"].image.tag = strenv(TAG)' and yq '.appVersion') so the logic is immune to formatting drift.
    • correctness, reliability, maintainability, testing, project-standards
  • charts/clickstack/values.yaml:374 -- The collector tag now duplicates appVersion as a second source of truth, kept in sync only by the new workflow sed and the CI guard rather than a single authoritative field.
    • Fix: Drive otel-collector.image.tag from appVersion at render time via a helper, or extend scripts/update-chart-versions.js (the documented version-sync script) to own this sync instead of a separate workflow path.
    • maintainability
🔵 P3 nitpicks (5)
  • .github/workflows/update-app-version.yml:36 -- The workflow_dispatch TAG input is interpolated into sed expressions (including the appVersion rewrite at line 29, which has no verification guard) without semver validation, so a tag containing / or & could corrupt the substitution.
    • Fix: Validate TAG against a strict semver regex at the top of the step and fail fast before any sed.
  • .github/workflows/helm-test.yaml:45 -- The repository string docker.clickhouse.com/clickhouse/clickstack-otel-collector is hardcoded verbatim in both helm-test.yaml and update-app-version.yml.
    • Fix: Hoist it to a shared workflow-level env value or composite action so a registry rename is a single edit.
  • scripts/update-chart-versions.js:11 -- AGENTS.md describes this as the "Version sync script," but it syncs only the chart version, not appVersion or the collector tag, which this PR now syncs elsewhere — making the description misleading.
    • Fix: Narrow the AGENTS.md description to "chart version sync" or extend the script to cover appVersion/values.yaml.
  • .github/workflows/helm-test.yaml:46 -- The awk extractor strips only double quotes from the tag and does not strip quotes from appVersion, so a future quoted appVersion or single-quoted tag would produce a false mismatch.
    • Fix: Strip all quote styles symmetrically from both extracted values (or use yq, per the P2 above).
  • .github/workflows/update-app-version.yml:1 -- The branch codex/219-align-otel-collector-2-28 does not follow the warren/ prefix convention documented in AGENTS.md.
    • Fix: Rename the branch to match the documented prefix before merge.

⚪ Pre-existing -- not introduced by this diff, surfaced for awareness

  • .github/workflows/update-app-version.yml:49 -- The Create changeset step still interpolates raw ${{ github.event.inputs.tag }} directly into a run: shell (a script-injection sink) even though this PR hardened the sibling sed steps to the env: TAG pattern. Exploitability is limited because workflow_dispatch is gated to write-access actors, but the inconsistency is worth closing in a follow-up. (security)
  • charts/clickstack/values.yaml:374 -- The collector image is pinned by mutable tag rather than digest. This matches repo-wide convention for all chart images and is out of scope here. (security)

Reviewers (8): correctness, testing, maintainability, project-standards, reliability, security, agent-native, learnings-researcher.

Testing gaps:

  • No helm-unittest renders the collector Deployment and asserts its image tag; drift is caught only at push/PR time by the awk guard, not in local unit tests.
  • No test exercises the update-app-version.yml mutation path (e.g. an inserted line between repository:/tag:, or an adversarial TAG) to confirm the awk verification fails the job rather than silently no-ops.

@mfroembgen mfroembgen closed this Jun 29, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Deep Review

✅ No critical issues found. The core change — bumping otel-collector.image.tag from 2.19.0 to 2.28.0 to match appVersion — is correct, renders cleanly, and directly addresses the reported crashloop. The findings below are robustness and maintainability recommendations on the surrounding workflow/CI machinery; none block merge.

🟡 P2 -- recommended

  • .github/workflows/helm-test.yaml:64 -- The validation only string-matches the tag and renders the template; it never confirms a clickstack-otel-collector:<appVersion> image is actually published, so a future appVersion bump to an unpublished collector tag passes CI green and fails at deploy time with ImagePullBackOff on the default (otel-collector.enabled: true) path.
    • Fix: Add a registry existence check (e.g. docker manifest inspect / crane manifest) for the resolved collector image to the workflow so a missing image fails CI instead of shipping an unpullable default.
    • reliability
  • .github/workflows/helm-test.yaml:64 -- grep -F hardcodes the exact rendered line image: "…:<tag>" (double-quoted, single space), but that formatting is produced by the upstream opentelemetry-collector subchart this repo does not control; an upstream quoting/spacing change would fail the whole helm-test job under set -euo pipefail even when the tag is correct.
    • Fix: Match a format-tolerant pattern such as grep -E 'clickstack-otel-collector:'"${APP_VERSION}" rather than the fully-quoted fixed string.
    • correctness, testing
  • charts/clickstack/values.yaml:374 -- The new collector-tag alignment behavior is guarded only by the shell/grep step in helm-test.yaml; no helm-unittest test asserts the rendered collector image tag, despite AGENTS.md calling for default-value coverage per template.
    • Fix: Add a helm-unittest assertion (or explicitly document that alignment is intentionally CI-guarded because the collector is an aliased subchart) so the coverage gap is deliberate and visible.
    • testing
  • .github/workflows/update-app-version.yml:44 -- The "find collector image block, then its tag" logic is re-encoded in three awk programs (rewrite, verify, and the helm-test.yaml reader) with divergent regexes and is coupled to the exact 2-space/4-space indentation via the ^[[:space:]]{0,2} block-boundary guard; a values.yaml reindent or reorder could make the scripts silently disagree.
    • Fix: Consolidate to a single shared helper, or address the value by YAML path with yq (e.g. yq -i '."otel-collector".image.tag = strenv(TAG)'), eliminating the indentation coupling.
    • maintainability, correctness
🔵 P3 nitpicks (4)
  • .github/workflows/update-app-version.yml:62 -- The verify awk matches the repository line without the trailing [[:space:]]*$ anchor used by the rewrite awk, so the two passes are not guaranteed to target the same line if a second matching repository entry is ever added.
    • Fix: Use an identical (or shared) regex in both the rewrite and verify awk blocks.
  • .github/workflows/helm-test.yaml:42 -- The appVersion read awk does not strip surrounding quotes (gsub(/"/, "", $2)) while the collector-tag read does, so if Chart.yaml appVersion were ever quoted the equality check would falsely fail.
    • Fix: Strip quotes symmetrically when reading appVersion.
  • .github/workflows/update-app-version.yml:18 -- The workflow_dispatch tag input is written verbatim into the collector image tag with no format validation, so a v-prefixed or typo'd value yields a non-existent image reference that only fails at pull time.
    • Fix: Validate the input against a semver regex before the rewrite steps so malformed tags fail fast.
  • AGENTS.md:153 -- The PR updated the update-app-version.yml CI-table row for the new tag-alignment behavior but left the helm-test.yaml row as "Unit tests + example validation," omitting the newly added appVersion/collector-tag validation step.
    • Fix: Note the new drift-validation step in the helm-test.yaml row for doc completeness.

Reviewers (5): correctness, testing, maintainability, project-standards, reliability.

Testing gaps:

  • No helm-unittest test asserts the rendered collector image tag or appVersion == tag equality; only the new CI grep guards it.
  • The update-app-version.yml awk rewrite/verify path is exercised only at workflow_dispatch runtime — no test covers regex/indentation drift or edge-case tag values, so a broken rewrite would surface only on a release dispatch.

@mfroembgen mfroembgen deleted the codex/219-align-otel-collector-2-28 branch June 29, 2026 09:20
wrn14897 added a commit that referenced this pull request Jun 30, 2026
## Problem

The `clickstack` chart on current `main` has `appVersion: 2.28.0`, but
the default bundled collector image is still pinned to
`docker.clickhouse.com/clickhouse/clickstack-otel-collector:2.19.0`.

As reported in #219, the older collector rejects the ClickHouse exporter
`json` key pushed by newer ClickStack app versions:

```console
'clickhouseexporter.Config' has invalid keys: json
```

That causes the default collector pod to crash loop.

## Changes

- Bump the default `otel-collector.image.tag` from `2.19.0` to `2.28.0`,
matching the current chart `appVersion`.
- Update the appVersion workflow so future appVersion bumps also update
the collector image tag.
- Add a patch changeset for the chart fix.

## Verification

```console
helm lint charts/clickstack
helm template smoke charts/clickstack --set clickhouse.persistence.enabled=false --set mongodb.persistence.enabled=false | rg 'clickstack-otel-collector|hyperdx:2\.28\.0|2\.19\.0|2\.28\.0'
```

The rendered default collector image is now:

```yaml
image: "docker.clickhouse.com/clickhouse/clickstack-otel-collector:2.28.0"
```

Fixes #219.

Related to #220. Replaces closed #233 with only the original first
commit.

Co-authored-by: Warren Lee <5959690+wrn14897@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

clickstack 3.0.0 defaults collector image tag to 2.19.0 while appVersion is 2.27.0

2 participants