Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/bump-clickhouse-operator-v0.0.6.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"helm-charts": patch
---

chore(deps): bump clickhouse-operator-helm to v0.0.6
21 changes: 21 additions & 0 deletions .changeset/clickhouse-explicit-resources.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
"helm-charts": patch
---

fix(clickhouse): harden ClickHouse defaults for clickhouse-operator v0.0.6

Two operator-default changes in v0.0.6 broke the single-replica ClickHouse
deployment; the chart now overrides both:

- Explicit `containerTemplate.resources` (2Gi memory, 500m CPU request). The
operator otherwise applies a 512Mi default (request == limit as of v0.0.6),
which is too low for the full ClickStack schema and OOMKills the server
(exit 137) under ingestion plus background merges.

- `settings.enableDatabaseSync: false`. The operator now defaults this to true,
which creates the `default` database with the Replicated (DatabaseReplicated)
engine so table metadata lives in Keeper. That feature targets multi-replica
clusters; in a single-replica deployment a transient Keeper hiccup during
startup desyncs the Replicated database and silently drops all seeded tables,
which never come back. Keeping `default` Atomic stores tables on the
persistent data volume so they survive restarts.
6 changes: 3 additions & 3 deletions charts/clickstack-operators/Chart.lock
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,6 @@ dependencies:
version: 1.7.0
- name: clickhouse-operator-helm
repository: oci://ghcr.io/clickhouse
version: 0.0.2
digest: sha256:1daf572004da83b1836c8867f11198530652fee6905d4786a2d5eef87bc611cd
generated: "2026-03-04T16:52:51.068188-06:00"
version: 0.0.6
digest: sha256:5afcb0d78e0ceecf1a18f3f7dfb52ee2627b7acea2621ffa411ee7bfb530adf7
generated: "2026-06-19T17:41:30.214406+02:00"
2 changes: 1 addition & 1 deletion charts/clickstack-operators/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,6 @@ dependencies:
repository: https://mongodb.github.io/helm-charts
alias: mongodb-operator
- name: clickhouse-operator-helm
version: "~0.0.2"
version: "~0.0.6"
repository: oci://ghcr.io/clickhouse
alias: clickhouse-operator
6 changes: 3 additions & 3 deletions charts/clickstack-operators/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ mongodb-operator:
# See https://clickhouse.com/docs/clickhouse-operator/overview for all options
clickhouse-operator:
webhook:
enable: false
enabled: false
certManager:
enable: false
enabled: false
crd:
enable: true
enabled: true
22 changes: 22 additions & 0 deletions charts/clickstack/tests/clickhouse-deployment_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,20 @@ tests:
path: spec.dataVolumeClaimSpec.resources.requests.storage
value: 10Gi

- it: should set explicit container resources so the operator default does not OOMKill ClickHouse
templates:
- clickhouse/cluster.yaml
asserts:
- equal:
path: spec.containerTemplate.resources.requests.memory
value: 2Gi
- equal:
path: spec.containerTemplate.resources.limits.memory
value: 2Gi
- equal:
path: spec.containerTemplate.resources.requests.cpu
value: 500m

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minimum CPU requests are definitely good to have. Overprovisioning a node is a much bigger problem than pod OOMs, so I like this. Glad we don't implement cpu limits.


- it: should resolve keeperClusterRef template expression
templates:
- clickhouse/cluster.yaml
Expand Down Expand Up @@ -152,3 +166,11 @@ tests:
path: spec.settings.extraUsersConfig.users.app
- isNotNull:
path: spec.settings.extraUsersConfig.users.otelcollector

- it: should disable operator databaseSync so the default DB stays Atomic
templates:
- clickhouse/cluster.yaml
asserts:
- equal:
path: spec.settings.enableDatabaseSync
value: false
9 changes: 9 additions & 0 deletions charts/clickstack/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -328,6 +328,12 @@ clickhouse:
image:
repository: clickhouse/clickhouse-server
tag: "25.7-alpine"
resources:
requests:
cpu: 500m
memory: 2Gi
limits:
memory: 2Gi
replicas: 1
shards: 1
keeperClusterRef:
Expand All @@ -339,6 +345,9 @@ clickhouse:
requests:
storage: 10Gi
settings:
# Keep the `default` database Atomic; the Replicated engine the operator
# selects when this is true drops seeded tables on Keeper desync.
enableDatabaseSync: false
Comment on lines +348 to +350

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enableDatabaseSync has been there since the first release and has always been enabled by default.
There were no changes regarding Atomic->Replicated recreation since then.

It makes sense even for single-node clusters, as it helps scale further and keeps the engine set the same for different topologies.

drops seeded tables on Keeper desync

This should not happen. It never drops tables or recreates the default database as Replicated if any tables have been created(code)

If smth really gets dropped by the operator(somehow) give a repro and I'll fix

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GrigoryPervakov, thanks for following up on this PR. I dug into the issue and found the root cause. The collector (clickhouseexporter) creates an atomic database (https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/277638be46c8653cc6574f2f9a6d2597bd5c12da/exporter/clickhouseexporter/internal/clickhouse.go#L60-L73). Later, the operator drops that database and recreates it as a Replicated database (https://github.com/ClickHouse/clickhouse-operator/blob/ecd71949243d71f37a717d21acbba20424af03a1/internal/controller/clickhouse/commands.go#L274-L290), which causes the integration test to fail.

What do you suggest as the best way to proceed here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GrigoryPervakov I've created a draft PR ClickHouse/clickhouse-operator#255. Let me know what you think. For the ClickStack Helm chart, we should use the Replicated engine going forward. That will be done in a follow-up PR.

extraUsersConfig:
users:
app:
Expand Down
4 changes: 2 additions & 2 deletions integration-tests/full-stack/assert.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ echo "Waiting for services to initialize..."
sleep 30

echo "Waiting for all pods to be ready..."
kubectl wait --for=condition=Ready pods --all --timeout=600s || true
kubectl wait --for=condition=Ready pods --all --field-selector=status.phase!=Succeeded --timeout=600s || true

echo "Pod status:"
kubectl get pods -o wide
Expand All @@ -24,7 +24,7 @@ echo "Checking ClickHouseCluster CR..."
kubectl get clickhousecluster -o wide || true

echo "Waiting for all pods to be ready (final check)..."
kubectl wait --for=condition=Ready pods --all --timeout=600s
kubectl wait --for=condition=Ready pods --all --field-selector=status.phase!=Succeeded --timeout=600s

echo "Final pod status:"
kubectl get pods -o wide
Expand Down
Loading