Skip to content

Monitoring API: remove maximum retention days prometheus#2851

Open
marioferh wants to merge 4 commits into
openshift:masterfrom
marioferh:fix_prometheus_retention_days
Open

Monitoring API: remove maximum retention days prometheus#2851
marioferh wants to merge 4 commits into
openshift:masterfrom
marioferh:fix_prometheus_retention_days

Conversation

@marioferh
Copy link
Copy Markdown
Contributor

@marioferh marioferh commented May 20, 2026

Remove Maximum validation and documentation for durationInDays (365)
and sizeInGiB (16384) on Prometheus Retention. We do not have enough
customer or operational data to pick defensible API-level ceilings;
arbitrary caps could block valid large-cluster configurations or
misalign with what CMO/Prometheus can actually support.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

Hello @marioferh! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The PR converts Prometheus retention to an hours-based field in the API: the Go type Renamed DurationInDaysDurationInHours (int32) while preserving the JSON name durationInDays for wire compatibility, updates the PrometheusConfig.Retention comment to state an omitted default of 360 hours, and sets a minimum of 1 hour (the prior days-based maximum was removed). The generated ClusterMonitoring CRD schema updates the retention field’s documentation/validation to hours and increases retention.sizeInGiB maximum from 16384 to 2147483647.

Suggested reviewers

  • JoelSpeed
🚥 Pre-merge checks | ✅ 15
✅ Passed checks (15 passed)
Check name Status Explanation
Title check ✅ Passed The title describes the main change: removing maximum retention validation for Prometheus, though 'days' phrasing is slightly imprecise given the broader scope.
Description check ✅ Passed The description clearly explains the rationale for removing maximum validation constraints on both durationInDays and sizeInGiB, directly addressing the PR's core intent.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All 137 Ginkgo test names are stable, descriptive, and static with no dynamic content like generated IDs or timestamps.
Test Structure And Quality ✅ Passed PR contains no Ginkgo test code—only API type definitions and CRD schema changes. Custom check for test structure is not applicable.
Microshift Test Compatibility ✅ Passed This PR does not add any new Ginkgo e2e tests; it only modifies API type definitions and CRD manifests for cluster monitoring configuration. The check is not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No new Ginkgo e2e tests are added in this PR. The changes only modify API type definitions and CRD schemas for Prometheus retention configuration. The SNO compatibility check is not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed PR only modifies API type definitions and CRD schemas for Prometheus retention; no deployment manifests, operator code, or scheduling constraints are added or modified.
Ote Binary Stdout Contract ✅ Passed PR modifies only Kubernetes API type definitions and CRD YAML manifest; no executable binary code, test binaries, or stdout-writing process-level functions are present.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No Ginkgo e2e tests are added in this PR. Changes are limited to Go type definitions and CRD YAML files for Prometheus monitoring configuration, making this check inapplicable.
No-Weak-Crypto ✅ Passed PR only modifies Prometheus retention configuration schema; contains no cryptographic code, crypto imports, or security operations subject to the weak-crypto check.
Container-Privileges ✅ Passed Check is not applicable to this PR; no container/pod security configurations exist in the monitoring API schema changes reviewed.
No-Sensitive-Data-In-Logs ✅ Passed The PR modifies only Prometheus retention configuration types and CRD schemas. No logging code is introduced, and no sensitive data (passwords, tokens, keys, PII) is logged or exposed in the changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 20, 2026
@openshift-ci openshift-ci Bot requested review from JoelSpeed and everettraven May 20, 2026 14:35
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 20, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign everettraven for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@marioferh marioferh force-pushed the fix_prometheus_retention_days branch 2 times, most recently from 80e0c7d to 03a1de9 Compare May 20, 2026 14:57
@marioferh
Copy link
Copy Markdown
Contributor Author

@danielmellado
Copy link
Copy Markdown
Contributor

danielmellado commented May 20, 2026

While this should be ok not having a tombstone as we're just loosening limits, the openapi.json diff includes unrelated changes (NetworkObservabilitySpec, registries descriptions, softirqs collector, KMS default) which are likely from other PRs that merged into master since the branch was created. Could you rebase on latest master so the openapi.json diff only shows the PR changes? This would avoid confusion during review and prevent accidentally shipping stale generated data.

@marioferh marioferh force-pushed the fix_prometheus_retention_days branch from 03a1de9 to 854e0ad Compare May 20, 2026 15:23
@danielmellado
Copy link
Copy Markdown
Contributor

danielmellado commented May 20, 2026

Hey @marioferh, as commented before we go with just removing the max constraints, I think we should also address the granularity issue that Ayoub and Simon raised.

Right now durationInDays: int32 can't express sub-day retention like 15h or 24h (which is the default for UWM), and sizeInGiB: int32 can't express sub-GiB sizes like 500MiB. The CMO ConfigMap and the Prometheus Operator CRD both accept freeform Prometheus duration/size strings, so customers migrating from retention: "15h" would have no equivalent in the new CRD.

I think we should push back on the integer-only convention for these two fields and switch to string fields with the same Prometheus pattern validation that the Prometheus Operator CRD uses:

duration: string with pattern ^(0|(([0-9]+)y)?(([0-9]+)w)?(([0-9]+)d)?(([0-9]+)h)?(([0-9]+)m)?(([0-9]+)s)?(([0-9]+)ms)?)$
size: string with pattern ^(0|([0-9]*[.])?[0-9]+((K|M|G|T|E|P)i?)?B)$
This way CMO can pass the values straight through to the Prometheus CR without any lossy conversion, and all existing ConfigMap values remain valid. The patterns are well-defined and fully validatable at admission time, so we're not losing any safety vs. int32.

@everettraven I know we discussed avoiding duration strings back in #2653, but this comes from the Prometheus Operator CR itself, which validates retention with ^(0|(([0-9]+)y)?(([0-9]+)w)?(([0-9]+)d)?(([0-9]+)h)?(([0-9]+)m)?(([0-9]+)s)?(([0-9]+)ms)?)$ and size with (^0|([0-9]*[.])?[0-9]+((K|M|G|T|E|P)i?)?B)$ (https://github.com/openshift/cluster-monitoring-operator/blob/0ce8e4c0bce00381802c25d92ed238775139ee82/assets/monitoring-operator/00_0prometheus-custom-resource-definition.yaml). CMO passes these values straight through to the Prometheus CR, so using durationInDays: int32 and sizeInGiB: int32 introduces a lossy conversion layer that breaks existing configs like retention: "15h" or retentionSize: "500MiB" that customers can set today in the ConfigMap. Would you be open to using string fields with these same Prometheus patterns for this specific case? They're fully validatable at admission time via the regex, and since this is v1alpha1 behind a feature gate we can still change it.

Worst case, as a fallback, we may set hours, but wanted to get back your thoughts on this! Thanks!

Minimum value is 1 day.
Maximum value is 365 days (1 year).
format: int32
maximum: 365
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note, a maximum of 2,147,483,647 here is equivalent to ~58835 centuries. Seems like an unreasonably high number that would never be set.

Without more context, I would think it to be reasonable to increase this to something like 3650 to account for ~10 years of retention. That seems pretty high to me for metrics, but 🤷

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@everettraven
Copy link
Copy Markdown
Contributor

Hey @marioferh, as commented before we go with just removing the max constraints, I think we should also address the granularity issue that Ayoub and Simon raised.

Right now durationInDays: int32 can't express sub-day retention like 15h or 24h (which is the default for UWM), and sizeInGiB: int32 can't express sub-GiB sizes like 500MiB. The CMO ConfigMap and the Prometheus Operator CRD both accept freeform Prometheus duration/size strings, so customers migrating from retention: "15h" would have no equivalent in the new CRD.

I think we should push back on the integer-only convention for these two fields and switch to string fields with the same Prometheus pattern validation that the Prometheus Operator CRD uses:

duration: string with pattern ^(0|(([0-9]+)y)?(([0-9]+)w)?(([0-9]+)d)?(([0-9]+)h)?(([0-9]+)m)?(([0-9]+)s)?(([0-9]+)ms)?)$ size: string with pattern ^(0|([0-9]*[.])?[0-9]+((K|M|G|T|E|P)i?)?B)$ This way CMO can pass the values straight through to the Prometheus CR without any lossy conversion, and all existing ConfigMap values remain valid. The patterns are well-defined and fully validatable at admission time, so we're not losing any safety vs. int32.

@everettraven I know we discussed avoiding duration strings back in #2653, but this comes from the Prometheus Operator CR itself, which validates retention with ^(0|(([0-9]+)y)?(([0-9]+)w)?(([0-9]+)d)?(([0-9]+)h)?(([0-9]+)m)?(([0-9]+)s)?(([0-9]+)ms)?)$ and size with (^0|([0-9]*[.])?[0-9]+((K|M|G|T|E|P)i?)?B)$ (https://github.com/openshift/cluster-monitoring-operator/blob/0ce8e4c0bce00381802c25d92ed238775139ee82/assets/monitoring-operator/00_0prometheus-custom-resource-definition.yaml). CMO passes these values straight through to the Prometheus CR, so using durationInDays: int32 and sizeInGiB: int32 introduces a lossy conversion layer that breaks existing configs like retention: "15h" or retentionSize: "500MiB" that customers can set today in the ConfigMap. Would you be open to using string fields with these same Prometheus patterns for this specific case? They're fully validatable at admission time via the regex, and since this is v1alpha1 behind a feature gate we can still change it.

Worst case, as a fallback, we may set hours, but wanted to get back your thoughts on this! Thanks!

We strongly encourage not using durations in favor of a unit-based integer, but if you folks feel strongly that this is the best UX for your end-users I'm not going to prevent you from doing so. I'm not the expert on your user base here, you folks are :).

That being said, if the concern is solely that you cannot reasonably express smaller unit values that are likely to occur, you can change the unit you are using for the integer field to the smallest reasonable unit. For example, durationInSeconds and sizeInBytes.

Even if this is in v1alpha1 we still have to abide by tombstoning rules if we change the serialization of the fields unless we want to move this to a v1alpha2.

@marioferh marioferh force-pushed the fix_prometheus_retention_days branch from 854e0ad to 71e6477 Compare May 28, 2026 11:23
@openshift-ci openshift-ci Bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 28, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@config/v1alpha1/types_cluster_monitoring.go`:
- Around line 2238-2247: The public JSON name was changed and will break
existing manifests; restore wire compatibility by keeping the original JSON
field name for the struct field DurationInHours (e.g. change the tag to
`json:"durationInDays,omitempty"` so existing persisted objects continue to
round-trip) and add a deprecation/kubebuilder comment on the field explaining it
represents hours (or add a new explicitly named field for hours in a new API
version/conversion path if you intend to support both); update the struct field
DurationInHours and its kubebuilder tags accordingly and ensure any API
docs/comments reflect the restored wire name and units.
- Around line 2243-2255: Remove the hard upper-bound metadata and docs for
retention and storage fields: delete the `+kubebuilder:validation:Maximum=87600`
and the "Maximum value is 87600 hours..." doc line for the DurationInHours field
(symbol: DurationInHours) and delete the
`+kubebuilder:validation:Maximum=2147483647` and the "Maximum value is
2147483647 GiB..." doc line for the sizeInGiB field (symbol: sizeInGiB); keep
the Minimum validations and the rest of the comments intact so the API no longer
advertises or enforces an upper bound.

In
`@payload-manifests/crds/0000_10_config-operator_01_clustermonitorings.crd.yaml`:
- Around line 4136-4147: The CRD change renamed the serialized retention field
in v1alpha1 from spec.prometheusConfig.retention.durationInDays to
durationInHours which will break existing stored objects and clients; revert the
rename in the v1alpha1 CRD so the field remains durationInDays, restore its
original description/format/minimum/maximum, and instead introduce any new
representation by adding a new API version (e.g., v1beta1) with durationInHours
plus proper conversion webhooks or conversion funcs and tombstoning logic to
translate between durationInDays and durationInHours during reads/writes.
- Around line 4143-4145: The CRD still enforces a 10-year ceiling via the schema
entry "maximum: 87600"; remove that maximum constraint so the API no longer
rejects larger retention values at admission time. Locate the clustermonitorings
CRD schema where "format: int32" and "maximum: 87600" appear (the
retention/hours field) and delete the "maximum: 87600" line (and any equivalent
numeric maximums on related retention fields) so the property is unconstrained
by the CRD schema. Leave the type/format (e.g., "type: integer" / "format:
int32") intact if needed, and run schema validation tests to ensure no other
admission-side maximums remain.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 003efd47-3c8c-4d49-aac1-a3b47834cee3

📥 Commits

Reviewing files that changed from the base of the PR and between 03a1de9 and 71e6477.

⛔ Files ignored due to path filters (5)
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings.crd.yaml is excluded by !**/zz_generated.crd-manifests/*
  • config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • config/v1alpha1/zz_generated.swagger_doc_generated.go is excluded by !**/zz_generated*
  • openapi/generated_openapi/zz_generated.openapi.go is excluded by !openapi/**, !**/zz_generated*
  • openapi/openapi.json is excluded by !openapi/**
📒 Files selected for processing (2)
  • config/v1alpha1/types_cluster_monitoring.go
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings.crd.yaml

Comment thread config/v1alpha1/types_cluster_monitoring.go Outdated
Comment thread config/v1alpha1/types_cluster_monitoring.go Outdated
Comment thread payload-manifests/crds/0000_10_config-operator_01_clustermonitorings.crd.yaml Outdated
Comment thread payload-manifests/crds/0000_10_config-operator_01_clustermonitorings.crd.yaml Outdated
@danielmellado
Copy link
Copy Markdown
Contributor

@everettraven besides the override on the CI, would we need to tombstone this due to the filed removal? (change, technically, but anyway). Thanks!

@simonpasquier
Copy link
Copy Markdown

We do not have enough customer or operational data to pick defensible API-level ceilings

We should collect data via telemetry in the first place.

@simonpasquier
Copy link
Copy Markdown

We strongly encourage not using durations in favor of a unit-based integer, but if you folks feel strongly that this is the best UX for your end-users I'm not going to prevent you from doing so. I'm not the expert on your user base here, you folks are :).

What was proposed is to adopt duration format already supported by the upstream Prometheus operator CRDs. These CRDs aren't perfect but they have been there for many years and I bet that our users are familiar with the format.

@marioferh marioferh force-pushed the fix_prometheus_retention_days branch from 71e6477 to fcd3ab5 Compare May 28, 2026 14:39
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

// +optional
DurationInDays int32 `json:"durationInDays,omitempty"`
DurationInHours int32 `json:"durationInDays,omitempty"`
// sizeInGiB specifies the maximum storage size in gibibytes (GiB) that Prometheus
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is gibibytes a small enough unit? Is there a reasonable use case to limit this to smaller than a gibibyte?

// +kubebuilder:validation:Maximum=365
// +optional
DurationInDays int32 `json:"durationInDays,omitempty"`
DurationInHours int32 `json:"durationInDays,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't change the semantics of an existing field name either. The correct way to do this is to tombstone the durationInDays field by commenting it out entirely (and adding a notice that the field has been tombstoned) and adding the new durationInHours field to replace it.

We aren't concerned about existing instances because this API will only ever be present in a TPNU cluster, but we are concerned about potential clients that may serialize/deserialize based on the existing field name which is why we tombstone here (to make sure we never introduce a field with the same name but different serialization to prevent breaking clients). The tombstone can be removed when the API version is increased.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add tombstone

@marioferh
Copy link
Copy Markdown
Contributor Author

after the discussion changed to string to maintain compatibly with Prometheus upstream even is string are not recommended for API

@danielmellado
Copy link
Copy Markdown
Contributor

We strongly encourage not using durations in favor of a unit-based integer, but if you folks feel strongly that this is the best UX for your end-users I'm not going to prevent you from doing so. I'm not the expert on your user base here, you folks are :).

What was proposed is to adopt duration format already supported by the upstream Prometheus operator CRDs. These CRDs aren't perfect but they have been there for many years and I bet that our users are familiar with the format.

I agree here with Simon and Mario in that the Promotheus CRD should serve as inspiration for this to maintain the base user familiarity with them. @everettraven could we make a compromise and keep that for now? besides your comment there, could you PTAL? Thanks!

@marioferh marioferh force-pushed the fix_prometheus_retention_days branch from 30547f6 to dda9e79 Compare May 29, 2026 11:28
@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 29, 2026
marioferh and others added 2 commits May 29, 2026 13:37
Express Prometheus retention duration in hours under the existing
durationInDays JSON field for wire compatibility. Remove operational
maximums (365 days, 16384 GiB, and the interim 87600-hour cap); keep
sizeInGiB bounded by the int32 maximum at admission time.

Signed-off-by: Mario Fernandez <mariofer@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Replace int32 durationInDays and sizeInGiB with duration and size string
fields validated by the upstream Prometheus Operator CRD patterns so CMO
can pass values through without lossy conversion. Tombstone the previous
fields and add integration tests for valid and invalid retention values.

Co-authored-by: Cursor <cursoragent@cursor.com>
@marioferh marioferh force-pushed the fix_prometheus_retention_days branch from dda9e79 to 0486146 Compare May 29, 2026 11:43
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 29, 2026
Comment on lines +2238 to +2242
// durationInDays is tombstoned since the field was replaced by duration.
// DurationInDays int32 `json:"durationInDays,omitempty"`

// sizeInGiB is tombstoned since the field was replaced by size.
// SizeInGiB int32 `json:"sizeInGiB,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please maintain the entire original comment for context.

You can add something like:

// TOMBSTONE: This field was tombstoned in favor of `duration`.
// ---

Above the existing field comments when tombstoning.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// The current default value is `15d`.
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=64
// +kubebuilder:validation:XValidation:rule=`self.matches('^(0|(([0-9]+)y)?(([0-9]+)w)?(([0-9]+)d)?(([0-9]+)h)?(([0-9]+)m)?(([0-9]+)s)?(([0-9]+)ms)?)$')`,message="must be a valid Prometheus duration string"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is setting this field to 0 any different than setting it to 0y or 0d?

Should we require that if you are not disabling retention all of your units must specify a value greater than 0?

For example, 0y10m5d1h30m doesn't really make sense to have the 0y to me.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"0", "0d", and "0y" are semantically equivalent (zero duration) in Prometheus parsing. We intentionally use the same validation pattern as the Prometheus Operator retention field so CMO can pass values through unchanged.

Upstream does not reject zero-valued components like 0y in composite durations, so we are not adding stricter admission rules here. We document "0" as the canonical way to disable time-based retention and recommend single-unit forms like "15d" or "24h" in field comments, but validation matches upstream.

Note: composite durations must follow the fixed unit order (y→w→d→h→m→s→ms), so values like 0y10m5d are invalid; valid examples with zero components include 0y5d1h30m or 0d15h.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"0", "0d", and "0y" are semantically equivalent (zero duration) in Prometheus parsing. I intentionally use the same validation pattern as the Prometheus Operator retention field so CMO can pass values through unchanged.
I can add a comment if it is neccesary

// +kubebuilder:validation:Maximum=16384
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=32
// +kubebuilder:validation:XValidation:rule=`self.matches('^(0|([0-9]*[.])?[0-9]+((K|M|G|T|E|P)i?)?B)$')`,message="must be a valid Prometheus byte-size string"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar question here, if you aren't setting this to 0, should you be required to specify a non-zero value for a unit?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prometheus Operator’s ByteSize pattern admits 500B ,the scale part is optional, you just need the trailing B. I copied that on purpose so retention.size can go straight through to retentionSize on the Prometheus CR without conversion.
If it is needed to be more strict I can change the CEL expression even if diverge from the other CR

Comment on lines +2245 to +2246
// The format mimics the Prometheus Operator CRD retention field validation pattern so values
// can be passed through to the Prometheus custom resource without conversion.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't necessary. End users configuring this API won't care why this field is formatted the way it is like this. This godoc is used to generate end-user facing documentation so we should avoid leaking developer-specific documentation here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +2262 to +2263
// The format mimics the Prometheus Operator CRD retentionSize field validation pattern so values
// can be passed through to the Prometheus custom resource without conversion.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment on dev-specific documentation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@everettraven
Copy link
Copy Markdown
Contributor

I agree here with Simon and Mario in that the Promotheus CRD should serve as inspiration for this to maintain the base user familiarity with them. @everettraven could we make a compromise and keep that for now? besides your comment there, could you PTAL? Thanks!

I'm OK with compromising here. I've left some additional comments related to the current changeset

marioferh and others added 2 commits May 29, 2026 16:08
Document that "0", "0d", and "0y" are equivalent zero durations while "0"
remains canonical for disabling time-based retention. Recommend single-unit
forms without adding stricter validation than the Prometheus Operator CRD.
Expand tombstones with full original field comments using Former marker lines
to avoid confusing kubebuilder code generation.

Co-authored-by: Cursor <cursoragent@cursor.com>
Keep deprecated durationInDays and sizeInGiB alongside the new Prometheus
duration/size fields with mutual exclusion validation to satisfy CRD schema
checks. Revert transitMount to optional, trim user-facing godoc, and align
integration test expectations with CEL validation error messages.

Co-authored-by: Cursor <cursoragent@cursor.com>
@openshift-ci openshift-ci Bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 29, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 29, 2026

@marioferh: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants