-
Notifications
You must be signed in to change notification settings - Fork 70
✨ Support ClusterExtension progress deadline detection
#2447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
✅ Deploy Preview for olmv1 ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
4dbbdd5 to
98f42da
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2447 +/- ##
==========================================
+ Coverage 69.48% 73.02% +3.53%
==========================================
Files 101 101
Lines 7701 7729 +28
==========================================
+ Hits 5351 5644 +293
+ Misses 1914 1638 -276
- Partials 436 447 +11
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
98f42da to
5ed4b38
Compare
ClusterExtension progress deadline detection
| // +kubebuilder:validation:Maximum:=720 | ||
| // +optional | ||
| // <opcon:experimental> | ||
| ProgressDeadlineMinutes int32 `json:"progressDeadlineMinutes,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add tests for this as well or do you feel it's unnecessary? Or if we could somehow run the same set of tests against the fields in both APIs it would be nice; particularly because we really want these to always match and that could serve as protection against one changing and not the other. As unlikely as it is that would actually happen 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea, done.
4eaad24 to
9788589
Compare
|
|
||
| if pd := existingRev.Spec.ProgressDeadlineMinutes; pd > 0 { | ||
| cnd := meta.FindStatusCondition(reconciledRev.Status.Conditions, ocv1.ClusterExtensionRevisionTypeProgressing) | ||
| isStillProgressing := cnd != nil && cnd.Status == metav1.ConditionTrue && cnd.Reason != ocv1.ReasonSucceeded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible that some disruption after the revision reached Progressing/Succeeded could cause the timeout to be triggered since we are comparing to the creation timestamp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible that some disruption after the revision reached Progressing/Succeeded could cause the timeout to be triggered since we are comparing to the creation timestamp?
hm, can you sketch a scenario? are we ever flipping from Progressing/Succeeded to Progressing/Retrying, i.e. what could be a cause for that?
| markAsNotProgressing(reconciledRev, ocv1.ReasonProgressDeadlineExceeded, fmt.Sprintf("Revision has not rolled out for %d minutes.", pd)) | ||
| reconcileErr = nil | ||
| res = ctrl.Result{} | ||
| } else if _, found := c.progressDeadlineCheckInFlight.Load(existingRev.GetUID()); !found && reconcileErr == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having trouble understanding what we are exactly doing here. Is this to make sure that that if nothing else happens, at least we come back at timeout to set the condition?
| timeout := time.Duration(pd) * time.Minute | ||
| if time.Since(existingRev.CreationTimestamp.Time) > timeout { | ||
| markAsNotProgressing(reconciledRev, ocv1.ReasonProgressDeadlineExceeded, fmt.Sprintf("Revision has not rolled out for %d minutes.", pd)) | ||
| reconcileErr = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be worth calling out why we are setting the error to nil. I guess its no not re-enqueue?
Adds optional `.spec.progressDeadlineMinutes` field to `ClusterExtension` and `ClusterExtensionRevision` that defines the maximum time an extension version can take to roll out before being marked as failed. When configured, if a `ClusterExtensionRevision` fails to roll out within the specified duration, the `Progressing` condition is set to `False` with reason `ProgressDeadlineExceeded`. This signals that manual intervention is required and stops automatic retry attempts. Added unit and e2e test asserting the added behavior.
9788589 to
c0a66ab
Compare
Description
Adds optional
.spec.progressDeadlineMinutesfield toClusterExtensionand
ClusterExtensionRevisionthat definesthe maximum time an extension version can take to roll out before being marked as failed.
When configured, if a
ClusterExtensionRevisionfails to roll out withinthe specified duration, the
Progressingcondition is set toFalsewith reasonProgressDeadlineExceeded. This signals that manual intervention is requiredand stops automatic retry attempts.
Added unit and e2e test asserting the added behavior.
Reviewer Checklist