Skip to content

Add template and scaling logic#6414

Open
vofish wants to merge 2 commits intoGoogleCloudPlatform:masterfrom
kiryl-filatau:azure-5k
Open

Add template and scaling logic#6414
vofish wants to merge 2 commits intoGoogleCloudPlatform:masterfrom
kiryl-filatau:azure-5k

Conversation

@vofish
Copy link
Collaborator

@vofish vofish commented Jan 30, 2026

  • Add deployment template
  • Add basic scale up and down logic

kubernetes_scale_benchmark.Cleanup(bm_spec)
container_service.RunRetryableKubectlCommand(
['delete', 'deployment', 'app'],
timeout=kubernetes_scale_benchmark._GetScaleTimeout(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could consider sharing the command, but we need both the timeout to be different (here you are sharing but that scales based off pod count) & the name ('app' instead of 'scaleup') so I'll leave to you whether you want to refactor into a function taking 2 arguments timeout & name or just duplicate.

samples = kubernetes_scale_benchmark.ParseStatusChanges(
'node',
start_time,
resources_to_ignore=set(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the initial node(s) be ignored?

])
kubernetes_commands.WaitForRollout(
'deployment/app',
timeout=kubernetes_scale_benchmark._GetScaleTimeout(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to swap timeout usage since that scales with kubernetes_scale_benchmark flags. Possibly take a variable in that function like "number of nodes/pods" that defaults to kubernetes_scale_benchmark flags. Also remove the _ since you're making it non public. Otherwise, make a new function in this benchmark.



def _WaitForScaledNodesDeletion(initial_node_count: int) -> bool:
timeout = 20 * 60 + kubernetes_scale_benchmark._GetScaleTimeout()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to take a while huh? We should quiz Piotr (who's now in the chat) about this. As annoying as it would be, it's possible we're ok with waiting for hour(s).

)
_ScaleDeploymentReplicas(0)
if _WaitForScaledNodesDeletion(initial_node_count):
_ScaleDeploymentReplicas(NUM_NODES.value)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to also collect samples for the second deployment, and for the scale down. Add more ParseStatusChanges calls.

It also looks like that function already takes start_time - great! Expand that function to ignore timestamps older than the start time. Add a unit test to that effect as well.

# Do one scale up, scale down, then scale up again.
return []
_ScaleDeploymentReplicas(NUM_NODES.value)
samples = kubernetes_scale_benchmark.ParseStatusChanges(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some metadata to each set of samples (scale up 1, scale down, scale up 2) indicating which phase it was in. ie for these add metadata like "phase": "scaleup1".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants