-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Fix hang in cloudstack-sysvmadm script #12355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## 4.20 #12355 +/- ##
============================================
- Coverage 16.23% 16.23% -0.01%
Complexity 13378 13378
============================================
Files 5657 5657
Lines 498866 498875 +9
Branches 60545 60546 +1
============================================
- Hits 81011 81008 -3
- Misses 408821 408831 +10
- Partials 9034 9036 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@blueorangutan package |
|
@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16231 |
DaanHoogland
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clgtm (though I think some de-duplication can be done)
shwstppr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a critical hang issue in the cloudstack-sysvmadm script where background process management was broken due to incorrect bash syntax. The fix corrects array variable assignments and simplifies the parallel job execution logic.
- Fixed incorrect bash array syntax that was causing variables to be arrays instead of scalars, leading to infinite loops
- Replaced complex and buggy manual process-checking logic with simple
waitcommands - Improved efficiency by moving thread limit adjustments outside loops
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Results Summary
| Test | Name | Threads | Time | Status |
|---|---|---|---|---|
| 1 | Minimum sequential | -t 1 |
18m 22s | PASS |
| 2 | Non-divisible | -t 7 |
5m 48s | PASS |
| 3 | Default (original bug) | -t 10 |
4m 38s | PASS |
| 4 | Boundary (count - 1) | -t 14 |
5m 23s | PASS |
| 5 | Boundary (exact match) | -t 15 |
4m 02s | PASS |
| 6 | Boundary (exact match repeat) | -t 15 |
4m 12s | PASS |
| 7 | Boundary (count + 1) | -t 16 |
4m 02s | PASS |
| 8 | Much larger | -t 20 |
4m 13s | PASS |
Key Observations
1. Bug is fixed: All tests completed without hanging, including the original bug scenario (Test 3: 15 routers with -t 10).
2. Performance scales as expected:
| Threads | Time | Batches | Analysis |
|---|---|---|---|
| 1 | 18m 22s | 15 | ~73 sec/router sequentially |
| 7 | 5m 48s | 3 | (7+7+1) |
| 10 | 4m 38s | 2 | (10+5) |
| 14 | 5m 23s | 2 | (14+1) - second batch waits for just 1 |
| 15+ | ~4m | 1 | All parallel - fastest |
3. Optimal performance: -t 15 or higher gives best results (~4 min) since all routers restart in parallel.
4. Interesting observation: -t 14 (5m 23s) is slower than -t 10 (4m 38s) because the second batch only has 1 router but still waits for full cycle.
Detailed Test Restuls:
Test 1: Minimum sequential - one VR at a time
[root@ref-trl-10580-k-Mol9-rositsa-kyuchukova-mgmt1 ~]# time cloudstack-sysvmadm -d localhost -u root -p P@ssword123 -r -t 1
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql: [Warning] Using a password on the command line interface can be insecure.
Stopping and starting 15 running routing vm(s)...
Done restarting router(s).
real 18m22.504s
user 0m2.444s
sys 0m3.900s
Test 2: t=7 - Non-divisible
[root@ref-trl-10580-k-Mol9-rositsa-kyuchukova-mgmt1 ~]# time cloudstack-sysvmadm -d localhost -u root -p <password> -r -t 7
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql: [Warning] Using a password on the command line interface can be insecure.
Stopping and starting 15 running routing vm(s)...
Done restarting router(s).
real 5m48.742s
user 0m3.155s
sys 0m5.210s
Test 3: Default/original bug (t=10) - 15 not divisible by 10
[root@ref-trl-10580-k-Mol9-rositsa-kyuchukova-mgmt1 ~]# time cloudstack-sysvmadm -d localhost -u root -p <password> -r
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql: [Warning] Using a password on the command line interface can be insecure.
Stopping and starting 15 running routing vm(s)...
Done restarting router(s).
real 4m38.045s
user 0m3.518s
sys 0m5.757s
Test 4: Boundary (count - 1) / t=14
[root@ref-trl-10580-k-Mol9-rositsa-kyuchukova-mgmt1 ~]# time cloudstack-sysvmadm -d localhost -u root -p P@ssword123 -r -t 14
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql: [Warning] Using a password on the command line interface can be insecure.
Stopping and starting 15 running routing vm(s)...
Done restarting router(s).
real 5m23.235s
user 0m4.968s
sys 0m8.096s
Test 5 Boundary (exact match) t=15 - exactly equals router count
[root@ref-trl-10580-k-Mol9-rositsa-kyuchukova-mgmt1 ~]# time cloudstack-sysvmadm -d localhost -u root -p P@ssword123 -r -t 15
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql: [Warning] Using a password on the command line interface can be insecure.
Stopping and starting 15 running routing vm(s)...
Done restarting router(s).
real 4m2.676s
user 0m5.016s
sys 0m8.023s
Test 6: Boundary (exact match) t=15
[root@ref-trl-10580-k-Mol9-rositsa-kyuchukova-mgmt1 ~]# time cloudstack-sysvmadm -d localhost -u root -p P@ssword123 -r -t 15
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql: [Warning] Using a password on the command line interface can be insecure.
Stopping and starting 15 running routing vm(s)...
Done restarting router(s).
real 4m12.920s
user 0m5.087s
sys 0m8.252s
Test 7: Boundary (count + 1) t=16
[root@ref-trl-10580-k-Mol9-rositsa-kyuchukova-mgmt1 ~]# time cloudstack-sysvmadm -d localhost -u root -p P@ssword123 -r -t 16
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql: [Warning] Using a password on the command line interface can be insecure.
Stopping and starting 15 running routing vm(s)...
Done restarting router(s).
real 4m2.942s
user 0m5.023s
sys 0m8.066s
Test 8: Much larger t=20 - more threads than routers
[root@ref-trl-10580-k-Mol9-rositsa-kyuchukova-mgmt1 ~]# time cloudstack-sysvmadm -d localhost -u root -p P@ssword123 -r -t 20
mysql: [Warning] Using a password on the command line interface can be insecure.
mysql: [Warning] Using a password on the command line interface can be insecure.
Stopping and starting 15 running routing vm(s)...
Done restarting router(s).
real 4m13.027s
user 0m5.047s
sys 0m8.166s
Description
This PR fixes #12067
(see #12067 (comment) for description)
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?
How did you try to break this feature and the system with this change?