Harden deploy migration flow during local and Azure deploys#148
Harden deploy migration flow during local and Azure deploys#148
Conversation
Agent-Logs-Url: https://github.com/DevExpGbb/gh-devlake/sessions/a286b823-c9cc-41e5-9805-7ad02eb0248b Co-authored-by: ewega <26189114+ewega@users.noreply.github.com>
Agent-Logs-Url: https://github.com/DevExpGbb/gh-devlake/sessions/a286b823-c9cc-41e5-9805-7ad02eb0248b Co-authored-by: ewega <26189114+ewega@users.noreply.github.com>
Agent-Logs-Url: https://github.com/DevExpGbb/gh-devlake/sessions/a286b823-c9cc-41e5-9805-7ad02eb0248b Co-authored-by: ewega <26189114+ewega@users.noreply.github.com>
Agent-Logs-Url: https://github.com/DevExpGbb/gh-devlake/sessions/a286b823-c9cc-41e5-9805-7ad02eb0248b Co-authored-by: ewega <26189114+ewega@users.noreply.github.com>
Agent-Logs-Url: https://github.com/DevExpGbb/gh-devlake/sessions/a286b823-c9cc-41e5-9805-7ad02eb0248b Co-authored-by: ewega <26189114+ewega@users.noreply.github.com>
Agent-Logs-Url: https://github.com/DevExpGbb/gh-devlake/sessions/a286b823-c9cc-41e5-9805-7ad02eb0248b Co-authored-by: ewega <26189114+ewega@users.noreply.github.com>
Agent-Logs-Url: https://github.com/DevExpGbb/gh-devlake/sessions/a286b823-c9cc-41e5-9805-7ad02eb0248b Co-authored-by: ewega <26189114+ewega@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR hardens the DevLake database migration step used by the GitHub CLI extension during deploy local and deploy azure, making the trigger phase retryable and ensuring the flow always waits for migration completion.
Changes:
- Adds a shared
triggerAndWaitForMigrationhelper that retries the migration trigger and then polls/pinguntil migration completes. - Updates local and Azure deploy flows to use the shared resilient migration helper.
- Improves
Client.TriggerMigration()to fail on non-2xx responses and (when present) surface response body details; adds targeted tests.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
internal/devlake/client.go |
Makes TriggerMigration return errors on non-2xx responses and include response body details. |
internal/devlake/client_test.go |
Adds table-driven tests for TriggerMigration status handling. |
cmd/helpers.go |
Enhances migration waiting messages and introduces the shared trigger+wait helper with retry logic. |
cmd/helpers_migration_test.go |
Adds focused tests for retry/timeout behavior around the shared migration helper. |
cmd/deploy_local.go |
Switches local deploy migration behavior to the new shared helper and improved warnings. |
cmd/deploy_azure.go |
Switches Azure deploy migration behavior to the new shared helper and improved warnings. |
Comments suppressed due to low confidence (1)
cmd/helpers.go:278
lastErris never cleared when a later trigger attempt succeeds. If attempt 1 fails and attempt 2 succeeds,lastErrstays non-nil, so the function prints "Continuing to monitor…" incorrectly and may return a misleading combined error if waiting later fails. Track atriggerSucceededbool (or resetlastErr = nilon success) and only treat trigger as failed if all attempts fail.
var lastErr error
for attempt := 1; attempt <= triggerAttempts; attempt++ {
err := devlakeClient.TriggerMigration()
if err == nil {
fmt.Println(" ✅ Migration triggered")
break
}
lastErr = err
fmt.Printf(" ⚠️ Trigger attempt %d/%d failed: %v\n", attempt, triggerAttempts, err)
if attempt < triggerAttempts {
fmt.Println(" DevLake may still be starting or migration may already be running — retrying...")
time.Sleep(triggerInterval)
}
}
fmt.Println("\n⏳ Waiting for migration to complete...")
if lastErr != nil {
fmt.Println(" Continuing to monitor migration status anyway...")
}
if err := waitForMigration(baseURL, waitAttempts, waitInterval); err != nil {
if lastErr != nil {
return fmt.Errorf("migration trigger failed earlier (%v) and waiting for migration completion also failed: %w", lastErr, err)
}
| func triggerAndWaitForMigration(baseURL string) error { | ||
| return triggerAndWaitForMigrationWithClient(baseURL, devlake.NewClient(baseURL), 3, 10*time.Second, 60, 5*time.Second) | ||
| } | ||
|
|
||
| func triggerAndWaitForMigrationWithClient(baseURL string, devlakeClient *devlake.Client, triggerAttempts int, triggerInterval time.Duration, waitAttempts int, waitInterval time.Duration) error { | ||
| fmt.Println("\n🔄 Triggering database migration...") | ||
|
|
There was a problem hiding this comment.
triggerAndWaitForMigrationWithClient takes both baseURL and a devlakeClient that already carries a BaseURL. If these ever diverge, migration trigger and migration wait will hit different instances. Consider deriving the wait URL from devlakeClient.BaseURL (or validating they match) to avoid accidental mismatches.
This issue also appears on line 256 of the same file.
internal/devlake/client.go
Outdated
| // TriggerMigration triggers the DevLake database migration endpoint. | ||
| func (c *Client) TriggerMigration() error { | ||
| resp, err := c.HTTPClient.Get(c.BaseURL + "/proceed-db-migration") | ||
| if err != nil { | ||
| return err | ||
| } | ||
| resp.Body.Close() | ||
| return nil | ||
| } | ||
| if err != nil { | ||
| return fmt.Errorf("triggering migration: %w", err) | ||
| } | ||
| defer resp.Body.Close() | ||
| if resp.StatusCode < http.StatusOK || resp.StatusCode >= http.StatusMultipleChoices { | ||
| body, _ := io.ReadAll(io.LimitReader(resp.Body, 512)) | ||
| bodyText := strings.TrimSpace(string(body)) | ||
| if bodyText != "" { | ||
| return fmt.Errorf("DevLake returned status %d: %s", resp.StatusCode, bodyText) | ||
| } | ||
| return fmt.Errorf("DevLake returned status %d", resp.StatusCode) | ||
| } |
There was a problem hiding this comment.
TriggerMigration error messages drop the request context (endpoint/path), which makes logs harder to interpret compared to other client helpers that include the HTTP method and path. Consider including /proceed-db-migration (and ideally resp.Status) in the returned error so callers can quickly identify the failing call.
See below for a potential fix:
path := "/proceed-db-migration"
resp, err := c.HTTPClient.Get(c.BaseURL + path)
if err != nil {
return fmt.Errorf("GET %s: triggering migration: %w", path, err)
}
defer resp.Body.Close()
if resp.StatusCode < http.StatusOK || resp.StatusCode >= http.StatusMultipleChoices {
body, _ := io.ReadAll(io.LimitReader(resp.Body, 512))
bodyText := strings.TrimSpace(string(body))
if bodyText != "" {
return fmt.Errorf("GET %s: DevLake returned %s: %s", path, resp.Status, bodyText)
}
return fmt.Errorf("GET %s: DevLake returned %s", path, resp.Status)
| func TestTriggerAndWaitForMigrationWithClient_RetriesBeforeWaiting(t *testing.T) { | ||
| triggerCalls := 0 | ||
| pingCalls := 0 | ||
|
|
||
| srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { | ||
| switch r.URL.Path { | ||
| case "/proceed-db-migration": | ||
| triggerCalls++ | ||
| if triggerCalls == 1 { | ||
| w.WriteHeader(http.StatusServiceUnavailable) | ||
| return | ||
| } | ||
| w.WriteHeader(http.StatusOK) | ||
| case "/ping": | ||
| pingCalls++ | ||
| w.WriteHeader(http.StatusOK) | ||
| default: | ||
| http.NotFound(w, r) | ||
| } | ||
| })) | ||
| defer srv.Close() | ||
|
|
||
| client := devlake.NewClient(srv.URL) | ||
|
|
||
| err := triggerAndWaitForMigrationWithClient(srv.URL, client, 2, time.Millisecond, 2, time.Millisecond) | ||
| if err != nil { | ||
| t.Fatalf("unexpected error: %v", err) | ||
| } | ||
| if triggerCalls != 2 { | ||
| t.Fatalf("trigger calls = %d, want 2", triggerCalls) | ||
| } | ||
| if pingCalls != 1 { | ||
| t.Fatalf("ping calls = %d, want 1", pingCalls) | ||
| } | ||
| } |
There was a problem hiding this comment.
The new migration helper has an edge case where an early trigger failure followed by a later success should not be treated as a trigger failure (and should not produce the combined "trigger failed earlier" error). Adding a focused test for "first trigger fails, later succeeds, then wait fails" would lock this behavior in and prevent regressions.
| var lastErr error | ||
| for attempt := 1; attempt <= triggerAttempts; attempt++ { | ||
| err := devlakeClient.TriggerMigration() | ||
| if err == nil { | ||
| fmt.Println(" ✅ Migration triggered") | ||
| break | ||
| } | ||
| lastErr = err | ||
| fmt.Printf(" ⚠️ Trigger attempt %d/%d failed: %v\n", attempt, triggerAttempts, err) | ||
| if attempt < triggerAttempts { | ||
| fmt.Println(" DevLake may still be starting or migration may already be running — retrying...") | ||
| time.Sleep(triggerInterval) | ||
| } | ||
| } | ||
|
|
||
| fmt.Println("\n⏳ Waiting for migration to complete...") | ||
| if lastErr != nil { | ||
| fmt.Println(" Continuing to monitor migration status anyway...") | ||
| } |
There was a problem hiding this comment.
In triggerAndWaitForMigrationWithClient, if an early trigger attempt fails and a later attempt succeeds, lastErr is never cleared. This leads to misleading output ("Continuing to monitor… anyway") and can produce an incorrect combined error claiming the trigger failed even when it eventually succeeded. Consider resetting lastErr to nil on success or tracking success with a separate boolean.
cmd/helpers.go
Outdated
| } | ||
|
|
||
| func triggerAndWaitForMigration(baseURL string) error { | ||
| return triggerAndWaitForMigrationWithClient(baseURL, devlake.NewClient(baseURL), 3, 10*time.Second, 60, 5*time.Second) |
There was a problem hiding this comment.
triggerAndWaitForMigration creates a devlake.NewClient(baseURL), whose default HTTP timeout is 90s. With multiple trigger attempts, a single hung request could significantly extend deploy time (e.g., 3×90s + intervals) before even starting the migration wait loop. Consider using a shorter timeout specifically for TriggerMigration (or passing in an HTTP client/timeout) so retries remain responsive.
| return triggerAndWaitForMigrationWithClient(baseURL, devlake.NewClient(baseURL), 3, 10*time.Second, 60, 5*time.Second) | |
| client := devlake.NewClientWithTimeout(baseURL, 10*time.Second) | |
| return triggerAndWaitForMigrationWithClient(baseURL, client, 3, 10*time.Second, 60, 5*time.Second) |
internal/devlake/client.go
Outdated
| if resp.StatusCode < http.StatusOK || resp.StatusCode >= http.StatusMultipleChoices { | ||
| body, _ := io.ReadAll(io.LimitReader(resp.Body, 512)) | ||
| bodyText := strings.TrimSpace(string(body)) | ||
| if bodyText != "" { | ||
| return fmt.Errorf("DevLake returned status %d: %s", resp.StatusCode, bodyText) | ||
| } | ||
| return fmt.Errorf("DevLake returned status %d", resp.StatusCode) | ||
| } |
There was a problem hiding this comment.
TriggerMigration’s non-2xx error message omits which endpoint/method failed (it only reports the status code and optional body). Including the request context (e.g., GET /proceed-db-migration or full URL) would make failures easier to diagnose, especially when multiple DevLake instances are involved.
| func TestTriggerMigration(t *testing.T) { | ||
| tests := []struct { | ||
| name string | ||
| statusCode int | ||
| wantErr bool | ||
| }{ | ||
| { | ||
| name: "success", | ||
| statusCode: http.StatusOK, | ||
| }, | ||
| { | ||
| name: "no content", | ||
| statusCode: http.StatusNoContent, | ||
| }, | ||
| { | ||
| name: "server error", | ||
| statusCode: http.StatusServiceUnavailable, | ||
| wantErr: true, | ||
| }, | ||
| } | ||
|
|
||
| for _, tt := range tests { | ||
| t.Run(tt.name, func(t *testing.T) { | ||
| srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { | ||
| if r.URL.Path != "/proceed-db-migration" { | ||
| t.Errorf("path = %s, want /proceed-db-migration", r.URL.Path) | ||
| } | ||
| w.WriteHeader(tt.statusCode) | ||
| })) | ||
| defer srv.Close() | ||
|
|
||
| client := NewClient(srv.URL) | ||
| err := client.TriggerMigration() | ||
|
|
||
| if tt.wantErr { | ||
| if err == nil { | ||
| t.Fatal("expected error, got nil") | ||
| } | ||
| return | ||
| } | ||
| if err != nil { | ||
| t.Fatalf("unexpected error: %v", err) | ||
| } | ||
| }) |
There was a problem hiding this comment.
TestTriggerMigration currently only checks error vs. no error by status code. Since TriggerMigration was changed to include response details when available, it would be valuable to add a failing case that returns a body and assert the error string includes the status code and body content (and that body is omitted when empty).
| time.Sleep(25 * time.Millisecond) | ||
| w.WriteHeader(http.StatusOK) | ||
| case "/ping": | ||
| pingCalls++ | ||
| if pingCalls == 1 { | ||
| w.WriteHeader(http.StatusPreconditionRequired) | ||
| return | ||
| } | ||
| w.WriteHeader(http.StatusOK) | ||
| default: | ||
| http.NotFound(w, r) | ||
| } | ||
| })) | ||
| defer srv.Close() | ||
|
|
||
| client := &devlake.Client{ | ||
| BaseURL: srv.URL, | ||
| HTTPClient: &http.Client{ | ||
| Timeout: 5 * time.Millisecond, | ||
| }, | ||
| } | ||
|
|
||
| err := triggerAndWaitForMigrationWithClient(srv.URL, client, 1, time.Millisecond, 3, time.Millisecond) | ||
| if err != nil { |
There was a problem hiding this comment.
The migration helper tests use extremely small timeouts/intervals (1–5ms). These can be flaky on slower/loaded CI environments due to scheduler and timer granularity. Consider increasing the durations (while keeping the tests fast) to reduce nondeterminism, e.g., using tens of milliseconds and slightly larger retry intervals.
|
@copilot Please address these review comments:
|
Agent-Logs-Url: https://github.com/DevExpGbb/gh-devlake/sessions/f7542969-e4d0-4329-bfa1-0cb12581ab4b Co-authored-by: ewega <26189114+ewega@users.noreply.github.com>
Agent-Logs-Url: https://github.com/DevExpGbb/gh-devlake/sessions/f7542969-e4d0-4329-bfa1-0cb12581ab4b Co-authored-by: ewega <26189114+ewega@users.noreply.github.com>
Addressed in 1efb3a3. The helper now clears stale trigger errors after a successful retry, waits against |
| // TriggerMigration triggers the DevLake database migration endpoint. | ||
| func (c *Client) TriggerMigration() error { | ||
| resp, err := c.HTTPClient.Get(c.BaseURL + "/proceed-db-migration") | ||
| const path = "/proceed-db-migration" | ||
|
|
||
| resp, err := c.HTTPClient.Get(c.BaseURL + path) | ||
| if err != nil { | ||
| return err | ||
| return fmt.Errorf("GET %s: triggering migration: %w", path, err) | ||
| } | ||
| defer resp.Body.Close() | ||
| if resp.StatusCode < http.StatusOK || resp.StatusCode >= http.StatusMultipleChoices { | ||
| body, _ := io.ReadAll(io.LimitReader(resp.Body, 512)) | ||
| bodyText := strings.TrimSpace(string(body)) | ||
| if bodyText != "" { | ||
| return fmt.Errorf("GET %s: DevLake returned %s: %s", path, resp.Status, bodyText) | ||
| } | ||
| return fmt.Errorf("GET %s: DevLake returned %s", path, resp.Status) |
There was a problem hiding this comment.
TriggerMigration is still doing a raw HTTPClient.Get and custom status/body handling instead of using the existing doGet/doPost request helpers that centralize error formatting and response handling. To keep API calls consistent and avoid duplicating response parsing logic, consider adding a small shared helper for non-JSON/empty-body endpoints (e.g., a doGetNoBody/doRequest that checks for 2xx and optionally captures a limited response body) and use it here.
| lastStatus := 0 | ||
| for attempt := 1; attempt <= maxAttempts; attempt++ { | ||
| resp, err := httpClient.Get(baseURL + "/ping") | ||
| if err == nil { | ||
| lastStatus = resp.StatusCode | ||
| resp.Body.Close() | ||
| if resp.StatusCode == http.StatusOK { | ||
| fmt.Println(" ✅ Migration complete!") | ||
| return nil | ||
| } | ||
| } | ||
| fmt.Printf(" Migrating... (%d/%d)\n", attempt, maxAttempts) | ||
| statusSuffix := "" | ||
| if lastStatus != 0 { | ||
| statusSuffix = fmt.Sprintf(", status=%d", lastStatus) | ||
| } | ||
| fmt.Printf(" Migrating... (%d/%d%s)\n", attempt, maxAttempts, statusSuffix) |
There was a problem hiding this comment.
waitForMigration keeps lastStatus from the last successful HTTP response; if a later attempt errors (timeout/DNS/etc.), the progress output can still show a stale status=... even though the current attempt didn’t get a response. Consider resetting lastStatus to 0 on request errors (or tracking/printing the last error separately) so the status suffix always reflects an actual response.
Summary
deploy localanddeploy azureTriggerMigrationfail on non-2xx responses and include response details when availableValidation
go build ./...go test ./...go vet ./...