Skip to content

Harden deploy migration flow during local and Azure deploys#148

Open
Copilot wants to merge 9 commits intomainfrom
copilot/investigate-db-migration-error
Open

Harden deploy migration flow during local and Azure deploys#148
Copilot wants to merge 9 commits intomainfrom
copilot/investigate-db-migration-error

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 25, 2026

Summary

  • add a shared migration helper that retries the trigger endpoint and always waits for migration completion afterward
  • use the resilient migration flow in both deploy local and deploy azure
  • make TriggerMigration fail on non-2xx responses and include response details when available
  • add focused tests covering timeout, retry, and trigger status handling

Validation

  • go build ./...
  • go test ./...
  • go vet ./...

Copilot AI and others added 7 commits March 25, 2026 20:52
Agent-Logs-Url: https://github.com/DevExpGbb/gh-devlake/sessions/a286b823-c9cc-41e5-9805-7ad02eb0248b

Co-authored-by: ewega <26189114+ewega@users.noreply.github.com>
Copilot AI requested a review from ewega March 25, 2026 21:01
@ewega ewega marked this pull request as ready for review March 25, 2026 22:26
Copilot AI review requested due to automatic review settings March 25, 2026 22:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the DevLake database migration step used by the GitHub CLI extension during deploy local and deploy azure, making the trigger phase retryable and ensuring the flow always waits for migration completion.

Changes:

  • Adds a shared triggerAndWaitForMigration helper that retries the migration trigger and then polls /ping until migration completes.
  • Updates local and Azure deploy flows to use the shared resilient migration helper.
  • Improves Client.TriggerMigration() to fail on non-2xx responses and (when present) surface response body details; adds targeted tests.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
internal/devlake/client.go Makes TriggerMigration return errors on non-2xx responses and include response body details.
internal/devlake/client_test.go Adds table-driven tests for TriggerMigration status handling.
cmd/helpers.go Enhances migration waiting messages and introduces the shared trigger+wait helper with retry logic.
cmd/helpers_migration_test.go Adds focused tests for retry/timeout behavior around the shared migration helper.
cmd/deploy_local.go Switches local deploy migration behavior to the new shared helper and improved warnings.
cmd/deploy_azure.go Switches Azure deploy migration behavior to the new shared helper and improved warnings.
Comments suppressed due to low confidence (1)

cmd/helpers.go:278

  • lastErr is never cleared when a later trigger attempt succeeds. If attempt 1 fails and attempt 2 succeeds, lastErr stays non-nil, so the function prints "Continuing to monitor…" incorrectly and may return a misleading combined error if waiting later fails. Track a triggerSucceeded bool (or reset lastErr = nil on success) and only treat trigger as failed if all attempts fail.
	var lastErr error
	for attempt := 1; attempt <= triggerAttempts; attempt++ {
		err := devlakeClient.TriggerMigration()
		if err == nil {
			fmt.Println("   ✅ Migration triggered")
			break
		}
		lastErr = err
		fmt.Printf("   ⚠️  Trigger attempt %d/%d failed: %v\n", attempt, triggerAttempts, err)
		if attempt < triggerAttempts {
			fmt.Println("   DevLake may still be starting or migration may already be running — retrying...")
			time.Sleep(triggerInterval)
		}
	}

	fmt.Println("\n⏳ Waiting for migration to complete...")
	if lastErr != nil {
		fmt.Println("   Continuing to monitor migration status anyway...")
	}
	if err := waitForMigration(baseURL, waitAttempts, waitInterval); err != nil {
		if lastErr != nil {
			return fmt.Errorf("migration trigger failed earlier (%v) and waiting for migration completion also failed: %w", lastErr, err)
		}

Comment on lines +249 to +255
func triggerAndWaitForMigration(baseURL string) error {
return triggerAndWaitForMigrationWithClient(baseURL, devlake.NewClient(baseURL), 3, 10*time.Second, 60, 5*time.Second)
}

func triggerAndWaitForMigrationWithClient(baseURL string, devlakeClient *devlake.Client, triggerAttempts int, triggerInterval time.Duration, waitAttempts int, waitInterval time.Duration) error {
fmt.Println("\n🔄 Triggering database migration...")

Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

triggerAndWaitForMigrationWithClient takes both baseURL and a devlakeClient that already carries a BaseURL. If these ever diverge, migration trigger and migration wait will hit different instances. Consider deriving the wait URL from devlakeClient.BaseURL (or validating they match) to avoid accidental mismatches.

This issue also appears on line 256 of the same file.

Copilot uses AI. Check for mistakes.
Comment on lines +507 to +521
// TriggerMigration triggers the DevLake database migration endpoint.
func (c *Client) TriggerMigration() error {
resp, err := c.HTTPClient.Get(c.BaseURL + "/proceed-db-migration")
if err != nil {
return err
}
resp.Body.Close()
return nil
}
if err != nil {
return fmt.Errorf("triggering migration: %w", err)
}
defer resp.Body.Close()
if resp.StatusCode < http.StatusOK || resp.StatusCode >= http.StatusMultipleChoices {
body, _ := io.ReadAll(io.LimitReader(resp.Body, 512))
bodyText := strings.TrimSpace(string(body))
if bodyText != "" {
return fmt.Errorf("DevLake returned status %d: %s", resp.StatusCode, bodyText)
}
return fmt.Errorf("DevLake returned status %d", resp.StatusCode)
}
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TriggerMigration error messages drop the request context (endpoint/path), which makes logs harder to interpret compared to other client helpers that include the HTTP method and path. Consider including /proceed-db-migration (and ideally resp.Status) in the returned error so callers can quickly identify the failing call.

See below for a potential fix:

	path := "/proceed-db-migration"
	resp, err := c.HTTPClient.Get(c.BaseURL + path)
	if err != nil {
		return fmt.Errorf("GET %s: triggering migration: %w", path, err)
	}
	defer resp.Body.Close()
	if resp.StatusCode < http.StatusOK || resp.StatusCode >= http.StatusMultipleChoices {
		body, _ := io.ReadAll(io.LimitReader(resp.Body, 512))
		bodyText := strings.TrimSpace(string(body))
		if bodyText != "" {
			return fmt.Errorf("GET %s: DevLake returned %s: %s", path, resp.Status, bodyText)
		}
		return fmt.Errorf("GET %s: DevLake returned %s", path, resp.Status)

Copilot uses AI. Check for mistakes.
Comment on lines +54 to +88
func TestTriggerAndWaitForMigrationWithClient_RetriesBeforeWaiting(t *testing.T) {
triggerCalls := 0
pingCalls := 0

srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
switch r.URL.Path {
case "/proceed-db-migration":
triggerCalls++
if triggerCalls == 1 {
w.WriteHeader(http.StatusServiceUnavailable)
return
}
w.WriteHeader(http.StatusOK)
case "/ping":
pingCalls++
w.WriteHeader(http.StatusOK)
default:
http.NotFound(w, r)
}
}))
defer srv.Close()

client := devlake.NewClient(srv.URL)

err := triggerAndWaitForMigrationWithClient(srv.URL, client, 2, time.Millisecond, 2, time.Millisecond)
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if triggerCalls != 2 {
t.Fatalf("trigger calls = %d, want 2", triggerCalls)
}
if pingCalls != 1 {
t.Fatalf("ping calls = %d, want 1", pingCalls)
}
}
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new migration helper has an edge case where an early trigger failure followed by a later success should not be treated as a trigger failure (and should not produce the combined "trigger failed earlier" error). Adding a focused test for "first trigger fails, later succeeds, then wait fails" would lock this behavior in and prevent regressions.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Comment on lines +256 to +274
var lastErr error
for attempt := 1; attempt <= triggerAttempts; attempt++ {
err := devlakeClient.TriggerMigration()
if err == nil {
fmt.Println(" ✅ Migration triggered")
break
}
lastErr = err
fmt.Printf(" ⚠️ Trigger attempt %d/%d failed: %v\n", attempt, triggerAttempts, err)
if attempt < triggerAttempts {
fmt.Println(" DevLake may still be starting or migration may already be running — retrying...")
time.Sleep(triggerInterval)
}
}

fmt.Println("\n⏳ Waiting for migration to complete...")
if lastErr != nil {
fmt.Println(" Continuing to monitor migration status anyway...")
}
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In triggerAndWaitForMigrationWithClient, if an early trigger attempt fails and a later attempt succeeds, lastErr is never cleared. This leads to misleading output ("Continuing to monitor… anyway") and can produce an incorrect combined error claiming the trigger failed even when it eventually succeeded. Consider resetting lastErr to nil on success or tracking success with a separate boolean.

Copilot uses AI. Check for mistakes.
cmd/helpers.go Outdated
}

func triggerAndWaitForMigration(baseURL string) error {
return triggerAndWaitForMigrationWithClient(baseURL, devlake.NewClient(baseURL), 3, 10*time.Second, 60, 5*time.Second)
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

triggerAndWaitForMigration creates a devlake.NewClient(baseURL), whose default HTTP timeout is 90s. With multiple trigger attempts, a single hung request could significantly extend deploy time (e.g., 3×90s + intervals) before even starting the migration wait loop. Consider using a shorter timeout specifically for TriggerMigration (or passing in an HTTP client/timeout) so retries remain responsive.

Suggested change
return triggerAndWaitForMigrationWithClient(baseURL, devlake.NewClient(baseURL), 3, 10*time.Second, 60, 5*time.Second)
client := devlake.NewClientWithTimeout(baseURL, 10*time.Second)
return triggerAndWaitForMigrationWithClient(baseURL, client, 3, 10*time.Second, 60, 5*time.Second)

Copilot uses AI. Check for mistakes.
Comment on lines +514 to +521
if resp.StatusCode < http.StatusOK || resp.StatusCode >= http.StatusMultipleChoices {
body, _ := io.ReadAll(io.LimitReader(resp.Body, 512))
bodyText := strings.TrimSpace(string(body))
if bodyText != "" {
return fmt.Errorf("DevLake returned status %d: %s", resp.StatusCode, bodyText)
}
return fmt.Errorf("DevLake returned status %d", resp.StatusCode)
}
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TriggerMigration’s non-2xx error message omits which endpoint/method failed (it only reports the status code and optional body). Including the request context (e.g., GET /proceed-db-migration or full URL) would make failures easier to diagnose, especially when multiple DevLake instances are involved.

Copilot uses AI. Check for mistakes.
Comment on lines +912 to +955
func TestTriggerMigration(t *testing.T) {
tests := []struct {
name string
statusCode int
wantErr bool
}{
{
name: "success",
statusCode: http.StatusOK,
},
{
name: "no content",
statusCode: http.StatusNoContent,
},
{
name: "server error",
statusCode: http.StatusServiceUnavailable,
wantErr: true,
},
}

for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.URL.Path != "/proceed-db-migration" {
t.Errorf("path = %s, want /proceed-db-migration", r.URL.Path)
}
w.WriteHeader(tt.statusCode)
}))
defer srv.Close()

client := NewClient(srv.URL)
err := client.TriggerMigration()

if tt.wantErr {
if err == nil {
t.Fatal("expected error, got nil")
}
return
}
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
})
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestTriggerMigration currently only checks error vs. no error by status code. Since TriggerMigration was changed to include response details when available, it would be valuable to add a failing case that returns a body and assert the error string includes the status code and body content (and that body is omitted when empty).

Copilot uses AI. Check for mistakes.
Comment on lines +20 to +43
time.Sleep(25 * time.Millisecond)
w.WriteHeader(http.StatusOK)
case "/ping":
pingCalls++
if pingCalls == 1 {
w.WriteHeader(http.StatusPreconditionRequired)
return
}
w.WriteHeader(http.StatusOK)
default:
http.NotFound(w, r)
}
}))
defer srv.Close()

client := &devlake.Client{
BaseURL: srv.URL,
HTTPClient: &http.Client{
Timeout: 5 * time.Millisecond,
},
}

err := triggerAndWaitForMigrationWithClient(srv.URL, client, 1, time.Millisecond, 3, time.Millisecond)
if err != nil {
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The migration helper tests use extremely small timeouts/intervals (1–5ms). These can be flaky on slower/loaded CI environments due to scheduler and timer granularity. Consider increasing the durations (while keeping the tests fast) to reduce nondeterminism, e.g., using tens of milliseconds and slightly larger retry intervals.

Copilot uses AI. Check for mistakes.
@ewega
Copy link
Copy Markdown
Contributor

ewega commented Mar 30, 2026

@copilot Please address these review comments:

  1. cmd/helpers.go:274 — In triggerAndWaitForMigrationWithClient, lastErr is never cleared when a later trigger attempt succeeds. If attempt 1 fails and attempt 2 succeeds, lastErr stays non-nil, so the function incorrectly prints "Continuing to monitor…" and may return a misleading combined error. Reset lastErr = nil on success (inside the if err == nil block after the break).

  2. cmd/helpers.go:255triggerAndWaitForMigrationWithClient takes both baseURL and a devlakeClient that already carries a BaseURL. If these diverge, the trigger and wait hit different instances. Derive the wait URL from devlakeClient.BaseURL instead of taking a separate baseURL parameter.

  3. internal/devlake/client.go:521TriggerMigration error messages drop the request context (endpoint/path). Other client helpers include the HTTP method and path. Include GET /proceed-db-migration and resp.Status in the returned errors for diagnostic consistency.

  4. cmd/helpers_migration_test.go:88 — Add a focused test for the edge case: "first trigger fails, later trigger succeeds, then wait fails" — the error should NOT include "migration trigger failed earlier" since the trigger ultimately succeeded.

Copilot AI and others added 2 commits March 30, 2026 16:24
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Mar 30, 2026

@copilot Please address these review comments:

  1. cmd/helpers.go:274 — In triggerAndWaitForMigrationWithClient, lastErr is never cleared when a later trigger attempt succeeds. If attempt ...

Addressed in 1efb3a3. The helper now clears stale trigger errors after a successful retry, waits against devlakeClient.BaseURL, TriggerMigration includes GET /proceed-db-migration plus resp.Status in failures, and there’s a regression test covering fail → succeed trigger followed by wait failure.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Comment on lines 507 to +522
// TriggerMigration triggers the DevLake database migration endpoint.
func (c *Client) TriggerMigration() error {
resp, err := c.HTTPClient.Get(c.BaseURL + "/proceed-db-migration")
const path = "/proceed-db-migration"

resp, err := c.HTTPClient.Get(c.BaseURL + path)
if err != nil {
return err
return fmt.Errorf("GET %s: triggering migration: %w", path, err)
}
defer resp.Body.Close()
if resp.StatusCode < http.StatusOK || resp.StatusCode >= http.StatusMultipleChoices {
body, _ := io.ReadAll(io.LimitReader(resp.Body, 512))
bodyText := strings.TrimSpace(string(body))
if bodyText != "" {
return fmt.Errorf("GET %s: DevLake returned %s: %s", path, resp.Status, bodyText)
}
return fmt.Errorf("GET %s: DevLake returned %s", path, resp.Status)
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TriggerMigration is still doing a raw HTTPClient.Get and custom status/body handling instead of using the existing doGet/doPost request helpers that centralize error formatting and response handling. To keep API calls consistent and avoid duplicating response parsing logic, consider adding a small shared helper for non-JSON/empty-body endpoints (e.g., a doGetNoBody/doRequest that checks for 2xx and optionally captures a limited response body) and use it here.

Copilot uses AI. Check for mistakes.
Comment on lines +224 to +239
lastStatus := 0
for attempt := 1; attempt <= maxAttempts; attempt++ {
resp, err := httpClient.Get(baseURL + "/ping")
if err == nil {
lastStatus = resp.StatusCode
resp.Body.Close()
if resp.StatusCode == http.StatusOK {
fmt.Println(" ✅ Migration complete!")
return nil
}
}
fmt.Printf(" Migrating... (%d/%d)\n", attempt, maxAttempts)
statusSuffix := ""
if lastStatus != 0 {
statusSuffix = fmt.Sprintf(", status=%d", lastStatus)
}
fmt.Printf(" Migrating... (%d/%d%s)\n", attempt, maxAttempts, statusSuffix)
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waitForMigration keeps lastStatus from the last successful HTTP response; if a later attempt errors (timeout/DNS/etc.), the progress output can still show a stale status=... even though the current attempt didn’t get a response. Consider resetting lastStatus to 0 on request errors (or tracking/printing the last error separately) so the status suffix always reflects an actual response.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants