Flaky test: publication_manager_test race condition in relay tracker restart

## Summary

The test `"handles relation tracker restart"` in `publication_manager_test.exs:503` has a race condition that causes intermittent CI failures.

Observed in run [22354924262](https://github.com/electric-sql/electric/actions/runs/22354924262) on `main` (2026-02-24).

## Error

```
** (exit) exited in: GenServer.call({:via, Registry, {:"Electric.ProcessRegistry:...", {Electric.Replication.PublicationManager.RelationTracker, nil}}}, {:remove_shape, "36215155-..."}, 5000)
    ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name
```

## Root Cause

The test at `test/electric/replication/publication_manager_test.exs:503`:

1. **Line 515**: `GenServer.stop(relation_tracker_name)` kills the RelationTracker
2. **Line 519**: `assert_pub_tables(ctx, [ctx.relation], 2_000)` polls **Postgres publication tables** until they match
3. **Line 522**: `PublicationManager.remove_shape(ctx.stack_id, shape_handle)` does a `GenServer.call` to the RelationTracker

The problem is that `assert_pub_tables` checks Postgres state, not whether the RelationTracker GenServer has been re-registered by the supervisor. There's a window where publication tables are correct (from the previous state) but the new RelationTracker process isn't yet alive or hasn't finished `handle_continue(:restore_relations, ...)`.

## Suggested Fix

Call `RelationTracker.wait_for_restore(ctx.stack_id)` before `remove_shape` on line 522. This function already exists (line 79-82 of `relation_tracker.ex`) and blocks until `handle_continue(:restore_relations)` completes, which guarantees the process is registered and ready.

## Context: Broader CI Flakiness

While investigating, I looked at all `sync-service` workflow failures from the last 2 days: **12 failures vs 14 successes (~46% failure rate)**. The failures are spread across many test files — only 1 of the 12 was this `publication_manager_test`:

| Test file | Failures |
|-----------|----------|
| `shape_cache_test.exs:501` | 4 |
| `request_batcher_test.exs:100` | 2 |
| `publication_manager_test.exs:503` | 1 |
| `api_test.exs:925` | 1 |
| `delete_shape_plug_test.exs:100` | 1 |
| `shape_db_test.exs:553` | 1 |
| `shape_cache_test.exs:877` | 1 |

Test file	Failures
`shape_cache_test.exs:501`	4
`request_batcher_test.exs:100`	2
`publication_manager_test.exs:503`	1
`api_test.exs:925`	1
`delete_shape_plug_test.exs:100`	1
`shape_db_test.exs:553`	1
`shape_cache_test.exs:877`	1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky test: publication_manager_test race condition in relay tracker restart #3909

Summary

Error

Root Cause

Suggested Fix

Context: Broader CI Flakiness

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Flaky test: publication_manager_test race condition in relay tracker restart #3909

Description

Summary

Error

Root Cause

Suggested Fix

Context: Broader CI Flakiness

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions