Skip to content

Add timeouts, and test restore with network partition#2247

Open
aredridel wants to merge 2 commits into
tursodatabase:mainfrom
spice-labs-inc:as/timeouts
Open

Add timeouts, and test restore with network partition#2247
aredridel wants to merge 2 commits into
tursodatabase:mainfrom
spice-labs-inc:as/timeouts

Conversation

@aredridel
Copy link
Copy Markdown

I'm happy to discuss and/or rework this, but I found that sqld did not recover when object storage was non-responsive.

Adding timeouts to the S3 library fixes this for me, and failures are detected.

aredridel added 2 commits June 1, 2026 11:59
Add integration tests for libsql-server bottomless replication restore
behavior when interrupted by various failure modes.

Tests verify sqld can resume and complete an interrupted restore from
S3-compatible object storage (minio) without requiring a restart.

Test cases:
- basic_restore: Sanity check that sqld restores from minio
- sqld_interrupted: sqld killed mid-restore, restarted, completes
- minio_interrupted: minio stopped mid-restore, restarted, sqld retries
- network_partition: sqld disconnected from network mid-restore, reconnected

Infrastructure:
- Docker-based fixtures with isolated networks per test
- Unique container/network names and ports via atomic counters
- Port mapping (not host networking) for isolation
- Automatic cleanup of Docker resources after each test

Files added:
- tests/bottomless/mod.rs
- tests/bottomless/fixtures.rs
- tests/bottomless/basic_restore.rs
- tests/bottomless/sqld_interrupted.rs
- tests/bottomless/minio_interrupted.rs
- tests/bottomless/network_partition.rs
- tests/bottomless/README.md

Files modified:
- tests/tests.rs: Add bottomless module
- Cargo.toml: Add reqwest dev-dependency, remove duplicate hex
- Add LIBSQL_BOTTOMLESS_S3_READ_TIMEOUT_SECS (default 5s)
- Add LIBSQL_BOTTOMLESS_S3_CONNECT_TIMEOUT_SECS (default 5s)
- Add LIBSQL_BOTTOMLESS_S3_OPERATION_ATTEMPT_TIMEOUT_SECS (default 10s)
- Configure TimeoutConfig on aws_sdk_s3::Config in bottomless::replicator::Options::client_config()
- Update meta_store.rs Options construction to include new timeout fields
- Remove #[ignore] from network_partition test
- Fix test fixtures: endpoint timing, image caching, mut minio
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant