fix(backup): handle S3-style object stores in config restore#4182
fix(backup): handle S3-style object stores in config restore#4182elangelo wants to merge 4 commits intoapache:mainfrom
Conversation
When restoring a backup, use both `exists()` and `listAll()` to check for the config directory. Object stores like S3 may not have an explicit directory marker object, but the config files are still present under that prefix. The previous check using only `exists()` would incorrectly throw an error in such cases.
…configset When a backup is copied through a local filesystem and re-uploaded to S3 (e.g. via `aws s3 sync`), the zero-byte directory marker objects created by S3BackupRepository are typically lost. This caused restore to fail with an error even though all configset files were present under the expected key prefix. Change BackupManager#uploadConfigDir to fall back to listing objects by key prefix when the directory marker is absent, so restore succeeds as long as the configset files themselves are present. A clear IllegalArgumentException is still thrown when neither marker nor files can be found. - Add unit tests covering the no-marker, truly-absent, and normal cases - Add documentation in the S3 backup/restore reference guide explaining the directory-marker behaviour and the graceful fallback Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| @@ -672,6 +672,14 @@ The location must already exist within your S3 bucket before you can perform any | |||
| Empty Location:: | |||
| If you do not want to use a sub-directory within your bucket to store your backup, you can use any of the following location options: `/`, `s3:/`, `s3://`. | |||
| However the location option is mandatory and you will receive an error when trying to perform backup operations without it. | |||
|
|
|||
| Directory Markers:: | |||
| S3 has no native concept of directories. | |||
There was a problem hiding this comment.
Is there some better approach that we SHOULD be using instead of this zero-byte marker objects appraoch? If you were doing this today, is this how you would fix it? It feels like you are doing a bug fix because of a more fundamentally flawed approach? Or, is this the way we do these typs of things?
There was a problem hiding this comment.
Well... I got sucked into this without indeed looking at the bigger picture. I'm not saying it's absolutely wrong to work with PREMARKER blobs, but it would certainly not how I would implement it.
The whole 'check if a file exists' is an anti-pattern anyway when working with 'cloud storage'. As even to check this you actually have to pay for that requests. It's way better to just try and deal with the error in case there is one.
I'm gonna look if it's possible to implement the way the backups themselves are taken and basically completely forget about the whole PREMARKER stuff. This should still be backwards compatibly with current backups that people rely on. The only restriction is that I will have to look into how the whole storage abstraction and see if I can make it work for both local and cloud storage without violating their way of working.
| ZkStateReader mockZkStateReader = mock(ZkStateReader.class); | ||
| ConfigSetService mockConfigSetService = mock(ConfigSetService.class); | ||
|
|
||
| // Simulate S3: no marker object → exists() is false, but listAll() finds the files |
There was a problem hiding this comment.
maybe just me, but I find mocks hard to understand. We have S3Mock as a dependency, could we be using it for these tests?
There was a problem hiding this comment.
I'll have a look into this. I missed that.
S3StorageClient.isDirectory: when no explicit directory-marker object exists (NoSuchKeyException), fall back to listing up to one object under the path prefix. This handles "virtual directories" that arise after an `aws s3 sync` round-trip, which silently drops zero-byte directory marker keys, causing restore to fail with NoSuchKey 404. CloudSolrClient: before computeIfAbsent for collectionRefreshes, evict any already-completed future via computeIfPresent. Without this, a stale done future is reused instead of submitting a fresh refresh, so callers (e.g. stale-state retry) never observe an updated ref. Also adds a unit test covering the virtual-directory fallback path and a changelog entry for the S3 fix.
Description
When restoring a backup, use both
exists()andlistAll()to check for the config directory. Object stores like S3 may not have an explicit directory marker object, but the config files are still present under that prefix. The previous check using onlyexists()would incorrectly throw an error in such cases.Solution
See description
For transparency Github Copilot was used to identify and fix the issue we were encountering with this functionality.
Tests
Manually verified on our solrcloud clusters
Checklist
Please review the following and check all that apply:
mainbranch../gradlew check.