Skip to content

fix: fall back to Cores API for delete-collection on standalone Solr (#33)#37

Closed
MohammadYusif wants to merge 1 commit into
apache:mainfrom
MohammadYusif:fix/issue-33
Closed

fix: fall back to Cores API for delete-collection on standalone Solr (#33)#37
MohammadYusif wants to merge 1 commit into
apache:mainfrom
MohammadYusif:fix/issue-33

Conversation

@MohammadYusif
Copy link
Copy Markdown

Summary

  • delete-collection crashes with HTTP 500 ("Aliases don't exist in a non-cloud context") when targeting standalone Solr, because the Collections API unconditionally checks ZooKeeper-managed aliases
  • delete_collection() now detects this specific 500 and transparently falls back to the V1 Cores API, which works in both standalone and SolrCloud deployments

Changes

  • osbenchmark/client.py: added non-cloud detection in delete_collection() and a new _delete_core() helper calling /solr/admin/cores?action=UNLOAD with full cleanup flags
  • tests/unit/solr/test_client.py: 4 new tests covering the fallback path, missing-core error propagation, the 400 path, and unrelated 500 errors

Fixes #33

…alone Solr (apache#33)

Standalone (non-SolrCloud) Solr returns HTTP 500 with "Aliases don't exist
in a non-cloud context" when the Collections API's DELETE endpoint tries to
inspect ZooKeeper-managed aliases.

SolrAdminClient.delete_collection() now detects this response and
transparently falls back to the V1 Cores API (UNLOAD action with
deleteIndex/deleteDataDir/deleteInstanceDir=true), which works in both
standalone and SolrCloud deployments.

A new private helper _delete_core() encapsulates the UNLOAD call and maps
404/400-not-found responses to CollectionNotFoundError for consistency with
the existing error contract.

Tests added:
- test_delete_collection_400_could_not_find: verifies 400 path raises CollectionNotFoundError
- test_delete_collection_standalone_falls_back_to_cores_api: verifies the fallback
- test_delete_collection_standalone_cores_api_not_found: verifies error propagation in fallback
- test_delete_collection_other_500_raises: verifies non-cloud-context 500s still raise
@janhoy
Copy link
Copy Markdown
Contributor

janhoy commented May 28, 2026

Thanks for this. No question the port so far is SolrCloud focused. Before we commit a fix, perhaps we should have a discussion about whether and how to make sure all workloads can run on both cloud and user-managed. Some Solr features and APIs only work with cloud.

Should each workload declare whether it is compatible with standalone? Or do we have auto fallback solutions for every mismatch?

a workload may specify a collection with 2 shards and 2 replicas. What to do on standalone mode? Better to exit with an error?

A workload may benchmark backup/restore, which have different apis…

Another workload may benchmark splitshard…

@MohammadYusif
Copy link
Copy Markdown
Author

Client-layer primitivescreate_collection, delete_collection, etc. — are pure infrastructure operations. The caller just wants the resource to exist or not; they don't care which API path was used. Here, transparent fallback is correct because the semantics are identical. That's what this PR does, and it's the right call for this narrow case.

Workload-level features — multi-shard/replica topology, splitshard, the SolrCloud backup API — have no meaningful standalone equivalent. A fallback here would silently change what's being benchmarked, which is worse than failing. For these, workloads should declare their requirements and orbit should fail fast with a clear error message before any work begins.

Concretely I'd suggest something like a requires_cloud: true field in the workload spec (or an incompatible_modes list), validated at startup against the detected deployment mode. The detection itself can reuse the same "is this ZooKeeper-aware?" signal this PR already handles.

So: keep this PR's fallback, it's appropriate here. But the broader policy should be: transparent fallback only where semantics are genuinely equivalent; declared incompatibility + early exit for everything else. That's unambiguous to workload authors and to users trying to understand why a run failed.

@janhoy
Copy link
Copy Markdown
Contributor

janhoy commented May 28, 2026

Question: How can I reproduce this? Can I run one of the geonames/nyc-taxis workloads against a standalone local solr and see the same? I'd expect those to fall over during configset upload or collection creation, since those are fairly different?

@epugh
Copy link
Copy Markdown
Contributor

epugh commented May 28, 2026

So let me ask a slightly provocative question... SHOULD WE SUPPORT STANDALONE? We already are suffering with this in the main Solr project. And in Solr 10, when you fire up a single node, it's in cloud anyway if you don't set special values. Plus, if you are testing a single node, well maybe other tools like our https://github.com/apache/solr/tree/main/solr/benchmark might be more useful. The niche I see that solr-orbit really hits is our Solr cloud macro benchmarking.

My user base: cloud. kevin's: cloud. Jan's: insert here how many standalone folks you plan to test with solr-orbit.

My suspicion is that we'll spend a bunch of effort to test standalone for making the code work, but no one will actually use it for anything meaningful.

Let's be realistic about our abilit to test things, and accept that this project requires solr cloud.

@janhoy
Copy link
Copy Markdown
Contributor

janhoy commented May 29, 2026

I'm open to be pragmatic and document that Solr Orbit currently only supports Solr Cloud. In the future we could extend it if there is enough demand and someone willing to do the job.

@janhoy
Copy link
Copy Markdown
Contributor

janhoy commented May 29, 2026

@MohammadYusif Again thanks for your contribution, but I'm afraid we focus on Cloud mode in the beginning, see #43 for a PR that documents this limitation, changes the quickstart example to start solr 9 in cloud mode, and also adds a detection to print an error if attempting to run with user-managed.

This is not to disqualify your code in any way. And it may be that we re-visit this decision at a later time. If so, we'll do a thorough analysis of how to support user managed solr fully in a good way.

Closing

@janhoy janhoy closed this May 29, 2026
@epugh
Copy link
Copy Markdown
Contributor

epugh commented May 30, 2026

Likewise @MohammadYusif I was happy to see your PR pop up! Please do keep an eye on the tickets and I'd lvoe to review more PR's.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: delete-collection fails with non cloud running Solr

3 participants