SOLR-17550: preferred Overseer might not work#4175
SOLR-17550: preferred Overseer might not work#4175r4mercur wants to merge 5 commits intoapache:mainfrom
Conversation
add testcase to identify wrong method behaviour of setPreferredOverseer() in ZkController
fix import after ./gradlew check & ./gradlew tidy
fix import after ./gradlew check & ./gradlew tidy & change test
./gradlew tidy
|
I triggered the tests. I wish I knew more about the Overseer to review this pr.. |
There was a problem hiding this comment.
How i understand the bug and the description of the ticket given from the creator, i would assume the behaviour currently is like when setPreferredOverseer() is called in the ZkController the values are set wrong. (happens when "ADDROLE" message is built)
For example with the buggy behaviour the values which are set like this:
{
"operation": "addrole",
"localhost:8983_solr": "localhost:8983_solr",
"role": "overseer",
"persist": "false"
}
but the key and values which are expected should look like this with the right key for the "node" part:
{
"operation": "addrole",
"node": "localhost:8983_solr",
"role": "overseer",
"persist": "false"
}
And since i'am also new in this topic the Overseer class is the main class which manages cluster state changes like collection creation, shards, roles etc. . This seems to be done with commands from Zookeeper-backed task queues. The class it self acts as the cluster coordinator as i understood it, but i could be wrong here.
dsmiley
left a comment
There was a problem hiding this comment.
Thanks for doing this.
The new test is a unit test. It's something.
I'm not sure what tests exists for the feature. I don't put much faith in unit tests to tell me much; I put faith in integration tests (aka behavior / functional tests).
|
I added a new teststep with some behaviour test in the OverseerRolesTest class. Would you consider this combination of the two testcases now (unittestcase + integration test) sufficient for this feature to be tested right or would you prefer a stricter integration assertion arround the overseer election behaviour ? Then i would delete the unittest part and would try to rewrite the integration test with more stricter "feature" based assertion. |
|
Before going further on tests... do you understand the bug enough to know what it's impact was? We'd use that to fill out the changelog for one thing but also I think it's kind of important to just know anyways. I see OverseerRolesTest; perhaps the feature wasn't totally broken; maybe it didn't work right in certain circumstances? |
|
You are right and i tried now to understand the real impact of this bug. With looking at the code: Users probably are using this rarely (or never) and it seems like even when using it the buggy behaviour would only lead to silent failure of the preferred overseer role assignment. So the question would be: Then it would be wise to close this PR and create a new ticket for removing the existance of this method? |
|
My reading of the code (aided by a few runs of the debugger in NodeRolesTest and OverseerRolesTest) is that if a node is started with system properties reflecting that it's the preferred Overseer (this is the new/recommended way), then, it will call |
|
BTW, sorry if my choice of |
add testcase to identify wrong method behaviour of setPreferredOverseer() in ZkController
https://issues.apache.org/jira/browse/SOLR-17550
Description
Like described in the ticket from the reporter: "By code inspection, ZkController.setPreferredOverseer creates a command message improperly. The key "node" should exist with the node name as value but instead the key is itself the node name – a bug introduced by a refactoring in 2022.
Ideally we understand the ramifications; like shouldn't this be found by a test?"
I only added the test case currently, not sure if the bug should also be fixed by using "node" as static string as key ?
Solution
My approach is a new unittest-case to identify the wrong behaviour of the setter method in the ZkController. I used the LLM "Claude Sonet 4.5" for problem description and solution proposals.
Tests
The new unittest-case written should identify problems, when using "getNodeName()" to set the key of the properties.
Checklist
Please review the following and check all that apply:
mainbranch../gradlew check.