Skip to content

OAK-12221 Cost estimation should be lower if more conditions are indexed#2912

Merged
thomasmueller merged 5 commits into
trunkfrom
OAK-12221
May 22, 2026
Merged

OAK-12221 Cost estimation should be lower if more conditions are indexed#2912
thomasmueller merged 5 commits into
trunkfrom
OAK-12221

Conversation

@thomasmueller
Copy link
Copy Markdown
Member

No description provided.

@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
71.2% Coverage on New Code (required ≥ 80%)
3.8% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

Copy link
Copy Markdown
Contributor

@bhabegger bhabegger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll trust you with the math. A bit challenging to verify in detail.

* once at the end against the most restrictive field's doc count. This makes the
* estimate independent of iteration order.
*/
private int getMaxPossibleNumDocsBySelectivity(Map<String, PropertyDefinition> propDefns, Filter filter) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is quite long and likely has many cases. May it is worth a unit test ?

Copy link
Copy Markdown
Member Author

@thomasmueller thomasmueller May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, all the new tests are actually testing this new method; if the feature toggle is enabled.

*/
private int getMaxPossibleNumDocsBySelectivity(Map<String, PropertyDefinition> propDefns, Filter filter) {
IndexStatistics indexStatistics = indexNode.getIndexStatistics();
if (indexStatistics == null) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these index statistics exist already ? If not wouldn't we be sending this warning systematically ? A bit noisy no ?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, getMaxPossibleNumDocs first checks the feature toggle, and then if enabled calls this method and returns the result... the idea is that once the feature toggle is enabled for some time, we can replace the old method completely with the new method.

The old and the new method do share some code (copy & pasted) - which would be a disadvantage if we have to keep it for a longer time - but it's an advantage if we later want to remove the old code.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well there is testing the method itself (aka what situations lead to what number of docs) and the impact that it entails (what index gets selected). They are close but not exactly the same either. But if you are satisfied with the coverage, I'm fine.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be better to have unit tests, so that it is not needed to add indexes, I agree... but that would require quite a big refactoring, which I'm not sure if it's worth it currently...

// Since, the index has no entries for "bar", estimated entry count for plan2 would be 0
assertEquals(0, plan2.getEstimatedEntryCount());
assertThat(plan2.getEstimatedEntryCount(), lessThan(plan1.getEstimatedEntryCount()));
assertTrue(plan2.getEstimatedEntryCount() < plan1.getEstimatedEntryCount());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, interesting. I do have a preference for assertThat(x).isLessThan(y) (but the assertj version hamcrest I don't know so well).

Why ? Because in case of failure of the test you get "expected value below 5 but got 4" and not "expected true but got false". This makes debugging the test way easier ;)

Copy link
Copy Markdown
Member Author

@thomasmueller thomasmueller May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertThat is deprecated., that's why I removed it. But if you have an alternative that is not deprecated then that's fine for me! Maybe it's just a different import?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would require a new dependency: assertJ, But I believe it is worthwhile for readability. And it is widely used (so reliable)

<!-- Source: https://mvnrepository.com/artifact/org.assertj/assertj-core -->
<dependency>
    <groupId>org.assertj</groupId>
    <artifactId>assertj-core</artifactId>
    <version>3.27.7</version>
    <scope>test</scope>
</dependency>


private int getNumDocs() {
IndexStatistics indexStatistics = indexNode.getIndexStatistics();
if (indexStatistics == null) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we fail faster in the absence if indexStatistics (higher up) rather than check multiple times ?

Basically a FullTextIndexPlannerWithStats and a FullTextIndexPlannerNoStats ? You toggle could then also be used to selectively choose which class instance to use rather than have to be repeated.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be an unrelated change.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, sorry, it's existing code.

@thomasmueller thomasmueller merged commit 94a0a8c into trunk May 22, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants