OAK-12221 Cost estimation should be lower if more conditions are indexed#2912
Conversation
|
bhabegger
left a comment
There was a problem hiding this comment.
I'll trust you with the math. A bit challenging to verify in detail.
| * once at the end against the most restrictive field's doc count. This makes the | ||
| * estimate independent of iteration order. | ||
| */ | ||
| private int getMaxPossibleNumDocsBySelectivity(Map<String, PropertyDefinition> propDefns, Filter filter) { |
There was a problem hiding this comment.
This method is quite long and likely has many cases. May it is worth a unit test ?
There was a problem hiding this comment.
Well, all the new tests are actually testing this new method; if the feature toggle is enabled.
| */ | ||
| private int getMaxPossibleNumDocsBySelectivity(Map<String, PropertyDefinition> propDefns, Filter filter) { | ||
| IndexStatistics indexStatistics = indexNode.getIndexStatistics(); | ||
| if (indexStatistics == null) { |
There was a problem hiding this comment.
Do these index statistics exist already ? If not wouldn't we be sending this warning systematically ? A bit noisy no ?
There was a problem hiding this comment.
So, getMaxPossibleNumDocs first checks the feature toggle, and then if enabled calls this method and returns the result... the idea is that once the feature toggle is enabled for some time, we can replace the old method completely with the new method.
The old and the new method do share some code (copy & pasted) - which would be a disadvantage if we have to keep it for a longer time - but it's an advantage if we later want to remove the old code.
There was a problem hiding this comment.
Well there is testing the method itself (aka what situations lead to what number of docs) and the impact that it entails (what index gets selected). They are close but not exactly the same either. But if you are satisfied with the coverage, I'm fine.
There was a problem hiding this comment.
it would be better to have unit tests, so that it is not needed to add indexes, I agree... but that would require quite a big refactoring, which I'm not sure if it's worth it currently...
| // Since, the index has no entries for "bar", estimated entry count for plan2 would be 0 | ||
| assertEquals(0, plan2.getEstimatedEntryCount()); | ||
| assertThat(plan2.getEstimatedEntryCount(), lessThan(plan1.getEstimatedEntryCount())); | ||
| assertTrue(plan2.getEstimatedEntryCount() < plan1.getEstimatedEntryCount()); |
There was a problem hiding this comment.
Hmmm, interesting. I do have a preference for assertThat(x).isLessThan(y) (but the assertj version hamcrest I don't know so well).
Why ? Because in case of failure of the test you get "expected value below 5 but got 4" and not "expected true but got false". This makes debugging the test way easier ;)
There was a problem hiding this comment.
assertThat is deprecated., that's why I removed it. But if you have an alternative that is not deprecated then that's fine for me! Maybe it's just a different import?
There was a problem hiding this comment.
It would require a new dependency: assertJ, But I believe it is worthwhile for readability. And it is widely used (so reliable)
<!-- Source: https://mvnrepository.com/artifact/org.assertj/assertj-core -->
<dependency>
<groupId>org.assertj</groupId>
<artifactId>assertj-core</artifactId>
<version>3.27.7</version>
<scope>test</scope>
</dependency>
|
|
||
| private int getNumDocs() { | ||
| IndexStatistics indexStatistics = indexNode.getIndexStatistics(); | ||
| if (indexStatistics == null) { |
There was a problem hiding this comment.
Can't we fail faster in the absence if indexStatistics (higher up) rather than check multiple times ?
Basically a FullTextIndexPlannerWithStats and a FullTextIndexPlannerNoStats ? You toggle could then also be used to selectively choose which class instance to use rather than have to be repeated.
There was a problem hiding this comment.
This would be an unrelated change.
There was a problem hiding this comment.
OK, sorry, it's existing code.


No description provided.