Skip to content

Upgrade to Lucene 10.4#4195

Open
rahulgoswami wants to merge 5 commits intoapache:mainfrom
rahulgoswami:lucene1040
Open

Upgrade to Lucene 10.4#4195
rahulgoswami wants to merge 5 commits intoapache:mainfrom
rahulgoswami:lucene1040

Conversation

@rahulgoswami
Copy link
Contributor

@rahulgoswami rahulgoswami commented Mar 7, 2026

https://issues.apache.org/jira/browse/SOLR-18143

Description

Upgrade Lucene dependency to 10.4

Solution

Followed instructions in dev-docs/lucene-upgrade.md and resolved compilation/test failures. Also made changes to documentation and upgrade guide wherever applicable.

@rahulgoswami
Copy link
Contributor Author

WIP...fixing failures with DenseVectorField tests

Rahul Goswami added 3 commits March 7, 2026 19:26
…to changes in Lucene104ScalarQuantizedVectorField format in Lucene 10.4
@rahulgoswami
Copy link
Contributor Author

rahulgoswami commented Mar 9, 2026

Put Claude(Opus 4.6) and Codex (GPT 5.4) to work on compilation errors and to understand test failures. Main test failures were around ScalarQuantizedDenseVectorField and BinaryQuantizedDenseVectorField due to breaking changes in Lucene104ScalarQuantizedVectorsFormat.

Major changes:

  • Lucene104HnswScalarQuantizedVectorsFormat has moved to an encoding based API. It no longer accepts "confidenceInterval" or "compression" params. Hence made those no-op in ScalarQuantizedDenseVectorField and removed the same from tests and documentation.

  • Added a note in documentation to say that older Solr 10.x schema may contain those params but not to be used going forward. Existing schemas will continue to be supported.

  • There is no separate binary quantization format in Lucene 10.4. Binary quantization is now just another encoding type of Lucene104ScalarQuantizedVectorsFormat (encoding=ScalarEncoding.SINGLE_BIT_QUERY_NIBBLE). But we'll need to continue to expose it at the Solr level as a separate type through BinaryQuantizedDenseVectorField for back compatibility.

@rahulgoswami rahulgoswami marked this pull request as ready for review March 9, 2026 06:21
@rahulgoswami rahulgoswami requested a review from dsmiley March 9, 2026 06:21
@rahulgoswami
Copy link
Contributor Author

rahulgoswami commented Mar 9, 2026

Additionally, Lucene104ScalarQuantizedVectorsFormat now supports 1,2,4,7 and 8 bits in the format as opposed to only 4 and 7 earlier. Guess that should be scoped under a separate PR with test and documentation changes instead of squashing everything together in this upgrade PR.

@rahulgoswami
Copy link
Contributor Author

@dsmiley @alessandrobenedetti Requesting a review please.

@rahulgoswami
Copy link
Contributor Author

rahulgoswami commented Mar 9, 2026

To-Do: Lucene104HnswScalarQuantizedVectorsFormat now defaults to 8 bits (ScalarEncoding.UNSIGNED_BYTE) instead of 7 bits in Lucene99HnswScalarQuantizedVectorsFormat earlier. Make that the new default for ScalarQuantizedDenseVectorField too (ScalarQuantizedDenseVectorField.DEFAULT_BITS)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant