Skip to content

[TEXT-103] Add support for weighted Levenshtein distance#737

Closed
ron-ladin6 wants to merge 3 commits intoapache:masterfrom
ron-ladin6:TEXT-103
Closed

[TEXT-103] Add support for weighted Levenshtein distance#737
ron-ladin6 wants to merge 3 commits intoapache:masterfrom
ron-ladin6:TEXT-103

Conversation

@ron-ladin6
Copy link

This PR adds support for weighted Levenshtein distance by allowing custom costs for insertion, deletion, and substitution operations.
Key Changes:

  • Added a new constructor to LevenshteinDistance for custom costs.
  • Implemented limitedCompareCustomCost to handle weights while maintaining memory efficiency ($O(\min(n, m))$).
  • Added comprehensive unit tests (reaching 99%+ code coverage).
  • Guaranteed 100% backward compatibility with existing logic.

Copy link
Member

@garydgregory garydgregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @ron-ladin6

Thank you for the PR.

Please address my comment.

You did not run mvn by itself per the instructions in the PR template or you'd have found the current build check failures.

TY!

*/
private final int replaceCost;

// -------------------------------------------------------------------------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove these // comments please, they don't help maintain the code.

* @throws IllegalArgumentException if threshold is negative, or any cost is negative.
* @since 1.13.0
*/
public LevenshteinDistance(final Integer threshold, final int insertCost, final int deleteCost,
Copy link
Member

@garydgregory garydgregory Mar 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a builder instead, this will avoid constructor inflation and deprecations in the future. You should end upt with a private constructor that takes the Builder as its input. For an example in this component, see RandomStringGenerator but here the builder wouldn't need to extend the deprecated Builder interface.

this.replaceCost = replaceCost;
}

// -------------------------------------------------------------------------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove these // comments please, they don't help maintain the code.

return unlimitedCompare(left, right, insertCost, deleteCost, replaceCost);
}

// -------------------------------------------------------------------------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove these // comments please, they don't help maintain the code.

* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* https://www.apache.org/licenses/LICENSE-2.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do NOT edit license headers.

}

}
// -------------------------------------------------------------------------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove these // comments please, they don't help maintain the code.
Do use Javadoc if you want to call something out.

assertEquals(4, dist.getReplaceCost());
}

}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't delete the empty line at the end of files.

@ron-ladin6
Copy link
Author

hello @garydgregory ,
Thank you for the review and the detailed feedback. I appreciate the guidance on how to align this with the project's standards.
I’m on it.

@ron-ladin6
Copy link
Author

.Hi @garydgregory ,

Thanks for the feedback. I've addressed your comments

Copy link
Member

@garydgregory garydgregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ron-ladin6

Thank you for your update.

  • Fix the new Javadoc since tag since1.13.0 was released in 2024; the next feature version will be 1.16.0
  • Use the set prefix for builder setters.

TY!

@ron-ladin6
Copy link
Author

ron-ladin6 commented Mar 15, 2026

@garydgregory
Done! Updated the since tags to 1.16.0 and added the set prefix to all builder methods.

@chtompki
Copy link
Member

We're making changes on an algorithm that we've cited as having been pulled from a book...this is questionable. I wonder if we need a new file all toghether.

@ron-ladin6
Copy link
Author

I used AI tools to help structure the code rather than copying it from a textbook. However, if the implementation ends up looking like a direct copy from a specific book, I agree that it's better not to include it as-is.

@chtompki
Copy link
Member

chtompki commented Mar 16, 2026

That flat doesn't fly at all. We can't say the code is from a book and have it be from AI

@ron-ladin6 ron-ladin6 closed this Mar 16, 2026
@ron-ladin6 ron-ladin6 deleted the TEXT-103 branch March 16, 2026 18:28
@chtompki
Copy link
Member

Note @ron-ladin6 I'm happy to help work this in as it's own class WeightedLevensteinDistance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants