Introduction
In this page you can find the example usage for org.apache.commons.text.similarity LevenshteinDistance apply.
Prototype
public Integer apply(final CharSequence left, final CharSequence right)
Source Link
Document
<p>Find the Levenshtein distance between two Strings.</p> <p>A higher score indicates a greater distance.</p> <p>The previous implementation of the Levenshtein distance algorithm was from <a href="http://www.merriampark.com/ld.htm">http://www.merriampark.com/ld.htm</a></p> <p>Chas Emerick has written an implementation in Java, which avoids an OutOfMemoryError which can occur when my Java implementation is used with very large strings.<br> This implementation of the Levenshtein distance algorithm is from <a href="http://www.merriampark.com/ldjava.htm">http://www.merriampark.com/ldjava.htm</a></p> <pre> distance.apply(null, *) = IllegalArgumentException distance.apply(*, null) = IllegalArgumentException distance.apply("","") = 0 distance.apply("","a") = 1 distance.apply("aaapppp", "") = 7 distance.apply("frog", "fog") = 1 distance.apply("fly", "ant") = 3 distance.apply("elephant", "hippo") = 7 distance.apply("hippo", "elephant") = 7 distance.apply("hippo", "zzzzzzzz") = 8 distance.apply("hello", "hallo") = 1 </pre>
Usage
From source file:org.talend.utils.string.Levenshtein.java
public static double getLevenshteinScore(String inputStr, String outputColumnName) {
double LevenshteinScore = 0.0;
double maxLength = (inputStr.length() > outputColumnName.length()) ? inputStr.length()
: outputColumnName.length();
LevenshteinDistance ld = new LevenshteinDistance();
double LevenshteinDistance = ld.apply(outputColumnName, inputStr);
// one can overwrite to have his own version
if (inputStr.contains(outputColumnName) || outputColumnName.contains(inputStr)) {
LevenshteinScore = (maxLength - LevenshteinDistance + 1) / (maxLength + 1);
} else {//w w w .j ava 2s. c o m
LevenshteinScore = 1 - (LevenshteinDistance / maxLength);
}
return LevenshteinScore;
}