Example usage for org.apache.commons.text.similarity LevenshteinDistance apply

Introduction

In this page you can find the example usage for org.apache.commons.text.similarity LevenshteinDistance apply.

Prototype


public Integer apply(final CharSequence left, final CharSequence right)

Source Link

Document

Find the Levenshtein distance between two Strings. A higher score indicates a greater distance. The previous implementation of the Levenshtein distance algorithm was from <a href="http://www.merriampark.com/ld.htm">http://www.merriampark.com/ld.htm</a> Chas Emerick has written an implementation in Java, which avoids an OutOfMemoryError which can occur when my Java implementation is used with very large strings. This implementation of the Levenshtein distance algorithm is from <a href="http://www.merriampark.com/ldjava.htm">http://www.merriampark.com/ldjava.htm</a> <pre> distance.apply(null, *) = IllegalArgumentException distance.apply(*, null) = IllegalArgumentException distance.apply("","") = 0 distance.apply("","a") = 1 distance.apply("aaapppp", "") = 7 distance.apply("frog", "fog") = 1 distance.apply("fly", "ant") = 3 distance.apply("elephant", "hippo") = 7 distance.apply("hippo", "elephant") = 7 distance.apply("hippo", "zzzzzzzz") = 8 distance.apply("hello", "hallo") = 1 </pre>

Usage

From source file:org.talend.utils.string.Levenshtein.java

public static double getLevenshteinScore(String inputStr, String outputColumnName) {
    double LevenshteinScore = 0.0;

    double maxLength = (inputStr.length() > outputColumnName.length()) ? inputStr.length()
            : outputColumnName.length();
    LevenshteinDistance ld = new LevenshteinDistance();
    double LevenshteinDistance = ld.apply(outputColumnName, inputStr);

    // one can overwrite to have his own version
    if (inputStr.contains(outputColumnName) || outputColumnName.contains(inputStr)) {
        LevenshteinScore = (maxLength - LevenshteinDistance + 1) / (maxLength + 1);
    } else {//w w w .j  ava  2s. c o  m
        LevenshteinScore = 1 - (LevenshteinDistance / maxLength);
    }
    return LevenshteinScore;
}