Use StringUtils.difference( )
,
StringUtils.indexOfDifference( )
, and
StringUtils.getLevenshteinDistance(
)
. StringUtils.difference(
)
prints out the difference between two strings, StringUtils.indexOfDifference( )
returns the
index at which two strings begin to differ, and StringUtils.getLevenshteinDistance( )
returns
the "edit distance" between two strings. The following example
demonstrates all three of these methods:
int dist = StringUtils.getLevenshteinDistance( "Word", "World" ); String diff = StringUtils.difference( "Word", "World" ); int index = StringUtils.indexOfDifference( "Word", "World" ); System.out.println( "Edit Distance: " + dist ); System.out.println( "Difference: " + diff ); System.out.println( "Diff Index: " + index );
This code compares the strings "Word" and "World," producing the following output:
Edit Distance: 2 Difference: ld Diff Index: 3
StringUtils.difference()
returns the difference between two strings, returning the
portion of the second string, which starts to differ from the first.
StringUtils.indexOfDifference()
returns the index at which the second string starts to
diverge from the first. The difference between "ABC" and "ABE" is "E,"
and the index of the difference is 2. Here's a more complex
example:
String a = "Strategy"; String b = "Strategic"; String difference = StringUtils.difference( a, b ); int differenceIndex = StringUtils.indexOfDifference( a, b ); System.out.println( "difference(Strategy, Strategic) = " + difference ); System.out.println( "index(Strategy, Strategic) = " + differenceIndex ); a = "The Secretary of the UN is Kofi Annan." b = "The Secretary of State is Colin Powell." difference = StringUtils.difference( a, b ); differenceIndex = StringUtils.indexOfDifference( a, b ); System.out.println( "difference(..., ...) = " + difference ); System.out.println( "index(..., ...) = " + differenceIndex );
This produces the following output, showing the differences between two strings:
difference(Strategy, Strategic) = ic index(Strategy, Strategic) = 7 difference(...,...) = State is Colin Powell. index(...,...) = 17
The Levenshtein distance is calculated as the number of
insertions, deletions, and replacements it takes to get from one string
to another. The distance between "Boat" and "Coat" is a one letter
replacement, and the distance between "Remember" and "Alamo" is 8—five
letter replacements and three deletions. Levenshtein distance is also
known as the edit distance
, which
is the number of changes one needs to make to a string to get from
string A to string B. The following example demonstrates the getLevenshteinDistance( )
method:
int distance1 = StringUtils.getLevenshteinDistance( "Boat", "Coat" ); int distance2 = StringUtils.getLevenshteinDistance( "Remember", "Alamo" ); int distance3 = StringUtils.getLevenshteinDistance( "Steve", "Stereo" ); System.out.println( "distance(Boat, Coat): " + distance1 ); System.out.println( "distance(Remember, Alamo): " + distance2 ); System.out.println( "distance(Steve, Stereo): " + distance3 );
This produces the following output, showing the Levenshtein (or edit) distance between various strings:
distance(Boat, Coat): 1 distance(Remember, Alamo): 8 distance(Steve, Stereo): 3
The Levenshtein distance has a number of different applications, including pattern recognition and correcting spelling mistakes. For more information about the Levenshtein distance, see http://www.merriampark.com/ld.htm, which explains the algorithm and provides links to implementations of this algorithm in 15 different languages.