Use variations of substring()
from StringUtils
. This
next example parses a string that contains five numbers delimited by
parentheses, brackets, and a pipe symbol (N0
* (N1
,N2
)
[N3
,N4
] |
N5
):
String formatted = " 25 * (30,40) [50,60] | 30" PrintWriter out = System.out; out.print("N0: " + StringUtils.substringBeforeLast( formatted, "*" ) ); out.print(", N1: " + StringUtils.substringBetween( formatted, "(", "," ) ); out.print(", N2: " + StringUtils.substringBetween( formatted, ",", ")" ) ); out.print(", N3: " + StringUtils.substringBetween( formatted, "[", "," ) ); out.print(", N4: " + StringUtils.substringBetween( formatted, ",", "]" ) ); out.print(", N5: " + StringUtils.substringAfterLast( formatted, "|" ) );
This parses the formatted text and prints the following output:
N0: 25, N1: 30, N2: 40, N3: 50, N4: 60, N5: 30
The following public static methods come in handy when trying to extract information from a formatted string:
StringUtils.substringBetween(
)
Captures content between two strings
StringUtils.substringAfter(
)
Captures content that occurs after the specified string
StringUtils.substringBefore(
)
Captures content that occurs before a specified string
StringUtils.substringBeforeLast(
)
Captures content after the last occurrence of a specified string
StringUtils.substringAfterLast(
)
Captures content before the last occurrence of a specified string
To illustrate the use of these methods, here is an example of a feed of sports scores. Each record in the feed has a defined format, which resembles this feed description:
\(SOT)<sport>[<team1>,<team2>] (<score1>,<score2>)\(ETX) Notes: \(SOT) is ASCII character 2 "Start of Text", \(ETX) is ASCII character 4 "End of Transmission". Example: \(SOT)Baseball[BOS,SEA] (24,22)\(ETX) \(SOT)Basketball[CHI,NYC] (29,5)\(ETX)
The following example parses this feed using StringUtils
methods trim( )
, substringBetween( )
, and substringBefore( )
. The boxScore
variable holds a test string to
parse, and, once parsed, this code prints out the game score:
// Create a formatted string to parse - get this from a feed char SOT = '\u0002'; char ETX = '\u0004'; String boxScore = SOT + "Basketball[CHI,BOS](69,75)\r\n" + ETX; // Get rid of the archaic control characters boxScore = StringUtils.trim( boxScore ); // Parse the score into component parts String sport = StringUtils.substringBefore( boxScore, "[" ); String team1 = StringUtils.substringBetween( boxScore, "[", "," ); String team2 = StringUtils.substringBetween( boxScore, ",", "]" ); String score1 = StringUtils.substringBetween( boxScore, "(", "," ); String score2 = StringUtils.substringBetween( boxScore, ",", ")" ); PrintWriter out = System.out out.println( "**** " + sport + " Score" ); out.println( "\t" + team1 + "\t" + score1 ); out.println( "\t" + team2 + "\t" + score2 );
This code parses a score, and prints the following output:
**** Basketball CHI 69 BOS 75
In the previous example, StringUtils.trim( )
rids the text of the
SOT
and ETX
control characters. StringUtils.substringBefore( )
then reads the
sport name—"Basketball"—and substringBetween(
)
is used to retrieve the teams and scores.
At first glance, the value of these substring( )
variations is not obvious. The
previous example parsed this simple formatted string using three static
methods on StringUtils
, but how
difficult would it have been to implement this parsing without the aid
of Commons Lang? The following example parses the same string using only
methods available in the Java 1.4 J2SE:
// Find the sport name without using StringUtils boxScore = boxScore.trim( ); int firstBracket = boxScore.indexOf( "[" ); String sport = boxScore.substring( 0, firstBracket ); int firstComma = boxScore.indexOf( "," ); String team1 = boxScore.substring( firstBracket + 1, firstComma ); int secondBracket = boxScore.indexOf( "]" ); String team2 = boxScore.substring( firstComma + 1, secondBracket ); int firstParen = boxScore.indexOf( "(" ); int secondComma = boxScore.indexOf( ",", firstParen ); String score1 = boxScore.substring( firstParen + 1, secondComma ); int secondParen = boxScore.indexOf( ")" ); String score2 = boxScore.substring( secondComma + 1, secondParen );
This parses the string in a similar number of lines, but the code
is less straightforward and much more difficult to maintain. Instead of
simply calling a substringBetween( )
method, the previous example calls String.indexOf( )
and performs arithmetic with
an index while calling String.substring(
)
. Additionally, the substring(
)
methods on StringUtils
are null
-safe; the Java 1.4 example
could throw a NullPointerException
if
boxScore
was null
.
String.trim( )
has the same
behavior as StringUtils.trim( )
,
stripping the string of all whitespace and ASCII control characters.
StringUtils.trim()
is simply a
wrapper for the String.trim( )
method, but the StringUtils.trim( )
method can gracefully handle a null
input. If a null
value is passed to
StringUtils.trim()
, a null
value is returned.