Common Java Cookbook

Edition: 0.19

Download PDF or Read on Scribd

Download Examples (ZIP)

2.5. Finding Nested Strings

2.5.1. Problem

You want to locate strings nested within other strings.

2.5.2. Solution

Use StringUtils.substringBetween( ). This method will return a string surrounded by two strings, which are supplied as parameters. The following example demonstrates the use of this method to extract content from HTML:

String htmlContent = "<html>\n" +
                     "  <head>\n" +
                     "    <title>Test Page</title>\n" +
                     "  </head>\n" +
                     "  <body>\n" +
                     "    <p>This is a TEST!</p>\n" +
                     "  </body>\n" +
                     "</html>";
// Extract the title from this XHTML content 
String title = StringUtils.substringBetween(htmlContent, "<title>", 
"</title>");
System.out.println( "Title: " + title );

This code extracts the title from this HTML document and prints the following:

Title: Test Page

2.5.3. Discussion

In the Solution section, the substringBetween() method returns the first string between the open and close strings—the title of an HTML document. The previous example only contained one nested element, but what happens when a string contains multiple elements nested in the same two strings? In the following example, three variables are extracted from a string using substringBetween( ):

String variables = "{45}, {35}, {120}" ;
List numbers = new ArrayList( );
String variablesTemp = variables;
while( StringUtils.substringBetween( variablesTemp, "{", "}" ) != null ) {
    String numberStr = StringUtils.substringBetween( variables, "{", "}" );
    Double number = new Double( numberStr );
    numbers.add( number );
    variablesTemp = variablesTemp.substring( variablesTemp.indexOf(",") );
}
double sum = StatUtil.sum( ArrayUtils.toPrimitive( numbers.toArray( ) ) );
System.out.println( "Variables: " + variables + ", Sum: " + sum );

The output of this example is:

Variable: {45}, {35}, {120}, Sum: 200

After each number is extracted from the curly braces, the system finds the index of the next comma and reduces the size of the string to search for the next call to StringUtils.

StringUtils.substringBetween( ) can also find text that is delimited by the same character:

String message = "|TESTING| BOUNDARYExampleBOUNDARY";
String first = StringUtils.substringBetween( message, "|"); 
String second = StringUtils.substringBetween( message, "BOUNDARY");

The first string would return "TESTING" as it is between the | characters, and the second string would contain "Example."


Creative Commons License
Common Java Cookbook by Tim O'Brien is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Permissions beyond the scope of this license may be available at http://www.discursive.com/books/cjcook/reference/jakartackbk-PREFACE-1.html. Copyright 2009. Common Java Cookbook Chunked HTML Output. Some Rights Reserved.