A simple but powerful java library allowing analysis and manipulation of parts of an HTML document, including some common server-side tags, while reproducing verbatim any unrecognised or invalid HTML. Also provides high-level HTML form manipulation functions.

The Jericho HTML Parser is an open source library released under the GNU Lesser General Public License (LGPL). You are therefore free to use it in commercial applications subject to the terms detailed in the licence document.

For downloads, support and updates visit the SourceForge.net project page at http://sourceforge.net/projects/jerichohtml/

For a summary of features and comparison with some other java HTML parsers, visit the homepage at http://www.htmlparser.net

Modifying an HTML Document

The typical method for modifying a document is as follows. See the description of the {@link au.id.jericho.lib.html.OutputDocument} class for sample code.

  1. Create a {@link au.id.jericho.lib.html.Source} object from the source text
  2. Use the tag search methods to find the required segments
  3. Create an {@link au.id.jericho.lib.html.OutputDocument} object from the source text
  4. Add an {@link au.id.jericho.lib.html.OutputSegment} to the OutputDocument for each segment of the document that is to be replaced with other content
  5. Call the {@link au.id.jericho.lib.html.OutputDocument#toString()} method to get the final output