|
|||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |
See:
Description
Interface Summary | |
IOutputSegment | Defines the interface for an output segment, which is used in an OutputDocument to
replace segments of the source document with other text. |
Class Summary | |
Attribute | Represents a single attribute
name/value segment within a StartTag . |
Attributes | Represents the list of Attribute objects present within a particular StartTag . |
AttributesOutputSegment | Implements an IOutputSegment whose content is a list of attribute name/value pairs. |
BlankOutputSegment | Implements an IOutputSegment whose content is a string of spaces with the same length as the segment. |
CharacterEntityReference | Represents an HTML Character Entity Reference. |
CharacterReference | Represents either a CharacterEntityReference or NumericCharacterReference . |
CharOutputSegment | Implements an IOutputSegment whose content is a character constant. |
Element | Represents an HTML element,
which encompasses the StartTag , an optional EndTag and all content in between. |
EndTag | Represents the end tag of an Element . |
FormControl | form controls |
FormControlOutputStyle | ************* |
FormControlOutputStyle.DisplayValueConfig | ************* must not be null |
FormControlType | Represents one of the HTML control types in a form which have the potential to be successful. |
FormField | Represents a field in an HTML form, a field being defined as the combination of all form controls having the same name. |
FormFields | Represents a collection of FormField objects. |
NumericCharacterReference | Represents an HTML Numeric Character Reference. |
OutputDocument | Represents a modified version of an original source text. |
Segment | Represents a segment of a Source document. |
Source | Represents a source HTML document. |
StartTag | Represents the start tag of an Element . |
StringOutputSegment | Implements an IOutputSegment whose content is a CharSequence . |
Tag | Represents either a StartTag or EndTag . |
Util | This class contains miscellaneous utility methods not directly associated with the HTML Parser library. |
Exception Summary | |
OverlappingOutputSegmentsException | Signals that overlapping output segments have been detected in the OutputDocument . |
A simple but powerful java library for parsing and modifying HTML documents, including analysis of abritrary HTML forms to determine the structure of submitted data.
The Jericho HTML Parser is an open source library released under the GNU Lesser General Public License (LGPL). You are therefore free to use it in commercial applications subject to the terms detailed in the licence document.
For downloads, support and updates visit the SourceForge.net project page at http://sourceforge.net/projects/jerichohtml/
For a summary of features and comparison with some other java HTML parsers, visit the homepage at http://jerichohtml.sourceforge.net
The typical method for modifying a document is as follows.
See the description of the OutputDocument
class for sample code.
Source
object from the source textOutputDocument
object from the source textIOutputSegment
to the OutputDocument for each segment of the document that is to be replaced with other textOutputDocument.toString()
method to get the final output
If the document only needs to be analysed instead of modified, only the first two steps listed above are required.
See the description of the FormFields
class for sample code.
|
|||||||||
PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES |