|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectau.id.jericho.lib.html.Segment
Represents a segment of a Source
document.
The "span" of a segment is defined by the combination of its begin and end character positions.
Constructor Summary | |
Segment(Source source,
int begin,
int end)
Constructs a new Segment with the specified Source and the specified begin and end character positions. |
Method Summary | |
char |
charAt(int index)
Returns the character at the specified index. |
int |
compareTo(java.lang.Object o)
Compares this Segment object to another object. |
boolean |
encloses(int pos)
Indicates whether this segment encloses the specified character position in the Source document. |
boolean |
encloses(Segment segment)
Indicates whether this Segment encloses the specified Segment . |
boolean |
equals(java.lang.Object object)
Compares the specified object with this Segment for equality. |
java.util.List |
findAllCharacterReferences()
Returns a list of all CharacterReference objects enclosed by this segment. |
java.util.List |
findAllComments()
Returns a list of all Segment objects enclosed by this segment that represent HTML comments. |
java.util.List |
findAllElements()
Returns a list of all Element objects enclosed by this segment. |
java.util.List |
findAllElements(java.lang.String name)
Returns a list of all Element objects with the specified name enclosed by this segment. |
java.util.List |
findAllStartTags()
Returns a list of all StartTag objects enclosed by this segment. |
java.util.List |
findAllStartTags(java.lang.String name)
Returns a list of all StartTag objects with the specified name enclosed by this segment. |
java.util.List |
findAllStartTags(java.lang.String attributeName,
java.lang.String value,
boolean valueCaseSensitive)
Returns a list of all StartTag objects with the specified attribute name/value pair beginning at or immediately following the specified position in the source document. |
java.util.List |
findFormControls()
Returns a list of the FormControl objects enclosed by this segment. |
FormFields |
findFormFields()
Returns the FormFields object representing all form fields enclosed by this segment. |
java.util.List |
findWords()
Deprecated. no replacement |
int |
getBegin()
Returns the character position in the Source where this segment begins. |
java.lang.String |
getDebugInfo()
Returns a string representation of this object useful for debugging purposes. |
int |
getEnd()
Returns the character position in the Source where this segment ends. |
java.lang.String |
getSourceText()
Deprecated. Use the toString() method instead |
java.lang.String |
getSourceTextNoWhitespace()
Deprecated. Use the more useful CharacterReference.decodeCollapseWhiteSpace(CharSequence) method instead. |
int |
hashCode()
Returns a hash code value for the segment. |
void |
ignoreWhenParsing()
Causes the this segment to be ignored when parsing. |
boolean |
isComment()
Indicates whether this Segment represents an HTML comment. |
static boolean |
isWhiteSpace(char ch)
Indicates whether the specified character is white space. |
int |
length()
Returns the length of the segment. |
Attributes |
parseAttributes()
Parses any Attributes within this segment. |
java.lang.CharSequence |
subSequence(int beginIndex,
int endIndex)
Returns a new character sequence that is a subsequence of this sequence. |
java.lang.String |
toString()
Returns the source text of this segment as a String . |
Methods inherited from class java.lang.Object |
getClass, notify, notifyAll, wait, wait, wait |
Constructor Detail |
public Segment(Source source, int begin, int end)
Segment
with the specified Source
and the specified begin and end character positions.
source
- the source document.begin
- the character position in the source where this segment begins.end
- the character position in the source where this segment ends.Method Detail |
public final int getBegin()
public final int getEnd()
public final boolean equals(java.lang.Object object)
Segment
for equality.
Returns true
if and only if the specified object is also a Segment
,
and both segments have the same Source
, and the same begin and end positions.
object
- the object to be compared for equality with this Segment
.
true
if the specified object is equal to this Segment
, otherwise false
.public int hashCode()
The current implementation returns the sum of the begin and end positions, although this is not guaranteed in future versions.
public final int length()
length
in interface java.lang.CharSequence
public final boolean encloses(Segment segment)
Segment
encloses the specified Segment
.
segment
- the segment to be tested for being enclosed by this segment.
true
if this Segment
encloses the specified Segment
, otherwise false
.public final boolean encloses(int pos)
Source
document.
This is the case if
.
getBegin()
<= pos < getEnd()
pos
- the position in the source document to be tested.
true
if this segment encloses the specified position, otherwise false
.public boolean isComment()
Segment
represents an HTML comment.
An HTML comment is an area of the source document enclosed by the delimiters
<!--
on the left and -->
on the right.
The HTML 4.01 Specification section 3.2.4
states that the end of comment delimiter may contain white space between the "--
" and ">
" characters,
but this library does not recognise end of comment delimiters containing white space.
true
if this Segment
represents an HTML comment, otherwise false
.public java.lang.String toString()
String
.
The returned String
is newly created with every call to this method, unless this
segment is itself a Source
object.
Note that before version 1.5 this returned a representation of this object useful for debugging purposes,
which can now be obtained via the getDebugInfo()
method.
toString
in interface java.lang.CharSequence
String
.public java.util.List findAllStartTags()
StartTag
objects enclosed by this segment.
StartTag
objects enclosed by this segment.public java.util.List findAllStartTags(java.lang.String name)
StartTag
objects with the specified name enclosed by this segment.
If the name argument is null
, all StartTags are returned.
name
- the name of the StartTags to find.
public java.util.List findAllStartTags(java.lang.String attributeName, java.lang.String value, boolean valueCaseSensitive)
StartTag
objects with the specified attribute name/value pair beginning at or immediately following the specified position in the source document.
attributeName
- the attribute name (case insensitive) to search for, must not be null
.value
- the value of the specified attribute to search for, must not be null
.valueCaseSensitive
- specifies whether the attribute value matching is case sensitive.
StartTag
objects with the specified attribute name/value pair beginning at or immediately following the specified position in the source document.public java.util.List findAllComments()
Segment
objects enclosed by this segment that represent HTML comments.
Segment
objects enclosed by this segment that represent HTML comments.public java.util.List findAllElements()
Element
objects enclosed by this segment.
Element
objects enclosed by this segment.public java.util.List findAllElements(java.lang.String name)
Element
objects with the specified name enclosed by this segment.
If the name argument is null
, all Elements are returned.
name
- the name of the Elements to find.
Element
objects with the specified name enclosed by this segment.public java.util.List findAllCharacterReferences()
CharacterReference
objects enclosed by this segment.
CharacterReference
objects enclosed by this segment.public java.util.List findFormControls()
FormControl
objects enclosed by this segment.
FormControl
objects enclosed by this segment.public FormFields findFormFields()
FormFields
object representing all form fields enclosed by this segment.
This is equivalent to FormFields.constructFrom(findFormControls())
FormFields
object representing all form fields enclosed by this segment.findFormControls()
public Attributes parseAttributes()
Attributes
within this segment.
This method is only used in the unusual situation where attributes exist outside of a start tag.
The StartTag.getAttributes()
method should be used in normal situations.
This is equivalent to source.parseAttributes(this.getBegin(),this.getEnd())
Attributes
within this segment, or null
if too many errors occur while parsing.public void ignoreWhenParsing()
This is equivalent to source.ignoreWhenParsing(segment.getBegin(),segment.getEnd())
Source.ignoreWhenParsing(int begin, int end)
,
Source.ignoreWhenParsing(Collection segments)
public int compareTo(java.lang.Object o)
Segment
object to another object.
If the argument is not a Segment
, a ClassCastException
is thrown.
A segment is considered to be before another segment if its begin position is earlier, or in the case that both segments begin at the same position, its end position is earlier.
Segments that begin and end at the same position are considered equal for the purposes of this comparison, even if they relate to different source documents.
Note: this class has a natural ordering that is inconsistent with equals.
This means that this method may return zero in some cases where calling the
equals(Object)
method with the same argument returns false
.
compareTo
in interface java.lang.Comparable
o
- the segment to be compared
java.lang.ClassCastException
- if the argument is not a Segment
public static final boolean isWhiteSpace(char ch)
The HTML 4.01 Specification section 9.1 specifies the following white space characters:
Despite the explicit inclusion of the zero-width space in the HTML specification, Microsoft IE6 does not
recognise them as whitespace and renders them as an unprintable character (empty square).
Even zero-width spaces included using the numeric character reference
are rendered this way.
Note that in versions prior to 1.5, this method did not recognise form feeds or zero-width spaces as white space.
ch
- the character to test.
true
if the specified character is white space, otherwise false
.public java.lang.String getDebugInfo()
public char charAt(int index)
This is logically equivalent to toString().charAt(index)
for a valid argument values 0 <= index < length()
.
However because this implementation works directly on the underlying document source string,
it should not be assumed that an IndexOutOfBoundsException
will be thrown
for an invalid argument value.
charAt
in interface java.lang.CharSequence
index
- the index of the character.
public final java.lang.CharSequence subSequence(int beginIndex, int endIndex)
This is logically equivalent to toString().subSequence(beginIndex,endIndex)
for valid values of beginIndex
and endIndex
.
However because this implementation works directly on the underlying document source string,
it should not be assumed that an IndexOutOfBoundsException
will be thrown
for invalid argument values as described in the String.subSequence(int,int)
method.
subSequence
in interface java.lang.CharSequence
beginIndex
- the begin index, inclusive.endIndex
- the end index, exclusive.
public java.lang.String getSourceText()
toString()
method instead
This method has been deprecated as of version 1.5 as it now duplicates the functionality of the toString()
method.
public final java.lang.String getSourceTextNoWhitespace()
CharacterReference.decodeCollapseWhiteSpace(CharSequence)
method instead.
All leading and trailing white space is omitted, and any sections of internal white space are replaced by a single space.
This method has been deprecated as of version 1.5 as it is no longer used internally and
was never very useful as a public method.
It is similar to the new CharacterReference.decodeCollapseWhiteSpace(CharSequence)
method, but
does not decode the text after collapsing the white space.
public final java.util.List findWords()
Segment
objects representing every word in this segment separated by white space.
Note that any markup contained in this segment will be regarded as normal text for the purposes of this method.
This method has been deprecated as of version 1.5 as it has no discernable use.
Segment
objects representing every word in this segment separated by white space.
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |