|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectau.id.jericho.lib.html.Segment
au.id.jericho.lib.html.Tag
au.id.jericho.lib.html.StartTag
Represents the start tag of an Element
.
Created using one of the following methods:
Element.getStartTag()
Source.findPreviousStartTag(int pos)
Source.findPreviousStartTag(int pos, String name)
Source.findNextComment(int pos)
Source.findNextStartTag(int pos)
Source.findNextStartTag(int pos, String name)
Source.findEnclosingComment(int pos)
Source.findEnclosingStartTag(int pos)
Segment.findAllStartTags(String name)
Segment.findAllComments()
The name
argument in the above methods can be literal text strings
specifying the name of a start tag to search for.
Specifying a name parameter ending in a colon (:
) searches for all start tags in the specified XML namespace.
The constants defined in the Tag
class can be used as name
arguments.
For example:
source.findAllStartTags(
Tag.A
)
or source.findAllStartTags("a")
- finds all hyperlink start tags
source.findAllStartTags(
Tag.PROCESSING_INSTRUCTION
)
- finds all processing instructions <? ... ?>
source.findAllStartTags(
Tag.SERVER_COMMON
)
- finds all common server tags <% ... %>
Note however that the end of a PHP tag can not be reliably found without the use of a PHP parser, meaning any PHP tag found by this library is not guaranteed to have the correct end position.
See also the XML 1.0 specification for start tags.
Element
,
EndTag
Field Summary |
Fields inherited from class au.id.jericho.lib.html.Tag |
A, ABBR, ACRONYM, ADDRESS, APPLET, AREA, B, BASE, BASEFONT, BDO, BIG, BLOCKQUOTE, BODY, BR, BUTTON, CAPTION, CENTER, CITE, CODE, COL, COLGROUP, DD, DEL, DFN, DIR, DIV, DL, DOCTYPE_DECLARATION, DT, EM, FIELDSET, FONT, FORM, FRAME, FRAMESET, H1, H2, H3, H4, H5, H6, HEAD, HR, HTML, I, IFRAME, IMG, INPUT, INS, ISINDEX, KBD, LABEL, LEGEND, LI, LINK, MAP, MENU, META, NOFRAMES, NOSCRIPT, OBJECT, OL, OPTGROUP, OPTION, P, PARAM, PRE, PROCESSING_INSTRUCTION, Q, S, SAMP, SCRIPT, SELECT, SERVER_COMMON, SERVER_MASON_COMPONENT_CALL, SERVER_MASON_COMPONENT_CALLED_WITH_CONTENT, SERVER_MASON_NAMED_BLOCK, SERVER_PHP, SMALL, SPAN, STRIKE, STRONG, STYLE, SUB, SUP, TABLE, TBODY, TD, TEXTAREA, TFOOT, TH, THEAD, TITLE, TR, TT, U, UL, VAR, XML_DECLARATION |
Method Summary | |
EndTag |
findEndTag()
Returns the end tag that corresponds to this start tag. |
static java.lang.String |
generateHTML(java.lang.String tagName,
java.util.Map attributesMap,
boolean emptyElementTag)
Generates the HTML text of a start tag with the specified tag name and attributes map. |
Attributes |
getAttributes()
Returns the attributes specified in this start tag. |
java.lang.String |
getDebugInfo()
Returns a string representation of this object useful for debugging purposes. |
Element |
getElement()
Returns the element that corresponds to this start tag. |
Segment |
getFollowingTextSegment()
Deprecated. Use new Segment(source,this.getEnd(),source.findNextTag(this.getEnd())) instead. |
FormControl |
getFormControl()
Returns the FormControl defined by this start tag. |
FormControlType |
getFormControlType()
Deprecated. Use getFormControl().getFormControlType() instead. |
boolean |
isComment()
Indicates whether this Segment represents an HTML comment. |
boolean |
isCommonServerTag()
Indicates whether the start tag is a common server tag ( <% ... %> ). |
boolean |
isDocTypeDeclaration()
Indicates whether the start tag is a document type declaration. |
boolean |
isEmptyElementTag()
Indicates whether the start tag is an empty element tag. |
boolean |
isEndTagForbidden()
Indicates whether the corresponding end tag is forbidden. |
boolean |
isEndTagOptional()
Indicates whether the corresponding end tag is optional according to the HTML specification. |
boolean |
isEndTagRequired()
Indicates whether the corresponding end tag is required according to the HTML specification. |
boolean |
isMasonComponentCall()
Indicates whether the start tag is a Mason component call ( <& ... &> ). |
boolean |
isMasonComponentCalledWithContent()
Indicates whether the start tag is a Mason component called with content. |
boolean |
isMasonNamedBlock()
Indicates whether the start tag is a Mason named block. |
boolean |
isMasonTag()
Indicates whether the start tag is any type of Mason tag. |
boolean |
isPHPTag()
Indicates whether the start tag is a standard PHP tag ( <?php ... ?> ). |
boolean |
isProcessingInstruction()
Indicates whether the start tag is a processing instruction. |
boolean |
isServerTag()
Indicates whether the start tag is a server tag. |
boolean |
isXMLDeclaration()
Indicates whether the start tag is an XML declaration. |
Attributes |
parseAttributes()
Parses the attributes specified in this start tag, regardless of the type of start tag. |
Attributes |
parseAttributes(int maxErrorCount)
Parses the attributes specified in this start tag, regardless of the type of start tag. |
java.lang.String |
regenerateHTML()
Regenerates the HTML text of this start tag. |
Methods inherited from class au.id.jericho.lib.html.Tag |
getName |
Methods inherited from class au.id.jericho.lib.html.Segment |
charAt, compareTo, encloses, encloses, equals, findAllCharacterReferences, findAllComments, findAllElements, findAllElements, findAllStartTags, findAllStartTags, findAllStartTags, findFormControls, findFormFields, findWords, getBegin, getEnd, getSourceText, getSourceTextNoWhitespace, hashCode, ignoreWhenParsing, isWhiteSpace, length, subSequence, toString |
Methods inherited from class java.lang.Object |
getClass, notify, notifyAll, wait, wait, wait |
Method Detail |
public boolean isEndTagForbidden()
This is the case if one of isComment()
, isEmptyElementTag()
,
isProcessingInstruction()
, isDocTypeDeclaration()
,
EndTag.isForbidden(getName())
,
or isServerTag()
is true
, unless
isMasonNamedBlock()
or isMasonComponentCalledWithContent()
is true
.
Note that as of version 1.1, this method also takes the name of the tag into account by checking whether the HTML specification forbids an end tag of this name.
true
if the corresponding end tag is forbidden, otherwise false
.public boolean isEndTagOptional()
This is equivalent to EndTag.isOptional(getName())
true
if the corresponding end tag is optional, otherwise false
.public boolean isEndTagRequired()
This is equivalent to EndTag.isRequired(getName())
true
if the corresponding end tag is required, otherwise false
.public boolean isEmptyElementTag()
This is signified by the characters "/>" at the end of the start tag.
true
if the StartTag is an empty element tag, otherwise false
.public boolean isComment()
Segment
Segment
represents an HTML comment.
An HTML comment is an area of the source document enclosed by the delimiters
<!--
on the left and -->
on the right.
The HTML 4.01 Specification section 3.2.4
states that the end of comment delimiter may contain white space between the "--
" and ">
" characters,
but this library does not recognise end of comment delimiters containing white space.
isComment
in class Segment
true
if this Segment
represents an HTML comment, otherwise false
.public boolean isProcessingInstruction()
Although an XML processing instruction technically requires a PITarget (essentially a name), this library considers any tag starting with a question mark (?) to be a processing instruction, including standard and short-form PHP tags.
An XML declaration is a special type of processing instruction with the reserved PITarget name of "xml".
The following code is an example of a processing instruction:
<?xml version="1.0" encoding="UTF-8"?>
true
if the start tag is a processing instruction, otherwise false
.public boolean isXMLDeclaration()
The following code is an example of an XML declaration:
<?xml version="1.0" encoding="UTF-8"?>
true
if the start tag is an XML declaration, otherwise false
.isProcessingInstruction()
public boolean isDocTypeDeclaration()
The following code is an example of a document type declaration:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
true
if the start tag is a document type declaration, otherwise false
.public boolean isServerTag()
Recognised server tags include ASP, JSP, PSP, PHP and Mason.
<script>
tags are never regarded as server tags, regardless of
whether they have a runat="server"
or equivalent attribute.
Short-form PHP tags are not recognised as server tags, but as processing instructions.
Returns true
if one of isCommonServerTag()
, isPHPTag()
or
isMasonTag()
is true
;
true
if the start tag is a server tag, otherwise false
.public boolean isPHPTag()
<?php ... ?>
).
Short-form
and ASP-style
PHP tags will return false
, but can be recognised using the isProcessingInstruction()
and isCommonServerTag()
methods respectively. PHP code blocks denoted using
<script language="php"> ... </script>
tags are also not recognised by this method.
This library only correctly recognises PHP tags that comply with the XML syntax for processing instructions.
Specifically, the tag is terminated by the first "?>
" text, regardless of
whether it occurs within a PHP string expression.
Unfortunately there is no reliable way to determine the end of a PHP tag without the use of a PHP parser.
Note that the standard PHP processor removes newline characters following PHP tags, but PHP tags recognised by this library do not include trailing newlines. They must be removed manually if required.
The following code is an example of a PHP tag:
<?php echo '<p>Hello World</p>'; ?>
true
if the start tag is a standard PHP tag, otherwise false
.public boolean isCommonServerTag()
<% ... %>
).
Common server tags include ASP, JSP, PSP, ASP-style PHP and Mason substitution tags.
The following code is an example of a JSP server tag:
<%@ include file="relativeURL" %>
true
if the start tag is a server tag, otherwise false
.public boolean isMasonTag()
This returns true
if any one of isMasonNamedBlock()
, isMasonComponentCall()
,
isMasonComponentCalledWithContent()
or isCommonServerTag()
is true
.
true
if the start tag is any type of Mason tag, otherwise false
.isServerTag()
public boolean isMasonNamedBlock()
The following code is an example of a Mason named block:
<%perl> print "hello world"; </%perl>
true
if the start tag is a Mason named block, otherwise false
.isMasonTag()
public boolean isMasonComponentCall()
<& ... &>
).
Note that this method returns false
for Mason substitution tags and named blocks.
Use the isCommonServerTag()
method to detect these other Mason tags.
The following code is an example of a Mason component call:
<& menu &>
true
if the start tag is a Mason component call, otherwise false
.isMasonTag()
public boolean isMasonComponentCalledWithContent()
The following code is an example of a Mason component called with content:
<&| /sql/select, query => 'SELECT name, age FROM User' &> <tr><td>%name</td><td>%age</td></tr> </&>
true
if the start tag is a Mason component called with content, otherwise false
.isMasonTag()
public Attributes getAttributes()
Guaranteed not null
except in one of the following cases:
Returns null
if this start tag represents an
HTML comment, DocType declaration,
processing instruction or server tag.
The only type of processing instruction that contains attributes by default is an XML Declaration.
To force the parsing of attributes in the above cases, use the parseAttributes()
method instead.
null
if this start tag represents an HTML comment, DocType declaration, processing instruction or server side tag.parseAttributes()
,
Source.parseAttributes(int pos, int maxEnd)
public Attributes parseAttributes()
getAttributes()
method.
This method returns the cached attributes from the getAttributes()
method
if not null
, otherwise the source is physically parsed with each call to this method.
This is equivalent to parseAttributes(Attributes.getDefaultMaxErrorCount())
parseAttributes
in class Segment
null
if too many errors occur while parsing.getAttributes()
,
Source.parseAttributes(int pos, int maxEnd)
public Attributes parseAttributes(int maxErrorCount)
getAttributes()
method.
See parseAttributes()
for more information.
maxErrorCount
- the maximum number of minor errors allowed while parsing
null
if too many errors occur while parsing.getAttributes()
public FormControl getFormControl()
FormControl
defined by this start tag.
This is equivalent to getElement().getFormControl()
FormControl
defined by this start tag, or null
if it is not a control.public FormControlType getFormControlType()
getFormControl().getFormControlType()
instead.
FormControlType
of this start tag.
This method has been deprecated as of version 1.5 as it is no longer used internally and was never very useful as a public method.
null
if it is not a control.Element.getFormControl()
public EndTag findEndTag()
This method exists mainly for backward compatability with version 1.0.
The getElement()
method is much more useful as it will determine the span of the
element even if the end tag is optional and doesn't exist
(This is a new feature in version 1.1).
This method on the other hand will just return null
in the above case, and
is equivalent to getElement().getEndTag()
null
if none exists.public Element getElement()
null
.
Note that as of version 1.1, this method returns an element spanning the logical HTML element if the end tag is optional but not present. In this case the version 1.0 method returned an element spanning only the start tag.
1. <div> 2. <div> 3. <div> 4. <div>This is line 4</div> 5. </div> 6. <div>This is line 6</div> 7. </div>
<div>
element is required,
making the sample code invalid as all the end tags are matched with other start tags.
1. <ul> 2. <li>item 1 3. <li>item 2 4. <ul> 5. <li>subitem 1</li> 6. <li>subitem 2 7. </ul> 8. <li>item 3</li> 9. </ul>
<li>
start tag on line 3.
<li>
start tag on line 8.
</ul>
end tag on line 7.
public java.lang.String regenerateHTML()
This method returns the start tag regenerated in the following manner:
If the start tag does not normally have attributes associated with it
(ie getAttributes()
is null
), then the original source text is returned.
<INPUT name='Company' value='Günter O&#39;Reilly & Associés'>
produces the following regenerated HTML:
<input name="Company" value="Günter O'Reilly & Associés" />
public static java.lang.String generateHTML(java.lang.String tagName, java.util.Map attributesMap, boolean emptyElementTag)
The output of the attributes is as described in the Attributes.generateHTML(Map attributesMap)
method.
The emptyElementTag
argument specifies whether the start tag should be an
empty element tag,
in which case a slash is inserted before the closing angle bracket, separated from the name
or last attribute by a single space.
LinkedHashMap attributesMap=new LinkedHashMap(); attributesMap.put("name","Company"); attributesMap.put("value","G\n00fcnter O'Reilly & Associés"); System.out.println(StartTag.generateHTML("INPUT",attributesMap,true));generates the following output:
<INPUT name="Company" value="Günter O'Reilly & Associés" />
tagName
- the name of the start tag.attributesMap
- a map containing attribute name/value pairs.emptyElementTag
- specifies whether the start tag should be an empty element tag.
EndTag.generateHTML(String tagName)
public java.lang.String getDebugInfo()
Segment
getDebugInfo
in class Segment
public Segment getFollowingTextSegment()
new Segment(source,this.getEnd(),source.findNextTag(this.getEnd()))
instead.
Guaranteed not null
.
This method has been deprecated as of version 1.5 as it is no longer used internally and was never very useful as a public method.
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |