au.id.jericho.lib.html
Class StartTag

java.lang.Object
  extended byau.id.jericho.lib.html.Segment
      extended byau.id.jericho.lib.html.Tag
          extended byau.id.jericho.lib.html.StartTag
All Implemented Interfaces:
java.lang.CharSequence, java.lang.Comparable

public final class StartTag
extends Tag

Represents the start tag of an Element.

Created using one of the following methods:

Note that an HTML comment is represented as a StartTag object.

The name argument in the above methods can be literal text strings specifying the name of a start tag to search for.

Specifying a name parameter ending in a colon (:) searches for all start tags in the specified XML namespace.

The constants defined in the Tag class can be used as name arguments. For example:
source.findAllStartTags(Tag.A) or source.findAllStartTags("a") - finds all hyperlink start tags
source.findAllStartTags(Tag.PROCESSING_INSTRUCTION) - finds all processing instructions <? ... ?>
source.findAllStartTags(Tag.SERVER_COMMON) - finds all common server tags <% ... %>

Note however that the end of a PHP tag can not be reliably found without the use of a PHP parser, meaning any PHP tag found by this library is not guaranteed to have the correct end position.

See also the XML 1.0 specification for start tags.

See Also:
Element, EndTag

Field Summary
 
Fields inherited from class au.id.jericho.lib.html.Tag
A, ABBR, ACRONYM, ADDRESS, APPLET, AREA, B, BASE, BASEFONT, BDO, BIG, BLOCKQUOTE, BODY, BR, BUTTON, CAPTION, CENTER, CITE, CODE, COL, COLGROUP, DD, DEL, DFN, DIR, DIV, DL, DOCTYPE_DECLARATION, DT, EM, FIELDSET, FONT, FORM, FRAME, FRAMESET, H1, H2, H3, H4, H5, H6, HEAD, HR, HTML, I, IFRAME, IMG, INPUT, INS, ISINDEX, KBD, LABEL, LEGEND, LI, LINK, MAP, MENU, META, NOFRAMES, NOSCRIPT, OBJECT, OL, OPTGROUP, OPTION, P, PARAM, PRE, PROCESSING_INSTRUCTION, Q, S, SAMP, SCRIPT, SELECT, SERVER_COMMON, SERVER_MASON_COMPONENT_CALL, SERVER_MASON_COMPONENT_CALLED_WITH_CONTENT, SERVER_MASON_NAMED_BLOCK, SERVER_PHP, SMALL, SPAN, STRIKE, STRONG, STYLE, SUB, SUP, TABLE, TBODY, TD, TEXTAREA, TFOOT, TH, THEAD, TITLE, TR, TT, U, UL, VAR, XML_DECLARATION
 
Method Summary
 EndTag findEndTag()
          Returns the end tag that corresponds to this start tag.
static java.lang.String generateHTML(java.lang.String tagName, java.util.Map attributesMap, boolean emptyElementTag)
          Generates the HTML text of a start tag with the specified tag name and attributes map.
 Attributes getAttributes()
          Returns the attributes specified in this start tag.
 java.lang.String getDebugInfo()
          Returns a string representation of this object useful for debugging purposes.
 Element getElement()
          Returns the element that corresponds to this start tag.
 Segment getFollowingTextSegment()
          Deprecated. Use new Segment(source,this.getEnd(),source.findNextTag(this.getEnd())) instead.
 FormControl getFormControl()
          Returns the FormControl defined by this start tag.
 FormControlType getFormControlType()
          Deprecated. Use getFormControl().getFormControlType() instead.
 boolean isComment()
          Indicates whether this Segment represents an HTML comment.
 boolean isCommonServerTag()
          Indicates whether the start tag is a common server tag (<% ... %>).
 boolean isDocTypeDeclaration()
          Indicates whether the start tag is a document type declaration.
 boolean isEmptyElementTag()
          Indicates whether the start tag is an empty element tag.
 boolean isEndTagForbidden()
          Indicates whether the corresponding end tag is forbidden.
 boolean isEndTagOptional()
          Indicates whether the corresponding end tag is optional according to the HTML specification.
 boolean isEndTagRequired()
          Indicates whether the corresponding end tag is required according to the HTML specification.
 boolean isMasonComponentCall()
          Indicates whether the start tag is a Mason component call (<& ... &>).
 boolean isMasonComponentCalledWithContent()
          Indicates whether the start tag is a Mason component called with content.
 boolean isMasonNamedBlock()
          Indicates whether the start tag is a Mason named block.
 boolean isMasonTag()
          Indicates whether the start tag is any type of Mason tag.
 boolean isPHPTag()
          Indicates whether the start tag is a standard PHP tag (<?php ... ?>).
 boolean isProcessingInstruction()
          Indicates whether the start tag is a processing instruction.
 boolean isServerTag()
          Indicates whether the start tag is a server tag.
 boolean isXMLDeclaration()
          Indicates whether the start tag is an XML declaration.
 Attributes parseAttributes()
          Parses the attributes specified in this start tag, regardless of the type of start tag.
 Attributes parseAttributes(int maxErrorCount)
          Parses the attributes specified in this start tag, regardless of the type of start tag.
 java.lang.String regenerateHTML()
          Regenerates the HTML text of this start tag.
 
Methods inherited from class au.id.jericho.lib.html.Tag
getName
 
Methods inherited from class au.id.jericho.lib.html.Segment
charAt, compareTo, encloses, encloses, equals, findAllCharacterReferences, findAllComments, findAllElements, findAllElements, findAllStartTags, findAllStartTags, findAllStartTags, findFormControls, findFormFields, findWords, getBegin, getEnd, getSourceText, getSourceTextNoWhitespace, hashCode, ignoreWhenParsing, isWhiteSpace, length, subSequence, toString
 
Methods inherited from class java.lang.Object
getClass, notify, notifyAll, wait, wait, wait
 

Method Detail

isEndTagForbidden

public boolean isEndTagForbidden()
Indicates whether the corresponding end tag is forbidden.

This is the case if one of isComment(), isEmptyElementTag(), isProcessingInstruction(), isDocTypeDeclaration(), EndTag.isForbidden(getName()), or isServerTag() is true, unless isMasonNamedBlock() or isMasonComponentCalledWithContent() is true.

Note that as of version 1.1, this method also takes the name of the tag into account by checking whether the HTML specification forbids an end tag of this name.

Returns:
true if the corresponding end tag is forbidden, otherwise false.

isEndTagOptional

public boolean isEndTagOptional()
Indicates whether the corresponding end tag is optional according to the HTML specification.

This is equivalent to EndTag.isOptional(getName())

Returns:
true if the corresponding end tag is optional, otherwise false.

isEndTagRequired

public boolean isEndTagRequired()
Indicates whether the corresponding end tag is required according to the HTML specification.

This is equivalent to EndTag.isRequired(getName())

Returns:
true if the corresponding end tag is required, otherwise false.

isEmptyElementTag

public boolean isEmptyElementTag()
Indicates whether the start tag is an empty element tag.

This is signified by the characters "/>" at the end of the start tag.

Returns:
true if the StartTag is an empty element tag, otherwise false.

isComment

public boolean isComment()
Description copied from class: Segment
Indicates whether this Segment represents an HTML comment.

An HTML comment is an area of the source document enclosed by the delimiters <!-- on the left and --> on the right.

The HTML 4.01 Specification section 3.2.4 states that the end of comment delimiter may contain white space between the "--" and ">" characters, but this library does not recognise end of comment delimiters containing white space.

Overrides:
isComment in class Segment
Returns:
true if this Segment represents an HTML comment, otherwise false.

isProcessingInstruction

public boolean isProcessingInstruction()
Indicates whether the start tag is a processing instruction.

Although an XML processing instruction technically requires a PITarget (essentially a name), this library considers any tag starting with a question mark (?) to be a processing instruction, including standard and short-form PHP tags.

An XML declaration is a special type of processing instruction with the reserved PITarget name of "xml".

The following code is an example of a processing instruction:

 <?xml version="1.0" encoding="UTF-8"?>
 

Returns:
true if the start tag is a processing instruction, otherwise false.

isXMLDeclaration

public boolean isXMLDeclaration()
Indicates whether the start tag is an XML declaration.

The following code is an example of an XML declaration:

 <?xml version="1.0" encoding="UTF-8"?>
 

Returns:
true if the start tag is an XML declaration, otherwise false.
See Also:
isProcessingInstruction()

isDocTypeDeclaration

public boolean isDocTypeDeclaration()
Indicates whether the start tag is a document type declaration.

The following code is an example of a document type declaration:

 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
 

Returns:
true if the start tag is a document type declaration, otherwise false.

isServerTag

public boolean isServerTag()
Indicates whether the start tag is a server tag.

Recognised server tags include ASP, JSP, PSP, PHP and Mason.

<script> tags are never regarded as server tags, regardless of whether they have a runat="server" or equivalent attribute.

Short-form PHP tags are not recognised as server tags, but as processing instructions.

Returns true if one of isCommonServerTag(), isPHPTag() or isMasonTag() is true;

Returns:
true if the start tag is a server tag, otherwise false.

isPHPTag

public boolean isPHPTag()
Indicates whether the start tag is a standard PHP tag (<?php ... ?>).

Short-form and ASP-style PHP tags will return false, but can be recognised using the isProcessingInstruction() and isCommonServerTag() methods respectively. PHP code blocks denoted using <script language="php"> ... </script> tags are also not recognised by this method.

This library only correctly recognises PHP tags that comply with the XML syntax for processing instructions. Specifically, the tag is terminated by the first "?>" text, regardless of whether it occurs within a PHP string expression. Unfortunately there is no reliable way to determine the end of a PHP tag without the use of a PHP parser.

Note that the standard PHP processor removes newline characters following PHP tags, but PHP tags recognised by this library do not include trailing newlines. They must be removed manually if required.

The following code is an example of a PHP tag:

 <?php echo '<p>Hello World</p>'; ?>
 

Returns:
true if the start tag is a standard PHP tag, otherwise false.

isCommonServerTag

public boolean isCommonServerTag()
Indicates whether the start tag is a common server tag (<% ... %>).

Common server tags include ASP, JSP, PSP, ASP-style PHP and Mason substitution tags.

The following code is an example of a JSP server tag:

 <%@ include file="relativeURL" %>
 

Returns:
true if the start tag is a server tag, otherwise false.

isMasonTag

public boolean isMasonTag()
Indicates whether the start tag is any type of Mason tag.

This returns true if any one of isMasonNamedBlock(), isMasonComponentCall(), isMasonComponentCalledWithContent() or isCommonServerTag() is true.

Returns:
true if the start tag is any type of Mason tag, otherwise false.
See Also:
isServerTag()

isMasonNamedBlock

public boolean isMasonNamedBlock()
Indicates whether the start tag is a Mason named block.

The following code is an example of a Mason named block:

 <%perl> print "hello world"; </%perl>
 

Returns:
true if the start tag is a Mason named block, otherwise false.
See Also:
isMasonTag()

isMasonComponentCall

public boolean isMasonComponentCall()
Indicates whether the start tag is a Mason component call (<& ... &>).

Note that this method returns false for Mason substitution tags and named blocks. Use the isCommonServerTag() method to detect these other Mason tags.

The following code is an example of a Mason component call:

 <& menu &>
 

Returns:
true if the start tag is a Mason component call, otherwise false.
See Also:
isMasonTag()

isMasonComponentCalledWithContent

public boolean isMasonComponentCalledWithContent()
Indicates whether the start tag is a Mason component called with content.

The following code is an example of a Mason component called with content:

 <&| /sql/select, query => 'SELECT name, age FROM User' &>
   <tr><td>%name</td><td>%age</td></tr>
 </&>
 

Returns:
true if the start tag is a Mason component called with content, otherwise false.
See Also:
isMasonTag()

getAttributes

public Attributes getAttributes()
Returns the attributes specified in this start tag.

Guaranteed not null except in one of the following cases:

Returns null if this start tag represents an HTML comment, DocType declaration, processing instruction or server tag.

The only type of processing instruction that contains attributes by default is an XML Declaration.

To force the parsing of attributes in the above cases, use the parseAttributes() method instead.

Returns:
the attributes specified in this start tag, or null if this start tag represents an HTML comment, DocType declaration, processing instruction or server side tag.
See Also:
parseAttributes(), Source.parseAttributes(int pos, int maxEnd)

parseAttributes

public Attributes parseAttributes()
Parses the attributes specified in this start tag, regardless of the type of start tag. This method is only required in the unusual situation where attributes exist in a start tag that normally doesn't contain attributes. These types of start tags are listed in the documentation of the getAttributes() method.

This method returns the cached attributes from the getAttributes() method if not null, otherwise the source is physically parsed with each call to this method.

This is equivalent to parseAttributes(Attributes.getDefaultMaxErrorCount())

Overrides:
parseAttributes in class Segment
Returns:
the attributes specified in this start tag, or null if too many errors occur while parsing.
See Also:
getAttributes(), Source.parseAttributes(int pos, int maxEnd)

parseAttributes

public Attributes parseAttributes(int maxErrorCount)
Parses the attributes specified in this start tag, regardless of the type of start tag. This method is only required in the unusual situation where attributes exist in a start tag that normally doesn't contain attributes. These types of start tags are listed in the documentation of the getAttributes() method.

See parseAttributes() for more information.

Parameters:
maxErrorCount - the maximum number of minor errors allowed while parsing
Returns:
the attributes specified in this start tag, or null if too many errors occur while parsing.
See Also:
getAttributes()

getFormControl

public FormControl getFormControl()
Returns the FormControl defined by this start tag.

This is equivalent to getElement().getFormControl()

Returns:
the FormControl defined by this start tag, or null if it is not a control.

getFormControlType

public FormControlType getFormControlType()
Deprecated. Use getFormControl().getFormControlType() instead.

Returns the FormControlType of this start tag.

This method has been deprecated as of version 1.5 as it is no longer used internally and was never very useful as a public method.

Returns:
the form control type of this start tag, or null if it is not a control.
See Also:
Element.getFormControl()

findEndTag

public EndTag findEndTag()
Returns the end tag that corresponds to this start tag.

This method exists mainly for backward compatability with version 1.0.

The getElement() method is much more useful as it will determine the span of the element even if the end tag is optional and doesn't exist (This is a new feature in version 1.1).

This method on the other hand will just return null in the above case, and is equivalent to getElement().getEndTag()

Returns:
the end tag that corresponds to this start tag, or null if none exists.

getElement

public Element getElement()
Returns the element that corresponds to this start tag. Guaranteed not null.

Note that as of version 1.1, this method returns an element spanning the logical HTML element if the end tag is optional but not present. In this case the version 1.0 method returned an element spanning only the start tag.

Example 1: Elements that have required end tags

 1. <div>
 2.   <div>
 3.     <div>
 4.       <div>This is line 4</div>
 5.     </div>
 6.     <div>This is line 6</div>
 7.   </div>

Example 2: Elements that have optional end tags

 1. <ul>
 2.   <li>item 1
 3.   <li>item 2
 4.     <ul>
 5.       <li>subitem 1</li>
 6.       <li>subitem 2
 7.     </ul>
 8.   <li>item 3</li>
 9. </ul>

Returns:
the element that corresponds to this start tag.

regenerateHTML

public java.lang.String regenerateHTML()
Regenerates the HTML text of this start tag.

This method returns the start tag regenerated in the following manner:

If the start tag does not normally have attributes associated with it (ie getAttributes() is null), then the original source text is returned.

Example:

The following source text:

<INPUT name='Company' value='G&uuml;nter O&#39;Reilly &amp Associés'>

produces the following regenerated HTML:

<input name="Company" value="G&uuml;nter O'Reilly &amp; Associ&eacute;s" />

Returns:
the regenerated HTML text of this start tag, or the source text if it does not have attributes.

generateHTML

public static java.lang.String generateHTML(java.lang.String tagName,
                                            java.util.Map attributesMap,
                                            boolean emptyElementTag)
Generates the HTML text of a start tag with the specified tag name and attributes map.

The output of the attributes is as described in the Attributes.generateHTML(Map attributesMap) method.

The emptyElementTag argument specifies whether the start tag should be an empty element tag, in which case a slash is inserted before the closing angle bracket, separated from the name or last attribute by a single space.

Example:

The following code:
 LinkedHashMap attributesMap=new LinkedHashMap();
 attributesMap.put("name","Company");
 attributesMap.put("value","G\n00fcnter O'Reilly & Associés");
 System.out.println(StartTag.generateHTML("INPUT",attributesMap,true));
generates the following output:

<INPUT name="Company" value="G&uuml;nter O'Reilly &amp; Associ&eacute;s" />

Parameters:
tagName - the name of the start tag.
attributesMap - a map containing attribute name/value pairs.
emptyElementTag - specifies whether the start tag should be an empty element tag.
Returns:
the HTML text of a start tag with the specified tag name and attributes map.
See Also:
EndTag.generateHTML(String tagName)

getDebugInfo

public java.lang.String getDebugInfo()
Description copied from class: Segment
Returns a string representation of this object useful for debugging purposes.

Overrides:
getDebugInfo in class Segment
Returns:
a string representation of this object useful for debugging purposes.

getFollowingTextSegment

public Segment getFollowingTextSegment()
Deprecated. Use new Segment(source,this.getEnd(),source.findNextTag(this.getEnd())) instead.

Returns the segment containing the text that immediately follows this start tag up until the start of the following tag.

Guaranteed not null.

This method has been deprecated as of version 1.5 as it is no longer used internally and was never very useful as a public method.

Returns:
the segment containing the text that immediately follows this start tag up until the start of the following tag.