HTMLDocument: Element Iterator Example : Document HTML « Development Class « Java

HTMLDocument: Element Iterator Example

Definitive Guide to Swing for Java 2, Second Edition
By John Zukowski     
ISBN: 1-893115-78-X
Publisher: APress


import javax.swing.text.AttributeSet;
import javax.swing.text.Element;
import javax.swing.text.ElementIterator;
import javax.swing.text.StyleConstants;
import javax.swing.text.html.HTML;
import javax.swing.text.html.HTMLDocument;
import javax.swing.text.html.HTMLEditorKit;
import javax.swing.text.html.parser.ParserDelegator;

public class ElementIteratorExample {

  public static void main(String args[]) throws Exception {

    if (args.length != 1) {
      System.err.println("Usage: java ElementIteratorExample input-URL");

    // Load HTML file synchronously
    URL url = new URL(args[0]);
    URLConnection connection = url.openConnection();
    InputStream is = connection.getInputStream();
    InputStreamReader isr = new InputStreamReader(is);
    BufferedReader br = new BufferedReader(isr);

    HTMLEditorKit htmlKit = new HTMLEditorKit();
    HTMLDocument htmlDoc = (HTMLDocument) htmlKit.createDefaultDocument();
    HTMLEditorKit.Parser parser = new ParserDelegator();
    HTMLEditorKit.ParserCallback callback = htmlDoc.getReader(0);
    parser.parse(br, callback, true);

    // Parse
    ElementIterator iterator = new ElementIterator(htmlDoc);
    Element element;
    while ((element = != null) {
      AttributeSet attributes = element.getAttributes();
      Object name = attributes.getAttribute(StyleConstants.NameAttribute);
      if ((name instanceof HTML.Tag)
          && ((name == HTML.Tag.H1) || (name == HTML.Tag.H2) || (name == HTML.Tag.H3))) {
        // Build up content text as it may be within multiple elements
        StringBuffer text = new StringBuffer();
        int count = element.getElementCount();
        for (int i = 0; i < count; i++) {
          Element child = element.getElement(i);
          AttributeSet childAttributes = child.getAttributes();
          if (childAttributes
              .getAttribute(StyleConstants.NameAttribute) == HTML.Tag.CONTENT) {
            int startOffset = child.getStartOffset();
            int endOffset = child.getEndOffset();
            int length = endOffset - startOffset;
            text.append(htmlDoc.getText(startOffset, length));
        System.out.println(name + ": " + text.toString());


Related examples in the same category

1.HTMLEditorKit DemoHTMLEditorKit Demo
2.SimpleAttributeSet ExampleSimpleAttributeSet Example
3.Text Tab SampleText Tab Sample
4.Styled DocumentStyled Document
5.Html utils for working with tag's names and attributes.
6.Escape HTML
7.Escape HTML
8.Encode HTML
9.Replace all the occurences of HTML escape strings with the respective characters.
10.HTML Rewriter
11.HTML Encode
12.XMLWriter is a generic class that provides common behavior to writers of a tagged language such as XML, WordML and HTML.
13.Escape html entities.
14.Html Encoder
15.Remove Comment