Extract links from an HTML page : HTML Parser « Network « Java Tutorial

Home
Java Tutorial
1.Language
2.Data Type
3.Operators
4.Statement Control
5.Class Definition
6.Development
7.Reflection
8.Regular Expressions
9.Collections
10.Thread
11.File
12.Generics
13.I18N
14.Swing
15.Swing Event
16.2D Graphics
17.SWT
18.SWT 2D Graphics
19.Network
20.Database
21.Hibernate
22.JPA
23.JSP
24.JSTL
25.Servlet
26.Web Services SOA
27.EJB3
28.Spring
29.PDF
30.Email
31.J2ME
32.J2EE Application
33.XML
34.Design Pattern
35.Log
36.Security
37.Apache Common
38.Ant
39.JUnit
Java Tutorial » Network » HTML Parser 
19.26.5.Extract links from an HTML pagePrevious/Next
import java.io.FileReader;
import java.util.ArrayList;

import javax.swing.text.MutableAttributeSet;
import javax.swing.text.html.HTML.Attribute;
import javax.swing.text.html.HTML.Tag;
import javax.swing.text.html.HTMLEditorKit.ParserCallback;
import javax.swing.text.html.parser.ParserDelegator;

public class Main {
  public final static void main(String[] argsthrows Exception {
    final ArrayList<String> list = new ArrayList<String>();

    ParserDelegator parserDelegator = new ParserDelegator();
    ParserCallback parserCallback = new ParserCallback() {
      public void handleText(final char[] data, final int pos) {
      }

      public void handleStartTag(Tag tag, MutableAttributeSet attribute, int pos) {
        if (tag == Tag.A) {
          String address = (Stringattribute.getAttribute(Attribute.HREF);
          list.add(address);
        }
      }

      public void handleEndTag(Tag t, final int pos) {
      }

      public void handleSimpleTag(Tag t, MutableAttributeSet a, final int pos) {
      }

      public void handleComment(final char[] data, final int pos) {
      }

      public void handleError(final java.lang.String errMsg, final int pos) {
      }
    };
    parserDelegator.parse(new FileReader("a.html"), parserCallback, false);
    System.out.println(list);
  }
}
19.26.HTML Parser
19.26.1.Getting the Links in an HTML Document
19.26.2.Getting the Text in an HTML Document
19.26.3.Escape HTML special characters from a String
19.26.4.Using javax.swing.text.html.HTMLEditorKit to parse html document
19.26.5.Extract links from an HTML page
19.26.6.extends HTMLEditorKit.ParserCallback
19.26.7.HTML parser based on HTMLEditorKit.ParserCallback
19.26.8.Find and display hyperlinks contained within a web page
19.26.9.Get all hyper links from a web page
19.26.10.HTML Parser
java2s.com  | Contact Us | Privacy Policy
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.