Extract links from an HTML page : HTML Parser « Network Protocol « Java

Extract links from an HTML page


import java.io.FileReader;
import java.util.ArrayList;

import javax.swing.text.MutableAttributeSet;
import javax.swing.text.html.HTML.Attribute;
import javax.swing.text.html.HTML.Tag;
import javax.swing.text.html.HTMLEditorKit.ParserCallback;
import javax.swing.text.html.parser.ParserDelegator;

public class Main {
  public final static void main(String[] args) throws Exception {
    final ArrayList<String> list = new ArrayList<String>();

    ParserDelegator parserDelegator = new ParserDelegator();
    ParserCallback parserCallback = new ParserCallback() {
      public void handleText(final char[] data, final int pos) {

      public void handleStartTag(Tag tag, MutableAttributeSet attribute, int pos) {
        if (tag == Tag.A) {
          String address = (String) attribute.getAttribute(Attribute.HREF);

      public void handleEndTag(Tag t, final int pos) {

      public void handleSimpleTag(Tag t, MutableAttributeSet a, final int pos) {

      public void handleComment(final char[] data, final int pos) {

      public void handleError(final java.lang.String errMsg, final int pos) {
    parserDelegator.parse(new FileReader("a.html"), parserCallback, false);


Related examples in the same category

1.Escape HTML special characters from a String
2.Using javax.swing.text.html.HTMLEditorKit to parse html document
3.extends HTMLEditorKit.ParserCallback
4.HTML parser based on HTMLEditorKit.ParserCallback
5.Get all hyper links from a web page
6.Getting the Links in an HTML Document
7.Getting the Text in an HTML Document
8.Find and display hyperlinks contained within a web page
9.Use regular expression to get web page title