Getting the Links in an HTML Document

  

import java.io.InputStreamReader;
import java.io.Reader;
import java.net.URI;
import java.net.URL;
import java.net.URLConnection;

import javax.swing.text.EditorKit;
import javax.swing.text.SimpleAttributeSet;
import javax.swing.text.html.HTML;
import javax.swing.text.html.HTMLDocument;
import javax.swing.text.html.HTMLEditorKit;

public class Main {
  public static void main(String[] argv) throws Exception {
    URL url = new URI("http://www.google.com").toURL();
    URLConnection conn = url.openConnection();
    Reader rd = new InputStreamReader(conn.getInputStream());

    EditorKit kit = new HTMLEditorKit();
    HTMLDocument doc = (HTMLDocument) kit.createDefaultDocument();
    kit.read(rd, doc, 0);

    HTMLDocument.Iterator it = doc.getIterator(HTML.Tag.A);
    while (it.isValid()) {
      SimpleAttributeSet s = (SimpleAttributeSet) it.getAttributes();

      String link = (String) s.getAttribute(HTML.Attribute.HREF);
      if (link != null) {
        System.out.println(link);
      }
      it.next();
    }
  }
}

Related examples in the same category

1.	Escape HTML special characters from a String
2.	Using javax.swing.text.html.HTMLEditorKit to parse html document
3.	Extract links from an HTML page
4.	extends HTMLEditorKit.ParserCallback
5.	HTML parser based on HTMLEditorKit.ParserCallback
6.	Get all hyper links from a web page
7.	Getting the Text in an HTML Document
8.	Find and display hyperlinks contained within a web page
9.	Use regular expression to get web page title

Getting the Links in an HTML Document : HTML Parser « Network Protocol « Java

Related examples in the same category