Getting the Text in an HTML Document : HTML Parser « Network Protocol « Java

Getting the Text in an HTML Document



import javax.swing.text.EditorKit;
import javax.swing.text.html.HTMLDocument;
import javax.swing.text.html.HTMLEditorKit;

public class Main {
  public static void main(String[] argv) throws Exception {
    HTMLDocument doc = new HTMLDocument() {
      public HTMLEditorKit.ParserCallback getReader(int pos) {
        return new HTMLEditorKit.ParserCallback() {
          public void handleText(char[] data, int pos) {

    URL url = new URI("").toURL();
    URLConnection conn = url.openConnection();
    Reader rd = new InputStreamReader(conn.getInputStream());

    EditorKit kit = new HTMLEditorKit();, doc, 0);


Related examples in the same category

1.Escape HTML special characters from a String
2.Using javax.swing.text.html.HTMLEditorKit to parse html document
3.Extract links from an HTML page
4.extends HTMLEditorKit.ParserCallback
5.HTML parser based on HTMLEditorKit.ParserCallback
6.Get all hyper links from a web page
7.Getting the Links in an HTML Document
8.Find and display hyperlinks contained within a web page
9.Use regular expression to get web page title