Getting the Text in an HTML Document : HTML Parser « Network Protocol « Java

Home
Java
1.2D Graphics GUI
2.3D
3.Advanced Graphics
4.Ant
5.Apache Common
6.Chart
7.Class
8.Collections Data Structure
9.Data Type
10.Database SQL JDBC
11.Design Pattern
12.Development Class
13.EJB3
14.Email
15.Event
16.File Input Output
17.Game
18.Generics
19.GWT
20.Hibernate
21.I18N
22.J2EE
23.J2ME
24.JavaFX
25.JDK 6
26.JDK 7
27.JNDI LDAP
28.JPA
29.JSP
30.JSTL
31.Language Basics
32.Network Protocol
33.PDF RTF
34.Reflection
35.Regular Expressions
36.Scripting
37.Security
38.Servlets
39.Spring
40.Swing Components
41.Swing JFC
42.SWT JFace Eclipse
43.Threads
44.Tiny Application
45.Velocity
46.Web Services SOA
47.XML
Java » Network Protocol » HTML Parser 




Getting the Text in an HTML Document
  

import java.io.InputStreamReader;
import java.io.Reader;
import java.net.URI;
import java.net.URL;
import java.net.URLConnection;

import javax.swing.text.EditorKit;
import javax.swing.text.html.HTMLDocument;
import javax.swing.text.html.HTMLEditorKit;

public class Main {
  public static void main(String[] argvthrows Exception {
    HTMLDocument doc = new HTMLDocument() {
      public HTMLEditorKit.ParserCallback getReader(int pos) {
        return new HTMLEditorKit.ParserCallback() {
          public void handleText(char[] data, int pos) {
            System.out.println(data);
          }
        };
      }
    };

    URL url = new URI("http://www.google.com").toURL();
    URLConnection conn = url.openConnection();
    Reader rd = new InputStreamReader(conn.getInputStream());

    EditorKit kit = new HTMLEditorKit();
    kit.read(rd, doc, 0);
  }
}

   
    
  














Related examples in the same category
1.Escape HTML special characters from a String
2.Using javax.swing.text.html.HTMLEditorKit to parse html document
3.Extract links from an HTML page
4.extends HTMLEditorKit.ParserCallback
5.HTML parser based on HTMLEditorKit.ParserCallback
6.Get all hyper links from a web page
7.Getting the Links in an HTML Document
8.Find and display hyperlinks contained within a web page
9.Use regular expression to get web page title
java2s.com  | Contact Us | Privacy Policy
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.