Find and display hyperlinks contained within a web page : HTML Parser « Network « Java Tutorial






import java.io.BufferedReader;
import java.io.FileReader;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
  public static void main(String[] arguments)throws Exception {
    StringBuffer output = new StringBuffer();

    FileReader file = new FileReader("a.htm");
    BufferedReader buff = new BufferedReader(file);
    boolean eof = false;
    while (!eof) {
      String line = buff.readLine();
      if (line == null)
        eof = true;
      else
        output.append(line + "\n");
    }
    buff.close();

    String page = output.toString();
    Pattern pattern = Pattern.compile("<a.+href=\"(.+?)\"");
    Matcher matcher = pattern.matcher(page);
    while (matcher.find()) {
      System.out.println(matcher.group(1));
    }
  }

}








19.26.HTML Parser
19.26.1.Getting the Links in an HTML Document
19.26.2.Getting the Text in an HTML Document
19.26.3.Escape HTML special characters from a String
19.26.4.Using javax.swing.text.html.HTMLEditorKit to parse html document
19.26.5.Extract links from an HTML page
19.26.6.extends HTMLEditorKit.ParserCallback
19.26.7.HTML parser based on HTMLEditorKit.ParserCallback
19.26.8.Find and display hyperlinks contained within a web page
19.26.9.Get all hyper links from a web page
19.26.10.HTML Parser