Java HTML to String htmlToText(String input)

Here you can find the source of htmlToText(String input)

Description

Converts HTML to plain text, according to the following rules:
  • Replaces any newlines or carriage returns in the source text with single spaces.

    License

    Apache License

    Declaration

    public static String htmlToText(String input) 
    

    Method Source Code

    //package com.java2s;
    // Licensed under the Apache License, Version 2.0 (the "License");
    
    public class Main {
        /**/*from  w  w w  . j a  v a 2s  . c  o m*/
         *  Converts HTML to plain text, according to the following rules:
         *  <ul>
         *  <li> Replaces any newlines or carriage returns in the source text with single spaces.
         *  <li> Replaces <code>&lt;P&gt;</code> and <code>&lt;BR&gt;</code> with newlines.
         *  <li> Replaces <code>&lt;LI&gt;</code> with newline followed by "* ".
         *  <li> Removes all other tags, including their attributes.
         *  <li> Leaves text behind.
         *  </ul>
         *
         *  @since 1.0.2
         */
        public static String htmlToText(String input) {
            if (input == null)
                input = "";
    
            input = input.replaceAll("[\r\n]+", " ");
    
            StringBuilder buf = new StringBuilder(input.trim());
            int openIdx = 0;
            while ((openIdx = buf.indexOf("<", openIdx)) >= 0) {
                int closeIdx = buf.indexOf(">", openIdx);
                if (closeIdx < 0) {
                    // punt on unclosed tag
                    buf.delete(openIdx, buf.length());
                    break;
                }
                String tag = buf.substring(openIdx + 1, closeIdx).trim().toUpperCase();
                buf.delete(openIdx, closeIdx + 1);
                tag = tag.replaceAll("\\s+.*", "");
                if (tag.equals("P") || tag.startsWith("BR"))
                    buf.insert(openIdx, "\n");
                else if (tag.equals("LI"))
                    buf.insert(openIdx, "\n* ");
            }
            return buf.toString();
        }
    }
    

    Related

    1. htmlToString(String aS_Text)
    2. htmlToString(String s)
    3. htmlToString(String string)
    4. htmlToText(String html)
    5. htmlToText(String html)
    6. htmlToText(String sHTML)
    7. toText(String html)
    8. toText(String html)
    9. toText(String text)