Java Utililty Methods HTML to Text

List of utility methods to do HTML to Text

Description

The list of methods to do HTML to Text are organized into topic(s).

Method

Stringhtml2text(final String html)
Converts HTML to plaintext.
Document document = Jsoup.parse(html);
document.select("br").append("\\n");
document.select("p").prepend("\\n\\n");
return document.text().replaceAll("\\\\n", "\n");
Stringhtml2text(String html)
htmltext
return Jsoup.parse(html).text();
Stringhtml2text(String htmlStr)
Converts html content to text.
if ((null == htmlStr) || (htmlStr.isEmpty())) {
    throw new IllegalArgumentException("The input html string was null or empty");
return Jsoup.parse(htmlStr).text();
Stringtext(Element e)
Fetches the text of an element but preserves newlines.
checkNotNull(e, "e should not be null.");
e.select("br").append("\\n");
e.select("p").prepend("\\n\\n");
return e.text().replaceAll("\\\\n", "\n").trim();
Stringtext(Element element)
text
final StringBuilder accum = new StringBuilder();
new NodeTraversor(new NodeVisitor() {
    public void head(Node node, int depth) {
        if (node instanceof TextNode) {
            TextNode textNode = (TextNode) node;
            String str = textNode.getWholeText();
            str = WHITESPACE_BLOCK.matcher(str).replaceAll(" ");
            accum.append(str);
...
StringtextOf(final Element el)
text Of
final StringBuilder accum = new StringBuilder();
new NodeTraversor(new NodeVisitor() {
    public void head(final Node node, final int depth) {
        if (node instanceof TextNode) {
            TextNode textNode = (TextNode) node;
            accum.append(textNode.text());
        } else if (node instanceof Element) {
            Element element = (Element) node;
...
ElementtoElement(String html)
Converts an HTML string to an HTML element.
return toElement(html, null);
ElementtoHtmlByHtml(String html)
insert html tag in the top
Beware: It's include head and body tag too
 input: 
I am Java programmer
output:
I am Java programmer
The output, can managing by #newOutputSetting(Document.OutputSettings)
return new Element("html").append(html);
ElementtoHtmlByPlain(String plainText)
same work with #toHtmlByHtml(String) but think input parameter as plain text (so meaning if there have charactor that cannot convert to html it's will change to other)
Example:
 input: 
I am Java programmer
output: <div>I am <code>Java</code> programmer</div>
Element html = new Element("html");
Element head = new Element("head");
Element body = new Element("body").text(plainText);
return html.prependChild(body).prependChild(head);