Java Utililty Methods HTML Parse Jsoup

List of utility methods to do HTML Parse Jsoup

Description

The list of methods to do HTML Parse Jsoup are organized into topic(s).

Method

Stringbr2nl(String html)
brnl
if (html == null)
    return html;
return Jsoup.parse(html).text().replaceAll("\\<.*?>", "");
Stringclean(String html)
clean
if (html == null || html.trim().length() < 1) {
    return "";
return Jsoup.clean(html, getWhitelist());
Stringclean(String html, Whitelist whitelist)
Cleans the specified HTML with the specified white list.
Document doc = parse(mask(html));
Cleaner cleaner = new Cleaner(whitelist);
Document clean = cleaner.clean(doc);
clean.outputSettings().prettyPrint(false);
return unmask(normalizeWhitespaces(clean).body().html());
StringcleanHTML(final String html)
Remove most unsafe tags and attributes, leaving mostly format tags and links.
return html != null ? Jsoup.clean(html, Whitelist.basic()) : null;
StringcleanHtmlCode(String html)
clean Html Code
String s = html;
s = s.replaceAll("(</?(p|div)|<br( ?/)?>)", "#####$1");
s = Jsoup.parse(s).text();
s = s.replaceAll("#####", "\n");
StringBuilder sb = new StringBuilder();
for (String line : s.split("\n")) {
    String trimmed = line.trim();
    if (!trimmed.isEmpty()) {
...
StringcleanHtmlFromString(String stringToClean)
Utility to remove the HTML tags in the String
return Jsoup.clean(stringToClean, Whitelist.basic());
StringcleanHTMLTags(String str)
Remove HTML tags.
if (str == null) {
    return null;
return Jsoup.parse(str).text();
StringcleanupHtmlDoc(String s)
cleanup Html Doc
if (s != null) {
    Document doc = Jsoup.parse(s);
    doc.outputSettings().prettyPrint(true);
    s = doc.toString();
return s;
StringclearBody(String html)
clear Body
Document document = Jsoup.parse(html);
document.outputSettings(new Document.OutputSettings().prettyPrint(false));
document.select("br").append("\n");
document.select("p").prepend("\n");
String result = Jsoup.clean(document.html(), "", Whitelist.none(),
        new Document.OutputSettings().prettyPrint(false));
result = result.replace("\r", "\n");
result = result.replace("\n ", "\n");
...
ElementcoverTag(String html, String... tagNames)
cover html by tagNames by first element of tagName will be outermost.
List<String> tags = Arrays.asList(tagNames);
Collections.reverse(tags);
Element e = null;
for (String s : tags) {
    if (e == null)
        e = addTag(html, s);
    else
        e = addTag(e, s);
...