Java String Whitespace Normalize normalizeWhitespace(String text)

Here you can find the source of normalizeWhitespace(String text)

Description

Translates multiple whitespace into single space character.

License

Apache License

Parameter

Parameter Description
text a parameter

Declaration

public static String normalizeWhitespace(String text) 

Method Source Code

//package com.java2s;
/*/*from  www .  ja v  a  2s  . co m*/
 * Copyright 2016
 * Ubiquitous Knowledge Processing (UKP) Lab
 * Technische Universit?t Darmstadt
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *      http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

public class Main {
    /**
     * Translates multiple whitespace into single space character. If there is
     * at least one new line character chunk is replaced by single LF (Unix new
     * line) character.
     *
     * @param text
     * @return
     */
    public static String normalizeWhitespace(String text) {

        text = text.replaceAll("(\r\n|\r)", "\n");
        //remove multiple white spaces but keep new lines
        text = text.replaceAll("(?:(?![\n])\\s+)", " "); // or [\\s+&&[^\n])]
        //replace extra <br> (sometimes the paragraph contains <br><br>, 
        //the first one will be use as new paragraph marker but the second 
        //one must be removed)
        text = text.replaceAll("<br>", ""); // or [\\s+&&[^\n])]

        return text;
    }
}

Related

  1. normalizeWhiteSpace(String input)
  2. normalizeWhitespace(String orig)
  3. normalizeWhitespace(String s)
  4. normalizeWhitespace(String source)
  5. normalizeWhiteSpace(String src)
  6. normalizeWhitespaces(String s)
  7. normalizeWhitespaces(String text)