Returns {@code true} if the specified character sequence is a valid sequence of UTF-16 char values. : UTF8 Byte Hex « Development Class « Java






Returns {@code true} if the specified character sequence is a valid sequence of UTF-16 char values.

 
/*
 * LingPipe v. 3.9
 * Copyright (C) 2003-2010 Alias-i
 *
 * This program is licensed under the Alias-i Royalty Free License
 * Version 1 WITHOUT ANY WARRANTY, without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the Alias-i
 * Royalty Free License Version 1 for more details.
 *
 * You should have received a copy of the Alias-i Royalty Free License
 * Version 1 along with this program; if not, visit
 * http://alias-i.com/lingpipe/licenses/lingpipe-license-1.txt or contact
 * Alias-i, Inc. at 181 North 11th Street, Suite 401, Brooklyn, NY 11211,
 * +1 (718) 290-9170.
 */

//package com.aliasi.util;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

import java.text.DecimalFormat;

/**
 * Static utility methods for processing strings, characters and
 * string buffers.
 *
 * @author  Bob Carpenter
 * @version 4.0.1
 * @since   LingPipe1.0
 * @see     java.lang.Character
 * @see     java.lang.String
 * @see     java.lang.StringBuilder
 */
public class Strings {


    /**
     * Returns {@code true} if the specified character sequence is a
     * valid sequence of UTF-16 {@code char} values.  A sequence is
     * legal if each high surrogate {@code char} value is followed by
     * a low surrogate value (as defined by {@link
     * Character#isHighSurrogate(char)} and {@link
     * Character#isLowSurrogate(char)}).
     *
     * <p>This method does <b>not</b> check to see if the sequence of
     * code points defined by the UTF-16 consists only of code points
     * defined in the latest Unicode standard.  The method only tests
     * the validity of the UTF-16 encoding sequence.
     * 
     * @param cs Character sequence to test.
     * @return {@code true} if the sequence of characters is
     * legal in UTF-16.
     */
    public static boolean isLegalUtf16(CharSequence cs) {
        for (int i = 0; i < cs.length(); ++i) {
            char high = cs.charAt(i);
            if (Character.isLowSurrogate(high))
                return false;
            if (!Character.isHighSurrogate(high))
                continue;
            ++i;
            if (i >= cs.length())
                return false;
            char low = cs.charAt(i);
            if (!Character.isLowSurrogate(low))
                return false;
            int codePoint = Character.toCodePoint(high,low);
            if (!Character.isValidCodePoint(codePoint))
                return false;
        }
        return true;
    }


}

   
  








Related examples in the same category

1.Convert file in SJIS to UTF8
2.Return an UTF-8 encoded String
3.Return an UTF-8 encoded String by length
4.UTF8 String utilities
5.Return UTF-8 encoded byte[] representation of a String
6.Encodes octects (using utf-8) into Hex data
7.Decodes values of attributes in the DN encoded in hex into a UTF-8 String.
8.converting between byte arrays and hex encoded strings
9.Convert bytes To Hex
10.Convert hex To Bytes
11.Unicode 2 ASCII
12.Make bytes
13.String converterString converter
14.Show unicode stringShow unicode string
15.Normalizer
16.Convert from UTF-8 to Unicode
17.Convert from Unicode to UTF-8
18.Utility methods for handling UTF-8 encoded byte streams.
19.Read Windows Notepad Unicode files
20.UTF Util
21.To UTF8 InputStream
22.URL UTF8 Encoder