Returns true if the argument, a UCS-4 character code, is valid in XML documents. - Java XML

Java examples for XML:DOM Document

Description

Returns true if the argument, a UCS-4 character code, is valid in XML documents.

Demo Code

/*/*from w  w  w  .j av a 2 s  . c  om*/
 *  PHEX - The pure-java Gnutella-servent.
 *  Copyright (C) 2001 - 2005 Phex Development Group
 *
 *  This program is free software; you can redistribute it and/or modify
 *  it under the terms of the GNU General Public License as published by
 *  the Free Software Foundation; either version 2 of the License, or
 *  (at your option) any later version.
 *
 *  This program is distributed in the hope that it will be useful,
 *  but WITHOUT ANY WARRANTY; without even the implied warranty of
 *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *  GNU General Public License for more details.
 *
 *  You should have received a copy of the GNU General Public License
 *  along with this program; if not, write to the Free Software
 *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 * 
 *  --- CVS Information ---
 *  $Id: XMLUtils.java,v 1.5 2005/10/03 00:18:30 gregork Exp $
 */
//package com.java2s;

public class Main {
    /**
     * Returns true if the argument, a UCS-4 character code, is valid in
     * XML documents.  Unicode characters fit into the low sixteen
     * bits of a UCS-4 character, and pairs of Unicode <em>surrogate
     * characters</em> can be combined to encode UCS-4 characters in
     * documents containing only Unicode.  (The <code>char</code> datatype
     * in the Java Programming Language represents Unicode characters,
     * including unpaired surrogates.)
     *
     * <P> In XML, UCS-4 characters can also be encoded by the use of
     * <em>character references</em> such as <b>&amp;#x12345678;</b>, which
     * happens to refer to a character that is disallowed in XML documents.
     * UCS-4 characters allowed in XML documents can be expressed with
     * one or two Unicode characters.
     *
     * @param ucs4char The 32-bit UCS-4 character being tested.
     */
    static public boolean isXmlChar(int ucs4char) {
        // [2] Char ::= #x0009 | #x000A | #x000D
        //          | [#x0020-#xD7FF]
        //  ... surrogates excluded!
        //          | [#xE000-#xFFFD]
        //          | [#x10000-#x10ffff]
        return ((ucs4char >= 0x0020 && ucs4char <= 0xD7FF)
                || ucs4char == 0x000A || ucs4char == 0x0009
                || ucs4char == 0x000D
                || (ucs4char >= 0xE000 && ucs4char <= 0xFFFD) || (ucs4char >= 0x10000 && ucs4char <= 0x10ffff));
    }
}

Related Tutorials