Java Charset Guess getCharsetFromBytes(byte abyte0[])

Here you can find the source of getCharsetFromBytes(byte abyte0[])

Description

get Charset From Bytes

License

Open Source License

Declaration

static String getCharsetFromBytes(byte abyte0[]) 

Method Source Code

//package com.java2s;
//License from project: Open Source License 

import java.io.UnsupportedEncodingException;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
    static String getCharsetFromBytes(byte abyte0[]) {
        if (abyte0 == null)
            return null;
        String s = null;/*w  w w  .  j  a  va 2  s  . c  o  m*/
        try {
            s = new String(abyte0, "ISO-8859-1");
        } catch (UnsupportedEncodingException unsupportedencodingexception) {
        }
        if (s == null) {
            return null;
        } else {
            Pattern pattern = Pattern.compile("<meta ([^>]+)>", 2);
            Matcher matcher = pattern.matcher(s);
            return matcher.find() ? getCharsetFromString(matcher.group(1))
                    : null;
        }
    }

    static String getCharsetFromString(String s) {
        if (s == null)
            return null;
        s = s.toLowerCase();
        if (s.indexOf("utf-8") >= 0)
            return "UTF-8";
        if (s.indexOf("shift_jis") >= 0 || s.indexOf("sjis") >= 0)
            return "MS932";
        if (s.indexOf("euc-jp") >= 0)
            return "EUC-JP";
        if (s.indexOf("iso-2022-jp") >= 0)
            return "ISO-2022-JP";
        else
            return null;
    }
}

Related

  1. getCharset(File file)
  2. getCharset(Object resource)
  3. getCharsetFileWriter(File file, String charset)
  4. getCharSetStr(String str, String oldCharSet, String newCharSet)
  5. getPYIndexChar(char strChinese, boolean bUpCase)