Java Charset Create getCharsetFromContent(URL url)

Here you can find the source of getCharsetFromContent(URL url)

Description

get Charset From Content

License

Open Source License

Declaration

public static String getCharsetFromContent(URL url) throws IOException 

Method Source Code


//package com.java2s;
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.nio.charset.Charset;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
    public static String getCharsetFromContent(URL url) throws IOException {
        InputStream stream = url.openStream();
        byte chunk[] = new byte[2048];
        int bytesRead = stream.read(chunk);
        if (bytesRead > 0) {
            String startContent = new String(chunk);
            String pattern = "\\<meta\\s*http-equiv=[\\\"\\']content-type[\\\"\\']\\s*content\\s*=\\s*[\"']text/html\\s*;\\s*charset=([a-z\\d\\-]*)[\\\"\\'\\>]";
            Matcher matcher = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE).matcher(startContent);
            if (matcher.find()) {
                String charset = matcher.group(1);
                if (Charset.isSupported(charset)) {
                    return charset;
                }/*from   w w w  .  j a va 2  s .  com*/
            }
        }

        return null;
    }
}

Related

  1. getCharset(String enc)
  2. getCharset(String encoding)
  3. getCharset(String encoding)
  4. getCharset(String name)
  5. getCharsetForSortOrder(final int sortOrder)
  6. getCharsetFromContentType(String contentType)
  7. getCharsetFromContentType(String contentType)
  8. getCharsetFromContentTypeString(String contentType)
  9. getCharsetList(List availableCharsets, Charset actualCharset)