Character Sets and Unicode: Code Set Conversion : Unicode « I18N « Java

Character Sets and Unicode: Code Set Conversion


Java Internationalization
By Andy Deitsch, David Czarnecki

ISBN: 0-596-00019-7

import java.lang.*;

public class Converter {
  public Converter(String input, String output) {
    try {
      FileInputStream fis = new FileInputStream(new File(input));
      BufferedReader in = new BufferedReader(new InputStreamReader(fis, "SJIS"));

      FileOutputStream fos = new FileOutputStream(new File(output));
      BufferedWriter out = new BufferedWriter(new OutputStreamWriter(fos, "UTF8"));

      // create a buffer to hold the characters
      int len = 80;
      char buf[] = new char[len];

      // read the file len characters at a time and write them out
      int numRead;
      while ((numRead =, 0, len)) != -1)
        out.write(buf, 0, numRead);
      // close the streams
    } catch (IOException e) {
      System.out.println("An I/O Exception Occurred: " + e);

  public static void main(String args[]) {
    new Converter(args[0], args[1]);


Related examples in the same category

1.Unicode DisplayUnicode Display
2.Display "special character" using Unicode
3.International friendly string comparison with case-order
4.Generic unicode textreader, which will use BOM mark to identify the encoding to be used. If BOM is not found then use a given default or system encoding.
5.Convert into Hexadecimal notation of Unicode
6.Generic Unicode text reader, which uses a BOM (Byte Order Mark) to identify the encoding to be used.
7.Generic unicode text reader.
8.processing SGML into unicode characters.
9.Write a 16 bit short as LITTLE_ENDIAN
10.Write a 32 bit int as LITTLE_ENDIAN.
11.Arabic Reshaper