Python - String Unicode

Python 3.X

In Python 3.X there are three string types:

  • str is used for Unicode text (including ASCII),
  • bytes is used for binary data (including encoded text), and
  • bytearray is a mutable variant of bytes.

Files work in two modes:

Mode Description
text represents content as str and implements Unicode encodings,
binary deals in raw bytes and does no data translation.

Python 2.X

In Python 2.X:

  • unicode strings represent Unicode text,
  • str strings handle both 8-bit text and binary data, and
  • bytearray is available in 2.6 and later as a back-port from 3.X.

Normal files' content is simply bytes represented as str.

A codecs module opens Unicode text files, handles encodings, and represents content as unicode objects.