What is a Character set?

Before we undrstand what is Python Character set, let’s forst understand first more about character set. Character sets are used in programming languages. A character set is a standard defined so that different characters (English alphabets, numeric digits 0 to 9 for example) that we use in daily life can also be used in source program and when the program runs these characters can also be interpreted (i.e converted to binary equivalents).

Which Character set is used by Python?

Types of Encoding Standards

There are two most popular standards – ASCII and Unicode. ASCII is traditionaly based on English language and has limited character set, while Unicode standard was created to accomodate other languages also.

ASCII is thus a sub-set of Unicode. Python supports Unicode standard, hence we can say it supoorts ASCII and Unicode as well.


Exra Bytes

Every character which is to be represented inside computer, must be converted to binary format. To do that every character is given some ASCII code (represented in 8 bit format thus allowing only 256 characters to be represented in binary format) or Unicode code (which could be 8bit i.e. utf-8 to represent all the ASCII characters, or 16 bit i.e utf-16 or 32 bits i.e. utf-32 to accomodate many other languages). 
You can follow the links below to know more about these encoding standards (ASCII and UNICODE).
in depth – ASCII Encoding
in depth – UNICODE Encoding

Characterset included in Python Language:

  • Letters – A-Z, a-z ( Also letters from most of the other languages)
  • Digits – 0 to 9
  • Special Characters – Space, operators (+, – *,  %,= etc.), separators (, (), [], {}, comma, fullstop (.) etc, and all other symbols like ‘ ” / \ %^&@~! etc.
  • White Space characters – Blank Space, tab, Carriage return (↵), newline, formfeed
  • Other Characters – Python is capable of processing all the ASCII and Unicode characters (character sets) as part of data and literals(constants)