Skip to content

Latest commit

 

History

History
18 lines (15 loc) · 1.71 KB

File metadata and controls

18 lines (15 loc) · 1.71 KB

Characters

  • Number of possible ASCII characters -> 256 characters.

  • Number of possible UTF-8 characters -> Million characters. It can be represented in 1 to 4 bytes

    • The first 128 characters (US-ASCII) need one byte.
    • The next 1,920 characters need two bytes to encode. This covers the remainder of almost all Latin alphabets, and also Greek, Cyrillic, Coptic, Armenian, Hebrew, Arabic, Syriac and Tāna alphabets, as well as Combining Diacritical Marks.
    • Three bytes are needed for characters in the rest of the Basic Multilingual Plane, which contains virtually all characters in common use[12] including most Chinese, Japanese and Korean [CJK] characters.
    • Four bytes are needed for characters in the other planes of Unicode, which include less common CJK characters, various historic scripts, mathematical symbols, and emoji (pictographic symbols).
    • Source : http://stackoverflow.com/questions/10229156/how-many-characters-can-utf-8-encode
    • ASCII table. a-z (97-122), A-Z(65-90) 0-9(48-57), http://www.asciitable.com/