-
Notifications
You must be signed in to change notification settings - Fork 34
Expand file tree
/
Copy pathutf8-category.1
More file actions
71 lines (71 loc) · 3.88 KB
/
utf8-category.1
File metadata and controls
71 lines (71 loc) · 3.88 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
.TH UTF8-SCRIPT 1 "February 14, 2015"
.SH NAME
.PP
utf8-script - tally UTF-8 encoded characters by general category
.SH SYNOPSIS
.PP
utf8-script [-l|--long-names] [-c|--count-ascii|-s|--skip-ascii]
.SH DESCRIPTION
.PP
Tally the UTF-8 encoded characters in the standard input stream by
general category.
.IP
.nf
\f[C]
\ Abbr\ \ Long\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ Description
\ ---\ \ \ ----\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ -----------
\ Lu\ \ \ \ Uppercase_Letter\ \ \ \ \ \ \ \ an\ uppercase\ letter
\ Ll\ \ \ \ Lowercase_Letter\ \ \ \ \ \ \ \ a\ lowercase\ letter
\ Lt\ \ \ \ Titlecase_Letter\ \ \ \ \ \ \ \ a\ digraphic\ character,\ with\ first\ part\ uppercase
\ LC\ \ \ \ Cased_Letter\ \ \ \ \ \ \ \ \ \ \ \ Lu\ |\ Ll\ |\ Lt
\ Lm\ \ \ \ Modifier_Letter\ \ \ \ \ \ \ \ \ a\ modifier\ letter
\ Lo\ \ \ \ Other_Letter\ \ \ \ \ \ \ \ \ \ \ \ other\ letters,\ including\ syllables\ and\ ideographs
\ L\ \ \ \ \ Letter\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ Lu\ |\ Ll\ |\ Lt\ |\ Lm\ |\ Lo
\ Mn\ \ \ \ Nonspacing_Mark\ \ \ \ \ \ \ \ \ a\ nonspacing\ combining\ mark\ (zero\ advance\ width)
\ Mc\ \ \ \ Spacing_Mark\ \ \ \ \ \ \ \ \ \ \ \ a\ spacing\ combining\ mark\ (positive\ advance\ width)
\ Me\ \ \ \ Enclosing_Mark\ \ \ \ \ \ \ \ \ \ an\ enclosing\ combining\ mark
\ M\ \ \ \ \ Mark\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ Mn\ |\ Mc\ |\ Me
\ Nd\ \ \ \ Decimal_Number\ \ \ \ \ \ \ \ \ \ a\ decimal\ digit
\ Nl\ \ \ \ Letter_Number\ \ \ \ \ \ \ \ \ \ \ a\ letterlike\ numeric\ character
\ No\ \ \ \ Other_Number\ \ \ \ \ \ \ \ \ \ \ \ a\ numeric\ character\ of\ other\ type
\ N\ \ \ \ \ Number\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ Nd\ |\ Nl\ |\ No
\ Pc\ \ \ \ Connector_Punctuation\ \ \ a\ connecting\ punctuation\ mark,\ like\ a\ tie
\ Pd\ \ \ \ Dash_Punctuation\ \ \ \ \ \ \ \ a\ dash\ or\ hyphen\ punctuation\ mark
\ Ps\ \ \ \ Open_Punctuation\ \ \ \ \ \ \ \ an\ opening\ punctuation\ mark\ (of\ a\ pair)
\ Pe\ \ \ \ Close_Punctuation\ \ \ \ \ \ \ a\ closing\ punctuation\ mark\ (of\ a\ pair)
\ Pi\ \ \ \ Initial_Punctuation\ \ \ \ \ an\ initial\ quotation\ mark
\ Pf\ \ \ \ Final_Punctuation\ \ \ \ \ \ \ a\ final\ quotation\ mark
\ Po\ \ \ \ Other_Punctuation\ \ \ \ \ \ \ a\ punctuation\ mark\ of\ other\ type
\ P\ \ \ \ \ Punctuation\ \ \ \ \ \ \ \ \ \ \ \ \ Pc\ |\ Pd\ |\ Ps\ |\ Pe\ |\ Pi\ |\ Pf\ |\ Po
\ Sm\ \ \ \ Math_Symbol\ \ \ \ \ \ \ \ \ \ \ \ \ a\ symbol\ of\ mathematical\ use
\ Sc\ \ \ \ Currency_Symbol\ \ \ \ \ \ \ \ \ a\ currency\ sign
\ Sk\ \ \ \ Modifier_Symbol\ \ \ \ \ \ \ \ \ a\ non-letterlike\ modifier\ symbol
\ So\ \ \ \ Other_Symbol\ \ \ \ \ \ \ \ \ \ \ \ a\ symbol\ of\ other\ type
\ S\ \ \ \ \ Symbol\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ Sm\ |\ Sc\ |\ Sk\ |\ So
\ Zs\ \ \ \ Space_Separator\ \ \ \ \ \ \ \ \ a\ space\ character\ (of\ various\ non-zero\ widths)
\ Zl\ \ \ \ Line_Separator\ \ \ \ \ \ \ \ \ \ U+2028\ LINE\ SEPARATOR\ only
\ Zp\ \ \ \ Paragraph_Separator\ \ \ \ \ U+2029\ PARAGRAPH\ SEPARATOR\ only
\ Z\ \ \ \ \ Separator\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ Zs\ |\ Zl\ |\ Zp
\ Cc\ \ \ \ Control\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ a\ C0\ or\ C1\ control\ code
\ Cf\ \ \ \ Format\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ a\ format\ control\ character
\ Cs\ \ \ \ Surrogate\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ a\ surrogate\ code\ point
\ Co\ \ \ \ Private_Use\ \ \ \ \ \ \ \ \ \ \ \ \ a\ private-use\ character
\ Cn\ \ \ \ Unassigned\ \ \ \ \ \ \ \ \ \ \ \ \ \ a\ reserved\ unassigned\ code\ point\ or\ a\ noncharacter
\ C\ \ \ \ \ Other\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ Cc\ |\ Cf\ |\ Cs\ |\ Co\ |\ Cn
\f[]
.fi
.SH OPTIONS
.PP
-c, --count-ascii : treat ASCII characters as a separate general
category called "ASCII".
.PP
-l, --long-names : use long names for the general categories instead of
the two character abbreviations.
.PP
-s, --skip-ascii : skip ASCII characters.
Only characters with Unicode point U+0080 and higher are counted.
.SH SEE ALSO
.PP
http://unicode.org/reports/tr44/#General_Category_Values
.SH AUTHORS
Clark Grubb.