Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
415 views
in Technique[技术] by (71.8m points)

c++ - Unicode/special characters in variable names in clang not allowed?

This question has unicode text that may not display correctly in all browsers.

clang now (>3.3) supports unicode characters in variable names http://llvm.org/releases/3.3/tools/clang/docs/ReleaseNotes.html#major-new-features.

However some special character are still forbiden.

int main(){
    double α = 2.; // alpha, ok!
    double ∞ = 99999.; // infinity, error
}

giving:

error: non-ASCII characters are not allowed outside of literals and identifiers
        double ∞ = 99999.;

What is the fundamental difference between α (alpha) and (infinty) for clang? That the former is unicode and the latter is not unicode but at the same time is not ASCII?

Is there a workaround or an option to allow this set of characters in clang (or BTW in gcc)?

Notes: 1) is just an example, there are a lot of characters that are potentially useful but also forbidden, like or ?. 2) I am not asking if it is good idea, please take it as a technical question. 3) I am interested in C++ compiler of clang 3.4 in Linux (gcc 4.8.3 doesn't support this). I am saving the source files with gedit using UTF-8 encoding and Unix/Linux line ending. 4) adding other normal first characters doesn't help: _∞


The answers point to a definite NO. Some ranges are indeed not allowed nor will they be soon. To move one step further to total craziness, the best alternative I found was to use characters that effectively look the same. (Now, this I might admit is not a good idea.) Those alternatives can be found here http://shapecatcher.com/. The result (sorry if it hurts your eyes):

//    double ∞ = 99999.; // still error
//    double ? = 99999.; // infinity negated still error
  double ? = 99999.; // letter oo
  double ? = 99999.; // letter OO
//    double ? = 99999.; // incomplete infinity still error

Other "alternative" dead ringers mentioned in the question that are in the allowed range: ?, ????????.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

So the clang document says (emphasis mine):

This feature allows identifiers to contain certain Unicode characters, as specified by the active language standard;

This is covered in the draft C++ standard Annex E, the characters allowed are as follows:

E.1 Ranges of characters allowed [charname.allowed]

00A8, 00AA, 00AD,

00AF, 00B2-00B5, 00B7-00BA, 00BC-00BE, 00C0-00D6, 00D8-00F6, 00F8-00FF

0100-167F, 1681-180D, 180F-1FFF 200B-200D, 202A-202E, 203F-2040, 2054,

2060-206F 2070-218F, 2460-24FF, 2776-2793, 2C00-2DFF, 2E80-2FFF

3004-3007, 3021-302F, 3031-303F

3040-D7FF F900-FD3D, FD40-FDCF,

FDF0-FE44, FE47-FFFD

10000-1FFFD, 20000-2FFFD, 30000-3FFFD, 40000-4FFFD, 50000-5FFFD, 60000-6FFFD, 70000-7FFFD, 80000-8FFFD, 90000-9FFFD, A0000-AFFFD, B0000-BFFFD, C0000-CFFFD, D0000-DFFFD, E0000-EFFFD

The code for infinity 221E is not included in the list.

For reference: these are the codes above converted to unicode characters (some of them may not display correctly in all browsers/available fonts).

¨, a, -,

ˉ, 2-μ, ·-o, ?-?, à-?, ?-?, ?-?

ā-?, ?-?, ?-? ?-?, ?-?, ?-?, ?,

?-? ?-?, ①-?, ?-?, ?-?, ?-?

?-〇, 〡-?, ?-?

?-? 豈-?, ?-?,

?-﹄, ?-?

??-??, ??-??, ??-??, ??-??, ??-??, ??-??, ??-??, ??-??, ??-??, ??-??, ??-??, ??-??, ??-??, ??-??

I could not find an extensive document that covers the rationale for the ranges chosen although N3146: Recommendations for extended identifier characters for C and C++ does provides some details on the influences.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...