Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
585 views
in Technique[技术] by (71.8m points)

unicode - Is this case a weird UTF-8 encoding conversion?

I am working with a remote application that seems to do some magic with the encoding. The application renders clear responses (which I'll refer as True and False), depending on user input. I know two valid values, that will render 'True', all the others should be 'False'.

What I found (accidently) interesting is, that submitting corrupted value leads to 'True'.

Example input:

USER10 //gives True
USER11 //gives True
USER12 //gives False
USER.. //gives False
OTHERTHING //gives False

so basically only these two first values render True response.

What I noticed is that USERà±0 (hex-wise x55x53x45x52C0xB1x30) is accepted as True, surprisingly. I did check other hex bytes, with no such success. It leads me to a conclusion that xC0xB1 could be somehow translated into 0x31 (='1').

My question is - how it could happen? Is that application performing some weird conversion from UTF-16 (or sth else) to UTF-8?

I'd appreciate any comments/ideas/hints.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

C0 is an invalid start byte for a two-byte UTF-8 sequence, but if a bad UTF-8 decoder accepts it C0 B1 would be interpreted as ASCII 31h (the character 1).

Quoting Wikipedia:

...(C0 and C1) could only be used for an invalid "overlong encoding" of ASCII characters (i.e., trying to encode a 7-bit ASCII value between 0 and 127 using two bytes instead of one....


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...