Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
756 views
in Technique[技术] by (71.8m points)

encoding - R: UTF-8 character bytes as Latin-1 characters bytes

I get UTF-8 character bytes as Latin-1 character bytes. Examples contain

Latin 1 character bytes        ----- UTF-8 bytes
?¤?¤nn??k                      ----- ??nn?k
?<U+0084>?<U+0084>N?<U+0096>S  ----- ??n?s 

and my session info

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.1

locale:
[1] C/UTF-8/C/C/C/C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

So what kind of settings do I need in R to handle umlauts correctly (not to return UTF-8 bytes as Latin-1 character bytes)?

Related?

  1. Turn Unicode into Umlaut in R on Mac (Facebook Data)

  2. https://stackoverflow.com/a/22945233/164148

  3. Apparently by this, I need to

If you call Sys.setlocale with "LC_CTYPE" or "LC_ALL" to change the system locale while RStudio is running, you may run into some minor issues as RStudio assumes the system encoding doesn't change. If you are on Windows, we recommend you only call Sys.setlocale in .Rprofile. If you are on Mac or Linux and want to change the system locale, please visit the support forum and let us know your scenario.

  1. Does there exist some simple tool to convert the Latin-1 character bytes to UTF-8 character bytes?

P.s. I have tested this now in R on Linux and R on OSX, I get the same problem of interpreting the UTF-8 character bytes as Latin-1 character bytes.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...