This question arose out of the following question on tex.sx: Sweave generating invalid LaTeX. The problem seems to be that Sweave
is not recognizing the encoding of the file, despite the locale being set to UTF-8, and the .Rnw
file being saved as UTF-8. The end result is that any .Rnw
file that contains non-ASCII characters ends up producing NA in the resultant .tex
file. As you can read in the comments to that question, another user doesn't show the problem, with what is apparently an identical setup. (R 2.13.1 on a Mac) Here's a minimal document that fails.
Update
Based on Aaron's suggestions, I've added sessionInfo
to the .Rnw
file, and now the real problem reveals itself. When Sweave
processes the file, it seems to change the locale.
.Rnw
file
documentclass{article}
usepackage[utf8]{inputenc}
egin{document}
Some non-ascii text: éüá?
<<>>=
sessionInfo()
@
end{document}
Running this through Sweave
, produces the following .tex
file. The line containing the non-ASCII characters has been converted into NA
by Sweave
. It seems also that the locale has been changed:
Resultant .tex
file
documentclass{article}
usepackage[utf8]{inputenc}
usepackage{Sweave}
egin{document}
NA
egin{Schunk}
egin{Sinput}
> sessionInfo()
end{Sinput}
egin{Soutput}
R version 2.13.1 (2011-07-08)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] tools_2.13.1
end{Soutput}
end{Schunk}
end{document}
sessionInfo()
from within R.app
returns:
> sessionInfo()
R version 2.13.1 (2011-07-08)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
Update (Response to Aaron)
> text <- readLines("sweave-enc-test.Rnw", warn = FALSE)
> enc <- tools:::.getVignetteEncoding(text, convert = TRUE)
>
> text
[1] "\documentclass{article}" "\usepackage[utf8]{inputenc}" "\begin{document}"
[4] "Some non-ascii text: éüá?" "\end{document}"
> enc
[1] "UTF-8"
> iconv(text, enc, "")
[1] "\documentclass{article}" "\usepackage[utf8]{inputenc}" "\begin{document}"
[4] "Some non-ascii text: éüá?" "\end{document}"
(This is the output from within the R console in R.app
.)
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…