Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
890 views
in Technique[技术] by (71.8m points)

r - Reading Rdata file with different encoding

I have an .RData file to read on my Linux (UTF-8) machine, but I know the file is in Latin1 because I've created them myself on Windows. Unfortunately, I don't have access to the original files or a Windows machine and I need to read those files on my Linux machine.

To read an Rdata file, the normal procedure is to run load("file.Rdata"). Functions such as read.csv have an encoding argument that you can use to solve those kind of issues, but load has no such thing. If I try load("file.Rdata", encoding = latin1), I just get this (expected) error:

Error in load("file.Rdata", encoding = "latin1") : unused argument (encoding = "latin1")

What else can I do? My files are loaded with text variables containing accents that get corrupted when opened in an UTF-8 environment.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Thanks to 42's comment, I've managed to write a function to recode the file:

fix.encoding <- function(df, originalEncoding = "latin1") {
  numCols <- ncol(df)
  for (col in 1:numCols) Encoding(df[, col]) <- originalEncoding
  return(df)
}

The meat here is the command Encoding(df[, col]) <- "latin1", which takes column col of dataframe df and converts it to latin1 format. Unfortunately, Encoding only takes column objects as input, so I had to create a function to sweep all columns of a dataframe object and apply the transformation.

Of course, if your problem is in just a couple of columns, you're better off just applying the Encoding to those columns instead of the whole dataframe (you can modify the function above to take a set of columns as input). Also, if you're facing the inverse problem, i.e. reading an R object created in Linux or Mac OS into Windows, you should use originalEncoding = "UTF-8".


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...