Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
641 views
in Technique[技术] by (71.8m points)

csv - How to read " double-quote escaped values with read.table in R

I am having trouble to read a file containing lines like the one below in R.

"_:b5507F4C7x59005","Fabiana D"atri"

Any idea? How can I make read.table understand that " is the escape of quote?

Cheers, Alexandre

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

It seems to me that read.table/read.csv cannot handle escaped quotes.

...But I think I have an (ugly) work-around inspired by @nullglob;

  • First read the file WITHOUT a quote character. (This won't handle embedded , as @Ben Bolker noted)
  • Then go though the string columns and remove the quotes:

The test file looks like this (I added a non-string column for good measure):

13,"foo","Fab D"atri","bar"
21,"foo2","Fab D"atri2","bar2"

And here is the code:

# Generate test file
writeLines(c("13,"foo","Fab D"atri","bar"",
             "21,"foo2","Fab D"atri2","bar2"" ), "foo.txt")

# Read ignoring quotes
tbl <- read.table("foo.txt", as.is=TRUE, quote='', sep=',', header=FALSE, row.names=NULL)

# Go through and cleanup    
for (i in seq_len(NCOL(tbl))) {
    if (is.character(tbl[[i]])) {
        x <- tbl[[i]]
        x <- substr(x, 2, nchar(x)-1) # Remove surrounding quotes
        tbl[[i]] <- gsub('"', '"', x) # Unescape quotes
    }
}

The output is then correct:

> tbl
  V1   V2          V3   V4
1 13  foo  Fab D"atri  bar
2 21 foo2 Fab D"atri2 bar2

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...