Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.3k views
in Technique[技术] by (71.8m points)

regex - R: How to replace space (' ') in string with a *single* backslash and space (' ')

I've searched many times, and haven't found the answer here or elsewhere. I want to replace each space ' ' in variables containing file names with a ' '. (A use case could be for shell commands, with the spaces escaped, so each file name doesn't appear as a list of arguments.) I have looked through the StackOverflow question "how to replace single backslash in R", and find that many combinations do work as advertised:

> gsub(" ", "", "a b")
[1] "a\b"

> gsub(" ", "\ ", "a b", fixed = TRUE)
[1] "a\ b"

but try these with a single-slash version, and R ignores it:

> gsub(" ", "\ ", "a b")
[1] "a b"

> gsub(" ", " ", "a b", fixed = TRUE)
[1] "a b"

For the case going in the opposite direction — removing slashes from a string, it works for two:

> gsub("", " ", "a\b")
[1] "a b"

> gsub("", " ", "a\b", fixed = TRUE)
[1] "a b"

However, for single slashes some inner perversity in R prevents me from even attempting to remove them:

> gsub("", " ", "a\b")
Error in gsub("", " ", "a\b") : 
  invalid regular expression '', reason 'Trailing backslash'

> gsub("", " ", "a", fixed = TRUE)
Error: unexpected string constant in "gsub("", " ", ""

The 'invalid regular expression' is telling us something, but I don't see what. (Note too that the perl = True option does not help.)

Even with three back slashes R fails to notice even one:

> gsub(" ", "\ ", "a b")
[1] "a b"

The patter extends too! Even multiples of two work:

> gsub(" ", "", "a b")
[1] "a\\b"

but not odd multiples (should get '\ ':

> gsub(" ", "\\\ ", "a b")
[1] "a\ b"

> gsub(" ", "\ ", "a b", fixed = TRUE)
[1] "a\ b"

(I would expect 3 slashes, not two.)

My two questions are:

  • How can my goal of replacing a ' ' with a ' ' be accomplished?
  • Why did the odd number-slash variants of the replacements fail, while the even number-slash replacements worked?

For shell commands a simple work-around is to quote the file names, but part of my interest is just wanting to understand what is going on with R's regex engine.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Get ready for a face-palm, because this:

> gsub(" ", "\ ", "a b", fixed = TRUE)
[1] "a\ b"

is actually working.

The two backslashes you see are just the R console's way of displaying a single backslash, which is escaped when printed to the screen.

To confirm the replacement with a single backslash is indeed working, try writing the output to a text file and inspect yourself:

f <- file("C:\output.txt")
writeLines(gsub(" ", "", "a b", fixed = TRUE), f)
close(f)

In output.txt you should see the following:

a

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...