I am using rvest
to parse a website. I'm hitting a wall with these little non-breaking spaces. How does one remove the whitespace that is created by the
element in a parsed html document?
library("rvest")
library("stringr")
minimal <- html("<!doctype html><title>blah</title> <p> foo")
bodytext <- minimal %>%
html_node("body") %>%
html_text
Now I have extracted the body text:
bodytext
[1] " foo"
However, I can't remove that pesky bit of whitespace!
str_trim(bodytext)
gsub(pattern = " ", "", bodytext)
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…