Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
895 views
in Technique[技术] by (71.8m points)

xml - extract data from raw html in R

I am trying to extract the values of all the values in all tabs from this page. http://www.imd.gov.in/section/hydro/dynamic/rfmaps/weekrain.htm

I first tried downloading as excel. But that was not possible. I am just able to download it as text file. If I try reading directly from webpage I get the raw html page. I am stuck as how to extract these values. Please find the code which I tried till now.

library(RCurl)
require(XML)
url = "http://www.imd.gov.in/section/hydro/dynamic/rfmaps/weekrain.htm"
download.file(url = url, destfile = "E:\indiaprecip")
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Just use function "htmlTreeParse" from XML

library(XML)
html <- htmlTreeParse("http://www.imd.gov.in/section/hydro/dynamic/rfmaps/weekrain.htm",
                     useInternalNodes = T)
xpathSApply(html, "//meta/@name")

But in your case you have another problem. The data which you want to access is located in html frame. Code below can help you to read these data:

library(XML)
library(RCulr)
url <- "http://www.imd.gov.in/section/hydro/dynamic/rfmaps/weekrain.htm"
html <- htmlTreeParse(url, useInternalNodes = T)
frameUrl <- paste("http://www.imd.gov.in/section/hydro/dynamic/rfmaps/",
                  xpathSApply(html, "//frame[1]/@src"),
                  sep = "")

htmlWithData = getURL(frameUrl,
                      httpheader = c("User-Agent" = "RCurl",
                                     "Referer" = url))

dataXml <- htmlTreeParse(htmlWithData, isURL = F, useInternalNodes = T)
xpathSApply(dataXml, "//body/table")

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...