I am doing some basic webscraping with RVest and am getting results to return, however the data isnt lining up with each other. Meaning, I am getting the items but they are out of order from the site so the 2 data elements I am scraping cant be joined in a data.frame.
library(rvest)
library(tidyverse)
base_url<- "https://www.uchealth.com/providers"
loc <- read_html(base_url) %>%
html_nodes('[class=locations]') %>%
html_text()
dept <- read_html(base_url) %>%
html_nodes('[class=department last]') %>%
html_text()
I was expecting to be able to create a dataframe of :
Location Department
Any suggestions? I was wondering if there is an index that would keep these items together but I didnt see anything.
EDIT: I tried this also and did not have any luck. It seems the location is getting an erroneous starting value:
scraping <- function(
base_url = "https://www.uchealth.com/providers"
)
{
loc <- read_html(base_url) %>%
html_nodes('[class=locations]') %>%
html_text()
dept <- read_html(base_url) %>%
html_nodes('[class=specialties]') %>%
html_text()
data.frame(
loc = ifelse(length(loc)==0, NA, loc),
dept = ifelse(length(dept)==0, NA, loc),
stringsAsFactors=F
)
}
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…