I am attempting to use rvest to spider a webpage that requires an email/password login on a form.
rm(list=ls())
library(rvest)
### Trying to sign into a form using email/password
url <-"http://www.perfectgame.org/" ## page to spider
pgsession <-html_session(url) ## create session
pgform <-html_form(pgsession)[[1]] ## pull form from session
set_values(pgform, `ctl00$Header2$HeaderTop1$tbUsername` = "[email protected]")
set_values(pgform, `ctl00$Header2$HeaderTop1$tbPassword` = "mypassword")
submit_form(pgsession,pgform,submit=`ctl00$Header2$HeaderTop1$Button1`)
This gives me the following error message:
Error in submit_request(form, submit) :
object 'ctl00$Header2$HeaderTop1$Button1' not found
If I submit the form without specifying the submit parameter, I get this:
Submitting with 'ctl00$Header2$HeaderTop1$Button1'
Error in function (type, msg, asError = TRUE) : <url> malformed
I also tried passing the parameters directly to httr as mentioned in this question: How can I POST a simple HTML form in R?, but the "submit" parameter did not accept the submit button either with backwards quotes (``), quotation marks, or without any quotes:
library(httr)
url <- "http://www.perfectgame.org/Rankings/Players/Default.aspx?gyear=2015&num=500"
fd <- list(
submit = `ctl00$Header2$HeaderTop1$Button1`,
`ctl00$Header2$HeaderTop1$tbUsername` = "[email protected]",
`ctl00$Header2$HeaderTop1$tbPassword` = "mypassword")
resp<-POST(url, body=fd, encode="form")
content(resp)
Any ideas for how I can log in from an R session and spider the data that's behind the login wall?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…