Since you're using Google News, instead of scraping this way, an easier method would be to access the RSS feed for that particular keyword and pull that into a dataframe. Luckily, there is the {tidyRSS}
package that you can use to do just this.
An example of what a feed looks like is with this URL:
https://news.google.com/rss/search?q=apple&hl=en-IN&gl=IN&ceid=IN:en
Learn how to customize this URL here. You can search by geolocation if you wish.
After you install tidyRSS
, you can implement it like so:
library(tidyRSS)
# I will search for the keyword Apple
keyword <- "https://news.google.com/rss/search?q=apple&hl=en-IN&gl=IN&ceid=IN:en"
# From the package vignette
google_news <- tidyfeed(
keyword,
clean_tags = TRUE,
parse_dates = TRUE
)
This gives you a dataframe with many variables that describe each article. You can choose which ones to keep.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…