Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
830 views
in Technique[技术] by (71.8m points)

r - Scraping Google News with Rvest for Keywords

I want to compare News Article from different countries for the usage of a specific keyword.

My idea is to scrape Google News using RCrawler:

RCrawler(website = “https://news.google.com/topics/CAAqIggKIhxDQkFTRHdvSkwyMHZNREZqY0hsNUVnSmtaU2dBUAE?hl=de&gl=DE&ceid=DE%3Ade”, MaxDepth = 5, Keywordfilter = c(“Keyword”), KeywordAccuracy = 99)

And then just counting the results that I’m getting back. Im not sure if this is the best method or if its even correct but I’m new to R and its the best method i can currently think of.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Since you're using Google News, instead of scraping this way, an easier method would be to access the RSS feed for that particular keyword and pull that into a dataframe. Luckily, there is the {tidyRSS} package that you can use to do just this.

An example of what a feed looks like is with this URL:

https://news.google.com/rss/search?q=apple&hl=en-IN&gl=IN&ceid=IN:en

Learn how to customize this URL here. You can search by geolocation if you wish.

After you install tidyRSS, you can implement it like so:

library(tidyRSS)

# I will search for the keyword Apple

keyword <- "https://news.google.com/rss/search?q=apple&hl=en-IN&gl=IN&ceid=IN:en"
# From the package vignette

google_news <- tidyfeed(
  keyword,
  clean_tags = TRUE,
  parse_dates = TRUE
)

This gives you a dataframe with many variables that describe each article. You can choose which ones to keep.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...