I try to crawl review data from amazon with Jupeter notebook.
But there is response 503 from server.
Does anyone know what's wrong with it?
Here is url.
https://www.amazon.com/Apple-MWP22AM-A-AirPods-Pro/product-reviews/B07ZPC9QD4/ref=cm_cr_arp_d_paging_btm_next_2?ie=UTF8&reviewerType=all_reviews&pageNumber=
Here is my code.
import re, requests, csv
from bs4 import BeautifulSoup
from time import sleep
def reviews_info(div):
review_text = div.find("div", "a-row a-spacing-small review-data").get_text()
review_author = div.find("span", "a-profile-name").get_text()
review_stars = div.find("span", "a-icon-alt").get_text()
on_review_date = div.find('span', 'a-size-base a-color-secondary review-date').get_text()
review_date = [x.strip() for x in re.sub("on ", "", on_review_date).split(",")]
return { "review_text" : review_text,
"review_author" : review_author,
"review_stars" : review_stars,
"review_date": review_date }
base_url = 'https://www.amazon.com/Apple-MWP22AM-A-AirPods-Pro/product-reviews/B07ZPC9QD4/ref=cm_cr_arp_d_paging_btm_next_2?ie=UTF8&reviewerType=all_reviews&pageNumber='
reviews = []
NUM_PAGES = 8
for page_num in range(1, NUM_PAGES + 1):
print("souping page", page_num, ",", len(reviews), "data collected")
url = base_url + str(page_num)
soup = BeautifulSoup(requests.get(url).text, 'lxml')
for div in soup('div', 'a-section review'):
reviews.append(reviews_info(div))
sleep(30)
Finally I tried
requests.get(url)
The output is
<Response [503]>
And I also tried
requests.get(url).text()
The output is
TypeError: 'str' object is not callable
Did Amazon blocked crawl?
I'd appreciate your answer!
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…