I would like to achieve the following:
- Run spider until finished
- Count scraped items
- If number_of_items > x: reason=finished (nothing to be done)
- If number_of_items <= x: reason=insufficient_number (change reason accordingly)
The first two parts are fine. However, I'm struggling with the last two steps, as I'm not sure how I can set the value manually. I tried so far the code below.
import scrapy
class MySpider(scrapy.Spider):
start_urls = ['https://example.com']
def start_requests(self):
yield scrapy.Request(url=self.start_urls[0], callback=self.parse)
def close(self, spider, reason):
# here I want to change the reason.
# I tried to change spider.crawler.stats.get_stats()['finish_reason'],
# however this only changes the stats (of course),
# but not the value in INFO: Closing spider (finished).
def parse(self, response):
...
Thanks for your support on this.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…