I have a website have many pages like this:
mywebsite/?page=1
mywebsite/?page=2
...
...
...
mywebsite/?page=n
each page have links to players. when you click on any link, you go to the page of that player.
Users can add players so I will end up with this situation.
Player1
has a link in page=1
.
Player10
has a link in page=2
after an hour
because users have added new players. i will have this situation.
Player1
has a link in page=3
Player10
has a link in page=4
and the new players like Player100
and Player101
have links in page=1
I want to scrap on all players to get their information. However, I don't want to scrap on players that I have already scrap. My question is how to user the BaseDupeFilter
in scrapy to identify that this player has been scraped and this not. Remember, I want to sracp on pages
of the website because each page will have different players in each time.
Thank you.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…