*** My code is for practice only!
I'm trying to scrape the names and teams that each player in FPL from their website https://www.premierleague.com/ and I got some problems with the code.
The problem is it's only getting the page with the '-1' in the end of the url, wihch I haven't even inculded in my pages list!
there isn't any logic with the pages - the basic url is https://www.premierleague.com/players?se=363&cl= while the number after the '=' seems to be random. so I created a list of the numbers and added it to the url with a for loop:
my code:
import requests
from bs4 import BeautifulSoup
import pandas
plplayers = []
pl_url = 'https://www.premierleague.com/players?se=363&cl='
pages_list = ['1', '2', '131', '34']
for page in pages_list:
r = requests.get(pl_url + page)
c = r.content
soup = BeautifulSoup(c, 'html.parser')
player_names = soup.find_all('a', {'class': 'playerName'})
for x in player_names:
player_d = {}
player_teams = []
player_href = x.get('href')
player_info_url = 'https://www.premierleague.com/' + player_href
player_r = requests.get(player_info_url, headers=headers)
player_c = player_r.content
player_soup = BeautifulSoup(player_c, 'html.parser')
team_tag = player_soup.find_all('td', {'class': 'team'})
for team in team_tag:
try:
team_name = team.find('span', {'class': 'long'}).text
if '(Loan)' in team_name:
team_name.replace(' (Loan) ', '')
if team_name not in player_teams:
player_teams.append(team_name)
player_d['NAME'] = x.text
player_d['TEAMS'] = player_teams
except:
pass
plplayers.append(player_d)
df = pandas.DataFrame(plplayers)
df.to_csv('plplayers.txt')
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…