I am an absolute beginner to Web Scraping using Python and know very little about programming in Python. I am just trying to extract the information of the lawyers in the Tennessee location. In the webpage, there are multiple links, within which there are further links to the categories of lawyers, and within those are the lawyers' details.
I have already extracted the links of the various cities into a list and have also extracted the various categories of lawyers available in each of the cities' links. The profile links have also been fetched and stored as a set. Now I am trying to fetch each lawyer's name, address, firm name and practice area and store it as an .xls file.
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
final=[]
records=[]
with requests.Session() as s:
res = s.get('https://attorneys.superlawyers.com/tennessee/', headers = {'User-agent': 'Super Bot 9000'})
soup = bs(res.content, 'lxml')
cities = [item['href'] for item in soup.select('#browse_view a')]
for c in cities:
r=s.get(c)
s1=bs(r.content,'lxml')
categories = [item['href'] for item in s1.select('.three_browse_columns:nth-of-type(2) a')]
for c1 in categories:
r1=s.get(c1)
s2=bs(r1.content,'lxml')
lawyers = [item['href'].split('*')[1] if '*' in item['href'] else item['href'] for item in
s2.select('.indigo_text .directory_profile')]
final.append(lawyers)
final_list={item for sublist in final for item in sublist}
for i in final_list:
r2 = s.get(i)
s3 = bs(r2.content, 'lxml')
name = s3.find('h2').text.strip()
add = s3.find("div").text.strip()
f_name = s3.find("a").text.strip()
p_area = s3.find('ul',{"class":"basic_profile aag_data_value"}).find('li').text.strip()
records.append({'Names': name, 'Address': add, 'Firm Name': f_name,'Practice Area':p_area})
df = pd.DataFrame(records,columns=['Names','Address','Firm Name','Practice Areas'])
df=df.drop_duplicates()
df.to_excel(r'C:UserslaptopDesktoplawyers.xls', sheet_name='MyData2', index = False, header=True)
I expected to get a .xls file, but nothing is returned as the execution is going on. It does not terminate until I force stop, and no .xls file is made.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…