The for statement doesn't work in Scraper collecting articles made with Selenium. The purpose is to scrape all the article-related contents(title, date, office, sort, article) that appear on the screen entering the URL.
However, only the first article is scraped.
I guess there is a problem with Pandas' data frame, but it's not clear.
import time
import pandas as pd
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument("user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36")
chrome_options.add_argument("lang=ko_KR")
wd = webdriver.Chrome(executable_path='c:/chromedriver.exe', options=chrome_options)
wd.implicitly_wait(10)
news_df = pd.DataFrame(columns=('Title', 'Date', 'Office', 'Sort', 'Article'))
idx = 0
news_url = 'https://newslibrary.naver.com/search/searchByKeyword.nhn#%7B%22mode%22%3A1%2C%22sort%22%3A0%2C%22trans%22%3A%221%22%2C%22pageSize%22%3A10%2C%22keyword%22%3A%22%EA%B1%B4%EC%84%A4%EC%82%B0%EC%97%85%22%2C%22status%22%3A%22success%22%2C%22startIndex%22%3A1%2C%22page%22%3A1%2C%22startDate%22%3A%221945-01-01%22%2C%22endDate%22%3A%221945-12-31%22%7D'
wd.get(news_url)
data = wd.find_elements_by_css_selector('#searchlist > ul > li:nth-child(1)')
try:
for da in data:
title = da.find_element_by_xpath('//*[@id="searchlist"]/ul/li[1]/div[2]/h3/a').get_attribute('title')
date = da.find_element_by_xpath('//*[@id="searchlist"]/ul/li[1]/div[2]/ul/li[1]').text
office = da.find_element_by_xpath('//*[@id="searchlist"]/ul/li[1]/div[2]/ul/li[2]').text
sort = da.find_element_by_xpath('//*[@id="searchlist"]/ul/li[1]/div[2]/ul/li[4]').text
article = da.find_element_by_xpath('//*[@id="searchlist"]/ul/li[1]/div[2]/div').text
article = article.replace("
", "")
article = article.replace("
", "")
article = article.replace("", "")
news_df.loc[idx] = [title, date, office, sort, article]
idx += 1
except AttributeError:
pass
wd.close()
print('Complete!')
question from:
https://stackoverflow.com/questions/65898271/for-statement-not-working-in-seleniums-scraper 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…