Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.7k views
in Technique[技术] by (71.8m points)

selenium - python - webdriver and asyncio

Is it possible open browser for each task firstly, and load links after that ? This code raises an error

import asyncio
from selenium import webdriver

async def get_html(url):
    driver = await webdriver.Chrome()
    response = await driver.get(url)

TypeError: object WebDriver can't be used in 'await' expression

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

If you want to use Selenium in an async fashion I would suggest using multiple instances of the Driver and a executor like this:

import asyncio
from concurrent.futures.thread import ThreadPoolExecutor

from selenium import webdriver

executor = ThreadPoolExecutor(10)


def scrape(url, *, loop):
    loop.run_in_executor(executor, scraper, url)


def scraper(url):
    driver = webdriver.Chrome("./chromedriver")
    driver.get(url)


loop = asyncio.get_event_loop()
for url in ["https://google.de"] * 2:
    scrape(url, loop=loop)

loop.run_until_complete(asyncio.gather(*asyncio.all_tasks(loop)))

Please note that you can run selenium in headless mode so you don't need to spawn the whole GUI for calling some simple url.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...