Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
423 views
in Technique[技术] by (71.8m points)

python - Scrape data from bloomberg

I want to scrape data from the Bloomberg website. The data under "IBVC:IND Caracas Stock Exchange Stock Market Index" needs to be scraped.

Here is my code so far:

import requests
from bs4 import BeautifulSoup as bs

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
                  'Chrome/58.0.3029.110 Safari/537.36 '
}
res = requests.get("https://www.bloomberg.com/quote/IBVC:IND", headers=headers)

soup = bs(res.content, 'html.parser')
# print(soup)
itmes = soup.find("div", {"class": "snapshot__0569338b snapshot"})

open_ = itmes.find("span", {"class": "priceText__1853e8a5"}).text
print(open_)
prev_close = itmes.find("span", {"class": "priceText__1853e8a5"}).text

I can't find the required values in the HTML. Which library should I use to handle that? I'm currently using BeautifulSoup and Requests.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

As indicated in other answers, the content is generated via JavaScript, hence not inside the plain html. For the given problem, two different angles of attack have been proposed

  • Selenium aka The Big Guns: This will let you automate virtually any task in a browser. Comes at a certain cost though in terms of speed.
  • API Request aka Thought Through: This is not always feasible. When it is however the case then it is much more efficient.

I elaborate on the second one. @ViniciusDAvila already laid out the typical blueprint for such a solution: navigate to the site, inspect the Network and figure out which request is responsible for fetching the data.

Once this is done, the rest is a matter of execution:

Scraper

import requests
import json
from urllib.parse import quote


# Constants
HEADERS = {
    'Host': 'www.bloomberg.com',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0',
    'Accept': '*/*',
    'Accept-Language': 'de,en-US;q=0.7,en;q=0.3',
    'Accept-Encoding': 'gzip, deflate, br',
    'Referer': 'https://www.bloomberg.com/quote/',
    'DNT': '1',
    'Connection': 'keep-alive',
    'TE': 'Trailers'
}
URL_ROOT = 'https://www.bloomberg.com/markets2/api/datastrip'
URL_PARAMS = 'locale=en&customTickerList=true'
VALID_TYPE = {'currency', 'index'}


# Scraper
def scraper(object_id: str = None, object_type: str = None, timeout: int = 5) -> list:
    """
    Get the Bloomberg data for the given object.
    :param object_id: The Bloomberg identifier of the object.
    :param object_type: The type of the object. (Currency or Index)
    :param timeout: Maximal number of seconds to wait for a response.
    :return: The data formatted as dictionary.
    """
    object_type = object_type.lower()
    if object_type not in VALID_TYPE:
        return list()
    # Build headers and url
    object_append = '%s:%s' % (object_id, 'IND' if object_type == 'index' else 'CUR')
    headers = HEADERS
    headers['Referer'] += object_append
    url = '%s/%s?%s' % (URL_ROOT, quote(object_append), URL_PARAMS)
    # Make the request and check response status code
    response = requests.get(url=url, headers=headers)
    if response.status_code in range(200, 230):
        return response.json()
    return list()

Test

# Index
object_id, object_type = 'IBVC', 'index'
data = scraper(object_id=object_id, object_type=object_type)
print('The open price for %s %s is: %d' % (object_type, object_id, data[0]['openPrice']))
# The open price for index IBVC is: 50094

# Exchange rate
object_id, object_type = 'EUR', 'currency'
data = scraper(object_id=object_id, object_type=object_type)
print('The open exchange rate for USD per {} is: {}'.format(object_id, data[0]['openPrice']))
# The open exchange rate for USD per EUR is: 1.0993

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...