I'm trying to download a csv file using Python requests library. The file is provided by a link (“Exportar lista completa de Fundos em CSV”) in the bottom of the following webpage: http://www.b3.com.br/pt_br/produtos-e-servicos/negociacao/renda-variavel/fundos-de-investimentos/fii/fiis-listados/
I have been using chrome’s devtools to reproduce all the get/post request, but until now I have not succeeded. The main difficult is to recreate the post body. It seems there are a lot of random numbers that change each time I download it. For example, there is a CRC number in the post body that I don’t know where it comes from. I tried use the zlib library to get it from the huge get response, but I get different results. On the other hand, I do know where “visitID” and “modifiedSince” come from.
Can someone help me, please?
Here is the code thar I tried:
import os, requests, zlib
from bs4 import BeautifulSoup
url1 = 'http://www.b3.com.br/pt_br/produtos-e-servicos/negociacao/renda-variavel/fundos-de-investimentos/fii/fiis-listados/'
session = requests.Session()
headers1 = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36',
'Connection': 'keep-alive',
'Accept': '*/*',
'Accept-Encoding': 'gzip, deflate, br',
'Connection': 'keep-alive'
}
r1 = session.get(url1, headers=headers1)
##### HERE IS WHERE I GET SOME VALUES THAT I NEED DO PASS AS A BODY PARAMETERS
soup = BeautifulSoup(r1.text, "lxml")
payload = [n for n in soup.find_all('script', type="text/javascript", attrs={'src': True}) if
'ruxitagentjs_ICA27SVfjqrux_10207210127152629.js' in n.get('src')]
payload = payload[0].attrs['data-dtconfig'].split('|')
payload = [n for n in payload if (('app' in n) or ('lastModification' in n))]
payload = [tuple(n.split('=')) for n in payload]
payload = dict(payload)
payload['modifiedSince'] = payload.pop('lastModification')
#########
url2 = 'https://sistemaswebb3-listados.b3.com.br/fundsProxy/fundsCall/GetListFundDownload/eyJ0eXBlRnVuZCI6NywicGFnZU51bWJlciI6MSwicGFnZVNpemUiOjIwfQ=='
headers2 = {
'Connection': 'keep-alive',
'Accept': 'application/json, text/plain, */*',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Dest': 'empty',
'Referer': 'https://sistemaswebb3-listados.b3.com.br/',
'Accept-Language': 'pt-BR,pt;q=0.9,en-US;q=0.8,en;q=0.7'
}
r2 = session.get(url2, headers=headers2, verify=False)
# THAT’S THE PART GOT STUCK
url3 = 'https://sistemaswebb3-listados.b3.com.br/fundsPage/rb_8370fec7-c82e-413f-a2c6-777046ed9811'
headers3 = {
'Connection': 'keep-alive',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36',
'Accept': '*/*',
'Origin': 'https://sistemaswebb3-listados.b3.com.br',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Dest': 'empty',
'Referer': 'https://sistemaswebb3-listados.b3.com.br/',
'Accept-Language': 'pt-BR,pt;q=0.9,en-US;q=0.8,en;q=0.7'
}
data = ???????
r3 = session.post(url3, headers=headers3, data=????, verify=False)
question from:
https://stackoverflow.com/questions/66049969/how-to-reconstruct-a-post-request-to-automatically-download-a-file-which-url-is