Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
293 views
in Technique[技术] by (71.8m points)

python - How to reconstruct a post request to automatically download a file which url is hidden and is avaiable only by means of a link?

I'm trying to download a csv file using Python requests library. The file is provided by a link (“Exportar lista completa de Fundos em CSV”) in the bottom of the following webpage: http://www.b3.com.br/pt_br/produtos-e-servicos/negociacao/renda-variavel/fundos-de-investimentos/fii/fiis-listados/

I have been using chrome’s devtools to reproduce all the get/post request, but until now I have not succeeded. The main difficult is to recreate the post body. It seems there are a lot of random numbers that change each time I download it. For example, there is a CRC number in the post body that I don’t know where it comes from. I tried use the zlib library to get it from the huge get response, but I get different results. On the other hand, I do know where “visitID” and “modifiedSince” come from.

Can someone help me, please?

Here is the code thar I tried:

import os, requests, zlib
from bs4 import BeautifulSoup

url1 = 'http://www.b3.com.br/pt_br/produtos-e-servicos/negociacao/renda-variavel/fundos-de-investimentos/fii/fiis-listados/'
session = requests.Session()

headers1 = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36',
    'Connection': 'keep-alive',
    'Accept': '*/*',
    'Accept-Encoding': 'gzip, deflate, br',
    'Connection': 'keep-alive'
}

r1 = session.get(url1, headers=headers1)

##### HERE IS WHERE I GET SOME VALUES THAT I NEED DO PASS AS A BODY PARAMETERS
soup = BeautifulSoup(r1.text, "lxml")

payload = [n for n in soup.find_all('script', type="text/javascript", attrs={'src': True}) if
           'ruxitagentjs_ICA27SVfjqrux_10207210127152629.js' in n.get('src')]

payload = payload[0].attrs['data-dtconfig'].split('|')
payload = [n for n in payload if (('app' in n) or ('lastModification' in n))]
payload = [tuple(n.split('=')) for n in payload]
payload = dict(payload)
payload['modifiedSince'] = payload.pop('lastModification') 
#########

url2 = 'https://sistemaswebb3-listados.b3.com.br/fundsProxy/fundsCall/GetListFundDownload/eyJ0eXBlRnVuZCI6NywicGFnZU51bWJlciI6MSwicGFnZVNpemUiOjIwfQ=='

headers2 = {
    'Connection': 'keep-alive',
    'Accept': 'application/json, text/plain, */*',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36',
    'Sec-Fetch-Site': 'same-origin',
    'Sec-Fetch-Mode': 'cors',
    'Sec-Fetch-Dest': 'empty',
    'Referer': 'https://sistemaswebb3-listados.b3.com.br/',
    'Accept-Language': 'pt-BR,pt;q=0.9,en-US;q=0.8,en;q=0.7'
}

r2 = session.get(url2, headers=headers2, verify=False)

# THAT’S THE PART GOT STUCK

url3 = 'https://sistemaswebb3-listados.b3.com.br/fundsPage/rb_8370fec7-c82e-413f-a2c6-777046ed9811'

headers3 = {
    'Connection': 'keep-alive',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36',
    'Accept': '*/*',
    'Origin': 'https://sistemaswebb3-listados.b3.com.br',
    'Sec-Fetch-Site': 'same-origin',
    'Sec-Fetch-Mode': 'cors',
    'Sec-Fetch-Dest': 'empty',
    'Referer': 'https://sistemaswebb3-listados.b3.com.br/',    
    'Accept-Language': 'pt-BR,pt;q=0.9,en-US;q=0.8,en;q=0.7'
}

data = ???????

r3 = session.post(url3, headers=headers3, data=????, verify=False)
question from:https://stackoverflow.com/questions/66049969/how-to-reconstruct-a-post-request-to-automatically-download-a-file-which-url-is

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...