I am trying to make a C++ program that searches a URL and picks out text by its name. I know this is achievable in python with requests and beautiful-soup, however I am trying to do this in C++. I have had a look at cURL
and so far this is what I have in terms of its functionality
#include <iostream>
#include <string>
#include <curl/curl.h>
static size_t WriteCallback(void *contents, size_t size, size_t nmemb, void *userp)
{
((std::string*)userp)->append((char*)contents, size * nmemb);
return size * nmemb;
}
int main(void)
{
CURL *curl;
CURLcode res;
std::string readBuffer;
curl = curl_easy_init();
if(curl) {
curl_easy_setopt(curl, CURLOPT_URL, "https://www.barcodelookup.com/763649064870");
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, WriteCallback);
curl_easy_setopt(curl, CURLOPT_WRITEDATA, &readBuffer);
res = curl_easy_perform(curl);
curl_easy_cleanup(curl);
std::cout << readBuffer << std::endl;
}
return 0;
}
The output of the page from the program does not include the text when viewed in a browser as displayed next to the name. For example, I am interested in picking out (from the source of https://www.barcodelookup.com/763649064870
this line here (5): <meta name="description" content="Barcode Lookup provides info on UPC 763649064870 - Seagate 1Tb Expansion Portable Drive USB.">
in python I would just find <meta name="description" ....
however in my C++ program I don't get that. The output is actually:
<!DOCTYPE html>
<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->
<!--[if IE 7]> <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->
<!--[if IE 8]> <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]-->
<head>
<title>Attention Required! | Cloudflare</title>
<meta name="captcha-bypass" id="captcha-bypass" />
<meta charset="UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1" />
<meta name="robots" content="noindex, nofollow" />
<meta name="viewport" content="width=device-width,initial-scale=1" />
<link rel="stylesheet" id="cf_styles-css" href="/cdn-cgi/styles/cf.errors.css" type="text/css" media="screen,projection" />
<!--[if lt IE 9]><link rel="stylesheet" id='cf_styles-ie-css' href="/cdn-cgi/styles/cf.errors.ie.css" type="text/css" media="screen,projection" /><![endif]-->
<style type="text/css">body{margin:0;padding:0}</style>
and that is just for the first portion of the output, but that in theory should have covered line 5 (which im interested in). How can I get the proper output as displayed by view page source?
question from:
https://stackoverflow.com/questions/65915787/getting-page-information-from-page-source-by-name-in-c 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…