I've set up something like this at work.
I used this reference. I recommend you read it.
Preparation:
Get the computed HTML of the page(s) section(s) you're targeting (i.e. use F12 developer console) to understand the structure of it.
<div class="articleBox navigation">
<!-- ... -->
<article>
<div>
<a href="css-partners/partner.html/100775">Aarhus University (AU)<span class="icon"></span></a>
</div>
<div class="nav-hint bold author">Denmark</div>
<div>Aarhus University (AU) is an academically diverse and research-oriented institution that works to solve the complex developmental challenges facing the world.</div>
</article>
<!-- ... -->
</div>
It is best if you already understand the Document Object Model and how you traverse it with JavaScript, specifically using query selectors, child nodes and so on; the Microsoft IE interface somewhat mirrors it. e.g. in JavaScript:
var articles = document.querySelectorAll("div.articleBox.navigation > article")
Add references to "Microsoft Internet Controls" and "Microsoft HTML Object Library" to your VB project.
The sub:
Initialise and open Internet Explorer in memory.
Dim ie as New InternetExplorer
Navigate to the page.
ie.Navigate "http://www.css.ethz.ch/en/services/css-partners.html?page=1"
Wait until the page has loaded.
Do While ie.ReadyState <> READYSTATE_COMPLETE
DoEvents
Loop
Traverse the Document Object Model of the page and store relevant details as required.
Dim articles As IHTMLDOMChildrenCollection
Dim article As IHTMLElement
Dim divs As IHTMLElementCollection
...
Set articles = ie.Document.querySelectorAll("div.articleBox.navigation > article")
Set article = articles(0)
Set divs = article.Children
Write the relevant details to a range.
Range("A1") = divs(0).innerText
Range("B1") = divs(1).innerText
Range("C1") = divs(2).innerText
Loop within article elements and loop pages (not shown).
Close and destroy instance of Internet Explorer.
ie.Quit
Set ie = Nothing
Put together:
Sub GetSearchResults()
Dim ie As New InternetExplorer
Dim articles As IHTMLDOMChildrenCollection
Dim article As IHTMLElement
Dim divs As IHTMLElementCollection
ie.Navigate "http://www.css.ethz.ch/en/services/css-partners.html?page=1"
Do While ie.ReadyState <> READYSTATE_COMPLETE
DoEvents
Loop
Set articles = ie.Document.querySelectorAll("div.articleBox.navigation > article")
Set article = articles(0)
Set divs = article.Children
Range("A1") = divs(0).innerText
Range("B1") = divs(1).innerText
Range("C1") = divs(2).innerText
ie.Quit
Set ie = Nothing
End Sub
I leave it as an exercise for you to work out how to loop within the article elements on the page, how to loop within all the pages you want to target, and how to write the information extracted to the appropriate Ranges in Excel.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…