It appears that Invoke-WebRequest
loads file
protocol URIs just fine, but fails to parse them even in PowerShell 4.0 (where it is officially supported).
An alternative that does not require setting up a website would be to load and parse HTML directly into MSHTML.
$html = New-Object -ComObject "HTMLFile";
$source = Get-Content -Path "file.html" -Raw;
$html.IHTMLDocument2_write($source);
$html.links.length;
Note that when I tested this, a single
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
header prevented my HTML from parsing and I have no idea why -- the document had similar XHTML-style headers and MSHTML had no issues with those.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…