Edit, Dec 2013: Google has deprecated the old Xml
service, replacing it with XmlService
. The script in this answer has been updated to use the new service. The new service requires standard-compliant XML & HTML, while the old one was forgiving of such problems as missing close-tags.
Have a look at the Tutorial: Parsing an XML Document. (As of Dec 2013, this tutorial is still on line, although the Xml service is deprecated.) Starting with that foundation, you can take advantage of the XML parsing in Script Services to navigate the page. Here's a small script operating on your example:
function getProgrammeList() {
txt = '<html> <body> <div> <div> <div id="here">hello world!!</div> </div> </div> </html>'
// Put the receieved xml response into XMLdocument format
var doc = Xml.parse(txt,true);
Logger.log(doc.html.body.div.div.div.id +" = "
+doc.html.body.div.div.div.Text ); /// here = hello world!!
debugger; // Pause in debugger - examine content of doc
}
To get the real page, start with this:
var url = 'http://blah.blah/whatever?querystring=foobar';
var txt = UrlFetchApp.fetch(url).getContentText();
....
If you look at the documentation for getElements
you'll see that there is support for retrieving specific tags, for example "div". That finds direct children of a specific element, it doesn't explore the entire XML document. You should be able to write a function that traverses the document examining the id
of each div
element until it finds your programme list.
var programmeList = findDivById(doc,"here");
Edit - I couldn't help myself...
Here's a utility function that will do just that.
/**
* Find a <div> tag with the given id.
* <pre>
* Example: getDivById( html, 'tagVal' ) will find
*
* <div id="tagVal">
* </pre>
*
* @param {Element|Document}
* element XML document or element to start search at.
* @param {String} id HTML <div> id to find.
*
* @return {XmlElement} First matching element (in doc order) or null.
*/
function getDivById( element, id ) {
// Call utility function to do the work.
return getElementByVal( element, 'div', 'id', id );
}
/**
* !Now updated for XmlService!
*
* Traverse the given Xml Document or Element looking for a match.
* Note: 'class' is stripped during parsing and cannot be used for
* searching, I don't know why.
* <pre>
* Example: getElementByVal( body, 'input', 'value', 'Go' ); will find
*
* <input type="submit" name="btn" value="Go" id="btn" class="submit buttonGradient" />
* </pre>
*
* @param {Element|Document}
* element XML document or element to start search at.
* @param {String} elementType XML element type, e.g. 'div' for <div>
* @param {String} attr Attribute or Property to compare.
* @param {String} val Search value to locate
*
* @return {Element} First matching element (in doc order) or null.
*/
function getElementByVal( element, elementType, attr, val ) {
// Get all descendants, in document order
var descendants = element.getDescendants();
for (var i =0; i < descendants.length; i++) {
var elem = descendants[i];
var type = elem.getType();
// We'll only examine ELEMENTs
if (type == XmlService.ContentTypes.ELEMENT) {
var element = elem.asElement();
var htmlTag = element.getName();
if (htmlTag === elementType) {
if (val === element.getAttribute(attr).getValue()) {
return element;
}
}
}
}
// No matches in document
return null;
}
Applying this to your example, we get this:
function getProgrammeList() {
txt = '<html> <body> <div> <div> <div id="here">hello world!!</div> </div> </div> </html>'
// Get the receieved xml response into an XML document
var doc = XmlService.parse(txt);
var found = getDivById(doc.getElement(),'here');
Logger.log(found.getAttribute(attr).getValue()
+ " = "
+ found.getValue()); /// here = hello world!!
}
Note: See this answer for a practical example of the use of these utilities.