I'm trying to get Google docs to strip stuff from web pages using importxml.
I'm getting a parse error for this :
=importxml("http://www.telegraph.co.uk/", "//div")
As well as reporting the parse error it adds a spurious closing quotes and bracket like so :
=importxml("http://www.telegraph.co.uk/", "//div")")
Any ideas ?
I don't think it's a parsing error, i think G just doesn't allow SERPS to be used as a URL argument to XPATH ? (i was using a results page, not the telegraph site).
Narrowed it down to a double quote problem, XPATH can't escape them.
I'm new to scraping, very interesting stuff going on with G docs, Python etc, doing stuff programmatically.
If anyone has any pointers (coding pun intended!) please tell :)
>>Narrowed it down to a double quote problem
That's good to hear!
importXML and importHTML were broken for quite a while. When Google launched the "new" Google Sheets, they quit working or would work intermittently.
I got away from it for a while and just the other day reloaded one of my spreadsheets that was completely broken and was surprised to see it with January 2016 data (most recent available in this case). I don't know when they finally fixed or if it is reliably fixed, but in the past it would come and go, making it incredibly aggravating to troubleshoot.
That does sound aggravating!