The Core

Why We Are Here => Web Development => Topic started by: gm66 on March 10, 2016, 05:35:53 PM

Title: importxml parsing error in Google sheets - help!
Post by: gm66 on March 10, 2016, 05:35:53 PM
I'm trying to get Google docs to strip stuff from web pages using importxml.

I'm getting a parse error for this :

=importxml("http://www.telegraph.co.uk/", "//div")

As well as reporting the parse error it adds a spurious closing quotes and bracket like so :

=importxml("http://www.telegraph.co.uk/", "//div")")

Any ideas ?

Title: Re: importxml parsing error in Google sheets - help!
Post by: gm66 on March 10, 2016, 06:11:48 PM
I don't think it's a parsing error, i think G just doesn't allow SERPS to be used as a URL argument to XPATH ? (i was using a results page, not the telegraph site).

Title: Re: importxml parsing error in Google sheets - help!
Post by: gm66 on March 10, 2016, 07:12:08 PM
Narrowed it down to a double quote problem, XPATH can't escape them.

I'm new to scraping, very interesting stuff going on with G docs, Python etc, doing stuff programmatically.

If anyone has any pointers (coding pun intended!) please tell :)

Title: Re: importxml parsing error in Google sheets - help!
Post by: ergophobe on March 10, 2016, 08:11:04 PM
>>Narrowed it down to a double quote problem

That's good to hear!

importXML and importHTML were broken for quite a while. When Google launched the "new" Google Sheets, they quit working or would work intermittently.

I got away from it for a while and just the other day reloaded one of my spreadsheets that was completely broken and was surprised to see it with January 2016 data (most recent available in this case). I don't know when they finally fixed or if it is reliably fixed, but in the past it would come and go, making it incredibly aggravating to troubleshoot.
Title: Re: importxml parsing error in Google sheets - help!
Post by: gm66 on March 11, 2016, 09:37:23 AM
That does sound aggravating!