importxml parsing error in Google sheets - help!

Started by gm66, March 10, 2016, 05:35:53 PM

Previous topic - Next topic

gm66

I'm trying to get Google docs to strip stuff from web pages using importxml.

I'm getting a parse error for this :

=importxml("http://www.telegraph.co.uk/", "//div")

As well as reporting the parse error it adds a spurious closing quotes and bracket like so :

=importxml("http://www.telegraph.co.uk/", "//div")")

Any ideas ?

Civilisation is a race between disaster and education ...

gm66

I don't think it's a parsing error, i think G just doesn't allow SERPS to be used as a URL argument to XPATH ? (i was using a results page, not the telegraph site).

Civilisation is a race between disaster and education ...

gm66

Narrowed it down to a double quote problem, XPATH can't escape them.

I'm new to scraping, very interesting stuff going on with G docs, Python etc, doing stuff programmatically.

If anyone has any pointers (coding pun intended!) please tell :)

Civilisation is a race between disaster and education ...

ergophobe

>>Narrowed it down to a double quote problem

That's good to hear!

importXML and importHTML were broken for quite a while. When Google launched the "new" Google Sheets, they quit working or would work intermittently.

I got away from it for a while and just the other day reloaded one of my spreadsheets that was completely broken and was surprised to see it with January 2016 data (most recent available in this case). I don't know when they finally fixed or if it is reliably fixed, but in the past it would come and go, making it incredibly aggravating to troubleshoot.

gm66

Civilisation is a race between disaster and education ...