Periodically save copy of webpage

Started by Adam C, April 11, 2013, 03:20:58 PM

Previous topic - Next topic

Adam C

Wondering if there's a tool / service out there that will do this - I expect so - or some easy to configure process...

Basically I want something that will save a copy of certain pages of competitor websites on a periodic basis - say weekly.

Set it and forget - until such time that you have a need to look.

What do you / would you use for this?

BoL

I'd just use curl or wget, wget if you want to fetch CSS/JS/images on the page too.

This small hacky bash script should do the trick. I'd put it in a cron job to run once a week.

Quotei=0;
folder='/home/richard/Desktop/urlfolder';
while read url; do
curl -L -A "Mozilla 6.0" -o "$folder/weekly_"`echo $i`"_"`eval date +%Y%m%d`".txt" "$url"
i=`expr $i + 1`
done < $1

It'll take a command along the lines of
sh urllist.sh /home/richard/Desktop/urllist.txt

where urllist.txt is just a list of urls, one per line. script may need editing but worked ok testing it.

Rooftop

We've got a tool that does this and highlights when pages change in certain ways: title changes, number of links, words on page - that sort of thing.  The eventual plan is to show those changes against some sort of visibility index  and link data.  The hope is to me able to visualise some of what is helping pages rank in particular sectors.

Don't suppose that is close to what you are looking at?  (this is the most cack-handed market research ever - in case you wondered)

littleman

Something like BoL's script could be setup to run automatically via a cron script.

ergophobe

perhaps simpler

wget -p -k -E http://example.com/page.html

-p: grab page requisites (images, JS, CSS)
-k:convert links. So if you have src="/images/image.jpg" and you're downloading example.com/dir/page it will convert that link to scr="../images/image.jpg"
-E: add html extension. If you're  downloading example.com/dir/page it will save it as page.html so you can double click to open in your browser.

Pipe it to gzip and save it in an archive with a timestamp in the filename and you're done!

bill

Are there any recommended WGET clients for Windows?

ergophobe

>>wget clients

Hmm.... well since I have git on my windows machines and git for win comes with git bash and bash has grep, wget, less and a lot of other things you'd expect from bash, that's how I would use it on windows (though actually, I don't really use it on windows).

You can also get it as a standalone from gnu: http://gnuwin32.sourceforge.net/packages/wget.htm

bill

The GNU tools might be the way to go. I was holding off on Cygwin on my work machine, which would have been the other option I know of.

Rupert

If you are looking for something pc based, how about iopus? You can get that to run macros to automate anything like that.
... Make sure you live before you die.

Rumbas


bill