Author Topic: Periodically save copy of webpage  (Read 30639 times)

Adam C

  • Inner Core
  • Hero Member
  • *
  • Posts: 627
    • View Profile
Periodically save copy of webpage
« on: April 11, 2013, 03:20:58 PM »
Wondering if there's a tool / service out there that will do this - I expect so - or some easy to configure process...

Basically I want something that will save a copy of certain pages of competitor websites on a periodic basis - say weekly.

Set it and forget - until such time that you have a need to look.

What do you / would you use for this?

BoL

  • Inner Core
  • Hero Member
  • *
  • Posts: 1214
    • View Profile
Re: Periodically save copy of webpage
« Reply #1 on: April 11, 2013, 04:03:01 PM »
I'd just use curl or wget, wget if you want to fetch CSS/JS/images on the page too.

This small hacky bash script should do the trick. I'd put it in a cron job to run once a week.

Quote
i=0;
folder='/home/richard/Desktop/urlfolder';
while read url; do
 curl -L -A "Mozilla 6.0" -o "$folder/weekly_"`echo $i`"_"`eval date +%Y%m%d`".txt" "$url"
 i=`expr $i + 1`
done < $1

It'll take a command along the lines of
sh urllist.sh /home/richard/Desktop/urllist.txt

where urllist.txt is just a list of urls, one per line. script may need editing but worked ok testing it.

Rooftop

  • Inner Core
  • Hero Member
  • *
  • Posts: 1915
    • View Profile
Re: Periodically save copy of webpage
« Reply #2 on: April 11, 2013, 05:26:11 PM »
We've got a tool that does this and highlights when pages change in certain ways: title changes, number of links, words on page - that sort of thing.  The eventual plan is to show those changes against some sort of visibility index  and link data.  The hope is to me able to visualise some of what is helping pages rank in particular sectors.

Don't suppose that is close to what you are looking at?  (this is the most cack-handed market research ever - in case you wondered)

littleman

  • Administrator
  • Hero Member
  • *****
  • Posts: 6591
    • View Profile
Re: Periodically save copy of webpage
« Reply #3 on: April 11, 2013, 05:36:00 PM »
Something like BoL's script could be setup to run automatically via a cron script.

ergophobe

  • Inner Core
  • Hero Member
  • *
  • Posts: 9394
    • View Profile
Re: Periodically save copy of webpage
« Reply #4 on: April 11, 2013, 10:06:35 PM »
perhaps simpler

wget -p -k -E http://example.com/page.html

-p: grab page requisites (images, JS, CSS)
-k:convert links. So if you have src="/images/image.jpg" and you're downloading example.com/dir/page it will convert that link to scr="../images/image.jpg"
-E: add html extension. If you're  downloading example.com/dir/page it will save it as page.html so you can double click to open in your browser.

Pipe it to gzip and save it in an archive with a timestamp in the filename and you're done!

bill

  • Devil's Avocado
  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1286
  • Avast!
    • View Profile
    • Email
Re: Periodically save copy of webpage
« Reply #5 on: April 14, 2013, 01:35:31 AM »
Are there any recommended WGET clients for Windows?

ergophobe

  • Inner Core
  • Hero Member
  • *
  • Posts: 9394
    • View Profile
Re: Periodically save copy of webpage
« Reply #6 on: April 15, 2013, 06:43:56 PM »
>>wget clients

Hmm.... well since I have git on my windows machines and git for win comes with git bash and bash has grep, wget, less and a lot of other things you'd expect from bash, that's how I would use it on windows (though actually, I don't really use it on windows).

You can also get it as a standalone from gnu: http://gnuwin32.sourceforge.net/packages/wget.htm

bill

  • Devil's Avocado
  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1286
  • Avast!
    • View Profile
    • Email
Re: Periodically save copy of webpage
« Reply #7 on: April 16, 2013, 02:13:23 AM »
The GNU tools might be the way to go. I was holding off on Cygwin on my work machine, which would have been the other option I know of.

Rupert

  • Inner Core
  • Hero Member
  • *
  • Posts: 3365
  • George in a previous life.
    • View Profile
    • SuitsMen
Re: Periodically save copy of webpage
« Reply #8 on: April 16, 2013, 07:58:50 AM »
If you are looking for something pc based, how about iopus? You can get that to run macros to automate anything like that.
... Make sure you live before you die.

Rumbas

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 2110
  • Viking Wrath
    • MSN Messenger - rasmussoerensen@hotmail.com
    • AOL Instant Messenger - seorasmus
    • View Profile
Re: Periodically save copy of webpage
« Reply #9 on: April 16, 2013, 08:35:01 AM »

bill

  • Devil's Avocado
  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1286
  • Avast!
    • View Profile
    • Email
Re: Periodically save copy of webpage
« Reply #10 on: April 16, 2013, 09:00:33 AM »
http://www.changedetection.com/
I use that all the time. Forgot about its ability to cache pages.