http://notepad.benfinoradin.info/wp-content/uploads/2012/05/wget_cheat-sheet.pdf
nice - thanks :)
Out of interest, when wget mirrors a site does it find what to download by the links in the source code?
I take it won't download files like Google's file to confirm ownership?
Aye, it follows links in the HTML, just like a search engine spider. As such, orphaned files (such as Google's) will get missed out
Cheers, just thought I would double check :)
Wget did a great job converting an old dynamic site of mine into static. It really saved me a ton of work. It ripped the site, saved it locally and converted all the link references so they'd work in static HTML.
wget --mirror -p --html-extension --base=./ -k -P ./ http://domain.com
Proper noob alert here.
I'm having a play with wget, i've installed it, i'm running it within the command line of windows, but where the hell does it save the files that it downloads?
It saves in the directory you are running it from iirc, and the folder (at least when scraping a particular site), is the server name of the site.
that's what I thought, nothings there tho.
Hummm, more play time I think.
Thanks
Ha, found it.
It didn't save it to the folder I was in which was C:\Program Files (x86)\GnuWin32\bin
Instead it save it to C:\Users\username\AppData\Local\VirtualStore\Program Files (x86)\GnuWin32\bin
Strange that but never the less it's there :)
> GnuWin32\bin
That makes sense as its a Windows port of a Unix utility. It effectively emulates the *nix file system and bin is where BINaries are stored and the likely location of wget
Try using it at the command line and CDing to a different directory before invoking it with the full path to the file and command
EG.
cd c:\downloads\websites\google.com
C:\Users\username\AppData\Local\VirtualStore\Program Files (x86)\GnuWin32\bin\wget.exe --mirror -p --html-extension --base=./ -k -P ./ http://google.com
Well I've learn't something new today!
That makes perfect sense now and the name VirtualStore gives it away a bit.
Cheers Jason