handy wget cheatsheet

Started by littleman, June 03, 2012, 07:04:13 AM

Previous topic - Next topic

4Eyes


Chunkford

Out of interest, when wget mirrors a site does it find what to download by the links in the source code?
I take it won't download files like Google's file to confirm ownership?
"If my answers frighten you then you should cease asking scary questions"

jetboy

Aye, it follows links in the HTML, just like a search engine spider. As such, orphaned files (such as Google's) will get missed out

Chunkford

Cheers, just thought I would double check :)
"If my answers frighten you then you should cease asking scary questions"

littleman

Wget did a great job converting an old dynamic site of mine into static.  It really saved me a ton of work.  It ripped the site, saved it locally and converted all the link references so they'd work in static HTML.

wget  --mirror -p --html-extension --base=./ -k -P ./ http://domain.com

Chunkford

Proper noob alert here.

I'm having a play with wget, i've installed it, i'm running it within the command line of windows, but where the hell does it save the files that it downloads?
"If my answers frighten you then you should cease asking scary questions"

BoL

It saves in the directory you are running it from iirc, and the folder (at least when scraping a particular site), is the server name of the site.

Chunkford

that's what I thought, nothings there tho.

Hummm, more play time I think.

Thanks
"If my answers frighten you then you should cease asking scary questions"

Chunkford

Ha, found it.

It didn't save it to the folder I was in which was C:\Program Files (x86)\GnuWin32\bin
Instead it save it to C:\Users\username\AppData\Local\VirtualStore\Program Files (x86)\GnuWin32\bin

Strange that but never the less it's there :)
"If my answers frighten you then you should cease asking scary questions"

JasonD

> GnuWin32\bin

That makes sense as its a Windows port of a Unix utility. It effectively emulates the *nix file system and bin is where BINaries are stored and the likely location of wget

Try using it at the command line and CDing to a different directory before invoking it with the full path to the file and command

EG.

cd c:\downloads\websites\google.com
C:\Users\username\AppData\Local\VirtualStore\Program Files (x86)\GnuWin32\bin\wget.exe  --mirror -p --html-extension --base=./ -k -P ./ http://google.com

Chunkford

Well I've learn't something new today!

That makes perfect sense now and the name VirtualStore gives it away a bit.

Cheers Jason
"If my answers frighten you then you should cease asking scary questions"