The Core

Why We Are Here => Hardware & Technology => Topic started by: littleman on June 03, 2012, 07:04:13 AM

Title: handy wget cheatsheet
Post by: littleman on June 03, 2012, 07:04:13 AM
http://notepad.benfinoradin.info/wp-content/uploads/2012/05/wget_cheat-sheet.pdf
Title: Re: handy wget cheatsheet
Post by: 4Eyes on June 04, 2012, 04:39:33 PM
nice - thanks :)
Title: Re: handy wget cheatsheet
Post by: Chunkford on June 06, 2012, 12:39:24 PM
Out of interest, when wget mirrors a site does it find what to download by the links in the source code?
I take it won't download files like Google's file to confirm ownership?
Title: Re: handy wget cheatsheet
Post by: jetboy on June 06, 2012, 06:10:44 PM
Aye, it follows links in the HTML, just like a search engine spider. As such, orphaned files (such as Google's) will get missed out
Title: Re: handy wget cheatsheet
Post by: Chunkford on June 06, 2012, 07:16:44 PM
Cheers, just thought I would double check :)
Title: Re: handy wget cheatsheet
Post by: littleman on June 06, 2012, 07:50:05 PM
Wget did a great job converting an old dynamic site of mine into static.  It really saved me a ton of work.  It ripped the site, saved it locally and converted all the link references so they'd work in static HTML.

wget  --mirror -p --html-extension --base=./ -k -P ./ http://domain.com
Title: Re: handy wget cheatsheet
Post by: Chunkford on June 19, 2012, 02:37:18 PM
Proper noob alert here.

I'm having a play with wget, i've installed it, i'm running it within the command line of windows, but where the hell does it save the files that it downloads?
Title: Re: handy wget cheatsheet
Post by: BoL on June 19, 2012, 02:38:37 PM
It saves in the directory you are running it from iirc, and the folder (at least when scraping a particular site), is the server name of the site.
Title: Re: handy wget cheatsheet
Post by: Chunkford on June 19, 2012, 02:41:02 PM
that's what I thought, nothings there tho.

Hummm, more play time I think.

Thanks
Title: Re: handy wget cheatsheet
Post by: Chunkford on June 19, 2012, 03:25:32 PM
Ha, found it.

It didn't save it to the folder I was in which was C:\Program Files (x86)\GnuWin32\bin
Instead it save it to C:\Users\username\AppData\Local\VirtualStore\Program Files (x86)\GnuWin32\bin

Strange that but never the less it's there :)
Title: Re: handy wget cheatsheet
Post by: JasonD on June 19, 2012, 10:52:13 PM
> GnuWin32\bin

That makes sense as its a Windows port of a Unix utility. It effectively emulates the *nix file system and bin is where BINaries are stored and the likely location of wget

Try using it at the command line and CDing to a different directory before invoking it with the full path to the file and command

EG.

cd c:\downloads\websites\google.com
C:\Users\username\AppData\Local\VirtualStore\Program Files (x86)\GnuWin32\bin\wget.exe  --mirror -p --html-extension --base=./ -k -P ./ http://google.com
Title: Re: handy wget cheatsheet
Post by: Chunkford on June 20, 2012, 10:18:23 AM
Well I've learn't something new today!

That makes perfect sense now and the name VirtualStore gives it away a bit.

Cheers Jason