Sunday, 29 May 2011

GNU Wget

Introduction to GNU Wget

GNU Wget is a free software package for retrieving files using HTTP, HTTPS and FTP, the most widely-used Internet protocols. It is a non-interactive commandline tool, so it may easily be called from scripts, cron jobs, terminals without X-Windows support, etc.
GNU Wget has many features to make retrieving large files or mirroring entire web or FTP sites easy, including:
  • Can resume aborted downloads, using REST and RANGE
  • Can use filename wild cards and recursively mirror directories
  • NLS-based message files for many different languages
  • Optionally converts absolute links in downloaded documents to relative, so that downloaded documents may link to each other locally
  • Runs on most UNIX-like operating systems as well as Microsoft Windows
  • Supports HTTP proxies
  • Supports HTTP cookies
  • Supports persistent HTTP connections
  • Unattended / background operation
  • Uses local file timestamps to determine whether documents need to be re-downloaded when mirroring
  • GNU Wget is distributed under the GNU General Public License.

Downloading GNU Wget

The source code for GNU Wget can be found on http://ftp.gnu.org/gnu/wget/ [via http] and ftp://ftp.gnu.org/gnu/wget/ [via FTP]. It can also be found on one of our FTP mirrors. For more download options, see the download information on the Wget Wgiki.

Documentation

GNU Wget documentation can be found at http://www.gnu.org/software/wget/manual/. For manuals of other GNU packages, please see http://www.gnu.org/manual/

WGET for Windows (win32) - current version: 1.11.4

updated February 18 2010
Read below to download and for some help with wget.

Downloads!wget.exe (401408 bytes) <<

Latest version is 1.11.4, compiled with MS Visual C++ and linked with OpenSSL 0.9.8k. Page will be updated with new releases of wget. Wget tends to see a couple of incremental bugfix releases (i.e. 1.11.x). I am currently using wget 1.11.x on a daily basis.
>> : win32 binary with OpenSSL support.
MD5: bd126a7b59d5d1f97ba89a3e71425731
SHA1: 457b1cd985ed07baffd8c66ff40e9c1b6da93753

Where is 1.12?Previous versionswget-1.10.2.exe (332800 bytes): win32 binary compiled with MS Visual C++ and with OpenSSL 0.9.7i support.

Latest official version of wget is currently wget 1.12, however this version does not currently compile for Windows. I am looking into this, but a Windows version of wget 1.12 may still take some time. The suggested mingw32 compile path is not a viable option to me right now, mainly because of the lack of IPv6 and 64-bit support.

wget 1.10.2 (Dec 2 2005)


wget 1.9.1 (Jun 03 2004)

wget-1.9.1.exe (308736 bytes): win32 binary compiled with MS Visual C++ 6.0 and with OpenSSL 0.9.7c support.

UsageBasic optionsmanual has all command line options and parameters.

wget is a command line program. You start it from the command prompt, either command.com in Windows 9x/Me or cmd.exe in Windows 2000/XP. The command prompt can be found in the Start Menu (Accessories).
wget.exe must be placed in your path (e.g. c:\windows) if you want to be able to run it from any directory.
To retrieve a file: wget http://users.ugent.be/~bpuype/wget/wget.exe
wget screenshot
wget in action...

First off, the official
Display all help: wget --help
Completely mirror a site: wget -mr http://...
-m: mirror
-r: recursive
Mirror without following links to other servers, parent directories: wget -mrnp http://...
-np: no-parent
Retrieve a html file and convert relative links to absolute ones: wget -k http://users.ugent.be/~bpuype/wget
-k: 'k'onvert links
Resume partially downloaded files (if supported by the server): wget -c http://...
-c: continue
Read url's from a file and retrieve them: wget -i file_with_urls.txt
-i: input-file
Ask for url's (read from stdin): wget -i -. Enter url's on the command line, press enter after each url, and terminate with ^Z (press CTRL-Z) on an empty line.

FTPProxyProcedure for Windows XP (similar for NT, 2000, Vista, 7). For Windows 95, 98, ME, add them to autoexec.bat (use msconfig to do this easily).

--glob=off
Don't treat (, *, ? etc. as globbing characters. Use when transfering files with names that contain these characters.
--passive-ftp
Use passive mode for data connection (try this if you're behind a firewall, NAT box...)

To make wget use a proxy, you must set up an environment variable before using wget. Type this at the command prompt:
set http_proxy=http://proxy.myprovider.net:8080
...where you use the correct proxy hostname and port for your ISP or network. You can use ftp_proxy to proxy ftp requests.
--proxy=on
--proxy=off
Turn proxy usage on/off once variable is set; default is on when variable is present.
Environment variables can be set permanently for the entire system, or on a per-user basis.

PasswordsSSL certificatesActivePerl).

To retrieve with passwords (http or ftp), you can use the following url syntax:
wget http://username:password@www.example.net/somedir/somefile
wget ftp://username:password@ftp.example.net/somedir/somefile
Additionally, you can also use --http-user, --http-password as well as --ftp-user, --ftp-password:
wget ftp://ftp.example.net/somefile --ftp-user=username --ftp-password=password
If username or password contain non-alphanumeric characters, you need to escape them when passing them in urls (rfc1738 %HH) syntax. For example, with a username of user@domain and password of pass, your url becomes http://user%40domain:pass@www.example.net/somefile. When using escaped urls in batch files, remember that % itself is a special character, and needs to be escaped itself (by using %% instead of %).

Current versions of OpenSSL do not come with root certificates. This means when trying to download over SSL, wget will give you errors such as
Unable to locally verify the issuer's authority.
ERROR: certificate common name `dnsname' doesn't match requested host name 
`dnsname'.
Either you can use the suggested --no-check-certificate to skip authentication - only use this if you only need encryption functionality, and not authentication. The alternative is to get a set of root certificates and pass it to wget with --ca-certificate file.crt. The problem is then to get a correct root certificate bundle first. The following link has a perl script which will download root certificates from Mozilla and convert them to a wget usable certificate bundle (you'll need Perl, typically
http://www.floodgap.com/software/ttytter/mk-ca-bundle.txt
Do not trust other people to give you a set of root certificates. This means you should not trust this site (but it no longer offers certificates anyway). Audit any sources you download root certificates from, audit the tools you use to process certificates (including the mk-ca-bundle.pl script linked above).
The official source for root certificates is your Windows install media and Windows Update (remember to update the root certificates regularly), though this set is not used by wget and many other Windows tools.
Furthermore, the Windows makefiles for wget refer to the certificate bundles available at http://curl.haxx.se/docs/caextract.html (which are extracted from Mozilla as well).

Default options (.ini file)Syntax for wget.ini (or .wgetrc) can be found in the official documentation

You can put either a wget.ini file in the same directory as wget.exe, or use an environment variable called wgetrc to point to the file if it is in another location (set wgetrc=\path\to\wget.ini).