[one Linux command per day] detailed usage of wget command

Linux wget is a tool for downloading files. It is used on the command line. It is an essential tool for Linux users, especially for network administrators. They often have to download some software or restore and back up from the remote server to the local server. If we use a virtual host to handle such transactions, we can only download it from the remote server to our computer disk, and then upload it to the server with FTP tool. This is a waste of time and energy. It's not impossible. When it comes to Linux VPS, it can be directly downloaded to the server without uploading. wget tool has small size but complete functions. It supports breakpoint download function, FTP and HTTP download methods, proxy server and settings, which are convenient and simple. Let's illustrate how to use wget in the form of an example.

Generally speaking, the wget command comes with the system. If you prompt command not found, we need to install it.

Ubuntu / Debian:

apt-get install wget

CentOS / Fedora:

yum install wget

wget command format:
wget [parameter list] [URL of target software and web page] / / usage: wget [options]... [URL]

The parameters required for the long option are also required when using the short option.

Start:

  -V,  --version                   display Wget And exit
  -h,  --help                      Print this help
  -b,  --background                Transfer to background after startup
  -e,  --execute=command              Run a“.wgetrc"Style command```

Log and input files:
  -o,  --output-file=file          Write log information to FILE
  -a,  --append-output=file        Add information to FILE
  -d,  --debug                     Print a lot of debugging information
  -q,  --quiet                     Quiet mode (No information output)
  -v,  --verbose                   Detailed output (This is the default)
  -nv, --no-verbose                Turn off verbose output, but do not enter quiet mode
       --report-speed=type         with <type> Report bandwidth. The type can be bits
  -i,  --input-file=file           Download local or external <file> Medium URL
  -F,  --force-html                Treat the input file as HTML file
  -B,  --base=URL                  Analytical relative URL of HTML Input file link (-i -F)
       --config=file               Specify the profile to use
       --no-cookies                Do not read any configuration files
       --rejected-log=file         Will reject URL Reason for writing <file>. 

Download:

  -t,  --tries=number                Set the number of retries to <number> (0 Representative unlimited)
       --retry-connrefused         Retry even if the connection is rejected
  -O,  --output-document=file      Write document to FILE
  -nc, --no-clobber                Do not download files that already exist and will be overwritten
  -c,  --continue                  Continue downloading files at breakpoint
       --start-pos=Offset          From zero counted <Offset> Start downloading
       --progress=type             Select progress bar type
       --show-progress             Show progress bar in any verbose state
  -N,  --timestamping              Get only files newer than local files
       --no-if-modified-since      Do not time stamp (timestamping) Use in mode
                                     if-modified-since get Conditional request
       --no-use-server-timestamps  don't set the local file's timestamp by
                                     the one on the server
  -S,  --server-response           Print server response
       --spider                    Do not download any files
  -T,  --timeout=SECONDS           Set all timeouts to SECONDS second
       --dns-timeout=SECS          set up DNS Lookup timeout is SECS second
       --connect-timeout=SECS      Set connection timeout to SECS second
       --read-timeout=SECS         Set the read timeout to SECS second
  -w,  --wait=SECONDS              Wait interval is SECONDS second
       --waitretry=SECONDS         Wait 1 during retry to get file..SECONDS second
       --random-wait               Random waiting interval each time when multiple files are obtained (0.5~1.5)*WAIT second
       --no-proxy                  Prohibit the use of agents
  -Q,  --quota=number                Set get quota to <number> byte
       --bind-address=ADDRESS      Bind to on local host ADDRESS (Host name or IP)
       --limit-rate=RATE           Limit download rate to RATE
       --no-dns-cache              close DNS Query cache
       --restrict-file-names=The characters in the system qualified file name are <system> Allowed characters
       --ignore-case               Matching file/Ignore case when directory
  -4,  --inet4-only                Connect to only IPv4 address
  -6,  --inet6-only                Connect to only IPv6 address
       --prefer-family=Address family      Connect to the specified family first( IPv6,IPv4 or none)Address of
       --user=user                 take ftp and http All user names are set to <user>
       --password=password             take ftp and http All passwords are set to <password>
       --ask-password              Prompt for password
       --no-iri                    close IRI support
       --local-encoding=ENC        use ENC As IRI (International resource identifier) Local encoding of
       --remote-encoding=ENC       use ENC As default remote encoding
       --unlink                    Remove files before overwriting

catalog:

  -nd, --no-directories            Do not create directory
  -x,  --force-directories         Force directory creation
  -nH, --no-host-directories       Do not create master (host) catalogue
       --protocol-directories      Use protocol name in directory
  -P,  --directory-prefix=prefix     Save file to <prefix>/..
       --cut-dirs=number             Ignore remote directory <number> A directory layer.

HTTP options:

       --http-user=user            set up http User name is <user>
       --http-password=password        set up http Password is <password>
       --no-cache                  Do not use server cached data.
       --default-page=NAME         Change default page (Usually“ index.html"). 
  -E,  --adjust-extension          Save with the appropriate extension HTML/CSS file
       --ignore-length             Ignore head'Content-Length'region
       --header=character string             Insert in head <character string>
       --max-redirect              Maximum redirection allowed per page
       --proxy-user=user           use <user> As proxy user name
       --proxy-password=password       use <password> As proxy password
       --referer=URL               stay HTTP Request header contains'Referer: URL'
       --save-headers              take HTTP Save header to file.
  -U,  --user-agent=agent           Identify yourself as <agent> instead of Wget/VERSION. 
       --no-http-keep-alive        Disable HTTP keep-alive (Persistent connection). 
       --no-cookies                Not used cookies. 
       --load-cookies=file         From before session starts <file> Load in cookies. 
       --save-cookies=file         Save after session cookies to FILE. 
       --keep-session-cookies      Load and save session (Non permanent) cookies. 
       --post-data=character string          use POST Mode; hold <string>Send as data.
       --post-file=file            use POST Mode; send out <file> Content.
       --method=HTTP method           Use the specified in the request <HTTP method>. 
       --post-data=character string          hold <string> As data transmission, it must be set --method
       --post-file=file            send out <file> Content, must be set --method
       --content-disposition       Allow when local file name is selected Content-Disposition
                                   head (In the experiment). 
       --content-on-error          Output received content in case of server error
       --auth-no-challenge         Send basic messages without waiting for the server to ask HTTP Validation information.

HTTPS (SSL/TLS) options:

       --secure-protocol=PR         Select a security protocol, which can be auto,SSLv2,
                                    SSLv3,TLSv1,PFS One of them.
       --https-only                 Follow only the safe HTTPS link
       --no-check-certificate       Do not validate the server's certificate.
       --certificate=file           Client certificate file.
       --certificate-type=type      Client certificate type, PEM or DER. 
       --private-key=file           Private key file.
       --private-key-type=type      Private key file type, PEM or DER. 
       --ca-certificate=file        With a group CA File of the certificate.
       --ca-directory=DIR           preservation CA Directory of hash lists for certificates.
       --ca-certificate=file        With a group CA File of the certificate.
       --pinnedpubkey=FILE/HASHES  Public key (PEM/DER) file, or any number
                                   of base64 encoded sha256 hashes preceded by
                                   'sha256//' and seperated by ';', to verify
                                   peer against

HSTS options:

       --no-hsts                   Disable HSTS
       --hsts-file                 HSTS Database path (the default value will be overwritten)

FTP options:

       --ftp-user=user             set up ftp User name is <user>. 
       --ftp-password=password         set up ftp Password is <password>
       --no-remove-listing         Do not delete'.listing'file
       --no-glob                   be not in FTP Wildcard expansion in file names
       --no-passive-ftp            Disable“ passive"transmission mode 
       --preserve-permissions      Preserve permissions on remote files
       --retr-symlinks             Get linked files when recursing directories (Not a directory)

FTPS options:

       --ftps-implicit                 Use implicit FTPS(Default port (990)
       --ftps-resume-ssl               Continue to control the data in the connection when you open the data connection SSL/TLS conversation
       --ftps-clear-data-connection    Encrypt only the control channel; Clear text is used for data transmission
       --ftps-fallback-to-ftp          Fall back to FTP,If the target server does not support FTPS

WARC options:

       --warc-file=file name          In a .warc.gz Hold request in file/Response data
       --warc-header=character string        Insert in head <character string>
       --warc-max-size=number        take WARC The maximum size of the is set to <number>
       --warc-cdx                  write in CDX Index file
       --warc-dedup=file name         Do not list records here CDX Records in documents
       --no-warc-compression       No GZIP compress WARC file
       --no-warc-digests           Don't count SHA1 abstract
       --no-warc-keep-log          Don't be WARC Store log files in records
       --warc-tempdir=catalogue         WARC Temporary file directory for writer

Recursive Download:

  -r,  --recursive                 Specify recursive Download
  -l,  --level=number                Maximum recursion depth (inf Or 0 means unlimited, that is, all downloads). 
       --delete-after             Delete local files after downloading
  -k,  --convert-links            Let me download it HTML or CSS Links in point to local files
       --convert-file-only         convert the file part of the URLs only (usually known as the basename)
       --backups=N                 write file X Before, rotate and move the most N Backup files
  -K,  --backup-converted         Converting files in X Back it up as X.orig. 
  -m,  --mirror                   -N -r -l inf --no-remove-listing Abbreviation of.
  -p,  --page-requisites          Download all for display HTML Elements such as pictures of the page.
       --strict-comments          In a strict manner (SGML) handle HTML notes.

Recursive accept / reject:

  -A,  --accept=list               Comma separated list of acceptable extensions
  -R,  --reject=list               Comma separated list of extensions to reject
       --accept-regex=REGEX        Match accepted URL Regular expression for
       --reject-regex=REGEX        Match rejected URL Regular expression for
       --regex-type=type           Regular type (posix|pcre)
  -D,  --domains=list              Comma separated list of acceptable domain names
       --exclude-domains=list      Comma separated list of domain names to reject
       --follow-ftp                track HTML In the document FTP link
       --follow-tags=list          Comma separated trace HTML Identification list
       --ignore-tags=list          Comma separated ignored HTML Identification list
  -H,  --span-hosts                Turn to external host during recursion
  -L,  --relative                  Track only relative links
  -I,  --include-directories=List list of allowed directories
       --trust-server-names        Making Redirections  URL The last paragraph of is used as the local file name
  -X,  --exclude-directories=List list of excluded directories
  -np, --no-parent                 Do not trace back to parent directory

1. Download a single file using wget
The following example is to download a file from the network and save it in the current directory

wget http://cn.wordpress.org/wordpress-3.1-zh_CN.zip

In the process of downloading, a progress bar will be displayed, including (download completion percentage, downloaded bytes, current download speed and remaining download time).

2. Use wget -O to download and save with a different file name
wget will command by default with the character after the last match "/". For dynamic link download, the file name is usually incorrect.
Error: the following example will download a file with the name download php? Id = 1080 save

wget http://www.centos.bz/download?id=1

Even if the downloaded file is in zip format, it is still in download php? Id = 1080 command.
Correct: to solve this problem, we can use the parameter - O to specify a file name:

wget -O wordpress.zip http://www.centos.bz/download.php?id=1080

3. Use wget – limit -rate to download
When you execute wget, it will occupy all possible broadband downloads by default. But when you are ready to download a large file and you need to download other files, it is necessary to speed limit.

wget –limit-rate=300k http://cn.wordpress.org/wordpress-3.1-zh_CN.zip

4. Continue transmission using wget -c breakpoint
Restart with wget -c to download interrupted files:

wget -c http://cn.wordpress.org/wordpress-3.1-zh_CN.zip

It is very helpful for us to download large files suddenly interrupted due to network and other reasons. We can continue to download instead of downloading a file again. You can use the - c parameter when you need to continue the interrupted download.

5. Background download using wget -b
For downloading very large files, we can use the parameter - b for background downloading.

wget -b http://cn.wordpress.org/wordpress-3.1-zh_CN.zip
Continuing in background, pid 1840. 
Output will be written to `wget-log'. 

You can use the following command to view the download progress

tail -f wget-log

6. Camouflage Agent Name Download
Some websites can reject your download request by judging that the proxy name is not a browser. However, you can camouflage through the - user agent parameter.

wget –user-agent="Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.648.204 Safari/534.16″ Download link 

7. Use wget – spider to test the download link
When you plan to download regularly, you should test whether the download link is valid at the scheduled time. We can add the – spider parameter to check.

wget –spider URL 

If the download link is correct, the

wget –spider URL 
Spider mode enabled. Check if remote file exists. 
HTTP request sent, awaiting response... 200 OK 
Length: unspecified [text/html] 
Remote file exists and could contain further links, 
but recursion is disabled — not retrieving. 

This ensures that the download can take place at a predetermined time, but when you give the wrong link, the following error will be displayed

wget –spider url 
Spider mode enabled. Check if remote file exists. 
HTTP request sent, awaiting response... 404 Not Found 
Remote file does not exist — broken link!!! 

You can use the spider parameter in the following cases:
Check before scheduled Download
Check whether the website is available at intervals
Check for dead links on Web pages

8. Use wget – tries to increase the number of retries
If there is a problem with the network or downloading a large file, it may also fail. wget retries 20 connection downloads by default. If necessary, you can use – tries to increase the number of retries.

wget –tries=40 URL 

9. Use wget -i to download multiple files
First, save a download link file

cat > filelist.txt 
url1 
url2 
url3 
url4 

Then use this file and the parameter - i to download

wget -i filelist.txt 

10. Use wget – mirror to mirror websites
The following example is to download the entire website locally.

wget –mirror -p –convert-links -P ./LOCAL URL 
–miror:Account opening image download 
-p:Download all for html The page displays normal files 
–convert-links:After downloading, convert to local links 
-P ./LOCAL: Save all files and directories to the specified local directory 

11. Use wget – reject to filter downloads in the specified format
You want to download a website, but you don't want to download pictures. You can use the following command.

wget –reject=gif url 

12. Use wget -o to save the download information into the log file
You do not want the downloaded information to be displayed directly on the terminal, but in a log file. You can use the following command:

wget -o download.log URL 

13. Use wget -Q to limit the total download file size
When you exit the download when the file you want to download exceeds 5M, you can use the following command:

wget -Q5m -i filelist.txt 

Note: this parameter has no effect on downloading a single file. It is only valid when downloading recursively.

14. Use wget -r -A to download the specified format file
This function can be used in the following situations:

Download all the pictures of a website
Download all videos from a website
Download all PDF files of a website

wget -r -A.pdf url 

15. Download using wget FTP
You can use wget to download ftp links.
Anonymous ftp download using wget

wget ftp-url 

ftp download using wget username and password authentication

wget –ftp-user=USERNAME –ftp-password=PASSWORD url

wget is an open source software developed under Linux by Hrvoje Niksic, which was later transplanted to various platforms including Windows. It has the following functions and features:
(1) Support breakpoint download function; This is also the biggest selling point of Internet ant and FlashGet in those years. Now, Wget can also use this function, and those users whose network is not very good can rest assured;
(2) Support FTP and HTTP download modes at the same time; Although most software can be downloaded by HTTP, sometimes it is still necessary to download software by FTP;
(3) Support proxy server; For systems with high security intensity, generally they will not directly expose their own systems to the Internet. Therefore, the support agent is a necessary function for downloading software;
(4) The setting is convenient and simple; Maybe users who are used to the graphical interface are not used to the command line. However, the command line actually has more advantages in setting. At least, you can click the mouse many times less, and don't worry about whether you click the mouse wrong;
(5) Small program, completely free; Small programs can be considered, because the current hard disk is too big; Completely free has to be considered. Even if there are many free software on the network, the advertising of these software is not what we like;
Although wget is powerful, it is relatively simple to use. The basic syntax is: wget [parameter list] URL. Let's illustrate the usage of wget with specific examples.

1. Download the entire http or ftp site.

wget http://place.your.url/here 

This command will http://place.your.url/here Download the home page. Using -x will force as like as two peas on the server. If you use the -nd parameter, all the contents downloaded on the server will be added to the local current directory.

wget -r http://place.your.url/here 

This command will follow the recursive method to download all directories and files on the server. In essence, it is to download the whole website. This command must be used with caution, because when downloading, all addresses pointed to by the downloaded website will also be downloaded. Therefore, if this website references other websites, the referenced websites will also be downloaded! For this reason, this parameter is not commonly used. You can use the - l number parameter to specify the level of download. For example, if only two layers are downloaded, use - l 2.

If you want to create a mirror site, you can use the - m parameter, for example:

wget -m http://place.your.url/here 

At this time, wget will automatically determine the appropriate parameters to create the mirror site. At this point, wget will log in to the server and read robots Txt and press robots Txt.

2. Breakpoint continuation.
When the file is particularly large or the network is particularly slow, the connection has been cut off before a file has been downloaded. At this time, it is necessary to continue transmission at a breakpoint. The breakpoint continuation of wget is automatic. You only need to use the - c parameter, for example:

wget -c http://the.url.of/incomplete/file 

Using breakpoint renewal requires the server to support breakpoint renewal- The T parameter indicates the number of retries. For example, if you need to retry 100 times, write - t 100. If it is set to - t 0, it indicates infinite retries until the connection is successful- The T parameter indicates the timeout waiting time. For example, - T 120 indicates that it will be timeout if the connection is not connected for 120 seconds.
3. Batch download.
If there are multiple files to download, you can generate a file and write the URL of each file in one line, for example, generate file download Txt, and then use the command:

wget -i download.txt 

This will download Txt inside the list of each URL are downloaded. (if the column is a file, download the file; if the column is a website, download the home page)
4. Selective download.
You can specify to let wget download only one type of file or no file. For example:

wget -m –reject=gif http://target.web.site/subdirectory 

Indicates download http://target.web.site/subdirectory , but ignore the gif file. – accept=LIST acceptable file types, – reject=LIST rejected file types.

5. Password and authentication.
wget can only handle websites restricted by user name / password. It can use two parameters:

–http-user=USER set up HTTP user 
–http-passwd=PASS set up HTTP password 

For websites that need certificates for authentication, you can only use other download tools, such as curl.
6. Download using a proxy server.
If the user's network needs to pass through the proxy server, wget can download files through the proxy server. At this point, you need to create a directory in the current user's directory wgetrc file. The proxy server can be set in the file:

http-proxy = 111.111.111.111:8080 
ftp-proxy = 111.111.111.111:8080 

Represents http proxy server and ftp proxy server respectively. If the proxy server requires a password, use:

–proxy-user=USER Set up proxy user 
–proxy-passwd=PASS Set proxy password 

These two parameters.
Use parameter – proxy=on/off to use or close the proxy.
wget also has many useful functions that users need to mine.
Chinese document names are usually encoded, but they are normal when – cut dirs,

wget -r -np -nH –cut-dirs=3 ftp://host/test/ 
test.txt 
wget -r -np -nH -nd ftp://host/test/ 
%B4%FA%B8%D5.txt 
wget "ftp://host/test/*" 
%B4%FA%B8%D5.txt 

For unknown reasons, wget will automatically encode the part of the captured file name in order to avoid special file names_ String has been processed, so the patch will be encoded_ String is processed into something like "% 3A", using decode_string is restored to ":" and applied to the part of directory and file name, decode_string is a wget built-in function.

wget -t0 -c -nH -x -np -b -m -P /home/sunny/NOD32view/ http://downloads1.kaspersky-labs.com/bases/ -o wget.log

Quoted from:
Author: catch the king before the thief

Keywords: Linux Operation & Maintenance server

Added by Paul1893 on Thu, 06 Jan 2022 17:03:59 +0200