Linux wget is a tool for downloading files. It is used on the command line. It is an essential tool for Linux users, especially for network administrators. They often have to download some software or restore and back up from the remote server to the local server. If we use a virtual host to handle such transactions, we can only download it from the remote server to our computer disk, and then upload it to the server with FTP tool. This is a waste of time and energy. It's not impossible. When it comes to Linux VPS, it can be directly downloaded to the server without uploading. wget tool has small size but complete functions. It supports breakpoint download function, FTP and HTTP download methods, proxy server and settings, which are convenient and simple. Let's illustrate how to use wget in the form of an example.
Generally speaking, the wget command comes with the system. If you prompt command not found, we need to install it.
Ubuntu / Debian:
apt-get install wget
CentOS / Fedora:
yum install wget
wget command format:
wget [parameter list] [URL of target software and web page] / / usage: wget [options]... [URL]
The parameters required for the long option are also required when using the short option.
Start:
-V, --version display Wget And exit -h, --help Print this help -b, --background Transfer to background after startup -e, --execute=command Run a“.wgetrc"Style command``` Log and input files: -o, --output-file=file Write log information to FILE -a, --append-output=file Add information to FILE -d, --debug Print a lot of debugging information -q, --quiet Quiet mode (No information output) -v, --verbose Detailed output (This is the default) -nv, --no-verbose Turn off verbose output, but do not enter quiet mode --report-speed=type with <type> Report bandwidth. The type can be bits -i, --input-file=file Download local or external <file> Medium URL -F, --force-html Treat the input file as HTML file -B, --base=URL Analytical relative URL of HTML Input file link (-i -F) --config=file Specify the profile to use --no-cookies Do not read any configuration files --rejected-log=file Will reject URL Reason for writing <file>.
Download:
-t, --tries=number Set the number of retries to <number> (0 Representative unlimited) --retry-connrefused Retry even if the connection is rejected -O, --output-document=file Write document to FILE -nc, --no-clobber Do not download files that already exist and will be overwritten -c, --continue Continue downloading files at breakpoint --start-pos=Offset From zero counted <Offset> Start downloading --progress=type Select progress bar type --show-progress Show progress bar in any verbose state -N, --timestamping Get only files newer than local files --no-if-modified-since Do not time stamp (timestamping) Use in mode if-modified-since get Conditional request --no-use-server-timestamps don't set the local file's timestamp by the one on the server -S, --server-response Print server response --spider Do not download any files -T, --timeout=SECONDS Set all timeouts to SECONDS second --dns-timeout=SECS set up DNS Lookup timeout is SECS second --connect-timeout=SECS Set connection timeout to SECS second --read-timeout=SECS Set the read timeout to SECS second -w, --wait=SECONDS Wait interval is SECONDS second --waitretry=SECONDS Wait 1 during retry to get file..SECONDS second --random-wait Random waiting interval each time when multiple files are obtained (0.5~1.5)*WAIT second --no-proxy Prohibit the use of agents -Q, --quota=number Set get quota to <number> byte --bind-address=ADDRESS Bind to on local host ADDRESS (Host name or IP) --limit-rate=RATE Limit download rate to RATE --no-dns-cache close DNS Query cache --restrict-file-names=The characters in the system qualified file name are <system> Allowed characters --ignore-case Matching file/Ignore case when directory -4, --inet4-only Connect to only IPv4 address -6, --inet6-only Connect to only IPv6 address --prefer-family=Address family Connect to the specified family first( IPv6,IPv4 or none)Address of --user=user take ftp and http All user names are set to <user> --password=password take ftp and http All passwords are set to <password> --ask-password Prompt for password --no-iri close IRI support --local-encoding=ENC use ENC As IRI (International resource identifier) Local encoding of --remote-encoding=ENC use ENC As default remote encoding --unlink Remove files before overwriting
catalog:
-nd, --no-directories Do not create directory -x, --force-directories Force directory creation -nH, --no-host-directories Do not create master (host) catalogue --protocol-directories Use protocol name in directory -P, --directory-prefix=prefix Save file to <prefix>/.. --cut-dirs=number Ignore remote directory <number> A directory layer.
HTTP options:
--http-user=user set up http User name is <user> --http-password=password set up http Password is <password> --no-cache Do not use server cached data. --default-page=NAME Change default page (Usually“ index.html"). -E, --adjust-extension Save with the appropriate extension HTML/CSS file --ignore-length Ignore head'Content-Length'region --header=character string Insert in head <character string> --max-redirect Maximum redirection allowed per page --proxy-user=user use <user> As proxy user name --proxy-password=password use <password> As proxy password --referer=URL stay HTTP Request header contains'Referer: URL' --save-headers take HTTP Save header to file. -U, --user-agent=agent Identify yourself as <agent> instead of Wget/VERSION. --no-http-keep-alive Disable HTTP keep-alive (Persistent connection). --no-cookies Not used cookies. --load-cookies=file From before session starts <file> Load in cookies. --save-cookies=file Save after session cookies to FILE. --keep-session-cookies Load and save session (Non permanent) cookies. --post-data=character string use POST Mode; hold <string>Send as data. --post-file=file use POST Mode; send out <file> Content. --method=HTTP method Use the specified in the request <HTTP method>. --post-data=character string hold <string> As data transmission, it must be set --method --post-file=file send out <file> Content, must be set --method --content-disposition Allow when local file name is selected Content-Disposition head (In the experiment). --content-on-error Output received content in case of server error --auth-no-challenge Send basic messages without waiting for the server to ask HTTP Validation information.
HTTPS (SSL/TLS) options:
--secure-protocol=PR Select a security protocol, which can be auto,SSLv2, SSLv3,TLSv1,PFS One of them. --https-only Follow only the safe HTTPS link --no-check-certificate Do not validate the server's certificate. --certificate=file Client certificate file. --certificate-type=type Client certificate type, PEM or DER. --private-key=file Private key file. --private-key-type=type Private key file type, PEM or DER. --ca-certificate=file With a group CA File of the certificate. --ca-directory=DIR preservation CA Directory of hash lists for certificates. --ca-certificate=file With a group CA File of the certificate. --pinnedpubkey=FILE/HASHES Public key (PEM/DER) file, or any number of base64 encoded sha256 hashes preceded by 'sha256//' and seperated by ';', to verify peer against
HSTS options:
--no-hsts Disable HSTS --hsts-file HSTS Database path (the default value will be overwritten)
FTP options:
--ftp-user=user set up ftp User name is <user>. --ftp-password=password set up ftp Password is <password> --no-remove-listing Do not delete'.listing'file --no-glob be not in FTP Wildcard expansion in file names --no-passive-ftp Disable“ passive"transmission mode --preserve-permissions Preserve permissions on remote files --retr-symlinks Get linked files when recursing directories (Not a directory)
FTPS options:
--ftps-implicit Use implicit FTPS(Default port (990) --ftps-resume-ssl Continue to control the data in the connection when you open the data connection SSL/TLS conversation --ftps-clear-data-connection Encrypt only the control channel; Clear text is used for data transmission --ftps-fallback-to-ftp Fall back to FTP,If the target server does not support FTPS
WARC options:
--warc-file=file name In a .warc.gz Hold request in file/Response data --warc-header=character string Insert in head <character string> --warc-max-size=number take WARC The maximum size of the is set to <number> --warc-cdx write in CDX Index file --warc-dedup=file name Do not list records here CDX Records in documents --no-warc-compression No GZIP compress WARC file --no-warc-digests Don't count SHA1 abstract --no-warc-keep-log Don't be WARC Store log files in records --warc-tempdir=catalogue WARC Temporary file directory for writer
Recursive Download:
-r, --recursive Specify recursive Download -l, --level=number Maximum recursion depth (inf Or 0 means unlimited, that is, all downloads). --delete-after Delete local files after downloading -k, --convert-links Let me download it HTML or CSS Links in point to local files --convert-file-only convert the file part of the URLs only (usually known as the basename) --backups=N write file X Before, rotate and move the most N Backup files -K, --backup-converted Converting files in X Back it up as X.orig. -m, --mirror -N -r -l inf --no-remove-listing Abbreviation of. -p, --page-requisites Download all for display HTML Elements such as pictures of the page. --strict-comments In a strict manner (SGML) handle HTML notes.
Recursive accept / reject:
-A, --accept=list Comma separated list of acceptable extensions -R, --reject=list Comma separated list of extensions to reject --accept-regex=REGEX Match accepted URL Regular expression for --reject-regex=REGEX Match rejected URL Regular expression for --regex-type=type Regular type (posix|pcre) -D, --domains=list Comma separated list of acceptable domain names --exclude-domains=list Comma separated list of domain names to reject --follow-ftp track HTML In the document FTP link --follow-tags=list Comma separated trace HTML Identification list --ignore-tags=list Comma separated ignored HTML Identification list -H, --span-hosts Turn to external host during recursion -L, --relative Track only relative links -I, --include-directories=List list of allowed directories --trust-server-names Making Redirections URL The last paragraph of is used as the local file name -X, --exclude-directories=List list of excluded directories -np, --no-parent Do not trace back to parent directory
1. Download a single file using wget
The following example is to download a file from the network and save it in the current directory
wget http://cn.wordpress.org/wordpress-3.1-zh_CN.zip
In the process of downloading, a progress bar will be displayed, including (download completion percentage, downloaded bytes, current download speed and remaining download time).
2. Use wget -O to download and save with a different file name
wget will command by default with the character after the last match "/". For dynamic link download, the file name is usually incorrect.
Error: the following example will download a file with the name download php? Id = 1080 save
wget http://www.centos.bz/download?id=1
Even if the downloaded file is in zip format, it is still in download php? Id = 1080 command.
Correct: to solve this problem, we can use the parameter - O to specify a file name:
wget -O wordpress.zip http://www.centos.bz/download.php?id=1080
3. Use wget – limit -rate to download
When you execute wget, it will occupy all possible broadband downloads by default. But when you are ready to download a large file and you need to download other files, it is necessary to speed limit.
wget –limit-rate=300k http://cn.wordpress.org/wordpress-3.1-zh_CN.zip
4. Continue transmission using wget -c breakpoint
Restart with wget -c to download interrupted files:
wget -c http://cn.wordpress.org/wordpress-3.1-zh_CN.zip
It is very helpful for us to download large files suddenly interrupted due to network and other reasons. We can continue to download instead of downloading a file again. You can use the - c parameter when you need to continue the interrupted download.
5. Background download using wget -b
For downloading very large files, we can use the parameter - b for background downloading.
wget -b http://cn.wordpress.org/wordpress-3.1-zh_CN.zip Continuing in background, pid 1840. Output will be written to `wget-log'.
You can use the following command to view the download progress
tail -f wget-log
6. Camouflage Agent Name Download
Some websites can reject your download request by judging that the proxy name is not a browser. However, you can camouflage through the - user agent parameter.
wget –user-agent="Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.648.204 Safari/534.16″ Download link
7. Use wget – spider to test the download link
When you plan to download regularly, you should test whether the download link is valid at the scheduled time. We can add the – spider parameter to check.
wget –spider URL
If the download link is correct, the
wget –spider URL Spider mode enabled. Check if remote file exists. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Remote file exists and could contain further links, but recursion is disabled — not retrieving.
This ensures that the download can take place at a predetermined time, but when you give the wrong link, the following error will be displayed
wget –spider url Spider mode enabled. Check if remote file exists. HTTP request sent, awaiting response... 404 Not Found Remote file does not exist — broken link!!!
You can use the spider parameter in the following cases:
Check before scheduled Download
Check whether the website is available at intervals
Check for dead links on Web pages
8. Use wget – tries to increase the number of retries
If there is a problem with the network or downloading a large file, it may also fail. wget retries 20 connection downloads by default. If necessary, you can use – tries to increase the number of retries.
wget –tries=40 URL
9. Use wget -i to download multiple files
First, save a download link file
cat > filelist.txt url1 url2 url3 url4
Then use this file and the parameter - i to download
wget -i filelist.txt
10. Use wget – mirror to mirror websites
The following example is to download the entire website locally.
wget –mirror -p –convert-links -P ./LOCAL URL –miror:Account opening image download -p:Download all for html The page displays normal files –convert-links:After downloading, convert to local links -P ./LOCAL: Save all files and directories to the specified local directory
11. Use wget – reject to filter downloads in the specified format
You want to download a website, but you don't want to download pictures. You can use the following command.
wget –reject=gif url
12. Use wget -o to save the download information into the log file
You do not want the downloaded information to be displayed directly on the terminal, but in a log file. You can use the following command:
wget -o download.log URL
13. Use wget -Q to limit the total download file size
When you exit the download when the file you want to download exceeds 5M, you can use the following command:
wget -Q5m -i filelist.txt
Note: this parameter has no effect on downloading a single file. It is only valid when downloading recursively.
14. Use wget -r -A to download the specified format file
This function can be used in the following situations:
Download all the pictures of a website
Download all videos from a website
Download all PDF files of a website
wget -r -A.pdf url
15. Download using wget FTP
You can use wget to download ftp links.
Anonymous ftp download using wget
wget ftp-url
ftp download using wget username and password authentication
wget –ftp-user=USERNAME –ftp-password=PASSWORD url
wget is an open source software developed under Linux by Hrvoje Niksic, which was later transplanted to various platforms including Windows. It has the following functions and features:
(1) Support breakpoint download function; This is also the biggest selling point of Internet ant and FlashGet in those years. Now, Wget can also use this function, and those users whose network is not very good can rest assured;
(2) Support FTP and HTTP download modes at the same time; Although most software can be downloaded by HTTP, sometimes it is still necessary to download software by FTP;
(3) Support proxy server; For systems with high security intensity, generally they will not directly expose their own systems to the Internet. Therefore, the support agent is a necessary function for downloading software;
(4) The setting is convenient and simple; Maybe users who are used to the graphical interface are not used to the command line. However, the command line actually has more advantages in setting. At least, you can click the mouse many times less, and don't worry about whether you click the mouse wrong;
(5) Small program, completely free; Small programs can be considered, because the current hard disk is too big; Completely free has to be considered. Even if there are many free software on the network, the advertising of these software is not what we like;
Although wget is powerful, it is relatively simple to use. The basic syntax is: wget [parameter list] URL. Let's illustrate the usage of wget with specific examples.
1. Download the entire http or ftp site.
wget http://place.your.url/here
This command will http://place.your.url/here Download the home page. Using -x will force as like as two peas on the server. If you use the -nd parameter, all the contents downloaded on the server will be added to the local current directory.
wget -r http://place.your.url/here
This command will follow the recursive method to download all directories and files on the server. In essence, it is to download the whole website. This command must be used with caution, because when downloading, all addresses pointed to by the downloaded website will also be downloaded. Therefore, if this website references other websites, the referenced websites will also be downloaded! For this reason, this parameter is not commonly used. You can use the - l number parameter to specify the level of download. For example, if only two layers are downloaded, use - l 2.
If you want to create a mirror site, you can use the - m parameter, for example:
wget -m http://place.your.url/here
At this time, wget will automatically determine the appropriate parameters to create the mirror site. At this point, wget will log in to the server and read robots Txt and press robots Txt.
2. Breakpoint continuation.
When the file is particularly large or the network is particularly slow, the connection has been cut off before a file has been downloaded. At this time, it is necessary to continue transmission at a breakpoint. The breakpoint continuation of wget is automatic. You only need to use the - c parameter, for example:
wget -c http://the.url.of/incomplete/file
Using breakpoint renewal requires the server to support breakpoint renewal- The T parameter indicates the number of retries. For example, if you need to retry 100 times, write - t 100. If it is set to - t 0, it indicates infinite retries until the connection is successful- The T parameter indicates the timeout waiting time. For example, - T 120 indicates that it will be timeout if the connection is not connected for 120 seconds.
3. Batch download.
If there are multiple files to download, you can generate a file and write the URL of each file in one line, for example, generate file download Txt, and then use the command:
wget -i download.txt
This will download Txt inside the list of each URL are downloaded. (if the column is a file, download the file; if the column is a website, download the home page)
4. Selective download.
You can specify to let wget download only one type of file or no file. For example:
wget -m –reject=gif http://target.web.site/subdirectory
Indicates download http://target.web.site/subdirectory , but ignore the gif file. – accept=LIST acceptable file types, – reject=LIST rejected file types.
5. Password and authentication.
wget can only handle websites restricted by user name / password. It can use two parameters:
–http-user=USER set up HTTP user –http-passwd=PASS set up HTTP password
For websites that need certificates for authentication, you can only use other download tools, such as curl.
6. Download using a proxy server.
If the user's network needs to pass through the proxy server, wget can download files through the proxy server. At this point, you need to create a directory in the current user's directory wgetrc file. The proxy server can be set in the file:
http-proxy = 111.111.111.111:8080 ftp-proxy = 111.111.111.111:8080
Represents http proxy server and ftp proxy server respectively. If the proxy server requires a password, use:
–proxy-user=USER Set up proxy user –proxy-passwd=PASS Set proxy password
These two parameters.
Use parameter – proxy=on/off to use or close the proxy.
wget also has many useful functions that users need to mine.
Chinese document names are usually encoded, but they are normal when – cut dirs,
wget -r -np -nH –cut-dirs=3 ftp://host/test/ test.txt wget -r -np -nH -nd ftp://host/test/ %B4%FA%B8%D5.txt wget "ftp://host/test/*" %B4%FA%B8%D5.txt
For unknown reasons, wget will automatically encode the part of the captured file name in order to avoid special file names_ String has been processed, so the patch will be encoded_ String is processed into something like "% 3A", using decode_string is restored to ":" and applied to the part of directory and file name, decode_string is a wget built-in function.
wget -t0 -c -nH -x -np -b -m -P /home/sunny/NOD32view/ http://downloads1.kaspersky-labs.com/bases/ -o wget.log
Quoted from:
Author: catch the king before the thief