1. Introduction to request Library
2. Usage
2.1 send request
The request library sets different methods to send different HTTP requests, such as get, post, etc
2.1.1 sending of Get request
Simple GET request
get request is the simplest request in the request library, and its use method is also very simple.
import request response = request.get('url')
So we get a simple get request.
GET request with parameters
When sending a request, we often need to send request parameters to the server. Usually, the parameters are placed in the URL in the form of key / value pairs, followed by a question mark.
import request response = request.get('www.xxx.com/get?id=1')
By adding parameters to the url, we can send a url with parameters. Of course, if you don't want to be so troublesome, you should attach parameters every time. The request library also provides params parameters for dictionary use.
import request param = {'id':'1','page':'20'} response = request.get('www.xxx.com/get',params=param) print(response.url)
results of enforcement
www.xxx.com/get?id=1&page=20
2.1.2 sending of other requests
The request library can also send the following requests:
import request resp_1 = request.post('url') resp_2 = request.put('url') resp_3 = request.delete('url') resp_4 = request.head('url') resp_5 = request.options('url')
2.1.3 request header setting method
When we execute the crawler program, the page we crawl will generally carry out some anti crawl operations. The most common is the identification and authentication of header header. For this kind of anti crawling, we can modify the request header to ensure the normal execution of the crawler.
import request header = { "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36" } response = request.get('url',headers=header)
In this way, we will modify the content of the header to the value we want.
2.2 response content
2.2.1 common corresponding contents
Crawling content is our ultimate goal, and we can get the content we want through the following request.
import request response = request.get('url') # Get response status code print(response.status_code) # Get response header information print(response.headers) # Get response content print(response.text) #Get response cookie print(response.cookie) #Get response url print(response.url)
About response.text response content
results of enforcement
<!DOCTYPE html> <html lang="zh-cn"> <head> <meta charset="utf-8" />
Through text, we can directly crawl the source code of the url. Here, we need to pay special attention to the coding rules of the source code. When we don't make special settings, the crawler can't automatically recognize the coding format of the source code. Generally, there will be garbled code.
Solution: as for the common utf-8 format, we can set parameters to let the program recognize the coding format of the code.
import request response = request.get('url') response.encoding = 'utf-8' print(response.text)
About obtaining status code and response header information
import request response = request.get('url') print(response.headers)# Get response header information print(response.text)# Get response content
results of enforcement
200 {'Access-Control-Allow-Credentials': 'true', 'Access-Control-Allow-Origin': '*', 'Content-Encoding': 'gzip', 'Content-Type': 'application/json', 'Date': 'Fri, 28 Jun 2019 14:38:09 GMT', 'Referrer-Policy': 'no-referrer-when-downgrade', 'Server': 'nginx', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'DENY', 'X-XSS-Protection': '1; mode=block', 'Content-Length': '258', 'Connection': 'keep-alive'} { "args": {}, "data": "", "files": {}, "form": { "hero": "leesin" }, "headers": { "Accept": "*/*", "Accept-Encoding": "gzip, deflate", "Content-Length": "11", "Content-Type": "application/x-www-form-urlencoded", "Host": "httpbin.org", "User-Agent": "Mozilla/5.0" }, "json": null, "origin": "61.144.173.21, 61.144.173.21", "url": "https://httpbin.org/post" }
2.2.2 binary response content
For non text requests (such as pictures), you can also access the request response body in bytes. Requests will automatically decode gzip and deflate for you and transmit encoded response data.
import requests response = requests.get('http://xxx.com/3.jpg') with open('1.jpg','wb') as f: f.write(response.content) f.close()
Here we save the pictures of the web page
2.2.3 JSON response content
There is a built-in JSON decoder in Requests, which can help you process JSON data:
import requests r = requests.get('https://api.github.com/events') print(r.json()) # Output results [{u'repository': {u'open_issues': 0, u'url': 'https://github.com/...
2.3 parameter transfer
2.3.1 file upload
import requests file = { 'file':open('File name','rb') } response = requests.post("url",file) print(response.text)
Note the location of the document
2.4 cookie
Get cookie
import requests response = requests.get('https://www.baidu.com') print(response.cookies)
Session maintenance
After obtaining the cookie, you can simulate the login operation
import requests session = requests.Session() session.get('http://httpbin.org/cookies/set/number/123456789') response = session.get('http://httpbin.org/cookies') print(response.text)
Here, we use session to save the current cookie and let the server think that it is a request initiated by a browser, so that the cookie can be printed successfully.
If you want to verify the model login, you can use requests.Session() to initiate a request. It can simulate the browser to request the server and maintain the login session