Reference blog: python crawler learning notes_ fdk less owner's blog - CSDN blog
'requests' Library
Installation and documentation address:
Install using pip: pip install requests
Chinese documents: Requests: make HTTP service human - Requests 2.18.1 document
Send GET request:
1. The simplest way to send a get request is to call through requests.get:
response = requests.get('http://www.baidu.com')
2. Add headers and query parameters:
If you want to add headers, you can pass in the headers parameter to add header information in the request header. If you want to pass a parameter into a url, you can use the params parameter. The example code is as follows:
import requests headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'} data = {'wd':'China'} url = 'https://www.baidu.com/s' response = requests.get(url,params=data,headers=headers) # View the response content in Unicode format print(response.text) # View the response content, byte stream format, and use decode to encode print(response.content) # View the full url address print(response.url) # View response header character encoding print(response.encoding) # View the status code of the response print(response.status_code)
Send POST request:
1. The most basic post request can use the post method:
response = requests.get('https://www.baidu.com/s',data=data)
2. Incoming data:
At this time, don't use urlencode for coding. Just pass it in to a dictionary. If the returned data is json Type, you can extract data according to the operation of the dictionary. The example code is as follows:
import requests headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36', 'Referer': 'https://www.lagou.com/jobs/list_python%E7%88%AC%E8%99%AB?city=%E5%85%A8%E5%9B%BD&cl=false&fromSearch=true&labelWords=&suginput=' } data = {'first': 'true', 'pn': '1', 'kd': 'python Reptile'} url = 'https://www.lagou.com/jobs/positionAjax.json?needAddtionalResult=false' response = requests.post(url, data=data, headers=headers) json_str = response.json() result = json_str['content']['positionResult']['result'] for i in result: # Output company name print(i['companyShortName']) # Output city name print(i['city']) print('*' * 20)
Use agent:
Using requests to add a proxy is very simple. Just pass the proxies parameter in the requested method (such as get or post). The example code is as follows:
import requests url = 'http://httpbin.org/ip' headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36' } proxy = { 'http': '118.190.95.35:9001' } resp = requests.get(url,headers=headers,proxies=proxy) print(resp.text)
cookie:
If a response contains a cookie, you can use the cookie attribute to get the returned cookie value:
import requests resp = requests.get('http://www.baidu.com') print(resp.cookies) # Get cookie details print(resp.cookies.get_dict())
session:
Previously, using the urllib library, you can use opener to send multiple requests, and cookies can be shared among multiple requests. If you want to use requests and share cookies, you can use the session object provided by the requests library. Note that this session is not a session in web development. It is just a session object. Let's take logging in to Renren as an example. The example code is as follows:
import requests url = 'http://renren.com/PLogin.do' data = {'email':'Renren email account','password':'password'} headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36' } # Sign in session = requests.Session() session.post(url,headers=headers,data=data) # Visit Dapeng personal Center resp = session.get('http://www.renren.com/880151247/profile') print(resp.text)
Handling untrusted SSL certificates:
For websites that have trusted SSL certificates, such as http://www.baidu.com/ , then you can directly return the normal response using requests. If the SSL certificate is not trusted, you need to add a parameter verify=False when requesting the website. The example code is as follows:
resp = requests.get('http://www.12306.cn/mormhweb/',verify=False) print(resp.content.decode('utf-8'))