quote method of get request
First of all, let's expand our little knowledge: the evolution of coding set
Because the computer was invented by Americans, only 127 characters were encoded into the computer at first, that is, upper and lower English letters, numbers and some symbols,
This coding table is called ASCII coding. For example, the upper case A is 65 and the lower case z is 122.
However, it is obvious that one byte is not enough to deal with Chinese. At least two bytes are required, and it cannot conflict with ASCII coding,
Therefore, China has formulated GB2312 code to encode Chinese.
You can imagine that there are hundreds of languages all over the world. Japan compiles Japanese into English_ In JIS, South Korea compiles Korean into Euc ‐ kr,
If countries have national standards, there will inevitably be conflicts. As a result, there will be garbled codes in multilingual texts.
Therefore, Unicode came into being. Unicode unifies all languages into one set of codes, so that there will be no more random code problems.
The Unicode standard is also evolving, but the most commonly used is to represent a character in two bytes (four bytes are required if very remote characters are used).
Modern operating systems and most programming languages support Unicode directly.
Let's start with a reptile Axe:
import urllib.request url='https://www.baidu.com/s?wd=' headers={ 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36' } #Customization of request object request=urllib.request.Request(url=url,headers=headers) #Impersonate the browser to send a request to the server response=urllib.request.urlopen(request) #Get the content of the response content=response.read().decode('utf-8') print(content)
We observed that wd = is followed by the search content, but we learned from the expansion that Chinese must be converted into unicode to be recognized. At this time, we use the quote method in the get request.
Full code:
import urllib.request import urllib.parse url='https://www.baidu.com/s?wd=' headers={ 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36' } #Convert Jay Chou into unicode encoding format name=urllib.parse.quote('Jay Chou') url=url+name #Customization of request object request=urllib.request.Request(url=url,headers=headers) #Impersonate the browser to send a request to the server response=urllib.request.urlopen(request) #Get the content of the response content=response.read().decode('utf-8') print(content)
urlencode method of get request
The above quote method can be used normally in the face of several data, but it is unable to deal with a large amount of data. Here we introduce urlencode method, which can extract data from the dictionary and convert it into unicode coding format, and automatically connect it with & symbol, which has advantages in dealing with a large amount of data.
Just read the code:
import urllib.request import urllib.parse url='https://www.baidu.com/s?' base_url='https://www.baidu.com/s?' data={ 'wd':'Jay Chou', 'sex':'male', 'location':'Taiwan Province, China' } new_data=urllib.parse.urlencode(data) url=base_url+new_data headers={ 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36' } #Convert Jay Chou into unicode encoding format name=urllib.parse.quote('Jay Chou') url=url+name #Customization of request object request=urllib.request.Request(url=url,headers=headers) #Impersonate the browser to send a request to the server response=urllib.request.urlopen(request) #Get the content of the response content=response.read().decode('utf-8') print(content)