Project: Blog Comments
I registered an account in advance - account: spiderman, password: crawler 334566. Please copy the following blog login address and open it in the browser:
https://wordpress-edu-3autumn.localprod.oc.forchange.cn/wp-login.php
We can use engineer's thinking to analyze how the browser's login request is sent. What you need to do is: first, do the normal operation - fill in the account password (don't click login), then use the engineer's method: right-click to open the "check" tool, click on the "network" and check the "preserve log" (continue to display the request record, to prevent the request record from being refreshed).
Confirm that the "Check" tool is open? [preserve log] Check it out? ok, just click login.
Let's expand request 0 [wp-login.php] and browse [headers]. In the [General] key, we can look at only the first two parameters [Request URL] (request address) and [Request Method] (request mode).
post request
The request here is post, not get.
In fact, both post and get can request with parameters, but the parameters of get request are displayed on the url.
For example, at level 5, the URL we ultimately request will become super long. They are all parameters.
But the parameters of the post request are not displayed directly, but hidden. Private information such as account passwords should use post requests. If you ask for it with get, all the passwords will be displayed on the website, which is obviously unscientific! ___________ As you can understand, get is in plaintext and post is in non-plaintext.
Typically, get requests are applied to retrieve web data, such as requests.get(), which we learned earlier. post requests are used to submit data to Web pages, such as form-type data (such as account passwords that are data from Web pages).
Just as [requests headers] store browser request information, [response headers] store server response information. Here's the cookies we're looking for at this juncture.
You will see that there are parameters for set cookies in response headers. What does set cookies mean? It's the server that writes cookies to the browser.
cookies and their usage
Actually, you're no stranger to cookies. I'm sure you've seen them. For example, when you log on to a website, you will see a checkable option "Remember Me" on the login page. If you check it, you will automatically log on when you open the website again later. That is cookie in action.
When you log in to the blog account Spiderman and check "Remember Me", the server will generate a cookies and Spiderman account binding. Next, it tells your browser about the cookies and lets the browser store them on your local computer. The next time a browser visits a blog with cookies, the server will know that you are a spiderman. You don't need to enter your account password again, you can access it directly.
Of course, cookies are also time-consuming and will expire when they expire. You should have had the experience of checking "Remember Me", but over time, the site will still prompt you to log in again, that is, the previous cookies have expired.
Keep looking at headers to see if there are any other login parameters. When you pull in [form data], you can see five parameters:
log and pwd are obviously our accounts and passwords, wp-submit guesses it is the login button. The link behind redirect_to is the website we will jump to after login. We don't know what the test cookie is.
We found the parameters for login. Now you can try to start writing code and make a login request to the server.
import requests #Introduce requests. url = ' https://wordpress-edu-3autumn.localprod.oc.forchange.cn/wp-login.php' #Assign the login address to the url. headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36' } #Adding request headers, as mentioned earlier, is to simulate the normal access of browsers and avoid being anti-crawled. data = { 'log': 'spiderman', #Write into an account 'pwd': 'crawler334566', #Write password 'wp-submit': 'Sign in', 'redirect_to': 'https://wordpress-edu-3autumn.localprod.oc.forchange.cn/wp-admin/', 'testcookie': '1' } #Encapsulate the login parameters into a dictionary and assign them to data. login_in = requests.post(url,headers=headers,data=data) #Initiate the request with requests.post, put in the parameters: the website requesting login, the request header and the login parameters, and assign them to login_in. print(login_in) #Print login_in
Response [200] is a status code that returns 200, which means that the server receives and responds to the login request. We have successfully logged in.
Write a comment on "pure test" according to "normal person" operation, click and publish.
Network s quickly loads many requests, clicks on [wp-comments-post.php], looks at headers, and finds that the comments I just published are hidden here.
Comment is the comment content, submit is the comment button, the other two parameters we can't understand, but it doesn't matter, we know they are all parameters related to the comment.
If we want to make a blog comment, we have to login first, then extract and invoke the login cookies, and then we need the parameters of the comment to initiate the request for comment.
Now, the login code is written in front of us, and we just found the parameters of the comments, so we just need to extract and invoke the login cookies.
Write a comment code first.
import requests #Introduce requests. url = ' https://wordpress-edu-3autumn.localprod.oc.forchange.cn/wp-login.php' #Assign the url to the address requesting login. headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36' } #Adding request headers, as mentioned earlier, is to simulate the normal access of browsers and avoid being anti-crawled. data = { 'log': 'spiderman', #Write into an account 'pwd': 'crawler334566', #Write password 'wp-submit': 'Sign in', 'redirect_to': 'https://wordpress-edu-3autumn.localprod.oc.forchange.cn/wp-admin/', 'testcookie': '1' } #Encapsulate the login parameters into a dictionary and assign them to data. login_in = requests.post(url,headers=headers,data=data) #Initiate the request with requests.post, put in the parameters: the website requesting login, the request header and the login parameters, and assign them to login_in. cookies = login_in.cookies #Method of extracting cookies: Call the cookies attribute of the requests object (login_in) to get the login cookies and assign them to the variable cookies. url_1 = 'https://wordpress-edu-3autumn.localprod.oc.forchange.cn/all-about-the-future_04/' #We would like to comment on the article website. data_1 = { 'comment': input('Please enter the comments you want to make:'), 'submit': 'Comments', 'comment_post_ID': '13', 'comment_parent': '0' } #Encapsulate the parameters of the comments into a dictionary. comment = requests.post(url_1,headers=headers,data=data_1,cookies=cookies) #Use requests.post to initiate a request for comment, and put in parameters: article address, headers, comment parameters, cookies parameters, and assign them to comment. #The way to call cookies is to pass in cookies=cookies parameters in the post request. print(comment.status_code) #Print out the comment status code. If the status code equals 200, it will prove that our comment is successful.
See the code 19 for extracting cookies: Call the cookies attribute of the requests object to get the registered cookies.
For the method to call cookies, see line 31: Pass in the cookies=cookies parameter in the post request.
One more explanation: login cookies actually contain a lot of names and values, which really help us to comment on cookies, but only take a small section of the value of login cookies. So the cookies you see in the headers panel in wp-comments-post.php are inconsistent when you log in and comment on them.
If we want to continue optimizing this code, we need to understand a new concept, session.
session and its usage
The so-called session, you can understand as we use the browser to access the Internet, to close the browser this process. Session is the information used by the server to record a specific user session during the session.
icon
For example, during the whole process of opening the browser and browsing the shopping webpage, which items you browse and how many items you put in the shopping cart will be saved by the server in the session.
icon
If you don't have session, there may be a funny situation: you buy a lot of additional goods in the shopping cart, and when you plan to settle, you find that the shopping cart is empty (degree;;) because the server doesn't record the goods you want to buy at all.
icon
By the way, session and cookies are closely related - cookies store session coding information, session also stores cookies information.
icon
When the browser visits the shopping page for the first time, the server returns the set cookies field to the browser, and the browser saves the cookies locally.
icon
When the browser visits the shopping page for the second time, it will take cookies to request. Because cookies contain session coding information, the server can immediately identify the user and return the session with the specific coding associated with the user.
That's why every time you log back into the shopping website, the items you put in the shopping cart will not disappear. Because when you log in, the server can find the session that saves your shopping cart information through the cookies carried by the browser.
We've figured out the concept of call, session, and the relationship with cookies, and finally we can start optimizing the code for posting blog comments.
Since cookies are so closely related to sessions, can we handle cookies by creating a session?
I do not know! Look through the official documents of requests to see if there is a way for us to create session s to handle cookies.
The optimized code for commenting is as follows (focusing on annotated code):
import requests #Refer to requests. session = requests.session() #Creating a session object with requests.session() is equivalent to creating a specific session to help us automatically keep cookies. url = 'https://wordpress-edu-3autumn.localprod.oc.forchange.cn/wp-login.php' headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36' } data = { 'log':input('Please enter your account number:'), #Use the input function to fill in the account and password, so that the code is more elegant, rather than directly filling in the account password. 'pwd':input('Please input a password:'), 'wp-submit':'Sign in', 'redirect_to':'https://wordpress-edu-3autumn.localprod.oc.forchange.cn/wp-admin/', 'testcookie':'1' } session.post(url,headers=headers,data=data) #Under the session created, the login request is initiated with post, and the parameters are put in: the website requesting login, the request header and the login parameters. url_1 = 'https://wordpress-edu-3autumn.localprod.oc.forchange.cn/wp-comments-post.php' #Assign url_1 to the address of the article we want to comment on. data_1 = { 'comment': input('Please enter the comments you want to make:'), 'submit': 'Comments', 'comment_post_ID': '13', 'comment_parent': '0' } #Encapsulate the parameters of the comments into a dictionary. comment = session.post(url_1,headers=headers,data=data_1) #In the created session, post is used to initiate the comment request and put in the parameters: the article address, the request header and comment parameters, and assign them to comment. print(comment) #Print comment
In fact, this code is not particularly optimized, we still need to enter the password to login every time to comment.
Can we have a better solution?
Answer: Yes! Cooks can help us save the status of the login, so we store the cookies at the first login, and then read the stored cookies at the next login, so that we don't need to re-enter the account password.
Storing cookies
Let's print out the login cookies first.
import requests session=requests.session() url='https://wordpress-edu-3autumn.localprod.oc.forchange.cn/wp-login.php' headers={ 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36' } data={ 'log':input('Please enter your account number:'), 'pwd':input('Please input a password:'), 'wp-submit':'Sign in', 'redirect_to':'https://wordpress-edu-3autumn.localprod.oc.forchange.cn/wp-admin/', 'testcookie':'1' } session.post(url,headers=headers,data=data) print(type(session.cookies)) print(session.cookies)
Requests CookieJar is a class of cookies objects. The content of cookies itself is a little like a list, and it is a little like the keys and values of a dictionary. We can't understand the specific values, and we don't need to understand them.
How do I store cookies? Can you store cookies as txt files by reading and writing files?
But the txt file stores strings, and the cookies just printed are not strings. Is there a way to convert cookies into strings?
icon
By the way, at the fourth level, we know that the json module can convert dictionaries into strings. Maybe we can turn cookies into dictionaries first, and then through the json module into strings. In this way, cookies can be stored as txt files using the open function.
Looking through the official documents, you can find the way to convert cookies into dictionaries and the way to use the json module.
The code for storing cookies as txt files is as follows
import requests,json #Introduce requests and json modules. session = requests.session() url = ' https://wordpress-edu-3autumn.localprod.oc.forchange.cn/wp-login.php' headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36' } data = { 'log': input('Please enter your account number.:'), 'pwd': input('Please enter your password.:'), 'wp-submit': 'Sign in', 'redirect_to': 'https://wordpress-edu-3autumn.localprod.oc.forchange.cn/wp-admin/', 'testcookie': '1' } session.post(url, headers=headers, data=data) cookies_dict = requests.utils.dict_from_cookiejar(session.cookies) #Turn cookies into dictionaries. print(cookies_dict) #Print cookies_dict cookies_str = json.dumps(cookies_dict) #Call the dumps function of the json module to convert cookies from a dictionary to a string. print(cookies_str) #Print cookies_str f = open('cookies.txt', 'w') #Create a file named cookies.txt to write content in write mode. f.write(cookies_str) #Write cookies that have been converted into strings to a file. f.close() #Close the file.
Tip: The above method of storing cookies is not the easiest one. It was chosen because it is easy to understand.
After running the code, it does prove that cookies can be converted into dictionaries, or that cookies in dictionary format can be converted into strings through the json module.
icon
In this way, we have managed to store cookies, but we still have to read cookies to solve the problem that every comment has to enter the account password first.
Read cookies
icon
When we store cookies, we first convert them into dictionaries and then into strings. Reading cookies, on the contrary, requires converting strings into dictionaries and then dictionaries into cookies'original format.
The code for reading the cookies section is as follows:
cookies_txt = open('cookies.txt', 'r') #Open a file named cookies.txt in reader read mode. cookies_dict = json.loads(cookies_txt.read()) #Call the loads function of the json module to convert the string into a dictionary. cookies = requests.utils.cookiejar_from_dict(cookies_dict) #Turn cookies into dictionaries and cookies into their original format. session.cookies = cookies #Get cookies: Call the cookies attribute of the requests object (session).
Finally, we've got the cookies stored and read.
icon
Finally, we can optimize the code as follows: if the program can read cookies, it will automatically log in and comment; if not, it will re-enter the account password to login and comment.
icon
The code optimized again is as follows:
import requests,json session = requests.session() #Create a session. headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36' } #Add request headers to avoid being anti-crawled. try: #If you can read the cookies file, execute the following code, skip the except code, and comment without login. cookies_txt = open('cookies.txt', 'r') #Open a file named cookies.txt in reader read mode. cookies_dict = json.loads(cookies_txt.read()) #Call the loads function of the json module to convert the string into a dictionary. cookies = requests.utils.cookiejar_from_dict(cookies_dict) #Turn cookies into dictionaries and cookies into their original format. session.cookies = cookies #Get cookies: Call the cookies attribute of the requests object (session). except FileNotFoundError: #If you can't read the cookies file and the program reports "FileNotFoundError" (no file can be found), execute the following code, log in again to get the cookies, and then comment. url = ' https://wordpress-edu-3autumn.localprod.oc.forchange.cn/wp-login.php' #Login site. data = {'log': input('Please enter your account number.:'), 'pwd': input('Please enter your password.:'), 'wp-submit': 'Sign in', 'redirect_to': 'https://wordpress-edu-3autumn.localprod.oc.forchange.cn/wp-admin/', 'testcookie': '1'} #Logon parameters. session.post(url, headers=headers, data=data) #In the session, the login request is initiated with post. cookies_dict = requests.utils.dict_from_cookiejar(session.cookies) #Turn cookies into dictionaries. cookies_str = json.dumps(cookies_dict) #Call the dump function of the json module to convert cookies from a dictionary to a string. f = open('cookies.txt', 'w') #Create a file named cookies.txt to write content in write mode f.write(cookies_str) #Write cookies that have been converted into strings to files f.close() #Close files url_1 = 'https://wordpress-edu-3autumn.localprod.oc.forchange.cn/wp-comments-post.php' #The website of the article. data_1 = { 'comment': input('Please enter what you want to comment on:'), 'submit': 'Comments', 'comment_post_ID': '13', 'comment_parent': '0' } #The parameters of the comment. comment = session.post(url_1,headers=headers,data=data_1) #In the created session, post is used to initiate the comment request and put in the parameters: the article address, the request header and comment parameters, and assign them to comment. print(comment.status_code) #Print comment status code
This solves the problem of typing the account password repeatedly every time, but there is a flaw in this code - it does not solve the problem that cookies will expire.
Whether cookies expire or not can be determined by whether the final status code is equal to 200. But a better solution would be to add a conditional judgment to the code and retrieve the new cookies if the cookies expire.
Therefore, more complete and object-oriented code should be as follows:
import requests, json session = requests.session() headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36'} def cookies_read(): cookies_txt = open('cookies.txt', 'r') cookies_dict = json.loads(cookies_txt.read()) cookies = requests.utils.cookiejar_from_dict(cookies_dict) return (cookies) # The above four lines of code are cookies read. def sign_in(): url = ' https://wordpress-edu-3autumn.localprod.oc.forchange.cn/wp-login.php' data = {'log': input('Please enter your account number.'), 'pwd': input('Please enter your password.'), 'wp-submit': 'Sign in', 'redirect_to': 'https://wordpress-edu-3autumn.localprod.oc.forchange.cn/wp-admin/', 'testcookie': '1'} session.post(url, headers=headers, data=data) cookies_dict = requests.utils.dict_from_cookiejar(session.cookies) cookies_str = json.dumps(cookies_dict) f = open('cookies.txt', 'w') f.write(cookies_str) f.close() # The above five lines of code are cookies storage. def write_message(): url_2 = 'https://wordpress-edu-3autumn.localprod.oc.forchange.cn/wp-comments-post.php' data_2 = { 'comment': input('Please enter your comments:'), 'submit': 'Comments', 'comment_post_ID': '13', 'comment_parent': '0' } return (session.post(url_2, headers=headers, data=data_2)) #The above nine lines of code are for comment. try: session.cookies = cookies_read() except FileNotFoundError: sign_in() session.cookies = cookies_read() num = write_message() if num.status_code == 200: print('Success!') else: sign_in() session.cookies = cookies_read() num = write_message()