After working all night, it's too unfriendly for me as a novice. I'm going to cry. I finally succeeded. Come on, come on, rush, release it and leave a souvenir for myself
Let's talk about the idea first. When we use selenium automation, it's not very different from when you open the website. So why can you log in when you open the website? Because you carry a cookie, which will save your login information. You can still log in when you click again after you exit the website. The most important thing is, The cookie class will not change for a long time. If it changes, it is mostly because your cookie is obtained incorrectly. According to this, we can easily log in as long as we get the correct cookie and carry it when we send a request to the web page.
Then we start to prepare the code. First, we need to get our cookies. Here we need to install a plug-in, Cookie Editor, which is very easy to use and can get cookies perfectly. However, because it is a third party, we don't know whether it's safe or not. If you don't think it's safe, you can go to the settings to find the cookie management, and then open all the cookie saving web pages, This is a lot of trouble. I won't introduce it here. I mainly talk about this third-party plug-in
First, go to the extended part of our browser, the extended website of edge browser: edge://extensions/
After entering, find the one to obtain the extension, and then search to obtain it. After successful acquisition, there is a prompt here. You can have a general look
The shortcut key I set is CTRL plus Y, because I don't use it very much. I can set my own comfortable shortcut key or open it in the upper right corner of the browser. It's also very good
After we log in to a web page, we open our plug-in and get our cookie s
After getting it, I began to think about how to write our code. Here is what I wrote first
# A third-party library to realize simulated Login is perfect and successful. It is very awesome. selenium can basically solve all login problems from pymongo import MongoClient import time import requests import sys ''' The parameters required for simulated Login are url,and cookies_str,One can be copied, and the other plug-in on the browser can be obtained, and then login easily. Finally, in order to enhance its portability, we added driver parameter cookies_str This is on the browser ctr+y It can be copied and then become our string parameter ''' # Technical point 1: pass parameters to class class Mndl(object): # Add three parameters to this class, remember # The string of url and cookies needs to be followed by a driver, otherwise the portability is poor and the login function is not great def __init__(self, *var): try: self.url = var[0] except IndexError: print("Error!!! You didn't enter url,cookies_str and driver,Please enter them!!!") # Close all python processes, including our selenium automation sys.exit(0) try: self.cookies_str = var[1] except IndexError: print("Error!!! You didn't enter cookies_str and driver,Please enter them!!!") sys.exit(0) try: self.driver = var[2] except IndexError: print("Error!!! You didn't enter driver,Please enter it!!!") sys.exit(0) # The main running functions will log in, but later, we will not close the page, and will continue to let you carry out other operations, just to achieve the effect of login def run(self): # Our own means of reporting mistakes, tell you what's wrong, otherwise you can't find it try: cookies_list = self.help_cookies() except Exception as e: print("Please enter the correct cookies_str!!!None of yours can be converted. Is there a mistake") print("The system reports errors. Let me show you", e) sys.exit(0) try: requests.get(self.url) except Exception as e: print("Error!!! Make sure you enter url Can be accessed!!!") print("The system reports errors. Let me show you", e) sys.exit(0) try: self.driver.get(self.url) except Exception as e: print("Error!!! Your parameter is not an instance of automation!!! Please enter the correct driver!!!") print("System error reporting point:", e) sys.exit(0) # Delete all the original parameters, otherwise it will be detected that this is a crawler, because some parameters are added and cannot be logged in self.driver.delete_all_cookies() ''' For some big factories, they will try to prevent you from submitting all the documents at one time cookie,Make you report an error and then you can't submit all the information cookie Let's add one of these try,Then refresh, you can almost solve this problem ''' # Technical point 5: solve the problem that large factories report errors and don't let you submit cookie s at one time for i in cookies_list: # Some of its URLs are constantly changing, that is, they need to be refreshed after submission, otherwise they will tell you that the domain name does not match, so try should be used to ensure that all our cookie s can be submitted try: self.driver.add_cookie(i) except: # Refreshing is different from the url of get. Refreshing will specify the url, and then the domain name will be different. This prevents automation, so we need to refresh self.driver.refresh() time.sleep(0.5) self.driver.add_cookie(i) time.sleep(3) # page refresh self.driver.refresh() # Technical point 2: this function imports cookies_str becomes a perfect list, which is full of dictionaries and is in python format # He will also write these data into our database to facilitate our observation def help_cookies(self): # Because there are some characters that python cannot parse in our str, we need to turn it into a perfect list cookies_list = eval(self.cookies_str.replace("false", "False").replace("true", "True")) client = MongoClient("localhost", 27017) db = client["python"] # This is a collection of cookies. Go to Mongodb later col = db["cookies"] # Delete all data first, or you can't see the previous data col.delete_many({}) # Why do you do this when writing data? Because it becomes scalable. After writing, you can extract it and log in another way. I don't know if it's useful col.insert_many(cookies_list) for cookie in cookies_list: # Technical point 3: because the value of our sameSite cannot be parsed by python, delete or replace it with something else # Technical point 4: after we submit mongodb, some things will be added and need to be deleted del cookie['sameSite'], cookie["_id"] return cookies_list ''' Some techniques learned eval()The teacher said before, and now I finally remember the powerful usage. Without it, I basically failed Pass parameters to a class and then use them '''
Although the writing is quite straddling and a little proud, I still have to talk about one of my ideas. This can be modified according to your own use environment, because you need to log in to two websites at the same time, and both use URLs and cookies. url is its login interface, and cookies are the login parameters we carry. Then if I can log in to each website only by entering URLs and cookies, it's very easy, And it is true in theory. To achieve this effect, I wrote a third-party auxiliary file. As long as I input the parameters, I can basically log in.
Explain my third-party file. After logging in, we certainly hope to do something instead of doing nothing. In order to enhance its effectiveness, we add a parameter instead of writing code directly in the third-party file, which is included in the basic usage of driver and selenium, Then we create an instance of the browser on our own file, and then name the input. Hey, you can also operate after logging in. It's very comfortable. Then, in order to enhance the scalability of the code, I plan to write it into mongodb. Later, if you need it, others can access this cookie and simulate login, However, it is troublesome to write more code in this way. Without downloading pymongo and learning mongodb, you can modify the code. Specifically, you can remove lines 2, 81-88 of code and the cookie["_id"] in 92 lines of code
After these things are done well, create a new document in the same directory, and then get up the url and cookie, we can start to implement it. Here is a code block of mine, you can have a look
# For the first time, I used the third-party library written by myself to simulate logging in to station b. QQ space also tried and succeeded import time # Import our third-party library from get_help import Mndl from selenium import webdriver cookies_str = ''' [ { "domain": ".bilibili.com", "expirationDate": 1675669981, "hostOnly": false, "httpOnly": false, "name": "_uuid", "path": "/", "sameSite": "unspecified", "secure": false, "session": false, "storeId": "0", "value": "A75779EB-4585-5878-3E43-6D32A3D55C1681507infoc", "id": 1 }, { "domain": ".bilibili.com", "hostOnly": false, "httpOnly": false, "name": "b_lsid", "path": "/", "sameSite": "unspecified", "secure": false, "session": true, "storeId": "0", "value": "81B856C10_17EFE7419A2", "id": 2 }, { "domain": ".bilibili.com", "expirationDate": 1675677272, "hostOnly": false, "httpOnly": false, "name": "b_ut", "path": "/", "sameSite": "unspecified", "secure": false, "session": false, "storeId": "0", "value": "5", "id": 3 }, { "domain": ".bilibili.com", "expirationDate": 1659689909, "hostOnly": false, "httpOnly": false, "name": "bili_jct", "path": "/", "sameSite": "unspecified", "secure": false, "session": false, "storeId": "0", "value": "a3b2fd3a87c83edcd60706d0e065a388", "id": 4 }, { "domain": ".bilibili.com", "expirationDate": 1676361767, "hostOnly": false, "httpOnly": false, "name": "blackside_state", "path": "/", "sameSite": "unspecified", "secure": false, "session": false, "storeId": "0", "value": "1", "id": 5 }, { "domain": ".bilibili.com", "expirationDate": 1647533178, "hostOnly": false, "httpOnly": false, "name": "bp_video_offset_1886803835", "path": "/", "sameSite": "unspecified", "secure": false, "session": false, "storeId": "0", "value": "627486273749365200", "id": 6 }, { "domain": ".bilibili.com", "expirationDate": 1738746918, "hostOnly": false, "httpOnly": false, "name": "buvid_fp", "path": "/", "sameSite": "unspecified", "secure": false, "session": false, "storeId": "0", "value": "1c8c7dbd98c2c133528a0cc10c550eda", "id": 7 }, { "domain": ".bilibili.com", "expirationDate": 1675674879, "hostOnly": false, "httpOnly": false, "name": "buvid_fp_plain", "path": "/", "sameSite": "unspecified", "secure": false, "session": false, "storeId": "0", "value": "undefined", "id": 8 }, { "domain": ".bilibili.com", "expirationDate": 1675669979, "hostOnly": false, "httpOnly": false, "name": "buvid3", "path": "/", "sameSite": "unspecified", "secure": false, "session": false, "storeId": "0", "value": "A95BACD3-53CB-3603-68EE-66D2B733716B80619infoc", "id": 9 }, { "domain": ".bilibili.com", "expirationDate": 1738741987, "hostOnly": false, "httpOnly": false, "name": "buvid4", "path": "/", "sameSite": "unspecified", "secure": false, "session": false, "storeId": "0", "value": "02D5CC9B-3352-A632-5E86-31182806CC8587874-022020615-B1501ZPp4rPBpvbG1w4m6Q%3D%3D", "id": 10 }, { "domain": ".bilibili.com", "expirationDate": 1676368449, "hostOnly": false, "httpOnly": false, "name": "CURRENT_BLACKGAP", "path": "/", "sameSite": "unspecified", "secure": false, "session": false, "storeId": "0", "value": "0", "id": 11 }, { "domain": ".bilibili.com", "expirationDate": 1676476538, "hostOnly": false, "httpOnly": false, "name": "CURRENT_FNVAL", "path": "/", "sameSite": "unspecified", "secure": false, "session": false, "storeId": "0", "value": "4048", "id": 12 }, { "domain": ".bilibili.com", "expirationDate": 1659689909, "hostOnly": false, "httpOnly": false, "name": "DedeUserID", "path": "/", "sameSite": "unspecified", "secure": false, "session": false, "storeId": "0", "value": "1886803835", "id": 13 }, { "domain": ".bilibili.com", "expirationDate": 1659689909, "hostOnly": false, "httpOnly": false, "name": "DedeUserID__ckMd5", "path": "/", "sameSite": "unspecified", "secure": false, "session": false, "storeId": "0", "value": "685099826b0c2de5", "id": 14 }, { "domain": ".bilibili.com", "expirationDate": 1675674878, "hostOnly": false, "httpOnly": false, "name": "fingerprint", "path": "/", "sameSite": "unspecified", "secure": false, "session": false, "storeId": "0", "value": "1c8c7dbd98c2c133528a0cc10c550eda", "id": 15 } ] ''' url = "https://www.bilibili.com/" driver = webdriver.Edge(executable_path="D:/python/study/selenium Automatic learning/msedgedriver") mndl = Mndl(url, cookies_str, driver) mndl.run() time.sleep(10)
cookies_str is the cookie I copied. Isn't it very many, but it's very easy to use. The main thing is to add three quotation marks, so that you won't make mistakes. I modified some values of this you see, and then you can't log in to my b station. The same is true for logging in to QQ space. Put the url and cookie_str can be changed. I wrote a lot of notes in my third-party documents to facilitate understanding and my own memory
Basically, some websites can simulate login, which is very difficult. If you are interested, you can also communicate together. To be honest, I still don't know how the crawler crawls the video. After obtaining the response and downloading, it can't be played, and many videos are intermittent. I don't know what to do. You get library tried, but reported an error, It's estimated that I have to modify some of the code later. Hey, come on, pay tribute to everyone who is unwilling to be ordinary.
Rush, rush!!!
"Forgive me for my unruly life and love freedom, and I will be afraid that one day I will fall ~ ~ ~ and abandon my ideals. Everyone will be rich, and I will be afraid that one day only you will share me ~ ~"