selenium initialization
Note that if Chrome is not installed in the default path, you need to use the option.binary_location sets the path of chrome. If the chrome driver is not in the same path as the current code, you also need to set the path of the chrome driver.
option = webdriver.ChromeOptions() option.add_argument('--headless') option.add_argument('--disable-gpu') option.binary_location = r'D:\Google\Chrome\Application\chrome.exe' #option.add_argument('blink-settings=imagesEnabled=false') browser = webdriver.Chrome(executable_path="chromedriver",options=option)
option of selenium
option = webdriver.ChromeOptions() option.add_argument('--headless') option.add_argument('--disable-gpu') option.add_argument('blink-settings=imagesEnabled=false') option.binary_location=r'D:\Google\Chrome\Application\chrome.exe'
browser.quit() close browser
browser.close() close the tab of the current operation
selenium add cookie s (bypass login)
Some websites need to be logged in. We can log in in in advance and get the cookie string. When selenium visits, we can add cookies:
-
Log in the website manually and get the cookie string;
-
selenium access url;
-
Delete the current cookie;
-
Add each cookie in turn. Each cookie must be in the form of a dictionary, and must have two key s: name and value;
-
Selenium visits the url again. The reason for the two visits is that it must visit first before selenium can know which website the cookie belongs to;
driver = webdriver.Chrome(options=option) driver.get(url) driver.delete_all_cookies() cookies = "magicid=95c9HOMFQUfwJOaJQaoKtxH3+A6YuS4v2PaMCR5vthTBaLSQv4yIN4/TI76Mhhde; ASP.NET_SessionId=vo4mepcmaboircxm5bguxsh2; _abtest_userid=c6c22371-bbd9-4a3d-9651-7fb3171cddf4; hoteluuid=5oZm2Xm4lDms2g4L; IsPersonalizedLogin=F" for line in cookies.split(';'): key,value = line.split('=',1) cookie = {} cookie['name'] = key.replace(" ","") cookie['value'] = urllib.parse.unquote(value) print(cookie) driver.delete_cookie(cookie['name']) driver.add_cookie(cookie) driver.get(url)
Get element
driver.find_element_by_xpath()
selenium + linux headless crawler
Install chrome
(1) Uninstall chrome
If you have installed chrome but the version is too old to match the version of chrome driver, you can uninstall chrome first
sudo apt-get remove google-chrome-stable
(2) Install dependency package:
sudo apt-get install libxss1 libappindicator1 libindicator7
(3) Download the installation package:
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
(4)sudo dpkg -i google-chrome-stable_current_amd64.deb
(5) This should not be required:
sudo apt-get install google-chrome
Install the chrome driver
Because of the version problem, it is better to manually download a zip and put it on the server and install it:
(1) Download the installation package
cd /usr/software wget -N http://chromedriver.storage.googleapis.com/2.26/chromedriver_linux64.zip
Or manually download the chrome driver corresponding to the previous chrome version in Taobao image
Taobao image: https://npm.taobao.org/mirrors/chromedriver/
Download it and put it under the / usr/software path of the server
(2) Install unzip
sudo apt-get install unzip
(3) Decompress + give executable permission
unzip chromedriver_linux64.zip chmod +x chromedriver
(4) Move
sudo mv -f chromedriver /usr/local/share/chromedriver
Mv-f means forced coverage
(5) Establish a soft connection
sudo ln -s /usr/local/share/chromedriver /usr/local/bin/chromedriver sudo ln -s /usr/local/share/chromedriver /usr/bin/chromedriver
(6) Verify that the chromedriver was installed successfully:
chromedriver --version
Write code
from selenium import webdriver from selenium.webdriver.chrome.options import Options import time, json, os import requests url = 'https://www.toutiao.com/search/?keyword= Celebrate more than ten years' option = webdriver.ChromeOptions() option.add_argument('--headless') browser = webdriver.Chrome(options=option) browser.get(url) cookie_item = {} for cookie_dict in browser.get_cookies(): cookie_item[cookie_dict['name']] = cookie_dict['value'] print(cookie_item) browser.quit()
Cracking the slide verification code
https://www.zhangshengrong.com/p/l51g69YJX0/
https://blog.csdn.net/chushiyan/article/details/101397426