[reptile] Selenium actual combat notes

selenium initialization

Note that if Chrome is not installed in the default path, you need to use the option.binary_location sets the path of chrome. If the chrome driver is not in the same path as the current code, you also need to set the path of the chrome driver.

option = webdriver.ChromeOptions()
            option.add_argument('--headless')
            option.add_argument('--disable-gpu')
            option.binary_location = r'D:\Google\Chrome\Application\chrome.exe'
            #option.add_argument('blink-settings=imagesEnabled=false')
            browser = webdriver.Chrome(executable_path="chromedriver",options=option)

option of selenium

option = webdriver.ChromeOptions()
option.add_argument('--headless')
option.add_argument('--disable-gpu')
option.add_argument('blink-settings=imagesEnabled=false')
option.binary_location=r'D:\Google\Chrome\Application\chrome.exe'

browser.quit() close browser
browser.close() close the tab of the current operation

selenium add cookie s (bypass login)

Some websites need to be logged in. We can log in in in advance and get the cookie string. When selenium visits, we can add cookies:

  • Log in the website manually and get the cookie string;

  • selenium access url;

  • Delete the current cookie;

  • Add each cookie in turn. Each cookie must be in the form of a dictionary, and must have two key s: name and value;

  • Selenium visits the url again. The reason for the two visits is that it must visit first before selenium can know which website the cookie belongs to;

driver = webdriver.Chrome(options=option)
driver.get(url)

driver.delete_all_cookies()

cookies = "magicid=95c9HOMFQUfwJOaJQaoKtxH3+A6YuS4v2PaMCR5vthTBaLSQv4yIN4/TI76Mhhde; ASP.NET_SessionId=vo4mepcmaboircxm5bguxsh2; _abtest_userid=c6c22371-bbd9-4a3d-9651-7fb3171cddf4; hoteluuid=5oZm2Xm4lDms2g4L; IsPersonalizedLogin=F"

for line in cookies.split(';'):

    key,value = line.split('=',1)
    cookie = {}

    cookie['name'] = key.replace(" ","")
    cookie['value'] = urllib.parse.unquote(value)
    print(cookie)
    driver.delete_cookie(cookie['name'])
    driver.add_cookie(cookie)

driver.get(url)

Get element

driver.find_element_by_xpath()

selenium + linux headless crawler

Install chrome

(1) Uninstall chrome

If you have installed chrome but the version is too old to match the version of chrome driver, you can uninstall chrome first

sudo apt-get remove google-chrome-stable

(2) Install dependency package:

  sudo apt-get install libxss1 libappindicator1 libindicator7

(3) Download the installation package:

wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb

(4)sudo dpkg -i google-chrome-stable_current_amd64.deb

(5) This should not be required:

sudo apt-get install google-chrome

Install the chrome driver

Because of the version problem, it is better to manually download a zip and put it on the server and install it:

(1) Download the installation package

cd /usr/software

wget -N http://chromedriver.storage.googleapis.com/2.26/chromedriver_linux64.zip

Or manually download the chrome driver corresponding to the previous chrome version in Taobao image

Taobao image: https://npm.taobao.org/mirrors/chromedriver/

Download it and put it under the / usr/software path of the server

(2) Install unzip

sudo apt-get install unzip

(3) Decompress + give executable permission

unzip chromedriver_linux64.zip
chmod +x chromedriver

(4) Move

   sudo mv -f chromedriver /usr/local/share/chromedriver

Mv-f means forced coverage

(5) Establish a soft connection

sudo ln -s /usr/local/share/chromedriver /usr/local/bin/chromedriver
sudo ln -s /usr/local/share/chromedriver /usr/bin/chromedriver

(6) Verify that the chromedriver was installed successfully:

chromedriver --version

Write code

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time, json, os
import requests

url = 'https://www.toutiao.com/search/?keyword= Celebrate more than ten years'
option = webdriver.ChromeOptions()
option.add_argument('--headless')
browser = webdriver.Chrome(options=option)
browser.get(url)
cookie_item = {}
for cookie_dict in browser.get_cookies():
    cookie_item[cookie_dict['name']] = cookie_dict['value']
print(cookie_item)
browser.quit()

Cracking the slide verification code

https://www.zhangshengrong.com/p/l51g69YJX0/
https://blog.csdn.net/chushiyan/article/details/101397426

Keywords: Selenium sudo Google Linux

Added by rachae1 on Tue, 09 Jun 2020 08:30:54 +0300