selenium python easy to get started

selenium python easy to get started

Hello everyone, I believe you want to automatically obtain the things you can only see by clicking the web content manually, including saving the knowledge content, collecting data, calculating the price, etc., but it is a little difficult due to the past methods. You know that the emergence of browsers such as chrome, Firefox and edge using the core of webkit, coupled with built-in automatic testing tools, Make these things easier, introduce them grandly (pretend that only I know), selenium!!! This is an automated testing tool. It can not only obtain page elements and download, but also simulate keyboard, mouse, click, drag and drop, etc

1. Installation
Suppose everyone uses chrome
First, we need to determine the version of chrome
emmm opens the browser. I think it's too low-end. The code is:

import subprocess
cmd="(Get-Item (Get-ItemProperty 'HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion\App Paths\chrome.exe').'(Default)').VersionInfo|select FileVersion"
completed = subprocess.run(["powershell", "-Command", cmd], capture_output=True)
print(completed.stdout.decode().strip().splitlines()[2])

This is to run a powershell statement on python to get a chrome statement
stdout is of type bytes, so I need decode()
strip() pulls out some spaces, but it doesn't
splitlines() breaks it up into multiple lines of text
[2] Third line

Well, I thought everyone didn't understand, so I watered

Then, to install chrome driver, selenium needs to specify the driver corresponding to different browsers before it can be used
Go here to download the chrome driver corresponding to the chrome version. If you use the development version (Canary), you need to download the chrome driver of the Canary version. As for Firefox, edge and phantom JS, the usage is the same, so I won't say more

https://chromedriver.chromium.org/

If you learn selenium, you can even go to this website and climb down directly according to the version. Well, that's what I do

Before python runs selenium and opens chromedriver, it is necessary to add parameters. First, start the first instance of selenium

import time
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.chrome.service import Service 
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
import re

class hello: 
    def __init__(self,url,selepath='c:\\1\\selenium\\chromedriver\\98\\chromedriver.exe'):
        self.url=url
        self.seleniumpath=selepath
        ser = Service(self.seleniumpath)
        """
        This is a new version selenium Specified syntax,It used to be Options
 		You can add it yourself here header Equal statement,Examples will be found later to explain
        """
        self.driver=webdriver.Chrome(service = ser)
		"""
		Headless mode --headless,Is running in the background,You won't see anything chrome open,The test won't open
		Contains multiple parameters
		from selenium.webdriver.chrome.options import Options
chrome_options = Options()
#chrome_options.add_argument("--disable-extensions")
#chrome_options.add_argument("--disable-gpu")
#chrome_options.add_argument("--no-sandbox") # linux only
chrome_options.add_argument("--headless")
# chrome_options.headless = True # also works
driver = webdriver.Chrome(service = ser,options=chrome_options)
"""
		

        self.driver.maximize_window()
        #self.driver.
        #window maximizing
        self.driver.get(self.url)
        #Open web page
    def closedriver(self):
    	self.driver.close()
 x=hello('https://www.baidu.com')
 x.closedriver()
    	

Pay attention to your saved chromedriver path
Note that the open website must start with https: / / or http: / /
Finally, close the driver, or you will have a chance to leave the driver in the process. Don't be afraid. It's very quiet. It doesn't move and won't come out to bite your ass
You can open the old version as many times as you don't turn it off. Now there are only one or two big chances

How to get web page elements
If you have learned dom or other crawlers, you know how to capture web page elements, including class name, id name, include name, xpath path, ccs selector, tag, etc. the most commonly used ones are listed. In fact, there are others that will not be discussed in detail here

How do you find the names of these elements
After you open the browser, you will see the key code of the browser by pressing f12 + tools or Ctrl + shift

Just practice with Baidu Encyclopedia
Open devtools and click
This button
Then point to the page element you want to get

It's easy to know the location of this element
Right click > copy

It is necessary to explain:
copy element: the current web page code segment, including nested ones, will be copied at one time
copy outerHtml: almost the same as above. I don't know the difference

Copy jspath
copy styles: css style
copy xpath: xpath path. It will try to use @ classname @id as the path. It is also findbyelement (by.xpath) (demonstration later)
copy full xpath: absolute xpath path

xpath is the fastest and most accurate, but the path is also the longest. If it's troublesome, you can get the classname id separately. The method is to select the element and then see the following properties

Well, after copying the path, you can get the element content. Please see the following code:

x=self.driver.find_element(By.CSS_SELECTOR,"body")
y=self.driver.find_element(By.XPATH,"/html/body/div[3]/div[2]/div/div[1]/dl[1]/dd/a[1]")
#xpath absolute path
z=self.driver.find_element(By.class_namee,"cmn-btn-28.cmn-btn-hover-blue.audio-play.title-audio-play.J-title-audio-play")#classname does not allow spaces. Spaces are used replace


The above is a simple version. It's not very safe. I'll add it later
Then capture the elements and get the contents

z.get_attribute('textContent')
#Get text content
z.get_attribute('innerHTML')
#Get the whole piece of code under the element
z.get_attribute('href')
#Get web address
z.get_attribute('src')
#Loaded file contents, such as pictures, js, files, etc

Then there is the simulated mouse and keyboard action

First of all, for a single operation

z.click()
z.send_key("abcd")
z.send_key(Keys.PAGE_DOWN)

Especially PAGE_DOWN is used to turn the page. Because it simulates the rendering effect, it is necessary to turn the page

Multiple action chains:

source_element = self.driver.find_element(By.link_text,"Courses")
  

target_element = self.driver.find_element(By.link_text,"Hard")
action = ActionChains(self.driver)
#action.move_to_element(By.class_name,"xxx")
#action.sendkeys
#action.click()
action.drag_and_drop(source_element, target_element)
  

action.perform()

It is to put multiple actions into action and then perform() releases them

Due to the problem of amplitude variation and delayed execution, switch the tab. Iframe will talk about it next time

Keywords: Python Selenium chrome

Added by wedge00 on Thu, 17 Feb 2022 04:07:35 +0200