Python 3 Network Crawler Actual Warfare-45, Microblog Palace Verification Code Recognition

In this section, we will introduce the identification of Sina Weibo Palace Verification Code. This Verification Code is a new type of interactive Verification Code. There will be an indicator link between each palace, indicating the sliding trajectory that we should follow. We need to follow the sliding trajectory from the beginning palace to the end palace in order to complete the verification. As shown in Figure 8-24:

Figure 8-24 Verification Code Example

The trajectory of the mouse after sliding will be marked by a line of * x, as shown in Figure 8-25:

Figure 8-25 Sliding Process

We can visit the mobile version of Sina Weibo login page to see the above authentication code, linked to: https://passport.weibo.cn/signin/login Of course, not every time there will be a verification code, usually when frequent logins or accounts have security risks.

Next, we'll try to identify such verification codes. In the process of learning, you can join us to learn and communicate with us. In the middle of 784, 758 and 214, you can share with you the current talent demand of Python enterprise, how to learn Python from zero foundation, and what to learn. Relevant learning videos and development tools are shared

1. This program logo

Our goal in this section is to identify and validate the microblog palace verification code by program.

2. Preparations

This time we use Selenium as the Python library and Chrome as the browser. Before that, please make sure that Selenium library, Chrome browser and Chrome Driver are installed correctly. The related process can be referred to in Chapter 1.

3. Identifying Ideas

To identify, we must start with exploring the rules. First, we find the rules that the four palaces of the verification code must be connected, and each line will have corresponding arrows. The shapes of the lines are various, such as C-type, Z-type, X-type and so on, as shown in Figures 8-26, 8-27, 8-28:

Figure 8-26 C

Figure 8-27 Z

Figure 8-28X

At the same time, we find that the trajectory of the same type is the same, the only difference is the direction of the connection, as shown in Figures 8-29 and 8-30:

Figure 8-29 Reverse Connection

Figure 8-30 Forward Connection

The alignment trajectories of the two verification codes are the same, but the order of the sliding grid is different due to the different arrows on the alignment.

So in order to fully recognize the order of sliding palaces, we need to recognize the direction of arrows. There may be eight kinds of arrows to observe the direction of the whole verification code, and they will appear in different locations. If we want to write an arrow direction recognition algorithm, we need to take into account the location of different arrows. Finding the coordinates of the Arrows'pixels in each position, and calculating the rule of the change of the pixels in the recognition algorithm, the workload becomes larger.

At this time, we can consider the method of template matching. Template matching means that some recognition targets are saved in advance and labeled, called templates, where we can get the validation code pictures and label the drag order as templates. When matching, it compares the target to be recognized and which of each template matches. If a matching template is found, the matched template is the same as the target to be recognized. Thus, the target to be recognized is successfully identified. Template matching is also a very common method in image recognition, which is simple to implement and easy to use.

If template matching method is to work well, we must collect enough templates. For microblog palace verification code, there are four palaces, and the maximum pattern of verification code is 432*1 = 24, so we can collect all the templates directly.

So the next thing we need to consider is which template to match, is it just matching arrows or matching the whole validation code graph? Let's weigh the matching accuracy and workload of the two methods.

  • The first is the problem of accuracy. If we want to match arrows, the target we compare has only a few arrows in the range of pixels, and we need to know exactly the pixels where the arrows are located. Once the pixels are biased, the matching template will be directly misaligned, resulting in a great discount in matching results. If we match the whole picture, we don't need to care about the location of arrows, and there are lines to help matching, so the matching accuracy of the whole picture is obviously higher.
  • Secondly, the problem of workload. If we want to match arrows, we need to save all arrow templates with different orientations, but arrows with the same position may have different orientations and arrows with the same orientation may have different positions. At this time, we need to calculate the positions of arrows and cut them out one by one to save them as templates, and at the same time, when matching. It is also necessary to find out whether there is a matching template in the corresponding location of the verification code. If we match the full graph, we don't need to care about the position and orientation of each arrow, we just need to save the full graph of the verification code, and we don't need to calculate the position of the arrow when matching, so the workload of matching the full graph is obviously smaller.

Therefore, in summary, we choose the way of full graph matching to identify.

So far, we can use the method of template matching to identify the palette verification code. After finding the matching template, we can get the drag order defined in advance for the template, and then simulate the drag.

4. Getting Templates

Before we start, we need to do some preparatory work. First, we need to save 24 full pictures of validation codes. Does it need to be done by hand? Of course not, because the validation code is random, a total of 24 kinds, so we can write a program to save some of the validation code images in batches, and then screen out the required pictures, just as the code is as follows:

import time

from io import BytesIO

from PIL import Image

from selenium import webdriver

from selenium.common.exceptions import TimeoutException

from selenium.webdriver.common.by import By

from selenium.webdriver.support.ui import WebDriverWait

from selenium.webdriver.support import expected_conditions as  EC

USERNAME  =  ''

PASSWORD  =  ''

class  CrackWeiboSlide():

    def __init__(self):

        self.url  =  'https://passport.weibo.cn/signin/login'

        self.browser  =  webdriver.Chrome()

        self.wait  =  WebDriverWait(self.browser,  20)

        self.username  =  USERNAME

        self.password  =  PASSWORD

    def __del__(self):

        self.browser.close()

    def open(self):

        """

        //Open the web page, enter the username password and click

        :return: None

        """

        self.browser.get(self.url)

        username  =  self.wait.until(EC.presence_of_element_located((By.ID,  'loginName')))

        password  =  self.wait.until(EC.presence_of_element_located((By.ID,  'loginPassword')))

        submit  =  self.wait.until(EC.element_to_be_clickable((By.ID,  'loginAction')))

        username.send_keys(self.username)

        password.send_keys(self.password)

        submit.click()

    def get_position(self):

        """

        //Get the location of the authentication code

        :return: Verification code position tuple

        """

        try:

            img  =  self.wait.until(EC.presence_of_element_located((By.CLASS_NAME,  'patt-shadow')))

        except TimeoutException:

            print('No validation code appears')

            self.open()

        time.sleep(2)

        location  =  img.location

        size  =  img.size

        top,  bottom,  left,  right  =  location['y'],  location['y']  +  size['height'],  location['x'],  location['x']  +  size['width']

        return  (top,  bottom,  left,  right)

    def get_screenshot(self):

        """

        //Get screenshots of web pages

        :return: Screen object

        """

        screenshot  =  self.browser.get_screenshot_as_png()

        screenshot  =  Image.open(BytesIO(screenshot))

        return  screenshot

    def get_image(self,  name='captcha.png'):

        """

        //Get Verification Code Pictures

        :return: Picture object

        """

        top,  bottom,  left,  right  =  self.get_position()

        print('Verification Code Location',  top,  bottom,  left,  right)

        screenshot  =  self.get_screenshot()

        captcha  =  screenshot.crop((left,  top,  right,  bottom))

        captcha.save(name)

        return  captcha

    def main(self):

        """

        //Bulk Acquisition Verification Code

        :return: Picture object

        """

        count  =  0

        while  True:

            self.open()

            self.get_image(str(count)  +  '.png')

            count  +=  1

if  __name__  ==  '__main__':

    crack  =  CrackWeiboSlide()

    crack.main()

Among them, USERNAME and PASSSWORD need to be modified to their user name passwords. After running for a period of time, you can find that there are many more validation codes with digital names in the local area, as shown in Figure 8-31:

Figure 8-31 Acquisition Results

Here we just need to select 24 different validation code pictures and name them and save them. Names can be taken directly as the sliding order of the palace. For example, a validation code picture is shown in Figure 8-32.

Figure 8-32 Verification Code Example

We can name it 4132.png, which means that the sliding order is 4-1-3-2. According to this rule, we organize the verification code into the following 24 graphs, as shown in Figure 8-33:

Figure 8-33

The 24 pictures above are our templates. Next, we only need to traverse the template for matching.

5. Template Matching

The above code has realized the function of saving the validation code. By calling get_image() method, we can get the image object of the validation code. After obtaining the object of the validation code, we need to match the template. The following methods are defined for matching:

from os import listdir

def detect_image(self,  image):

    """

    //Matching pictures

    :param image: picture

    :return: Drag sequence

    """

    for  template_name in  listdir(TEMPLATES_FOLDER):

        print('Matching',  template_name)

        template  =  Image.open(TEMPLATES_FOLDER  +  template_name)

        if  self.same_image(image,  template):

            # Return order

            numbers  =  [int(number)  for  number in  list(template_name.split('.')[0])]

            print('Drag sequence',  numbers)

            return  numbers
Python Resource sharing qun 784758214 ,Installation packages are included. PDF,Learning videos, here is Python The gathering place of learners, zero foundation and advanced level are all welcomed.

Here TEMPLATES_FOLDER is the folder where the template is located. Here we use listdir() method to get the file names of all the templates, then traverse them, and compare the validation codes with the template by the same_image() method. If the matching is successful, then the matched template file names are changed to columns. Tables, if matched to 3124.png, return results [3, 1, 2, 4].

The method of comparison is as follows:

def is_pixel_equal(self,  image1,  image2,  x,  y):

    """

    //Determine whether two pixels are identical

    :param image1: Picture 1

    :param image2: Picture 2

    :param x: position x

    :param y: position y

    :return: Are Pixels Same

    """

    # Pixel Points of Two Pictures

    pixel1  =  image1.load()[x,  y]

    pixel2  =  image2.load()[x,  y]

    threshold  =  20

    if  abs(pixel1[0]  -  pixel2[0])  <  threshold and  abs(pixel1[1]  -  pixel2[1])  <  threshold and  abs(

            pixel1[2]  -  pixel2[2])  <  threshold:

        return  True

    else:

        return  False

def same_image(self,  image,  template):

    """

    //Recognition of Similar Verification Codes

    :param image: Verification code to be identified

    :param template: Template

    :return:

    """

    # Similarity threshold

    threshold  =  0.99

    count  =  0

    for  x  in  range(image.width):

        for  y  in  range(image.height):

            # Determine whether the pixels are the same

            if  self.is_pixel_equal(image,  template,  x,  y):

                count  +=  1

    result  =  float(count)  /  (image.width *  image.height)

    if  result  >  threshold:

        print('Successful matching')

        return  True

    return  False

Here, we also use the method of traversing pixels to compare pictures. The same_image() method receives two parameters. Image is the object of verification code to be detected and template is the object of template. Because the size of the two objects is identical, we traverse all the pixels of pictures here and compare the image of the same position between them. Whether the prime points are the same, if they are the same, count and add 1. Finally, calculate the proportion of the same pixels to the total pixels. If the proportion exceeds a certain threshold, it will be determined that the picture is exactly the same and the matching is successful. A threshold of 0.99 is set here, that is, if the similarity ratio of the two is more than 0.99, the matching is successful.

In this way, 24 templates are matched in turn by the above method. If the picture of the verification code is normal, a matching template can always be found, so that the sliding order of the palace can be obtained finally.

6. Simulated drag

After getting the sliding order, we then drag the mouse to connect the palaces according to the sliding order. The method is as follows:

def move(self,  numbers):

    """

    //Drag in sequence

    :param numbers:

    :return:

    """

    # Get four points

    circles  =  self.browser.find_elements_by_css_selector('.patt-wrap .patt-circ')

    dx  =  dy  =  0

    for  index in  range(4):

        circle  =  circles[numbers[index]  -  1]

        # If it's the first cycle

        if  index  ==  0:

            # Click on the first click

            ActionChains(self.browser)

                .move_to_element_with_offset(circle,  circle.size['width']  /  2,  circle.size['height']  /  2)

                .click_and_hold().perform()

        else:

            # Number of Small Movements

            times  =  30

            # drag

            for  i  in  range(times):

                ActionChains(self.browser).move_by_offset(dx  /  times,  dy  /  times).perform()

                time.sleep(1  /  times)

        # If it's the last cycle

        if  index  ==  3:

            # mouseup

            ActionChains(self.browser).release().perform()

        else:

            # Calculate next migration

            dx  =  circles[numbers[index  +  1]  -  1].location['x']  -  circle.location['x']

            dy  =  circles[numbers[index  +  1]  -  1].location['y']  -  circle.location['y']
Python Resource sharing qun 784758214 ,Installation packages are included. PDF,Learning videos, here is Python The gathering place of learners, zero foundation and advanced level are all welcomed.

The parameters received by the method here are the points of the lattice in order, such as [3, 1, 2, 4]. First, we use the find_elements_by_css_selector() method to obtain four Palace elements, which are a list form, each element represents a palace. Next, we traverse the points of the palace in order, and then do a series of corresponding operations.

If it's the first palace, click the mouse directly and keep moving, otherwise move to the next palace. If it's the last palace, let go of the mouse, or calculate the offset to the next palace.

Through four cycles, we can successfully operate the browser to complete the dragging and filling of the checking code of the palace, and recognize the success after loosening the mouse.

The operation effect is shown in Fig. 8-34.

Figure 8-34 Operation Effect

The mouse moves slowly from the starting position to the terminating position, and the identification of the verification code is completed after the last palace is loosened.

So far, the identification of microblog palace verification code has been completed.

After the identification is completed, the authentication code window will automatically close, and then the login button will be clicked directly to complete the microblog login.

7. Concluding remarks

In this section, we introduce a common pattern matching method to identify the verification code, and simulate the mouse drag action to achieve the identification of the verification code. If we encounter similar verification codes, we can use the same idea to identify them.

Keywords: Python Selenium Mobile

Added by defx on Wed, 07 Aug 2019 18:14:43 +0300