Free crack image verification code (digital or mixed Chinese and English) (with code) (2022) (verification code 2)

preface

Corresponding blog posts: Click here to jump to get more information about this article

There is a Google verification code in front. Now let's crack the relatively simple image verification code. (digital and mixed Chinese and English)

Plug in verification code

Google plugin needs to be loaded AutoVerify

This recognition is a simple method for text + number verification of pictures.

This plug-in can automatically find pictures after clicking the input box and automatically fill in the verification code. It does not need browser level operation (browser level operation such as right-click menu bar), so it is the only choice for ordinary verification code.

Of course, this is too dependent on plug-ins. In fact, you can use some public libraries for verification code identification, which is faster.

For example, the following

https://github.com/sml2h3/ddddocr

https://github.com/madmaze/pytesseract

The recognition accuracy varies from person to person.

Go directly to the code to show how to load the plug-in.

# The plug-in needs to be placed in the same directory as the py file
def input_dependence(): # Load the Google plugin and initialize the environment
    global driver, shadow
    # Start browser kernel
    opt = ChromeOptions()
    opt.headless = False
    path_e = os.getcwd() + r"\AutoVerify.crx"
    opt.add_extension(path_e)
    opt.add_argument("window-size=1920,1080")
    # opt.add_experimental_option('prefs', prefs)  # Turn off the notification prompt in the upper left corner of the browser
    # opt.add_argument("disable-infobars")  # Close the prompt 'chrome is under the control of automatic test software'
    opt.add_argument('--no-sandbox')
    # Set the developer mode to start. In this mode, the webdriver property is normal
    opt.add_experimental_option('excludeSwitches', ['enable-automation'])
    # opt.add_argument({"extensions.ui.developer_mode": True})
    # opt.add_experimental_option('useAutomationExtension', False)
    # opt.set_preference("extensions.firebug.allPagesActivation", "on")
    opt.add_experimental_option('excludeSwitches', ['enable-logging'])
    ser = Service("chromedriver")
    driver = Chrome(service=ser, options=opt)
    driver.set_page_load_timeout(300)

Free API implementation code verification code

List some common free API resources

Foreign TrueCaptcha

100 times a day and 3000 times a month. gmail email registration is required. The monthly free amount is the highest, which is suitable for foreign environment and daily use.

1.truecaptha

Registered address:

https://apitruecaptcha.org/

Find the information of the corresponding user (userid and apikey):

https://apitruecaptcha.org/api

Use pictures in the form of image files (png or jpg)

import requests
import base64
import json

def solve(f): # f is the name of the png file. You need to save the verification code as a local picture before this operation. See the main function of Tencent cloud below for details on how to save it

    with open(f, "rb") as image_file:
        encoded_string = base64.b64encode(image_file.read())
    #print(encoded_string)
    url = 'https://api.apitruecaptcha.org/one/gettext'

    data = { 'userid':'userid Fill in the corresponding value of', 'apikey':'apikey Fill in the corresponding value of',  'data':str(encoded_string)[2:-1]}
    r = requests.post(url = url, json = data)
    j = json.loads(r.text)
    return(j)

Or use the picture in the form of link (base64 code) (it is recommended to use selenium intuitive point in the above, and the following is directly parsed through beautiful soup, which requires a thorough study of the web page source code)

TRUECAPTCHA_USERID = os.environ.get("TRUECAPTCHA_USERID", "userid Fill in the corresponding value of")
TRUECAPTCHA_APIKEY = os.environ.get("TRUECAPTCHA_APIKEY", "apikey Fill in the corresponding value of")

def captcha_solver(captcha_image_url: str, session: requests.session) -> dict:
    """
    TrueCaptcha API doc: https://apitruecaptcha.org/api
    Free to use 100 requests per day.
    """
    response = session.get(captcha_image_url)
    encoded_string = base64.b64encode(response.content)
    url = "https://api.apitruecaptcha.org/one/gettext"

    data = {
        "userid": TRUECAPTCHA_USERID,
        "apikey": TRUECAPTCHA_APIKEY,
        # case sensitivity of text (upper | lower| mixed)
        "case": "lower",
        # use human or AI (human | default)
        "mode": "default",
        "data": str(encoded_string)[2:-1],
    }
    r = requests.post(url=url, json=data)
    j = json.loads(r.text)
    return j


def handle_captcha_solved_result(solved: dict) -> str:
    """Since CAPTCHA sometimes appears as a very simple binary arithmetic expression.
    But since recognition sometimes doesn't show the result of the calculation directly,
    that's what this function is for.
    """
    if "result" in solved:
        solved_text = solved["result"]
        if "RESULT  IS" in solved_text:
            log("[Captcha Solver] You are using the demo apikey.")
            print("There is no guarantee that demo apikey will work in the future!")
            # because using demo apikey
            text = re.findall(r"RESULT  IS . (.*) .", solved_text)[0]
        else:
            # using your own apikey
            log("[Captcha Solver] You are using your own apikey.")
            text = solved_text
        operators = ["X", "x", "+", "-"]
        if any(x in text for x in operators):
            for operator in operators:
                operator_pos = text.find(operator)
                if operator == "x" or operator == "X":
                    operator = "*"
                if operator_pos != -1:
                    left_part = text[:operator_pos]
                    right_part = text[operator_pos + 1 :]
                    if left_part.isdigit() and right_part.isdigit():
                        return eval(
                            "{left} {operator} {right}".format(
                                left=left_part, operator=operator, right=right_part
                            )
                        )
                    else:
                        # Because these symbols("X", "x", "+", "-") do not appear at the same time,
                        # it just contains an arithmetic symbol.
                        return text
        else:
            return text
    else:
        print(solved)
        raise KeyError("Failed to find parsed results.")


def get_captcha_solver_usage() -> dict:
    url = "https://api.apitruecaptcha.org/one/getusage"

    params = {
        "username": TRUECAPTCHA_USERID,
        "apikey": TRUECAPTCHA_APIKEY,
    }
    r = requests.get(url=url, params=params)
    j = json.loads(r.text)
    return j

Domestic Tencent cloud accounts need real name authentication

Free character recognition service, 1000 free times per month for each interface, and 2000 free times per month for cracking interfaces that generally use general printing or general printing (high precision). The monthly free quota is general, with the strongest universality and good specificity. I mainly use this.

You need to prepare a lot of things. Let me talk about the preliminary work one by one.

One Tencent cloud account - > link

Click registration or login in the upper right corner (no money, white whoring service, just have an account)

A pair of access keys corresponding to Tencent cloud account ---- > https://console.cloud.tencent.com/cam/capi

After creating a new key, remember the secret ID and secret key of the key for standby.

Tencent cloud account opens character recognition service accordingly ----- > https://console.cloud.tencent.com/ocr/

Remember to receive the free quota after opening.

Install the sdk of Tencent cloud in the local python environment:

pip install tencentcloud-sdk-python

All right, the preparations are finished. Go straight to the code!

from selenium.webdriver import ChromeOptions, Chrome
from selenium.webdriver.chrome.service import Service
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver import ActionChains
from selenium.webdriver.common.keys import Keys
from tencentcloud.common import credential
from tencentcloud.common.profile.client_profile import ClientProfile
from tencentcloud.common.profile.http_profile import HttpProfile
from tencentcloud.common.exception.tencent_cloud_sdk_exception import TencentCloudSDKException
import base64

def pass_ocr(src):
    from tencentcloud.ocr.v20181119 import ocr_client, models
    try:
        cred = credential.Credential(SecretId, SecretKey)
        httpProfile = HttpProfile()
        httpProfile.endpoint = "ocr.tencentcloudapi.com"

        clientProfile = ClientProfile()
        clientProfile.httpProfile = httpProfile
        client = ocr_client.OcrClient(cred, "na-toronto", clientProfile)

        req = models.GeneralAccurateOCRRequest()
        params = {
            "ImageBase64": src,
            "IsPdf": False
        }
        req.from_json_string(json.dumps(params))

        resp = client.GeneralAccurateOCR(req)
        # print(resp.to_json_string())
        # return resp.to_json_string()
        result = resp.to_json_string()
        # Process validation results
        temp = []
        for i in json.loads(result)["TextDetections"]:
            y = i["DetectedText"].split(" ")
            try:
                for j in y:
                    temp.append(j)
            except:
                temp.append(i["DetectedText"])
        print(temp)
        cct = "" # The verified character is cct
        for i in temp:
            cct = cct + i
        return cct

    except TencentCloudSDKException as err:
        print(err)


def main():
    driver.switch_to.default_content() # Make sure the default is in the global
    WebDriverWait(driver, 20, 0.5).until(
        EC.visibility_of_element_located((By.CSS_SELECTOR, 'css Select the verification code picture position in the selector')))
    element = driver.find_element(By.CSS_SELECTOR, 'css Select the verification code picture position in the selector)')
    try:
        os.remove("origin.png") # Make sure to delete the original picture
    except Exception as e:
        print(e)
    element.screenshot("origin.png") # Intercept the element location of the verification code picture and save it as origin png
    f = open(os.getcwd() + "\\" + "origin.png", 'rb')  # Read saved pictures
    code_data = base64.b64encode(f.read()).decode('utf-8') # Transcoding to base64 format that can be read by API
    f.close()
    result = pass_ocr(code_data) # verification
    time.sleep(random.uniform(1, 3))
    WebDriverWait(driver, 20, 0.5).until(
        EC.visibility_of_element_located((By.CSS_SELECTOR, 'css Fill in the blank box with the verification code selected by the selector')))
    driver.find_element(By.CSS_SELECTOR, 'css Fill in the blank box with the verification code selected by the selector').send_keys(result)
    print("successfully verify captha PNG to str:\n{}".format(result))

Baidu Intelligent Cloud accounts in China need real name authentication.

Generally, digital coding or other APIs corresponding to verification codes are used. The free quota of each API is 1000 times a month, with strong specificity, general usability and general free quota.

Link up: (remember the registration and real name authentication of Baidu Intelligent Cloud account in advance)

https://cloud.baidu.com/product/ocr_general

Remember to receive the free quota after opening.

import urllib.request
import re
import base64
import requests

# The code of others who whore here can replace their own access key. Thank you
# Original warehouse: https://github.com/zqtz/verifycode/blob/master/%E5%9B%BE%E5%83%8F%E9%AA%8C%E8%AF%81%E7%A0%81/baidu_api.py
host = "https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=oa6VVGS7ldI5GG1e3fHrgvB6&client_secret=xdaZFWKnqt2Hsxvnpd2GDo2QNpfGrHLQ&"
response = requests.get(host)
if response:
    access_token = re.findall(r'"access_token":"(.*?)"', response.text)[0]

'''
Universal character recognition (high precision version)
'''
request_url = "https://aip.baidubce.com/rest/2.0/ocr/v1/accurate_basic"
# Open the picture file fetch. In binary mode jpg
f = open('fetch.jpg', 'rb') # How to intercept the verification code image here can refer to what I wrote in Tencent cloud above
img = base64.b64encode(f.read())
params = {"image": img}
access_token = access_token
request_url = request_url + "?access_token=" + access_token
headers = {'content-type': 'application/x-www-form-urlencoded'}
response = requests.post(request_url, data=params, headers=headers)
if response:
    print(response.json()['words_result'][0]['words'])

Build your own code and the success rate will follow your luck.

Project address: (I haven't tried it. If I have nothing to do, I can test it by myself)

https://github.com/smxiazi/NEW_xp_CAPTCHA

an account of happenings after the event being told

Individuals recommend TrueCaptcha for foreign environments, and Tencent cloud or Baidu intelligent cloud.

At present, I only use TrueCaptcha and Tencent cloud to provide coding services in this regard.

If you pay, there are Super Eagles. The poor have no money to buy paid services.

I even use the server mostly for free... There are more free machines in the blog cloud control interface than paying machines... It's really poor.

Keywords: Front-end Selenium Python crawler chrome

Added by Impius on Sun, 06 Feb 2022 10:06:12 +0200