Practice the real knowledge - Scratch integrated ip agent (take Abu cloud as an example)

I. Preface

There is a project that needs to crawl the Securities Association, and the other party has ip shielding. So I need to realize the ip automatic switch in the scratch to complete the crawling task.

Before that, I have used the third-party library, namely, scrapy proxys plus the proxy api interface of sesame ip. Maybe the previous code has not been adjusted well, resulting in the failure to succeed. (there will be a chance to test later).

2, Abu cloud example

Abu cloud officially gave python and scrape's Sample code

Python 3 example

from urllib import request

    # Target page to visit
    targetUrl = ""

    # proxy server 
    proxyHost = ""
    proxyPort = "9020"

    # Proxy tunnel validation information
    proxyUser = "H01234567890123D"
    proxyPass = "0123456789012345"

    proxyMeta = "http://%(user)s:%(pass)s@%(host)s:%(port)s" % {
        "host" : proxyHost,
        "port" : proxyPort,
        "user" : proxyUser,
        "pass" : proxyPass,

    proxy_handler = request.ProxyHandler({
        "http"  : proxyMeta,
        "https" : proxyMeta,

    #auth = request.HTTPBasicAuthHandler()
    #opener = request.build_opener(proxy_handler, auth, request.HTTPHandler)

    opener = request.build_opener(proxy_handler)

    resp = request.urlopen(targetUrl).read()

    print (resp)                        

The above is the native writing method, and the following is the middleware writing method of scrapy

scrapy Middleware

 import base64

    # proxy server 
    proxyServer = ""

    # Proxy tunnel validation information
    proxyUser = "H01234567890123D"
    proxyPass = "0123456789012345"

    # for Python2
    proxyAuth = "Basic " + base64.b64encode(proxyUser + ":" + proxyPass)

    # for Python3
    #proxyAuth = "Basic " + base64.urlsafe_b64encode(bytes((proxyUser + ":" + proxyPass), "ascii")).decode("utf8")

    class ProxyMiddleware(object):
        def process_request(self, request, spider):
            request.meta["proxy"] = proxyServer

            request.headers["Proxy-Authorization"] = proxyAuth 

Here you can write it in Middleware in the project of sketch.

3, Formal integration

Add a new class in of the project:

import base64

""" Abu cloud ip Agent configuration, including account password """
proxyServer = ""
proxyUser = "HWFHQ5YP14Lxxx"
proxyPass = "CB8D0AD56EAxxx"
# for Python3
proxyAuth = "Basic " + base64.urlsafe_b64encode(bytes((proxyUser + ":" + proxyPass), "ascii")).decode("utf8")

class ABProxyMiddleware(object):
    """ Abu cloud ip Agent configuration """
    def process_request(self, request, spider):
        request.meta["proxy"] = proxyServer
        request.headers["Proxy-Authorization"] = proxyAuth

Then open the middleware in


   #'Securities.middlewares.SecuritiesDownloaderMiddleware': None,

    'Securities.middlewares.ABProxyMiddleware': 1,

4, Precautions

By default, Abu cloud dynamic ip requests five times in one second (you can add money and buy many times). So, when it defaults to 5 times, I need to speed limit the crawler, or in, add the following code in the blank:

""" Enable speed limit setting """
AUTOTHROTTLE_START_DELAY = 0.2  # Initial download delay
DOWNLOAD_DELAY = 0.2  # Time between requests

Of course, if you pay more than one time, you don't need to think about the speed limit.

You can complete the integration of Abu cloud dynamic proxy ip in the summary, and climb it to the top of your heart!

