Tiktok grab, mitmproxy tiktok capture automatic acquisition

Tiktok grab, mitmproxy tiktok capture automatic acquisition

Record how to crawl app data with python. This paper takes tiktok app as an example. Programming tool: pycharm app capture tool: mitmproxy app automation tool: appium running environment: windows10 idea: suppose we have configured the tools we need. 1. Use mitmproxy to capture mobile app packets to get the content we want. 2. Use appium to automate testing tools, Drive the app to simulate human actions (sliding, clicking, etc.) 3. Combine 1 and 2 to achieve the effect of automatic crawler

Data collection interface SDK is required, please Click to view the interface document

1, mitmproxy/mitmdump packet capture

Make sure that mitmproxy is installed, that the mobile phone and PC are in the same LAN, and that the CA certificate of mitmproxy is configured. There are many related configuration tutorials on the Internet, which I will skip here. Because mitmproxy does not support windows system, mitmdump, one of its components, is used here. It is the command line interface of mitmproxy. It can be used to connect with our Python script and realize the processing after listening with Python. Tiktok mobile phone is configured with mitmproxy. Then, enter mitmdump on the console and turn on the app on the phone. Mitmdump will show all requests on the mobile phone, as shown below. Tiktok app can slide down to see the request displayed by mitmdump.

http://v1-dy.ixigua.com/;http://v3-dy.ixigua.com/;http://v9-dy.ixigua.com/

These 3 tiktok url are our target url. Next, you need to write a python script to download the video. You need to use mitmdump - s scripts Py (here is the python file name) to execute the script.

import requests
# File path
path = 'D:/video/'
num = 1788


def response(flow):
    global num
    # After testing, it is found that the video url prefix is mainly three
    target_urls = ['http://v1-dy.ixigua.com/', 'http://v9-dy.ixigua.com/',
                   'http://v3-dy.ixigua.com/']
    for url in target_urls:
        # Filter out unwanted URLs
        if flow.request.url.startswith(url):
            # Set video name
            filename = path + str(num) + '.mp4'
            # Use request to get the content of the video url
            # stream=True is used to delay downloading the response body until the response is accessed Content attribute
            res = requests.get(flow.request.url, stream=True)
            # Write video to folder
            with open(filename, 'ab') as f:
                f.write(res.content)
                f.flush()
                print(filename + 'Download complete')
            num += 1

Tiktok is relatively rough, but the basic logic is tiktok, so we can download the jitter video. But the method has the disadvantage that it needs people to constantly slide and shake the next video, so we can use a powerful appium automation test tool to solve this problem.  

2, Appium simulates the operation of the mobile phone

Make sure that you have configured the Android and SDK environments that appium relies on, and there are many tutorials on the Internet, which I won't say here. Appium is easy to use. First, open appium. The startup interface is as follows Click the Start Server button to start the appium service Connect the Android phone to the PC through the data cable and turn on the USB debugging function. You can enter the adb command (you can find it on the Internet) to test the connection. If the following results appear, the connection is successful model is the device name, which is required for later configuration. Then click the button indicated by the arrow in the figure below to display a configuration page Configure the Desired Capabilities parameters of the startup app in the JSON presentation configuration in the lower right corner, which are paltformName, deviceName, appPackage and appActivity respectively. Platform name: platform name, usually Android or IOS deviceName: device name, specific type of mobile phone appPackage:App package name appActivity: entry Activity name, usually expressed in start Platform name and deviceName are easy to obtain, while appPackage and appActivity can be obtained through the following methods. On the console, enter {ADB logcat > D: \ log Log command, mobile phone and app in tiktok, then open log. in D disk. Log file to find the Displayed keyword From the above figure, you can know the com.com behind Displayed ss. android. ugc. Aweme corresponds to appPackage main.MainActivity corresponds to appActivity. Finally, our configuration results are as follows:

{
  "platformName": "Android",
  "deviceName": "Mi_Note_3",
  "appPackage": "com.ss.android.ugc.aweme",
  "appActivity": ".main.MainActivity"
}

Mobile phone mobile phone mobile phone tiktok app and click on Start Session to start app and enter the startup page, and PC will pop up a debug window, from this window can preview the current mobile phone page, you can also simulate various operations on the phone, in this article is not the key, so skip.   In the following, we will use python script to drive the app and run it directly in python

from appium import webdriver
from time import sleep


class Action():
    def __init__(self):
        # Initialize the configuration and set the Desired Capabilities parameter
        self.desired_caps = {
            "platformName": "Android",
            "deviceName": "Mi_Note_3",
            "appPackage": "com.ss.android.ugc.aweme",
            "appActivity": ".main.MainActivity"
        }
        # Specify Appium Server
        self.server = 'http://localhost:4723/wd/hub'
        # Create a new Session
        self.driver = webdriver.Remote(self.server, self.desired_caps)
        # Set the initial sliding coordinates and sliding distance
        self.start_x = 500
        self.start_y = 1500
        self.distance = 1300

    def comments(self):
        sleep(2)
        # After the app is opened, click the screen once to ensure the display of the page
        self.driver.tap([(500, 1200)], 500)

    def scroll(self):
        # Infinite sliding
        while True:
            # Simulated sliding
            self.driver.swipe(self.start_x, self.start_y, self.start_x, 
                              self.start_y-self.distance)
            # Set delay wait
            sleep(2)

    def main(self):
        self.comments()
        self.scroll()


if __name__ == '__main__':

    action = Action()
    action.main()

The following is the process of the reptile. ps: occasionally crawl to duplicate videos

Disclaimer: this content is only for learning and communication. If it infringes the rights and interests of your company, contact the author to delete it

Added by Death_Octimus on Sun, 26 Dec 2021 01:14:19 +0200