WeChat official account data grab

1. grab all the files of a official account.

Charles + wechat for computer + pycharm+python

 

2. Analysis

After analysis: every official account list page is connected.

At the beginning of https://mp.weixin.qq.com/mp/profile_ext, only a few references to each official account are taken at the time of grasping.

Grab:

 

3. Code

import requests
import json
import time

def parse(__biz, uin, key, pass_ticket, appmsg_token="", offset="0"):
    """
    //Article information acquisition
    """
    url = 'https://mp.weixin.qq.com/mp/profile_ext'
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36 MicroMessenger/6.5.2.501 NetType/WIFI WindowsWechat QBCore/3.43.901.400 QQBrowser/9.0.2524.400",
    }
    params = {
        "action": "getmsg",
        "__biz": __biz,
        "f": "json",
        "offset": str(offset),
        "count": "10",
        "is_ok": "1",
        "scene": "124",
        "uin": uin,
        "key": key,
        "pass_ticket": pass_ticket,
        "wxtoken": "",
        "appmsg_token": appmsg_token,
        "x5": "0",
    }

    res = requests.get(url, headers=headers, params=params, timeout=3)
    data = json.loads(res.text)
    print(data)
    # Get information list
    msg_list = eval(data.get("general_msg_list")).get("list", [])
    for i in msg_list:
        # Remove text links
        try:
            # Article title
            title = i["app_msg_ext_info"]["title"].replace(',', '')
            # Article summary
            digest = i["app_msg_ext_info"]["digest"].replace(',', '')
            # Article links
            url = i["app_msg_ext_info"]["content_url"].replace("\\", "").replace("http", "https")
            # Article release time
            date = i["comm_msg_info"]["datetime"]
            print(title, digest, url, date)
            with open('article.csv', 'a') as f:
                f.write(title + ',' + digest + ',' + url + ',' + str(date) + '\n')
        except:
            pass
    # Judge whether page 1 can be continued-Can page 0-It's all over
    if 1 == data.get("can_msg_continue", 0):
        time.sleep(3)
        parse(__biz, uin, key, pass_ticket, appmsg_token, data["next_offset"])
    else:
        print("Crawling completed")


if __name__ == '__main__':
    # Request parameters
    __biz = input('biz: ')
    uin = input('uin: ')
    key = input('key: ')
    pass_ticket = input('passtick: ')
    # analytic function
    parse(__biz, uin, key, pass_ticket, appmsg_token="", offset="0")

By grasping the parameters of __biz, UIN, key and pass_ticket of different official account, we can complete the official account.

Note: before running the program, after obtaining the required parameters with the package grabbing tool, you need to close the charles or fiddle package grabbing tool first, and then run the program, otherwise an error will be reported. (this program needs to be crawled after the wechat on the computer is logged in. If the simulator is used, the uni and key parameters on the simulator are different from those on the computer, and the general request cannot succeed!)

Keywords: JSON simulator Pycharm Python

Added by MNSarahG on Thu, 07 May 2020 18:56:45 +0300