[Python crawler advanced learning] - JS reverse hundred examples - complex login process, latest WB reverse

 

statement

All contents in this article are only for learning and communication. The packet capturing content, sensitive website and data interface have been desensitized. It is strictly prohibited to use them for commercial and illegal purposes, otherwise all the consequences have nothing to do with the author. If there is infringement, please contact me and delete them immediately!

Reverse target

The reverse goal of this time is WB login. Although there are not too many encryption parameters for login, the login process is a little more complex. It has experienced many transfers, and it takes about nine times to successfully log in.

There is only one encryption parameter encountered during login, i.e. password encryption. The encrypted password will be used when obtaining the token. Obtaining the token is a POST request, and the sp value in the Form Data is the encrypted password, similar to e23c5d62dbf9f8364005f331e487873c70d7ab0e8dd2057c3e66d1ae5d2837ef1dcf86

Login process

First, let's clarify the login process. The special parameters in each step are described. The parameters not mentioned represent fixed values and can be copied directly.

The general process is as follows:

  1. Pre login

  2. Get encrypted password

  3. Get token

  4. Get the encrypted account

  5. Send verification code

  6. Verification code

  7. Access redirect url

  8. Access crossdomain2 url

  9. Login via passport url

1. Pre login

 

Pre login is a GET request. Query String Parameters mainly contain two important parameters: su: the user name is encoded by base64,: 13 bit timestamp. The returned data contains a JSON, which can be extracted by regular. The JSON contains seven parameter values: retcode, servertime, pcid, nonce, pubkey, rsakv and execime , most of which are used in subsequent requests, and some of which are used to encrypt passwords. Examples of the number of returned data:

xxxxSSOController.preloginCallBack({
    "retcode": 0,
    "servertime": 1627461942,
    "pcid": "gz-1cd535198c0efe850b96944c7945e8fd514b",
    "nonce": "GWBOCL",
    "pubkey": "EB2A38568661887FA180BDDB5CABD5F21C7BFD59C090CB2D245......",
    "rsakv": 1330428213,
    "exectime": 16
})

2. Obtain the encrypted password

RSA encryption is used for password encryption. The encrypted password can be obtained through Python or JS. The reverse of JS encryption will be analyzed separately later.

3. Get token

This token value will be used in the following steps: obtaining encrypted mobile phone number, sending verification code and verifying verification code. The token value obtained is POST request, and the value of Query String Parameters is fixed: client: ssologin JS (v1.4.19), the value of Form Data is relatively large, but except for the encrypted password, other parameters can be found in the data returned from pre login in step 1. The main parameters are as follows:

  • su: the user name is encrypted by base64
  • servertime: obtained from the JSON returned by pre login in step 1
  • nonce: get it from the JSON returned by pre login in step 1
  • rsakv: get it from the JSON returned by pre login in step 1
  • sp: encrypted password
  • prelt: random value

The returned data is the HTML source code, from which you can extract the token value, similar to 2ngfharzfaip_ QwX70Npj8gw4lgj7RbCnByb3RlY3Rpb24., If the returned token is not like this, it indicates that the account or password is wrong.

4. Obtain the encrypted account

 

The "su" we encountered earlier is the user name encrypted by base64. Here, it further encrypts the user name. The encrypted user name will be used when sending verification code and verification verification code. The parameters of GET request and Query String Parameters are also relatively simple. Token is the token value obtained in step 3, callback_url is the home page of the website, and the returned data is the HTML source code. You can use xpath syntax: / / input[@name='encrypt_mobile']/@value to extract the encrypted account. Its value is similar to: f2de0b5e333a. It should be noted here that even for the same account, the results of each encryption are different.

5. Send verification code

 

Sending verification code is a POST request, and its parameters are relatively simple. The token in Query String Parameters is the token obtained in step 3, and the token in Form Data is "Encrypt"_ Mobile} is the encrypted account obtained in step 4. The returned data is the status of verification code sending, for example: {'retcode': 20000000, 'msg': 'succ', 'data': []}.

6. Verification code

 

The verification code is a POST request, and its parameters are also very simple. The token in Query String Parameters is the token obtained in step 3, and the "Encrypt" in Form Data_ Mobile , is the encrypted account obtained in step 4, code , is the verification code received in step 5, the returned data is a JSON, retcode , and , msg , represent the verification status, and redirect url , is the page to be accessed after the verification step. It will be used in the next step. Examples of returned data:

{
  "retcode": 20000000,
  "msg": "succ",
  "data": {
    "redirect_url": "https://login.xxxx.com.cn/sso/login.php?entry=xxxxx&returntype=META&crossdomain=1&cdult=3&alt=ALT-NTcxNjMyMTA2OA==-1630292617-yf-78B1DDE6833847576B0DC4B77A6C77C4-1&savestate=30&url=https://xxxxx.com"
  }
}

7. Visit redirect url

 

The request interface in this step is actually the redirect url and GET request returned in step 6, similar to: https://login.xxxx.com.cn/sso/login.php?entry=xxxxx&returntype=META......

The returned data is the HTML source code. We need to extract the URL of crossdomain2 from it. The extraction result is similar to: https://login.xxxx.com.cn/crossdomain2.php?action=login&entry=xxxxx...... Similarly, this URL is also the next page to visit.

8. Visit crossdomain2 url

The request interface in this step is the crossdomain2 url and GET request extracted in step 7, similar to: https://login.xxxx.com.cn/crossdomain2.php?action=login&entry=xxxxx......

The returned data is also the HTML source code. We need to extract the real login URL from it. The extraction result is similar to: https://passport.xxxxx.com/wbsso/login?ssosavestate=1661828618&url=https...... In the last step, you only need to access the real login URL to realize the login operation.

9. Log in through the passport url

 

This is the last step and the real login operation. The GET request interface is the passport url extracted in step 8, similar to: https://passport.xxxxx.com/wbsso/login?ssosavestate=1661828618&url=https......

The returned data includes login results, user ID and user name, similar to:

({"result":true,"userinfo":{"uniqueid":"5712321368","displayname":"tomb"}});

Since then, the complete login process of WB has been completed, and you can directly take the cookies after successful login for other operations.

Encryption password reverse

In the login process, step 2 is to obtain the encrypted password. In the token obtained in step 3, the requested Query String Parameters contain an encryption parameter sp, which is the encrypted password. Next, we conduct reverse analysis on the encryption of the password.

Directly search the keyword "sp" globally and find that there are many values. Here, we use the skills mentioned earlier to try to search "sp =, sp: or" var sp "to narrow the scope. In this case, we try to search" sp =, which can be seen in index JS has only one value. Bury the breakpoint for debugging. You can see that "sp" is actually the value of "b":

PS: when searching, it should be noted that you cannot search on the page after successful login. At this time, the resources have been refreshed and reloaded, and the encrypted JS file has disappeared. You need to enter the wrong account and password in the login interface to capture packets, search and break points.

 

Continue to trace the value of # b # upward. The key code has an if else statement, which embeds breakpoints respectively. After debugging, you can see that the value of # b # is generated under if:

 

Analyze two lines of key code:

f.setPublic(me.rsaPubkey, "10001");
b = f.encrypt([me.servertime, me.nonce].join("\t") + "\n" + b)

me.rsaPubkey,me.servertime,me.nonce # are the data returned from the pre login in step 1.

Move the mouse over , f.setPublic , and , f.encrypt, and you can see , br , and , bt , functions respectively:

 

 

Follow up the two functions respectively, and you can see that they are both under an anonymous function:

 

Directly copy the entire anonymous function, remove the outermost anonymous function, and conduct local debugging. During debugging, you will be prompted that # navigator # is not defined. Check the copied source code, which uses # navigator AppName # and # navigator AppVersion can be defined directly or left blank.

navigator = {
    appName: "Netscape",
    appVersion: "5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}

Continue debugging and you will find that in var C = this doPublic(b); The prompt object does not support this property or method. Search , dopublic , and find a sentence , BQ prototype. doPublic = bs;, Here, it is directly changed to # dopublic = BS; Just.

Analyzing the whole RSA encryption logic can also be implemented in Python. Code example (pubkey needs to be completed):

import rsa
import binascii


pre_parameter = {
        "retcode": 0,
        "servertime": 1627461942,
        "pcid": "gz-1cd535198c0efe850b96944c7945e8fd514b",
        "nonce": "GWBOCL",
        "pubkey": "EB2A38568661887FA180BDDB5CABD5F21C7BFD59C090CB2D245......",
        "rsakv": 1330428213,
        "exectime": 16
}

password = '12345678'

public_key = rsa.PublicKey(int(pre_parameter['pubkey'], 16), int('10001', 16))
text = '%s\t%s\n%s' % (pre_parameter['servertime'], pre_parameter['nonce'], password)
encrypted_str = rsa.encrypt(text.encode(), public_key)
encrypted_password = binascii.b2a_hex(encrypted_str).decode()

print(encrypted_password)

Complete code

The following only demonstrates part of the key code and cannot be run directly! Full code warehouse address: https://github.com/kgepachong/crawler/

Key JS encryption code architecture

navigator = {
    appName: "Netscape",
    appVersion: "5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}

function bt(a) {}

function bs(a) {}

function br(a, b) {}

// N functions are omitted here

bl.prototype.nextBytes = bk;
doPublic = bs;
bq.prototype.setPublic = br;
bq.prototype.encrypt = bt;
this.RSAKey = bq


function getEncryptedPassword(me, b) {
    br(me.pubkey, "10001");
    b = bt([me.servertime, me.nonce].join("\t") + "\n" + b);
    return b
}

// Test sample
// var me = {
//     "retcode": 0,
//     "servertime": 1627283238,
//     "pcid": "gz-a9243276722ed6d4671f21310e2665c92ba4",
//     "nonce": "N0Y3SZ",
//     "pubkey": "EB2A38568661887FA180BDDB5CABD5F21C7BFD59C090CB2D245A87AC253062882729293E5506350508E7F9AA3BB77F4333231490F915F6D63C55FE2F08A49B353F444AD3993CACC02DB784ABBB8E42A9B1BBFFFB38BE18D78E87A0E41B9B8F73A928EE0CCEE1F6739884B9777E4FE9E88A1BBE495927AC4A799B3181D6442443",
//     "rsakv": "1330428213",
//     "exectime": 13
// }
// Var B = '12312' / / password
// console.log(getEncryptedPassword(me, b))

Python login key code

#!/usr/bin/env python3
# -*- coding: utf-8 -*-


import re
import json
import time
import base64
import binascii

import rsa
import execjs
import requests
from lxml import etree


# Flag to judge whether some requests are successful
response_success_str = 'succ'

pre_login_url = 'Desensitization treatment, complete code attention GitHub: https://github.com/kgepachong/crawler'
get_token_url = 'Desensitization, complete code attention GitHub: https://github.com/kgepachong/crawler'
protection_url = 'Desensitization, complete code attention GitHub: https://github.com/kgepachong/crawler'
send_code_url = 'Desensitization treatment, complete code attention GitHub: https://github.com/kgepachong/crawler'
confirm_url = 'Desensitization, complete code attention GitHub: https://github.com/kgepachong/crawler'

headers = {
    'Host': 'Desensitization, complete code attention GitHub: https://github.com/kgepachong/crawler',
    'Referer': 'Desensitization treatment, complete code attention GitHub: https://github.com/kgepachong/crawler',
    'sec-ch-ua': '" Not;A Brand";v="99", "Google Chrome";v="91", "Chromium";v="91"',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}
session = requests.session()


def get_pre_parameter(username: str) -> dict:
    su = base64.b64encode(username.encode())
    time_now = str(int(time.time() * 1000))
    params = {
        'entry': 'Desensitization, complete code attention GitHub: https://github.com/kgepachong/crawler',
        'callback': 'Desensitization treatment, complete code attention GitHub: https://github.com/kgepachong/crawler',
        'su': su,
        'rsakt': 'mod',
        'checkpin': 1,
        'client': 'ssologin.js(v1.4.19)',
        '_': time_now,
    }
    response = session.get(url=pre_login_url, params=params, headers=headers).text
    parameter_dict = json.loads(re.findall(r'\((.*)\)', response)[0])
    # print('1.[pre parameter]: %s' % parameter_dict)
    return parameter_dict


def get_encrypted_password(pre_parameter: dict, password: str) -> str:
    # Obtain the encrypted password through JS
    # with open('encrypt.js', 'r', encoding='utf-8') as f:
    #     js = f.read()
    # encrypted_password = execjs.compile(js).call('getEncryptedPassword', pre_parameter, password)
    # # print('2.[encrypted password]: %s' % encrypted_password)
    # return encrypted_password

    # Obtain the encrypted password through the rsa module and binassii module of Python
    public_key = rsa.PublicKey(int(pre_parameter['pubkey'], 16), int('10001', 16))
    text = '%s\t%s\n%s' % (pre_parameter['servertime'], pre_parameter['nonce'], password)
    encrypted_str = rsa.encrypt(text.encode(), public_key)
    encrypted_password = binascii.b2a_hex(encrypted_str).decode()
    # print('2.[encrypted password]: %s' % encrypted_password)
    return encrypted_password


def get_token(encrypted_password: str, pre_parameter: dict, username: str) -> str:
    su = base64.b64encode(username.encode())
    data = {
        'entry': 'Desensitization treatment, complete code attention GitHub: https://github.com/kgepachong/crawler',
        'gateway': 1,
        'from': '',
        'savestate': 7,
        'qrcode_flag': False,
        'useticket': 1,
        'pagerefer': '',
        'vsnf': 1,
        'su': su,
        'service': 'miniblog',
        'servertime': pre_parameter['servertime'],
        'nonce': pre_parameter['nonce'],
        'pwencode': 'rsa2',
        'rsakv': pre_parameter['rsakv'],
        'sp': encrypted_password,
        'sr': '1920*1080',
        'encoding': 'UTF-8',
        'prelt': 38,
        'url': 'Desensitization treatment, complete code attention GitHub: https://github.com/kgepachong/crawler',
        'returntype': 'META'
    }
    response = session.post(url=get_token_url, headers=headers, data=data)
    # response.encoding = 'gbk'
    ajax_login_url = re.findall(r'replace\("(.*)"\)', response.text)[0]
    token = ajax_login_url.split('token%3D')[-1]
    if 'weibo' not in token:
        # print('3.[token]: %s' % token)
        return token
    else:
        raise Exception('Login failed! Wrong user name or password!')


def get_encrypted_mobile(token: str) -> str:
    params = {
        'token': token,
        'callback_url': 'Desensitization treatment, complete code attention GitHub: https://github.com/kgepachong/crawler'
    }
    response = session.get(url=protection_url, params=params, headers=headers)
    tree = etree.HTML(response.text)
    encrypted_mobile = tree.xpath("//input[@name='encrypt_mobile']/@value")[0]
    # print('4.[encrypted mobile]: %s' % encrypted_mobile)
    return encrypted_mobile


def send_code(token: str, encrypt_mobile: str) -> str:
    params = {'token': token}
    data = {'encrypt_mobile': encrypt_mobile}
    response = session.post(url=send_code_url, params=params, data=data, headers=headers).json()
    if response['msg'] == response_success_str:
        code = input('Please enter the verification code: ')
        # print('5.[code]: %s' % code)
        return code
    else:
        # print('5.[failed to send verification code]: %s' % response)
        raise Exception('Verification code sending failed: %s' % response)


def confirm_code(encrypted_mobile: str, code: str, token: str) -> str:
    params = {'token': token}
    data = {
        'encrypt_mobile': encrypted_mobile,
        'code': code
    }
    response = session.post(url=confirm_url, params=params, data=data, headers=headers).json()
    if response['msg'] == response_success_str:
        redirect_url = response['data']['redirect_url']
        # print('6.[redirect url]: %s' % redirect_url)
        return redirect_url
    else:
        # print('6. [verification code failed]:% s'% response)
        raise Exception('Verification code verification failed: %s' % response)


def get_cross_domain2_url(redirect_url: str) -> str:
    response = session.get(url=redirect_url, headers=headers).text
    cross_domain2_url = re.findall(r'replace\("(.*)"\)', response)[0]
    # print('7.[cross domain2 url]: %s' % cross_domain2_url)
    return cross_domain2_url


def get_passport_url(cross_domain2_url: str) -> str:
    response = session.get(url=cross_domain2_url, headers=headers).text
    passport_url_str = re.findall(r'setCrossDomainUrlList\((.*)\)', response)[0]
    passport_url = json.loads(passport_url_str)['arrURL'][0]
    # print('8.[passport url]: %s' % passport_url)
    return passport_url


def login(passport_url: str) -> None:
    response = session.get(url=passport_url, headers=headers).text
    login_result = json.loads(response.replace('(', '').replace(');', ''))
    if login_result['result']:
        user_unique_id = login_result['userinfo']['uniqueid']
        user_display_name = login_result['userinfo']['displayname']
        print('Login succeeded! user ID: %s,user name:%s' % (user_unique_id, user_display_name))
    else:
        raise Exception('Login failed:%s' % login_result)


def main():
    username = input('Please enter login account: ')
    password = input('Please enter the login password: ')

    # 1. Pre login and obtain a dictionary parameter, including servertime, nonce, pubkey and rsakv to be used later
    pre_parameter = get_pre_parameter(username)

    # 2. Obtain the encrypted password through JS or Python
    encrypted_password = get_encrypted_password(pre_parameter, password)

    # 3. Get token
    token = get_token(encrypted_password, pre_parameter, username)

    # 4. Obtain the encrypted mobile phone number through the protection url
    encrypted_mobile = get_encrypted_mobile(token)

    # 5. Send mobile phone verification code
    code = send_code(token, encrypted_mobile)

    # 6. Verify the verification code. If the verification is successful, a redirected URL will be returned
    redirect_url = confirm_code(encrypted_mobile, code, token)

    # 7. Access the redirected URL and extract the crossdomain2 URL
    cross_domain2_url = get_cross_domain2_url(redirect_url)

    # 8. Access the crossdomain2 URL and extract the passport URL
    passport_url = get_passport_url(cross_domain2_url)

    # 9. Access the passport URL to log in
    login(passport_url)


if __name__ == '__main__':
    main()

 

Finally, there is a surprise (don't miss it)

It is the dream of every programmer to become a big manufacturer. He also hopes to have the opportunity to shine and make great achievements. However, the distance between ideal and reality needs to be shortened.

So here I have prepared some gift bags, hoping to help you.


★ gift bag 1

If you have no self-control or motivation to learn and communicate together, welcome to leave a message in the private letter or comment area. I will pull you into the learning and exchange group. We will communicate and study together, report to the group and punch in. There are many benefits in the group, waiting for you to unlock. Join us quickly!
★ gift bag 2

❶ a complete set of Python e-books, 200, a total of 6 G e-book materials, covering all major fields of Python.

❷ Python hands-on projects, including crawler, data analysis, machine learning, artificial intelligence and small game development.

Keywords: Python Javascript crawler

Added by slipster70 on Mon, 24 Jan 2022 18:54:50 +0200