python's crawl of easy reviews (simple)

I often listen to songs in NetEasy Cloud, and feel that those people have a good literary style and have commented on talents since ancient times. At the same time, I want to know what songs really make these people use their talents. Let's crawl through some of the hot song reviews.

Some of the modules we'll use

from Crypto.Cipher import AES Encryption

from base64 import b64encode

import requests,json. request JSON statement

import re. Regular expression

Crawling data from lxml import etree

import pymysql database operation

Connections to databases

db = pymysql.connect(host="localhost",
                         port=3306,
                         user="root",
                         passwd="123456",
                         database="wypc",
                         charset="utf8",
                         autocommit=True)

By analyzing the source code of a web page, we get two parameters, one is params, the other is encSecKey and they are encrypted, so we need to analyze its source. F12 Opens source to search encSckey. Find the encSecKey ins id e this js, find the location of encSecKey, debug the breakpoint and find that this is the result of the final parameters, then analyze the json function to get the encryption method of NetEase Cloud Web site. For the data crawling process, the same way the page is requested and the request header is obtained. Invoke the basic requests and json libraries to simulate the web page's ability to request servers and encode crawled html files into json format, and call the relibrary to regularly parse the list you get later. In the code for this project, I first get a list of song name IDS by get request and then accept the request request with a response. etree was used to preprocess the data. Then use xpath to get the parsed data.

 headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36 Edg/95.0.1020.40'
    }


    f = "00e0b509f6259df8642dbc35662901477df22677ec152b5ff68ace615bb7b725152b3ab17a876aea8a5aa76d2e417629ec4ee341f56135fccf695280104e0312ecbda92557c93870114af6c9d05c4f7f0c3685b7a46bee255932575cce10b424d813cfe4875d3e82047b97ddef52741d546b8e289dc6935b3ece0462db0a22b8e7"
    g = "0CoJUm6Qyw8W8jud"
    e = "010001"
    i = "oStLTrr0FaYKC6IE"

    def get_encSecKey():
        return "bf0c12e363f8fca9793ddab0cf6e1ea72b5d1c84c136a11147ce1b8b25da43ba385a7df0b5c97f0f9fe8904a23e318757cef5b8fe9da78ba447bfffc89a3fd3fc99db34631b69dffabe2bf6849961452f751fc71302bc3259177f70b4fdbf9ae19a5bd58b1d8422f4f0c1319f32099cbf6bf871bf59e459f05247c009f0f41f5"

    def get_params(data):  #By default, you receive a string here
        first = enc_params(data,g)
        second = enc_params(first, i)
        return second


    def to_16(data):
        pad = 16 - len(data)%16
        data += chr(pad)*pad
        return data


    def enc_params(data,key):
        iv = "0102030405060708"
        data = to_16(data)
        aes = AES.new(key = key.encode("utf-8"),IV = iv.encode('utf-8'),mode = AES.MODE_CBC)
        bs = aes.encrypt(data.encode("utf-8"))
        return str(b64encode(bs),"utf-8")


    url_song = 'https://music.163.com/discover/toplist?id=3778678'
    resp = requests.get(url=url_song, headers=headers).text
    tree = etree.HTML(resp)

    song = tree.xpath('//div[@data-key="song_toplist-3778678"]/ul[@class="f-hide"]/li/a/@href')
    song_name = tree.xpath('//div[@data-key="song_toplist-3778678"]/ul[@class="f-hide"]/li/a/text()')
    obj = re.compile(r'\d+', re.S)
    songID_list = []

Creating tables in navicat

 

Write an SQL statement, pass cursor.execute(sql) to execute, our database table is actually another software navicat has been built, so we can only insert the code in our SQL statement, db.commit() successfully submits the results to the database to determine whether all inserts were successful.

        for hot in range(0,Comment_mum):

            nickname = result['data']['hotComments'][hot]['user']['nickname']

            songName = song_name[song_count1]

            hot_comment = result['data']['hotComments'][hot]['content']


            if result['data']['hotComments'][hot]['user']['vipRights'] == None:
                Vip = 0
            else:
                Vip = 1
            likedCount = result['data']['hotComments'][hot]['likedCount']

            sql = "insert into comment(ID,nickName,songName,comment,VIP,likedNum) " \
                  "values(%d,'%s','%s','%s',%d,%d)" % (comment_count, nickname, songName, hot_comment, Vip, likedCount)#sql statement

            try:
                cursor.execute(sql)#Execute sql statement
                db.commit() # All inserts were successful and the results were submitted to the database
                print(str(comment_count)+' Song: "'+songName+'"Comment Information'+'Saved in MYSQL Database!!!')
            except Exception as e:
                db.rollback()# If the submission fails, the result falls back to the previous submission
                print("implement MySQL Error:%s" %e)
            comment_count = comment_count + 1
        song_count1 = song_count1 + 1  # Incremental Song Name
    cursor.close()#Close
    db.close()#To break off

Keywords: Python MySQL

Added by hoolahoops on Sun, 26 Dec 2021 02:19:11 +0200