I often listen to songs in NetEasy Cloud, and feel that those people have a good literary style and have commented on talents since ancient times. At the same time, I want to know what songs really make these people use their talents. Let's crawl through some of the hot song reviews.
Some of the modules we'll use
from Crypto.Cipher import AES Encryption
from base64 import b64encode
import requests,json. request JSON statement
import re. Regular expression
Crawling data from lxml import etree
import pymysql database operation
Connections to databases
db = pymysql.connect(host="localhost", port=3306, user="root", passwd="123456", database="wypc", charset="utf8", autocommit=True)
By analyzing the source code of a web page, we get two parameters, one is params, the other is encSecKey and they are encrypted, so we need to analyze its source. F12 Opens source to search encSckey. Find the encSecKey ins id e this js, find the location of encSecKey, debug the breakpoint and find that this is the result of the final parameters, then analyze the json function to get the encryption method of NetEase Cloud Web site. For the data crawling process, the same way the page is requested and the request header is obtained. Invoke the basic requests and json libraries to simulate the web page's ability to request servers and encode crawled html files into json format, and call the relibrary to regularly parse the list you get later. In the code for this project, I first get a list of song name IDS by get request and then accept the request request with a response. etree was used to preprocess the data. Then use xpath to get the parsed data.
headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.54 Safari/537.36 Edg/95.0.1020.40' } f = "00e0b509f6259df8642dbc35662901477df22677ec152b5ff68ace615bb7b725152b3ab17a876aea8a5aa76d2e417629ec4ee341f56135fccf695280104e0312ecbda92557c93870114af6c9d05c4f7f0c3685b7a46bee255932575cce10b424d813cfe4875d3e82047b97ddef52741d546b8e289dc6935b3ece0462db0a22b8e7" g = "0CoJUm6Qyw8W8jud" e = "010001" i = "oStLTrr0FaYKC6IE" def get_encSecKey(): return "bf0c12e363f8fca9793ddab0cf6e1ea72b5d1c84c136a11147ce1b8b25da43ba385a7df0b5c97f0f9fe8904a23e318757cef5b8fe9da78ba447bfffc89a3fd3fc99db34631b69dffabe2bf6849961452f751fc71302bc3259177f70b4fdbf9ae19a5bd58b1d8422f4f0c1319f32099cbf6bf871bf59e459f05247c009f0f41f5" def get_params(data): #By default, you receive a string here first = enc_params(data,g) second = enc_params(first, i) return second def to_16(data): pad = 16 - len(data)%16 data += chr(pad)*pad return data def enc_params(data,key): iv = "0102030405060708" data = to_16(data) aes = AES.new(key = key.encode("utf-8"),IV = iv.encode('utf-8'),mode = AES.MODE_CBC) bs = aes.encrypt(data.encode("utf-8")) return str(b64encode(bs),"utf-8") url_song = 'https://music.163.com/discover/toplist?id=3778678' resp = requests.get(url=url_song, headers=headers).text tree = etree.HTML(resp) song = tree.xpath('//div[@data-key="song_toplist-3778678"]/ul[@class="f-hide"]/li/a/@href') song_name = tree.xpath('//div[@data-key="song_toplist-3778678"]/ul[@class="f-hide"]/li/a/text()') obj = re.compile(r'\d+', re.S) songID_list = []
Creating tables in navicat
Write an SQL statement, pass cursor.execute(sql) to execute, our database table is actually another software navicat has been built, so we can only insert the code in our SQL statement, db.commit() successfully submits the results to the database to determine whether all inserts were successful.
for hot in range(0,Comment_mum): nickname = result['data']['hotComments'][hot]['user']['nickname'] songName = song_name[song_count1] hot_comment = result['data']['hotComments'][hot]['content'] if result['data']['hotComments'][hot]['user']['vipRights'] == None: Vip = 0 else: Vip = 1 likedCount = result['data']['hotComments'][hot]['likedCount'] sql = "insert into comment(ID,nickName,songName,comment,VIP,likedNum) " \ "values(%d,'%s','%s','%s',%d,%d)" % (comment_count, nickname, songName, hot_comment, Vip, likedCount)#sql statement try: cursor.execute(sql)#Execute sql statement db.commit() # All inserts were successful and the results were submitted to the database print(str(comment_count)+' Song: "'+songName+'"Comment Information'+'Saved in MYSQL Database!!!') except Exception as e: db.rollback()# If the submission fails, the result falls back to the previous submission print("implement MySQL Error:%s" %e) comment_count = comment_count + 1 song_count1 = song_count1 + 1 # Incremental Song Name cursor.close()#Close db.close()#To break off