Python crawls past and present K songs

preface

Do you still remember the songs we sang in childhood and the voices of people we used to like today? As the mainstream singing software, national K song is deeply missed by me. Every time you visit the Internet, it's a pity if one day the other party deletes the song or becomes private. Today, I'm here to make a national k-song Downloader, so that you can keep your past and present with others! [this program integrates me National Day, I'm coming~ The get_cookies.py file code in this article can automatically obtain cookies. However, the measured automatically obtained cookies will have one parameter less than the normally accessed cookies, and may be missing in the number of single songs obtained (I actually downloaded 143 national karaoke songs and only downloaded 140). There is no problem in directly copying cookies.]

thinking

At present, the national K-songs only show the first 8 songs, and it's useless to view more at the bottom. At the beginning, I saw this phenomenon. I was desperate to climb. Of course, it's ok if I'm willing to share them one by one, extract links and download them, but people certainly don't want to be so troublesome. I just want to get one person and download all their songs. After my research on the web page, I found that there is a script tag on the personal interface to store the overview information that can be obtained by the current account. The main useful information is the total number of songs. If we climb someone else's, only the number of non private songs will be displayed, which can let us know how many we want to climb and have a good start, I think it should be handled better later.
After I refreshed and observed XHR many times, I had an idea and clicked once to check more. I found that although it was useless, there was an extra XHR request kg_ ugc_ get_ Home page, although I saw that he did not return useful data, when I saw the familiar parameters he needed, I was sure that this was the only way to get songs!

I was so excited that I spliced start, num and share_ After uid, sure enough, it successfully responded to a callback object with song information. After simple analysis, it obtained the ugclist song list. After many attempts, you can only request 15 songs at most. Just request more times. Anyway, the total song data has been obtained.
I thought it would be easy next. I didn't expect the audio link to be easy to find, but I didn't expect

I'm going to vomit here. I don't know where the parameters come from. It's a little too complicated to find them... I feel that at least a dozen or even dozens of JavaScript functions participate in the generation of these parameters, which are more likely to be generated on the server side. I resolutely abandoned this path.
I looked at the web page source code. Hey, the song url is in a script tag of the web page source code. Sure enough, I was favored by the goddess of luck twice.
Next, the song address can be resolved by using web page analysis tools such as beautiful soup, which directly skips the acquisition of ftnrkey, vkey, fname and ugcid. Hehe. Then you can download it normally.
After downloading the songs, I looked at my national K-song. It seemed that there was another album that had not been downloaded. I thought that the songs had been downloaded and the album could not be dropped. I directly opened the album label and a FCG appeared_ user_ album_ List's XHR request can be seen in English to obtain the album list. I only have one album, so I display one. The actual parameters of the album details interface are very few. As long as an album id parameter s, you can access the album interface. Now I am desperate for XHR in the interface. I resolutely analyze the web source code again and find that the information I want is still quietly lying in the script tag. However, I clearly have 11 singles, and he can only get 10, I don't know whether it's because I deleted one of them or because he can only get the top 10. Regardless of it, the album is almost finished. If you have many songs in the album, you can try to get a few~
Next, downloading album songs is the same as downloading ordinary songs. After obtaining the shareid, you can enter the song details page and analyze the song website to download them~

code

# _*_ coding:utf-8 _*_
# Project: 
# FileName: qmkg_new.py
# UserName: Gao Junji
# ComputerUser: 19305
# Day: 2021/10/24
# Time: 12:00
# IDE: PyCharm
# Women, no—— Soul sadness from October 9, 2021

import os
import sys
import json
import time
import base64
import getpass
import sqlite3

import urllib3
import requests
import webbrowser
import ctypes.wintypes
from bs4 import BeautifulSoup
from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes


urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)


class DataBlob(ctypes.Structure):
    _fields_ = [('cbData', ctypes.wintypes.DWORD), ('pbData', ctypes.POINTER(ctypes.c_char))]


def dp_api_decrypt(encrypted):
    p = ctypes.create_string_buffer(encrypted, len(encrypted))
    blob_out = DataBlob()
    ret_val = ctypes.windll.crypt32.CryptUnprotectData(ctypes.byref(DataBlob(ctypes.sizeof(p), p)), None, None, None, None, 0, ctypes.byref(blob_out))
    if not ret_val:
        raise ctypes.WinError()
    result = ctypes.string_at(blob_out.pbData, blob_out.cbData)
    ctypes.windll.kernel32.LocalFree(blob_out.pbData)
    return result


def aes_decrypt(encrypted_txt):
    with open(f'C:\\Users\\{getpass.getuser()}\\AppData\\Local\\Google\\Chrome\\User Data\\Local State', encoding='utf-8', mode="r") as f:
        jsn = json.loads(str(f.readline()))
    encrypted_key = base64.b64decode(jsn["os_crypt"]["encrypted_key"].encode())
    encrypted_key = encrypted_key[5:]
    cipher = Cipher(algorithms.AES(dp_api_decrypt(encrypted_key)), None, backend=default_backend())
    cipher.mode = modes.GCM(encrypted_txt[3:15], tag=None, min_tag_length=16)
    return cipher.decryptor().update(encrypted_txt[15:])


def chrome_decrypt(encrypted_txt):
    if sys.platform == 'win32':
        try:
            if encrypted_txt[:4] == b'x01x00x00x00':
                return dp_api_decrypt(encrypted_txt).decode()
            elif encrypted_txt[:3] == b'v10':
                return aes_decrypt(encrypted_txt)[:-16].decode()
        except WindowsError:
            return None
    else:
        raise WindowsError


def get_cookies_from_chrome(d):
    con = sqlite3.connect(f'C:\\Users\\{getpass.getuser()}\\AppData\\Local\\Google\\Chrome\\User Data\\Default\\Cookies')
    con.row_factory = sqlite3.Row
    cur = con.cursor()
    cur.execute(f'SELECT name, encrypted_value as value FROM cookies where host_key like "%{d}%"')
    cookies = ''
    for row in cur:
        if row['value'] is not None:
            value = chrome_decrypt(row['value'])
            if value is not None:
                cookies += row['name'] + '=' + value + ';'
    return cookies


def parse_cookies(cookies: str):
    cookies_dict = {}
    for c in cookies.replace(' ', '').split(';'):
        try:
            cookies_dict[c.split('=')[0]] = c.split('=')[1]
        except IndexError:
            cookies_dict[c.split('=')[0]] = ''
    if "" in cookies_dict:
        del cookies_dict[""]
    return cookies_dict


webbrowser.open('https://kg.qq.com/index-pc.html')
cookie = input('Enter a valid cookie(XHR Select one of them), otherwise read Google browser cookie((songs may be missing):')
if not cookie:
    cookie = get_cookies_from_chrome('qq.com') + get_cookies_from_chrome('kg.qq.com')

if not parse_cookies(cookie).get('muid', None):
    print('Waiting for login K Song web page...')
    n = 1
    while not parse_cookies(cookie).get('muid', None):
        cookie = get_cookies_from_chrome('qq.com') + get_cookies_from_chrome('kg.qq.com')
        print(f'Detect login status[{n}]second...', end='\r')
        time.sleep(1)
        n += 1
uid = parse_cookies(cookie)['muid']
print(f'\n Get user uid: {uid}')
inp = input('To query uid,Otherwise, get the user itself:')
if len(inp) > 10:
    uid = inp

# Get all available song information
total = 0  # Total number of songs available
ugc = []  # All song data
user_information = {}  # Basic user information
res = requests.get(f'https://kg.qq.com/node/personal?uid={uid}', cookies={"cookie": cookie})
if res.ok:
    for script in BeautifulSoup(res.text, 'lxml').find_all('script'):
        if "window.__DATA__" in script.text:
            user_information = json.loads(script.text[script.text.find('{'): script.text.rfind('};') + 1])["data"]
            total = user_information["ugc_total_count"]  # No cookies = = public songs | cookies = = all songs in the account | number of songs that can be obtained
            print(f'Total Songs:{total}')
            if not os.path.exists(f'{user_information["kgnick"]}_{uid}/media'):
                os.makedirs(f'{user_information["kgnick"]}_{uid}/media')
            num = 15  # Get up to 15 songs at a time
            n = 1  # the number of pages
            while n:
                url = f'http://node.kg.qq.com/cgi/fcgi-bin/kg_ugc_get_homepage?type=get_uinfo&start={n}&num={num}&share_uid={uid}'
                res = requests.get(url, cookies={"cookie": cookie})
                if res.ok:
                    song_information = json.loads(res.text[res.text.find('{'): res.text.rfind('}') + 1])["data"]
                    if not song_information["ugclist"]:
                        break
                    ugc += song_information["ugclist"]
                    n += 1
            break
    else:
        print('No songs found!')

if user_information:
    open(f'{user_information["kgnick"]}_{uid}/{user_information["kgnick"]}_{uid}.json', 'w', encoding='utf-8').write(json.dumps(ugc, indent=4, ensure_ascii=False))
    for i, song in enumerate(ugc):
        # Get song links directly from the dictionary (skip the trouble of vkey)
        res = requests.get(f'https://node.kg.qq.com/play?s={song["shareid"]}', cookies={"cookie": cookie})
        if res.ok:
            for script in BeautifulSoup(res.text, 'lxml').find_all('script'):
                if "window.__DATA__" in script.text:
                    media_information = json.loads(script.text[script.text.find('{'): script.text.rfind('};') + 1])["detail"]
                    res = requests.get(media_information["playurl"], stream=True)
                    if res.ok:
                        print(f'\r Downloading:{user_information["kgnick"]}_{uid}/media/{song["title"]}_{song["shareid"]}.m4a\n[Current:{str(i + 1).zfill(len(str(total)))}/in total:{total}]', end='')
                        open(f'{user_information["kgnick"]}_{uid}/media/{song["title"]}_{song["shareid"]}.m4a', 'wb').write(res.content)
                    break
            else:
                print('No media links found!')
    print()

# Get album
album_list = {}
res = requests.get(f'https://node.kg.qq.com/cgi/fcgi-bin/fcg_user_album_list?dest_uid={uid}', cookies={"cookie": cookie})
if res.ok:
    album_information = json.loads(res.text[res.text.find('{'): res.text.rfind('}') + 1])["data"]
    if "album_list" in album_information and album_information["album_list"]:
        for album in album_information["album_list"]:
            album_list[album["album_id"]] = {"album_name": album["album_name"], "album_list": []}
            res = requests.get(f'https://node.kg.qq.com/album?s={album["album_id"]}')
            if res.ok:
                for script in BeautifulSoup(res.text, 'lxml').find_all('script'):
                    if "window.__DATA__" in script.text:
                        album_list_information = json.loads(script.text[script.text.find('{'): script.text.rfind('};') + 1])["detail"]
                        if album_list_information["ugc_list"] and album["ugc_num"] and len(album_list_information["ugc_list"]) == album["ugc_num"]:
                            if not os.path.exists(f'{user_information["kgnick"]}_{uid}/{album["album_name"]}_{album["album_id"]}'):
                                os.makedirs(f'{user_information["kgnick"]}_{uid}/{album["album_name"]}_{album["album_id"]}')
                            if album["ugc_num"] != album_list_information["ugc_num"]:
                                print('Only 10 album songs can be obtained temporarily, and the excess part cannot be obtained')
                            album_list[album["album_id"]]["album_list"] = album_list_information["ugc_list"]
                        else:
                            print('No album songs or not all album songs obtained')
                        break
                else:
                    print('Album no songs found!')

# Download album
if album_list:
    open(f'{user_information["kgnick"]}_{uid}/album_list.json', 'w', encoding='utf-8').write(json.dumps(album_list, indent=4, ensure_ascii=False))
    for album in album_list:
        if album_list[album]:
            total = len(album_list[album]["album_list"])
            open(f'{user_information["kgnick"]}_{uid}/{album_list[album]["album_name"]}_{album}/{album_list[album]["album_name"]}_{album}.json', 'w', encoding='utf-8').write(json.dumps(album_list[album], indent=4, ensure_ascii=False))
            for i, song in enumerate(album_list[album]["album_list"]):
                # Get song links directly from the dictionary (skip the trouble of vkey)
                res = requests.get(f'https://node.kg.qq.com/play?s={song["ugc_id"]}', cookies={"cookie": cookie})
                if res.ok:
                    for script in BeautifulSoup(res.text, 'lxml').find_all('script'):
                        if "window.__DATA__" in script.text:
                            media_information = json.loads(script.text[script.text.find('{'): script.text.rfind('};') + 1])["detail"]
                            res = requests.get(media_information["playurl"], stream=True)
                            if res.ok:
                                print(f'\r Downloading:{user_information["kgnick"]}_{uid}/{album_list[album]["album_name"]}_{album}/{song["song_name"]}_{song["ugc_id"]}.m4a\n[Current:{str(i + 1).zfill(len(str(total)))}/in total:{total}]', end='')
                                open(f'{user_information["kgnick"]}_{uid}/{album_list[album]["album_name"]}_{album}/{song["song_name"]}_{song["ugc_id"]}.m4a', 'wb').write(res.content)
                            break
                    else:
                        print('No media links found!')
            print()
input('Download completed! Press enter to end the program~')

Conclusion

This program can be packaged into a program: pyinstaller -F qmkg_new.py where to use it. Ladies and sisters who sing well on the national K song can download and share it with me ~ [/ manual funny /]

Keywords: Python crawler Python crawler

Added by khendar on Sun, 24 Oct 2021 06:11:55 +0300