Hand-on instructions for using Python to grab QQ music data (4th bounce)

[1. Project objectives]

By using Python to grab QQ music data (the first shot), we achieve the song name, album name, playback link of a song with a specified number of pages in the QQ music singer's single line.

By using Python to grab QQ music data (second round) by hand, we have achieved lyrics and top reviews for specified QQ music songs.

By teaching you how to grab QQ music data using Python (the third bomb), we have achieved more reviews and generated word clouds.

This time we will encapsulate three items together and control crawling of different data through the menu.

[2. libraries needed]

The libraries mainly involved are requests, openpyxl, html, json, wordcloud, jieba

A numpy library and a pipinstall pillow are also needed to change the background picture of a word cloud image

To generate.exe, pyinstaller -F is required

[3. Project Realization]

1. First determine what functions the menu will perform:

(1) Get song information (song title, album, link) of the designated singer

(2) Get the lyrics of specified songs

(3) Obtain comments on specified songs

(4) Generating Word Cloud Map

Exit the system

The code is as follows:

class QQ():

def menu(self):
    print('Welcome QQ Music crawler system, below is the function menu, please select.\n')
    while True:
        try:
            print('Function Menu\n1.Get song information for a designated singer\n2.Get the lyrics of the specified song\n3.Get specific song Reviews\n4.Generate Word Cloud Map\n5.Exit System\n')
            choice = int(input('Please enter the number to select the corresponding function:'))
            if choice == 1:
                self.get_info()
            elif choice == 2:
                self.get_id()
                self.get_lyric()
            elif choice == 3:
                self.get_id()
                self.get_comment()
            elif choice == 4:
                self.wordcloud()
            elif choice == 5:
                print('Thanks for using!')
                break
            else:
                print('Input error,Please re-enter.\n')
        except:
            print('Input error,Please re-enter.\n')  

The first line creates the class, and the second line defines the menu function. Instantiation of the class is used here. The first parameter of all functions is self. I think instantiation is more convenient for passing parameters.

whiletrue loops menus indefinitely;

Try...except...so that the loop does not exit with an error;

The other code opens different functions for setting different numbers to be entered.

2. Packaging item (1) get_info()

The code is as follows:

def get_info(self):

    wb=openpyxl.Workbook()  
    #Create Workbook
    sheet=wb.active 
    #Get workbook activity table
    sheet.title='song' 
    #Sheet Rename

    sheet['A1'] ='Song Name'     #Add a header to assign values to A1 cells
    sheet['B1'] ='Owning album'   #Add a header to assign values to B1 cells
    sheet['C1'] ='Play Links'   #Add a header to assign values to C1 cells
    url = 'https://c.y.qq.com/soso/fcgi-bin/client_search_cp'
    name = input('Please enter the name of the singer you want to query:')
    page = int(input('Please enter the number of song pages you need to query:'))
    for x in range(page):
        params = {
        'ct':'24',
        'qqmusic_ver': '1298',
        'new_json':'1',
        'remoteplace':'sizer.yqq.song_next',
        'searchid':'64405487069162918',
        't':'0',
        'aggr':'1',
        'cr':'1',
        'catZhida':'1',
        'lossless':'0',
        'flag_qc':'0',
        'p':str(x+1),
        'n':'20',
        'w':name,
        'g_tk':'5381',
        'loginUin':'0',
        'hostUin':'0',
        'format':'json',
        'inCharset':'utf8',
        'outCharset':'utf-8',
        'notice':'0',
        'platform':'yqq.json',
        'needNewCode':'0'    
        }
        res = requests.get(url,params=params)
        json = res.json()
        list = json['data']['song']['list']
        for music in list:
            song_name = music['name']
            # Find the song name and assign it to song_name
            album = music['album']['name']
            # Find the album name and assign it to album
            link = 'https://y.qq.com/n/yqq/song/' + str(music['mid']) + '.html\n\n'
            # Find playback links and assign links to links
            sheet.append([song_name,album,link])
            # Write name s, album s, and link s as lists, and Excel on multiple lines using the append function
            
    wb.save(name+'Before Personal Singles'+str(page*20)+'Detailed list.xlsx')            
    #Save and name this Excel file last
    print('Download successful!\n')

3. Encapsulation items (2) are get_id() and get_lyric

The code is as follows:

def get_id(self):

    self.i = input('Please enter the song name:')
    url_1 = 'https://c.y.qq.com/soso/fcgi-bin/client_search_cp'
    # This is the url requesting song comments
    headers = {'user-agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
    params = {'ct': '24', 'qqmusic_ver': '1298', 'new_json': '1', 'remoteplace': 'txt.yqq.song', 'searchid': '71600317520820180', 't': '0', 'aggr': '1', 'cr': '1', 'catZhida': '1', 'lossless': '0', 'flag_qc': '0', 'p': '1', 'n': '10', 'w': self.i, 'g_tk': '5381', 'loginUin': '0', 'hostUin': '0', 'format': 'json', 'inCharset': 'utf8', 'outCharset': 'utf-8', 'notice': '0', 'platform': 'yqq.json', 'needNewCode': '0'}  
    res_music = requests.get(url_1,headers=headers,params=params)
    json_music = res_music.json()
    self.id = json_music['data']['song']['list'][0]['id']
    # print(self.id)

def get_lyric(self):
    url_2 = 'https://c.y.qq.com/lyric/fcgi-bin/fcg_query_lyric_yqq.fcg'
    # This is the url requesting song comments
    headers = {
    'origin':'https://y.qq.com',
    'referer':'https://y.qq.com/n/yqq/song/001qvvgF38HVc4.html',
    'user-agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
    params = {
        'nobase64':'1',
        'musicid':self.id,
        '-':'jsonp1',
        'g_tk':'5381',
        'loginUin':'0',
        'hostUin':'0',
        'format':'json',
        'inCharset':'utf8',
        'outCharset':'utf-8',
        'notice':'0',
        'platform':'yqq.json',
        'needNewCode':'0',
        }
    res_music = requests.get(url_2,headers=headers,params=params)
    js_1 = res_music.json()
    lyric = js_1['lyric']
    lyric_html = html.unescape(lyric)   #The escape character html.unescape method was used
    # print(lyric_html)
    f1 = open(self.i+'Lyric.txt','a',encoding='utf-8')    #Store in txt
    f1.writelines(lyric_html)
    f1.close()
    print('Download successful!\n')

In particular,'origin'and'referer' must be added to the headers of the downloaded lyrics, or the data should not be crawled down.

4. Packaging items (3) get_comment() and wordcloud()

The code is as follows:

def get_comment(self):

    page = input('Please enter the number of comment pages to download:')
    url_3 = 'https://c.y.qq.com/base/fcgi-bin/fcg_global_comment_h5.fcg'
    headers = {'user-agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
    f2 = open(self.i+'comment.txt','a',encoding='utf-8')    #Store in txt
    for n in range(int(page)):
        params = {'g_tk_new_20200303': '5381', 'g_tk': '5381', 'loginUin': '0', 'hostUin': '0', 'format': 'json', 'inCharset': 'utf8', 'outCharset': 'GB2312', 'notice': '0', 'platform': 'yqq.json', 'needNewCode': '0', 'cid': '205360772', 'reqtype': '2', 'biztype': '1', 'topid': self.id, 'cmd': '6', 'needmusiccrit': '0', 'pagenum':n, 'pagesize': '15', 'lasthotcommentid':'', 'domain': 'qq.com', 'ct': '24', 'cv': '10101010'}
        res_music = requests.get(url_3,headers=headers,params=params)
        js_2 = res_music.json()
        comments = js_2['comment']['commentlist']
        for i in comments:
            comment = i['rootcommentcontent'] + '\n-----------------\n'
            f2.writelines(comment)
        # print(comment)
    f2.close()
    print('Download successful!\n')

def wordcloud(self):
    self.name = input('Enter the name of the file you want to generate the word cloud:')
    def cut(text):
        wordlist_jieba=jieba.cut(text)
        space_wordlist=" ".join(wordlist_jieba)
        return space_wordlist
    with open(self.name+".txt" ,encoding="utf-8")as file:
        text=file.read()
        text=cut(text)
        mask_pic=numpy.array(Image.open("heart.png"))
        wordcloud = WordCloud(font_path="C:/Windows/Fonts/simfang.ttf",
        collocations=False,
        max_words= 100,
        min_font_size=10, 
        max_font_size=500,
        mask=mask_pic).generate(text)
        wordcloud.to_file(self.name+'Cloud Word Map.png')  # Save the Word Cloud 
    print('Build succeeded!\n')

5. Instantiation of the last class

qq = QQ()
qq.menu()
6. Effect display

image
image
image
image
image
image
image
image
image
image
Package into.exe
Packed with pyinstaller-F, runs with errors and flicks.

image
Looking at the image above, the error message should be related to the word cloud image. Comment out the library needed for the word cloud image. def wordcloud() can be packaged normally by modifying the image below, but it will not be able to generate the word cloud image anymore:

image
image
image
When you download lyrics or comments, if you have a song with a duplicate name, you can add the name of the singer to the front of the song, such as "Dun Purple Bubble" in the picture above.

[4. Summary]

1. Item 4 reviews the first three items, consolidating the knowledge points of crawlers and reviewing the related usage of classes at the same time.

2. The first three items can be self-stamped; the article will learn: Hand-by-hand teaches you to use Python to grab QQ music data (first shot), Hand-by-Hand teaches you to use Python to grab QQ music data (second shot), Hand-by-Hand teaches you to use Python to grab QQ music data (third shot).

3. Thank you for watching. It's not easy to write more than a hundred lines of code.Wishing your little partners success in their studies and smooth work!

4. If you need the source code of this article, please reply to the four words "QQ Music" in the background of public number to get it. It feels good, remember to give a star oh.

What do you gain from reading this article?Forward to more people

IT Shared Home

Please reply in the background of WeChat to join the group

Keywords: Python JSON Windows encoding

Added by Stoned Gecko on Sun, 26 Apr 2020 20:29:49 +0300