Climbing Music Resources

Has been suffering from the lack of good music resources, python crawler network transmission is extremely powerful, just a little python base would like to make a script to climb a little mp3 resources, nonsense, let's see the effect first.

/home/roland/PycharmProjects/Main.py
please input the artist name: billie eilish

Process finished with exit code 0

After running, we need to input the name of the climbing singer. Here we fill in a name of the singer we like and then return (to support the Chinese singer). The log does not output content, but directly saves the mp3 file to the save_path path path path, as shown in the following figure:

The python version used here is 3.6, which theoretically runs directly without additional request libraries.
code analysis

  • The following function splices the request header in POST mode
  • Access: Open the browser console - > NetWork option - > Send a request - > View POST

Take chrome as an example
Press F12 to open the console
Finding Form Data in turn according to the figure below is what you need. Of course, not all requests will use data headers. The purpose of doing this is to imitate the browser's process of accessing web pages.

Then the fields that need to be added to the header are added in turn, where pages and content are dynamic variables (the original address is ajax asynchronous loading) content is the singer to find.

def resemble_data(content, index):
    data = {}
    data['types'] = 'search'
    data['count'] = '30'
    data['source'] = 'netease'
    data['pages'] = index
    data['name'] = content
    data = urllib.parse.urlencode(data).encode('utf-8')
    return data

In addition, we need to get the corresponding code of User-Agent is:

opener.addheaders =
 [('User-Agent','Mozilla/5.0 (X11; Linux x86_64) 
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36')]

Where does this thing come from?

Let's just get it directly from the browser's console to disguise python's request request request as browser access.

    # set proxy agent
    proxy_support = urllib.request.ProxyHandler({'http:': '119.6.144.73:81'})
    opener = urllib.request.build_opener(proxy_support)
    urllib.request.install_opener(opener)
    # set proxy agent

Here we set up a proxy ip to prevent the anti-crawler mechanism of the server (an ip frequent access will be regarded as a crawler rather than a visitor operation, this is just an example, we can crawl instead of the address and port number of geographical ip to allow more ip access at the same time, reducing the possibility of being identified as a crawler)
Here's an example of Climbing Replacement Liip (skip it if you're not interested)

import urllib.request
import urllib.parse
import re
url = 'http://31f.cn/'
head = {}
head['User-Agent'] = 
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36
 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36'
response = urllib.request.urlopen(url)
html_document = response.read().decode('utf-8')
pattern_ip = re.compile(r'<td>(\d+\.\d+\.\d+\.\d+)</td>[\s\S]*?
<td>(\d{2,4})</td>')
ip_list = pattern_ip.findall(html_document)
print(len(ip_list))
for item in ip_list:
    print("ip The address is:%s The port number is:%s" % (item[0], item[1]))


The response here returns the link address of a music file in a format similar to xxxxuuid.mp3. We name the default uuid.mp3 directly as song name. mp3, and then write the file in binary file format.

 data = {}
    data['types'] = 'url'
    data['id'] = id
    data['source'] = 'netease'
    data = urllib.parse.urlencode(data).encode('utf-8')
    response = urllib.request.urlopen(url, data)
    music_url_str = response.read().decode('utf-8')
    music_url = pattern.findall(music_url_str)
    result = urllib.request.urlopen(music_url[0])
    file = open(save_path+name+'.mp3', 'wb')
    file.write(result.read())

As for Request url, you can get it here (of course, this is just an example, this URL is not the URL used by the example):

The following is the complete code, save_path ='/home/rol and/Spider/Img/ to modify the save path of the music file to its own save path.

import urllib.request
import urllib.parse
import json
import re


def resemble_data(content, index):
    data = {}
    data['types'] = 'search'
    data['count'] = '30'
    data['source'] = 'netease'
    data['pages'] = index
    data['name'] = content
    data = urllib.parse.urlencode(data).encode('utf-8')
    return data


def request_music(url, content):
    # set proxy agent
    proxy_support = urllib.request.ProxyHandler({'http:': '119.6.144.73:81'})
    opener = urllib.request.build_opener(proxy_support)
    opener.addheaders = [('User-Agent','Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36')]
    urllib.request.install_opener(opener)
    # set proxy agent
    total = []
    pattern = re.compile(r'\(([\s\S]*)\)')
    for i in range(1, 10):
        data = resemble_data(content, str(i))
        response = urllib.request.urlopen(url, data)
        result = response.read().decode('unicode_escape')
        json_result = pattern.findall(result)
        total.append(json_result)
    return total


def save_music_file(id, name):
    save_path = '/home/roland/Spider/Img/'
    pattern = re.compile('http.*?mp3')
    url = 'http://www.gequdaquan.net/gqss/api.php?callback=jQuery111307210973120745481_1533280033798'
    data = {}
    data['types'] = 'url'
    data['id'] = id
    data['source'] = 'netease'
    data = urllib.parse.urlencode(data).encode('utf-8')
    response = urllib.request.urlopen(url, data)
    music_url_str = response.read().decode('utf-8')
    music_url = pattern.findall(music_url_str)
    result = urllib.request.urlopen(music_url[0])
    file = open(save_path+name+'.mp3', 'wb')
    file.write(result.read())
    file.flush()
    file.close()


def main():
    url = 'http://www.gequdaquan.net/gqss/api.php?callback=jQuery11130967955054499249_1533275477385'
    content = input('please input the artist name:')
    result = request_music(url, content)
    for group in result[0]:
        target = json.loads(group)
        for item in target:
            save_music_file(str(item['id']), str(item['name']))


main()

Keywords: Python Linux network JSON

Added by indian98476 on Mon, 26 Aug 2019 18:22:11 +0300