Python 3 -- data visualization - visual display of nearly 60000 IP addresses by cracking the IP query interface (with source code)


This time, Python is used to write the crawler and store the crawled data into the database. Finally, pyechards is used to draw the chart to visually display 58677 IP addresses, including IP address classification, IP address attribution distribution and IP address operator distribution.

1, Preparatory work

The overall operation is based on Python. My environment:

  1. Python3.8
  2. JetBrains PyCharm 2018.2.2 x64
  3. pyecharts 1.9.0

2, Train of thought

1. Overall idea

2. Reptile ideas


In fact, after data extraction, it can be directly stored in the database. Here, I first store the txt text file, then read the text file and write it into excel, and finally manually import the MySQL database, which is a bit superfluous.

3, Effect display

1. Database

1.1 data sheet
The data table stores IP, IP home and IP operator information.

1.2 data volume
A total of 58677 IP address information are stored (after de duplication).

2.IP address classification analysis - pie chart


You can see that class A addresses account for A large proportion.

3.IP address distribution visualization - Map


Visually mark the IP address attribution information on the map (do not analyze the IP addresses in foreign countries). There are more IP addresses in Shandong, with a total of 12805, followed by Guangdong and Hong Kong.

4.IP address distribution analysis - pie chart

What, you can't see from the map just now that Guangdong and Hong Kong account for a large proportion? Please look at this picture.

5.IP address distribution visualization - bar chart


The bar chart can clearly show the number of various quantities.

6. Analysis on the proportion of IP address operators pie chart


From the pie chart above, we can see that education network accounts for the largest proportion, followed by telecom and Alibaba cloud.

7. Visualization of IP address operator proportion - bar chart


The bar chart represents the number by the length of the bar. This paper is used to analyze and display the number of IP operators.

4, Source code

1.get_ip_infos.py (IP address information crawler)

#coding:utf-8
import requests
import json
import time
import re

import xlwt

"""
resource_id Parameters are important
"""

class IP_ana:

    def read_txt(self,txt_file)->list:
        """
        Read from file IP Address, remove line breaks at the end
        :param txt_file:
        :return:
        """
        with open(txt_file,'r',encoding="utf-8")as f:
            data=[ip.strip() for ip in f.readlines()]
            return data

    def fmt_ip(self,ip)->str:
        """
        yes IP Format the address and remove the port number
        :param ip: Pending IP address
        :return: IPv4 Formatted IP address
        """
        regx="(([01]{0,1}\d{0,1}\d|2[0-4]\d|25[0-5])\.){3}([01]{0,1}\d{0,1}\d|2[0-4]\d|25[0-5])"
        if ":" in ip:
            aim_ip=ip.split(":")[0]
        else:
            aim_ip=ip
        if re.match(regx,aim_ip):
            return aim_ip
        else:
            return False

    def do_request(self,ip)->str:
        """
        Access to the interface
        :param ip: url Required parameters
        :return: Web page source code
        """
        try:
                full_url=f"https://sp1.baidu.com/8aQDcjqpAAV3otqbppnN2DJv/api.php?query={ip}&co=&resource_id=5809"
                headers={
                    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36"
                }
                r=requests.get(full_url,headers=headers)
                if r.status_code==200:
                    html=r.text.encode('utf-8').decode('unicode_escape')
                    # print(html)
                    return html
        except:
            return False

    def get_ip_attribute(self,html)->dict:
        """
        obtain IP Place of ownership
        :param html:
        :return:
        """
        try:
            item={}
            _json=json.loads(html)
            data=_json.get("data")
            item['location'],item['ISP']=data[0].get("location").split(" ")
            return item
        except:
            return False

    def save_result(self,data)->None:
        """
        Store crawl results
        :param data:
        :return:
        """
        with open(aim_txt,'a',encoding='utf-8')as f:
            f.write(data+'\n')

    def write_to_excel(self,all_data)->None:
        """
        write in excel
        :param all_data:
        :return:
        """
        workbook = xlwt.Workbook()
        worksheet = workbook.add_sheet('sheet', cell_overwrite_ok=True)
        headers = ['IP', 'Place of ownership', 'ISP', ]
        worksheet.write(0, 0, headers[0])
        worksheet.write(0, 1, headers[1])
        worksheet.write(0, 2, headers[2])
        for index, data in enumerate(all_data):
            worksheet.write(index + 1, 0, data[0])
            worksheet.write(index + 1, 1, data[1])
            worksheet.write(index + 1, 2, data[2])
        workbook.save(excel_path)


if __name__ == '__main__':
    a=IP_ana()
    all_ips=a.read_txt("test_ip.txt")
    aim_txt="./2021-11-2_test_result.txt"
    excel_path = "./combine_result.xls"
    for ip in all_ips:
        ip_fmt=a.fmt_ip(ip)
        if ip_fmt:
            ip_infos=a.do_request(ip_fmt)
            if ip_infos:
                item=a.get_ip_attribute(ip_infos)
                if item:
                    data=ip_fmt+":\t"+item['location']+"\t"+item["ISP"]
                    print(data)
                    a.save_result(data)
                else:
                    print(ip_fmt + ":\t" + "Parsing failed!")
            else:
                print((ip_fmt + ":\t" + "Failed to get information!"))
        else:
            print((ip + ":\t" + "Not standard IPv4 Format!"))
    #Read txt and write to excel
    ip_data=a.read_txt(aim_txt)
    excel_data_list=[]
    for ip_data_ in ip_data:
        ip_data_list=[ip_data_.strip().split("\t")]
        excel_data_list.append(ip_data_list)
    a.write_to_excel(excel_data_list)

2.analysis_ip_infos.py (visual analysis of IP address data)

import pymysql
from pyecharts.charts import Map, Bar
from pyecharts.charts import Pie
from pyecharts import options as opts
from pyecharts.globals import ThemeType

"""
Map:
Pie chart:
Map:
"""

def get_data_from_mysql():
    """
    Get data from database
    :return:
    """
    try:
        conn=pymysql.connect(
            host='127.0.0.1',
            port=3306,
            user='root',
            password='root',
            db='ip_count',
            charset='utf8'
        )
        cursor=conn.cursor()
        sql="select * from ip_count"
        cursor.execute(sql)
        data=cursor.fetchall()
        return data
    except pymysql.Error:
        print("Database operation error!")
    finally:
        cursor.close()
        conn.close()


def sort_ip(data):
    sort_result={}
    sort_result["A class"]=0
    sort_result["B class"]=0
    sort_result["C class"]=0
    sort_result["D class"]=0
    sort_result["E class"]=0
    for ip in data:
        fisrt_num=ip.split('.')[0]
        if 0<=int(fisrt_num)<=127:
            sort_result["A class"]+=1
        elif  128<=int(fisrt_num)<=191:
            sort_result["B class"]+=1
        elif  192<=int(fisrt_num)<=223:
            sort_result["C class"]+=1
        elif  224<=int(fisrt_num)<=239:
            sort_result["D class"]+=1
        elif  240<=int(fisrt_num)<=247:
            sort_result["E class"]+=1
        else:
            print(ip)
    return sort_result

def sort_provinces(data):
    """
    Classify and sort the provincial information
    :param data:
    :return:
    """
    sort_result_item={}
    Province_34 = ['Beijing', 'Shanghai', 'Hainan', 'Guizhou', 'Hubei', 'Chongqing', 'Jiangsu', 'Anhui', 'Macao Special Administrative Region', 'Sichuan', 'Jiangxi', 'Zhejiang', 'Qinghai', 'Henan',
                   'Tianjin', 'Taiwan', 'Hunan', 'Shaanxi', 'Heilongjiang', 'Guangdong', 'Hong Kong', 'Hebei', 'Liaoning', 'Fujian', 'Guangxi', 'Tibet', 'Inner Mongolia', 'Xinjiang', 'Yunnan',
                   'Gansu', 'Ningxia', 'Shanxi', 'Shandong', 'Jilin']
    provinces=[line[1] for line in data]
    for line in provinces:
        if line !=None:
            for p in Province_34:
                if p in line:
                    if p in sort_result_item.keys():
                        sort_result_item[p]+=1
                    else:
                        sort_result_item[p]=0
    sort_result=sorted(sort_result_item.items(),key=lambda x:x[1],reverse=True)
    return sort_result

def sort_ISP_data(data):
    """
    yes ISP Sort information by category
    :param data:
    :return:
    """
    data=[line[2] for line in data if line[2]!=None]
    item={}
    for isp in data:
        if isp in item.keys():
            item[isp]+=1
        else:
            item[isp] = 0
    item_sorted=sorted(item.items(),key=lambda x:x[1],reverse=True)
    return item_sorted

def draw_map(data):
    """
    Map the classified province information
    :param data:
    :return:
    """
    area1=[d[0] for d in data]
    area2=[d[1] for d in data]
    aim_num=max(area2)
    num_max_pos = len(str(aim_num)) - 2
    mid_num = divmod(aim_num, int("1" + "0" * num_max_pos))
    res_num = str(mid_num[0] + 1) + "0" * (num_max_pos)
    map =\
        Map(init_opts=opts.InitOpts(width="1000px", height="600px"))\
            .add("",[list(z) for z in zip(area1, area2)], 'china')\
            .set_global_opts(title_opts=opts.TitleOpts(title="IP Address distribution visualization-Map")
    ,visualmap_opts = opts.VisualMapOpts(max_=res_num, split_number=8, is_piecewise=True,precision=0))
    map.render("./IP Address distribution visualization-Map.html")

def draw_line(data,type_):
    """
    take ISP And province information are drawn into a bar chart
    :param data:
    :param type_:
    :return:
    """
    bar = (
        Bar(init_opts=opts.InitOpts(theme=ThemeType.LIGHT)) #Use theme
            .add_xaxis([data[0] for data in data])
            .add_yaxis("Proportion",[data[1] for data in data])
            .set_global_opts(
            title_opts=opts.TitleOpts(title=f"IP address{type_}visualization-Bar chart.html"),
            datazoom_opts=opts.DataZoomOpts(),
        )
    )
    bar.render(f"IP address{type_}visualization-Bar chart.html")

def draw_pie(data,type_):
    """
    take ISP And province information are drawn into a pie chart
    :param data:
    :return:
    """
    c = (
        Pie()
            .add(f"IP address{type_}visualization", data,color = "green",rosetype = "radius")
            .set_colors(["lightblue", "orange", "yellow", "blue", "pink", "green", "purple", "black"])
            .set_global_opts(title_opts=opts.TitleOpts(title=f"IP address{type_}visualization-Pie chart"),legend_opts=opts.LegendOpts(
            orient="vertical", #Legend placed vertically
            pos_top="15%",# Legend position adjustment
            pos_left="2%"),
    )
            .set_series_opts(label_opts=opts.LabelOpts(formatter="{b} : {c} ({d}%)"))
            .render(f"IP address{type_}analysis-Pie chart.html")
    )

if __name__ == '__main__':
    data=get_data_from_mysql()
    ip_data=[line[0] for line in  data]
    sort_ip_type_data=[(k,v) for k,v in sort_ip(ip_data).items()]
    draw_pie(sort_ip_type_data,"classification")

    ip_sort_res=sort_provinces(data)
    draw_map(ip_sort_res)
    draw_pie(ip_sort_res[:8],"distribution")
    draw_line(ip_sort_res[:8],"distribution")

    isp_sort_res=sort_ISP_data(data)
    draw_pie(isp_sort_res[:8],"Proportion of operators")
    draw_line(isp_sort_res[:8],"Proportion of operators")

5, Summary

This time, Python's pyecarts is used to draw pie charts, bar charts and maps to visually display the distribution of IP addresses and operators. The amount of code is small, and pyecarts is encapsulated. Calling the interface function can be easily implemented. I hope you can do it yourself. What are the deficiencies in ideas and codes? You are welcome to correct and criticize! I think it's OK. I can praise it.

Keywords: Python data visualization echarts

Added by Earnan on Sun, 07 Nov 2021 05:39:08 +0200