Graduation Design of Data Visualization Direction--A Visual Analysis System of Recruitment Information and Rental Data Based on Python Crawler

It has been more than ten days since my undergraduate defense passed smoothly. I have decided to make a summary of the last small achievements of my undergraduate course and share them with the small partners who want to do the project in this direction. I hope you want to have projects for reference and ideas for implementation for learning when you do the practical work. Demo Video Show you first.

I. Purpose and Significance of Research

(1) Status quo

  1. Graduates pay attention to the key and difficult points: finding a job + renting a house
  2. The recruitment website is complex: pull hook net, BOSS direct employment, no worry about the future, etc.
  3. Maturity of employment information websites at various universities
  4. There are many rental websites: Chain Home, I love my home, etc.

(2) Disadvantages

  1. Information only, single function
  2. The information is scattered and the overall situation is unknown.
  3. Text and numeric forms are not intuitive
  4. Recruitment is not associated with renting

(3) Improvement

  1. Integrate information, statistics
  2. Partition Data Visualization
  3. Rich Chart Rendering
  4. Set recruitment rental in one

Therefore, there is an urgent need for a platform that can integrate as much information as possible, and this platform needs strong statistical data and data visualization capabilities, so that users can retrieve recruitment information and housing source information through this platform, and understand the overall situation through chart visualization. For the growing employment force each year, we can clearly understand the current status of Internet industry and rental in first-line cities, new-line cities and second-line cities from the system, which can help us make the right choice for our own situation.

2. Implementation ideas and related technologies

Realization of Front-end and Back-end Data Interaction--ajax Technology

Passing parameters through ajax is an intermediate layer between users and servers, which asynchronizes user actions and server responses. The front end converts the parameters it needs to pass into a JSON string (json.stringify), then sends a request to the server via get/post and passes the parameters directly to the back end, which responds to the front end requests and receives data. The data is queried as a condition, and the result set of query in JSON string format is returned to the front end, which receives the data returned in the background to make conditional judgment and make corresponding page display.

Differences in how get/post requests are requested

They all submit data to the server, and they all get data from the server. Get request, browser will send the response header and data together, server response 200 indicates the request is successful, return data. Post-style requests, the browser will first send a response header, the server responds 100, the browser then sends the data body, the server responds 200, the request succeeds, and the data is returned. Post is safer.

When do I use get and when do I use post requests?

Sign-in registration, modify information section use post mode, data overview display, visual display section use get mode. That is, data query uses get mode, and data addition, deletion and modification uses post mode to be more secure.

Data Visualization Chart Display - ECharts Chart Gallery (with all the chart generation code you want to support online debugging, great looking charts, this website is absolutely absolute. Share it with everyone)

Website: ECharts Open Source Visual Chart Library

3. System Overall Functional Framework

4. Detailed implementation

(1) Data acquisition

1. Get Recruitment Information

Because of the strong anti-crawling mechanism of the pull-hook network, the header information is encapsulated using user-agent and cookies, the crawler is disguised as a browser to access the web page, the request is made through the post method of the requests package, the request is successfully returned to the json format string, and the data is read directly using the dictionary method to get the information about the python position we want. You can read the total number of positions, through the total number of positions and the number of positions that can be displayed on each page. We can calculate the total number of pages, then use a circular page-by-page crawl to summarize the job information and write it to a CSV-formatted file as well as a local mysql database.

import requests
import math
import time
import pandas as pd
import pymysql
from sqlalchemy import create_engine

def get_json(url, num):
    """
    From the specified url Medium Pass requests Request to carry request header and body to get information on a Web page,
    :return:
    """
    url1 = 'https://www.lagou.com/jobs/list_python/p-city_0?&cl=false&fromSearch=true&labelWords=&suginput='
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36',
        'Host': 'www.lagou.com',
        'Referer': 'https://www.lagou.com/jobs/list_%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90?labelWords=&fromSearch=true&suginput=',
        'X-Anit-Forge-Code': '0',
        'X-Anit-Forge-Token': 'None',
        'X-Requested-With': 'XMLHttpRequest',
        'Cookie':'user_trace_token=20210218203227-35e936a1-f40f-410d-8400-b87f9fb4be0f; _ga=GA1.2.331665492.1613651550; LGUID=20210218203230-39948353-de3f-4545-aa01-43d147708c69; LG_HAS_LOGIN=1; hasDeliver=0; privacyPolicyPopup=false; showExpriedIndex=1; showExpriedCompanyHome=1; showExpriedMyPublish=1; RECOMMEND_TIP=true; index_location_city=%E5%85%A8%E5%9B%BD; Hm_lvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1613651550,1613652253,1613806244,1614497914; _putrc=52ABCFBE36E5D0BD123F89F2B170EADC; gate_login_token=ea312e017beac7fe72547a32956420b07d6d5b1816bc766035dd0f325ba92b91; JSESSIONID=ABAAAECAAEBABII8D8278DB16CB050FD656DD1816247B43; login=true; unick=%E7%94%A8%E6%88%B72933; WEBTJ-ID=20210228%E4%B8%8B%E5%8D%883:38:37153837-177e7932b7f618-05a12d1b3d5e8c-53e356a-1296000-177e7932b8071; sensorsdata2015session=%7B%7D; _gid=GA1.2.1359196614.1614497918; __lg_stoken__=bb184dd5d959320e9e61d943e802ac98a8538d44699751621e807e93fe0ffea4c1a57e923c71c93a13c90e5abda7a51873c2e488a4b9d76e67e0533fe9e14020734016c0dcf2; X_MIDDLE_TOKEN=90b85c3630b92280c3ad7a96c881482e; LGSID=20210228161834-659d6267-94a3-4a5c-9857-aaea0d5ae2ed; TG-TRACK-CODE=index_navigation; SEARCH_ID=092c1fd19be24d7cafb501684c482047; X_HTTP_TOKEN=fdb10b04b25b767756070541617f658231fd72d78b; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%2220600756%22%2C%22first_id%22%3A%22177b521c02a552-08c4a0f886d188-73e356b-1296000-177b521c02b467%22%2C%22props%22%3A%7B%22%24latest_traffic_source_type%22%3A%22%E7%9B%B4%E6%8E%A5%E6%B5%81%E9%87%8F%22%2C%22%24latest_search_keyword%22%3A%22%E6%9C%AA%E5%8F%96%E5%88%B0%E5%80%BC_%E7%9B%B4%E6%8E%A5%E6%89%93%E5%BC%80%22%2C%22%24latest_referrer%22%3A%22%22%2C%22%24os%22%3A%22Linux%22%2C%22%24browser%22%3A%22Chrome%22%2C%22%24browser_version%22%3A%2288.0.4324.190%22%2C%22lagou_company_id%22%3A%22%22%7D%2C%22%24device_id%22%3A%22177b521c02a552-08c4a0f886d188-73e356b-1296000-177b521c02b467%22%7D; _gat=1; Hm_lpvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1614507066; LGRID=20210228181106-f2d71d85-74fe-4b43-b87e-d78a33c872ad'
    }
    data = {
        'first': 'true',
        'pn': num,
        'kd': 'BI Engineer'}
    #Get Cookies Information
    s = requests.Session()
    print('establish session: ', s, '\n\n')
    s.get(url=url1, headers=headers, timeout=3)
    cookie = s.cookies
    print('Obtain cookie: ', cookie, '\n\n')
    #Add request parameters and information such as headers, Cookies to make url requests
    res = requests.post(url, headers=headers, data=data, cookies=cookie, timeout=3)
    res.raise_for_status()
    res.encoding = 'utf-8'
    page_data = res.json()
    print('Request response results:', page_data, '\n\n')
    return page_data

def get_page_num(count):
    """
    Calculates the number of pages to be captured, and by entering keyword information in a pull-down grid, you can see up to 30 pages of information displayed,Display up to 15 job information per page
    :return:
    """
    page_num = math.ceil(count / 15)
    if page_num > 29:
        return 29
    else:
        return page_num

def get_page_info(jobs_list):
    """
    Get Positions
    :param jobs_list:
    :return:
    """
    page_info_list = []
    for i in jobs_list:  # Cycle through all job information on each page
        job_info = []
        job_info.append(i['companyFullName'])
        job_info.append(i['companyShortName'])
        job_info.append(i['companySize'])
        job_info.append(i['financeStage'])
        job_info.append(i['district'])
        job_info.append(i['positionName'])
        job_info.append(i['workYear'])
        job_info.append(i['education'])
        job_info.append(i['salary'])
        job_info.append(i['positionAdvantage'])
        job_info.append(i['industryField'])
        job_info.append(i['firstType'])
        job_info.append(",".join(i['companyLabelList']))
        job_info.append(i['secondType'])
        job_info.append(i['city'])
        page_info_list.append(job_info)
    return page_info_list

def unique(old_list):
    newList = []
    for x in old_list:
        if x not in newList :
            newList.append(x)
    return newList

def main():
    connect_info = 'mysql+pymysql://{}:{}@{}:{}/{}?charset=utf8'.format("root", "123456", "localhost", "3366",
                                                                        "lagou")
    engine = create_engine(connect_info)
    url = ' https://www.lagou.com/jobs/positionAjax.json?needAddtionalResult=false'
    first_page = get_json(url, 1)
    total_page_count = first_page['content']['positionResult']['totalCount']
    num = get_page_num(total_page_count)
    total_info = []
    time.sleep(10)
    for num in range(1, num + 1):
        # Get job-related information for each page
        page_data = get_json(url, num)  # Get response json
        jobs_list = page_data['content']['positionResult']['result']  # Get all python-related job information on each page
        page_info = get_page_info(jobs_list)
        total_info += page_info
        print('Crawled to{}Page, total number of positions{}'.format(num, len(total_info)))
        time.sleep(20)
    #Convert the total data to a data frame and output it, then write it to a csv-formatted file and to a local database
    df = pd.DataFrame(data=unique(total_info),
                      columns=['companyFullName', 'companyShortName', 'companySize', 'financeStage',
                               'district', 'positionName', 'workYear', 'education',
                               'salary', 'positionAdvantage', 'industryField',
                               'firstType', 'companyLabelList', 'secondType', 'city'])
    df.to_csv('bi.csv', index=True)
    print('Job information saved locally')
    df.to_sql(name='demo', con=engine, if_exists='append', index=False)
    print('Job information saved database')

if __name__ == '__main__':
    main()

2. Get rental housing source information

Use user-agent and cookies to encapsulate header information to disguise the crawler as a browser to access web pages, make url requests through the request package get method to get web page data, add Web page data parsed by pyquery package into the dataframe by field, and then save the dataframe into csv file and local database.

import requests
from pyquery import PyQuery as pq
from fake_useragent import UserAgent
import time
import pandas as pd
import random
import pymysql
from sqlalchemy import create_engine

UA = UserAgent()
headers = {
    'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6',
    'Cookie': 'lianjia_uuid=6383a9ce-19b9-47af-82fb-e8ec386eb872; UM_distinctid=1777521dc541e1-09601796872657-53e3566-13c680-1777521dc5547a; _smt_uid=601dfc61.4fcfbc4b; _ga=GA1.2.894053512.1612577894; _jzqc=1; _jzqckmp=1; _gid=GA1.2.1480435812.1614959594; Hm_lvt_9152f8221cb6243a53c83b956842be8a=1614049202,1614959743; csrfSecret=lqKM3_19PiKkYOfJSv6ldr_c; activity_ke_com=undefined; ljisid=6383a9ce-19b9-47af-82fb-e8ec386eb872; select_nation=1; crosSdkDT2019DeviceId=-kkiavn-2dq4ie-j9ekagryvmo7rd3-qjvjm0hxo; Hm_lpvt_9152f8221cb6243a53c83b956842be8a=1615004691; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%221777521e37421a-0e1d8d530671de-53e3566-1296000-1777521e375321%22%2C%22%24device_id%22%3A%221777521e37421a-0e1d8d530671de-53e3566-1296000-1777521e375321%22%2C%22props%22%3A%7B%22%24latest_traffic_source_type%22%3A%22%E8%87%AA%E7%84%B6%E6%90%9C%E7%B4%A2%E6%B5%81%E9%87%8F%22%2C%22%24latest_referrer%22%3A%22https%3A%2F%2Fwww.baidu.com%2Flink%22%2C%22%24latest_referrer_host%22%3A%22www.baidu.com%22%2C%22%24latest_search_keyword%22%3A%22%E6%9C%AA%E5%8F%96%E5%88%B0%E5%80%BC%22%2C%22%24latest_utm_source%22%3A%22guanwang%22%2C%22%24latest_utm_medium%22%3A%22pinzhuan%22%2C%22%24latest_utm_campaign%22%3A%22wybeijing%22%2C%22%24latest_utm_content%22%3A%22biaotimiaoshu%22%2C%22%24latest_utm_term%22%3A%22biaoti%22%7D%7D; lianjia_ssid=7a179929-0f9a-40a4-9537-d1ddc5164864; _jzqa=1.3310829580005876700.1612577889.1615003848.1615013370.6; _jzqy=1.1612577889.1615013370.2.jzqsr=baidu|jzqct=%E9%93%BE%E5%AE%B6.jzqsr=baidu; select_city=440300; srcid=eyJ0Ijoie1wiZGF0YVwiOlwiZjdiNTI1Yjk4YjI3MGNhNjRjMGMzOWZkNDc4NjE4MWJkZjVjNTZiMWYxYTM4ZTJkNzMxN2I0Njc1MDEyY2FiOWMzNTIzZTE1ZjEyZTE3NjlkNTRkMTA2MWExZmIzMWM5YzQ3ZmQxM2M3NTM5YTQ1YzM5OWU0N2IyMmFjM2ZhZmExOGU3ZTc1YWU0NDQ4NTdjY2RiMjEwNTQyMDQzM2JiM2UxZDQwZWQwNzZjMWQ4OTRlMGRkNzdmYjExZDQwZTExNTg5NTFkODIxNWQzMzdmZTA4YmYyOTFhNWQ2OWQ1OWM4ZmFlNjc0OTQzYjA3NDBjNjNlNDYyNTZiOWNhZmM4ZDZlMDdhNzdlMTY1NmM0ZmM4ZGI4ZGNlZjg2OTE2MmU4M2MwYThhNTljMGNkODYxYjliNGYwNGM0NzJhNGM3MmVmZDUwMTJmNmEwZWMwZjBhMzBjNWE2OWFjNzEzMzM4M1wiLFwia2V5X2lkXCI6XCIxXCIsXCJzaWduXCI6XCJhYWEyMjhiNVwifSIsInIiOiJodHRwczovL20ubGlhbmppYS5jb20vY2h1enUvc3ovenVmYW5nL3BnJTdCJTdELyIsIm9zIjoid2ViIiwidiI6IjAuMSJ9',
    'Host': 'sz.lianjia.com',
    'Referer': 'https://sz.lianjia.com/zufang/',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36',
}
num_page = 2

class Lianjia_Crawer:

    def __init__(self, txt_path):
        super(Lianjia_Crawer, self).__init__()
        self.file = str(txt_path)
        self.df = pd.DataFrame(columns = ['title', 'district', 'area', 'orient', 'floor', 'price', 'city'])

    def run(self):
        '''Startup script'''
        connect_info = 'mysql+pymysql://{}:{}@{}:{}/{}?charset=utf8'.format("root", "123456", "localhost", "3366", "lagou")
        engine = create_engine(connect_info)
        for i in range(100):
            url = "https://sz.lianjia.com/zufang/pg{}/".format(str(i))
            self.parse_url(url)
            time.sleep(random.randint(2, 5))
            print('Crawling url by {}'.format(url))
        print('Finished crawling!!!!!!!!!!!!!!')
        self.df.to_csv(self.file, encoding='utf-8')
        print('Rent information saved locally')
        self.df.to_sql(name='house', con=engine, if_exists='append', index=False)
        print('Rental Information Saved Database')

    def parse_url(self, url):
        headers['User-Agent'] = UA.chrome
        res = requests.get(url, headers=headers)
        #Declare pq objects
        doc = pq(res.text)
        for i in doc('.content__list--item .content__list--item--main'):
            try:
                pq_i = pq(i)
                # House Title
                title = pq_i('.content__list--item--title a').text()
                # Specific information
                houseinfo = pq_i('.content__list--item--des').text()
                # Administrative Region
                address = str(houseinfo).split('/')[0]
                district = str(address).split('-')[0]
                # House area
                full_area = str(houseinfo).split('/')[1]
                area = str(full_area)[:-1]
                # Orientation
                orient = str(houseinfo).split('/')[2]
                # floor
                floor = str(houseinfo).split('/')[-1]
                # Price
                price = pq_i('.content__list--item-price').text()
                #City
                city = 'Shenzhen'
                data_dict = {'title': title, 'district': district, 'area': area, 'orient': orient, 'floor': floor, 'price': price, 'city': city}
                self.df = self.df.append(data_dict, ignore_index=True)
                print([title, district, area, orient, floor, price, city])
            except Exception as e:
                print(e)
                print("Index extraction failed, please try again!!!!!!!!!!!!!")

if __name__ =="__main__":
    txt_path = "zufang_shenzhen.csv"
    Crawer = Lianjia_Crawer(txt_path)
    Crawer.run() # Start crawler script

(2) Database

You need to create three database tables to store recruitment information, housing information, and user information.

The crawled data is stored in a csv file and a local mysql database, some of which are shown in the following figure:

Database to Background Connection

Using pymysql to connect to the local mysql database, first install pymysql through pip and create the database and database tables, import the pymysql package, open the database connection, create a cursor object using cursor() method, and execute() method to execute the SQL query.

(3) Registration and login

Registration login process:


Implementation Code (Backend Response):

#Registered User
@app.route('/addUser',methods=['POST'])
def addUser():
    #Server-side get json
    get_json = request.get_json()
    name = get_json['name']
    password = get_json['password']
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',
                           charset='utf8mb4')
    cursor = conn.cursor()
    cursor.execute("select count(*) from `user` where `username` = '" + name + "'")
    count = cursor.fetchall()
    #The nickname already exists
    if (count[0][0]!= 0):
        table_result = {"code": 500, "msg": "The nickname already exists!"}
        cursor.close()
    else:
        add = conn.cursor()
        sql = "insert into `user`(username,password) values('"+name+"','"+password+"');"
        add.execute(sql)
        conn.commit()
        table_result = {"code": 200, "msg": "login was successful"}
        add.close()
    conn.close()
    return jsonify(table_result)
#User Login
@app.route('/loginByPassword',methods=['POST'])
def loginByPassword():
    get_json = request.get_json()
    name = get_json['name']
    password = get_json['password']
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',
                           charset='utf8mb4')
    cursor = conn.cursor()
    cursor.execute("select count(*) from `user` where `username` = '" + name +"' and password = '" + password+"';")
    count = cursor.fetchall()
    if(count[0][0] != 0):
        table_result = {"code": 200, "msg": name}
        cursor.close()
    else:
        name_cursor = conn.cursor()
        name_cursor.execute("select count(*) from `user` where `username` = '" + name +"';")
        name_count = name_cursor.fetchall()
        #print(name_count)
        if(name_count[0][0] != 0):
            table_result = {"code":500, "msg": "Wrong password!"}
        else:
            table_result = {"code":500, "msg":"This user does not exist, please register first!"}
        name_cursor.close()
    conn.close()
    print(name)
    return jsonify(table_result)

(4) Home page functions

(5) Modifying personal information

#Personal Information Modification
@app.route('/updateUserInfo',methods=['POST'])
def updateUserInfo():
    get_json = request.get_json()
    name = get_json['name']
    print(name)
    email = get_json['email']
    content = get_json['content']
    address = get_json['address']
    phone = get_json['phone']
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',
                           charset='utf8mb4')
    cursor = conn.cursor()
    cursor.execute("update `user` set email = '"+email+"',content = '"+content+"',address = '"+address+"',phone = '"+phone+"' where username = '"+ name +"';")
    conn.commit()
    table_result = {"code": 200, "msg": "Update successful!","youxiang": email, "tel": phone}
    cursor.close()
    conn.close()
    print(table_result)
    return jsonify(table_result)

(6) Modify password
Passwords can be modified in two ways, requiring security validation

#Password modification
@app.route('/updatePass',methods=['POST'])
def updatePass():
    get_json = request.get_json()
    name = get_json['name']
    oldPsw = get_json['oldPsw']
    newPsw = get_json['newPsw']
    rePsw = get_json['rePsw']
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',
                           charset='utf8mb4')
    cursor = conn.cursor()
    cursor.execute("select count(*) from `user` where `username` = '" + name + "' and password = '" + oldPsw+"';")
    count = cursor.fetchall()
    print(count[0][0])
    #Determine nickname password correspondence
    if (count[0][0] == 0):
        table_result = {"code": 500, "msg": "The original password is wrong!"}
        cursor.close()
    else:
        updatepass = conn.cursor()
        sql = "update `user` set password = '"+newPsw+"' where username = '"+ name +"';"
        updatepass.execute(sql)
        conn.commit()
        table_result = {"code": 200, "msg": "Password modified successfully!", "username": name, "new_password": newPsw}
        updatepass.close()
    conn.close()
    return jsonify(table_result)

(7) Data overview display

Note: Only research Internet job recruitment information

Take the recruitment data profile as an example:

@app.route('/data',methods=['GET'])
def data():
    limit = int(request.args['limit'])
    page = int(request.args['page'])
    page = (page-1)*limit
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',charset='utf8mb4')
    cursor = conn.cursor()
    if (len(request.args) == 2):
        cursor.execute("select count(*) from demo")
        count = cursor.fetchall()
        cursor = conn.cursor(cursor=pymysql.cursors.DictCursor)
        cursor.execute("select * from demo limit "+str(page)+","+str(limit))
        data_dict = []
        result = cursor.fetchall()
        for field in result:
            data_dict.append(field)
    else:
        education = str(request.args['education'])
        positionName = str(request.args['positionName']).lower()
        if(education=='Unlimited'):
            cursor.execute("select count(*) from demo where positionName like '%"+positionName+"%'")
            count = cursor.fetchall()
            cursor = conn.cursor(cursor=pymysql.cursors.DictCursor)
            cursor.execute("select * from demo where positionName like '%"+positionName+"%' limit " + str(page) + "," + str(limit))
            data_dict = []
            result = cursor.fetchall()
            for field in result:
                data_dict.append(field)
        else:
            cursor.execute("select count(*) from demo where positionName like '%"+positionName+"%' and education = '"+education+"'")
            count = cursor.fetchall()
            cursor = conn.cursor(cursor=pymysql.cursors.DictCursor)
            cursor.execute("select * from demo where positionName like '%"+positionName+"%' and education = '"+education+"' limit " + str(page) + "," + str(limit))
            data_dict = []
            result = cursor.fetchall()
            for field in result:
                data_dict.append(field)
    table_result = {"code": 0, "msg": None, "count": count[0], "data": data_dict}
    cursor.close()
    conn.close()
    return jsonify(table_result)

(8) Visualization of recruitment data
From four perspectives: national, first-line city, new-line city and second-line city.
Take a chestnut and the analysis of the enterprise situation nationwide is as follows:

@app.route('/qiye',methods=['GET'])
def qiye():
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',
                           charset='utf8mb4')
    cursor = conn.cursor()
    cursor.execute("SELECT DISTINCT(city) from demo")
    result = cursor.fetchall()
    city = []
    city_result = []
    companySize = []
    companySizeResult = []
    selected = {}
    # Captured Cities
    for field in result:
        city.append(field[0])
    if (len(request.args) == 0):
        # No query criteria
        # Get the number of cities
        for i in city:
            cursor.execute("SELECT count(*) from demo where city = '" + i + "'")
            count = cursor.fetchall()
            dict = {'value': count[0][0], 'name': i}
            city_result.append(dict)
        # Initialize initial display of several cities
        for i in city[10:]:
            selected[i] = False
        # Acquire several company sizes
        cursor.execute("SELECT DISTINCT(companySize) from demo")
        company = cursor.fetchall()
        for field in company:
            companySize.append(field[0])
            # Number of companies of each size
            cursor.execute("SELECT count(*) from demo where companySize = '" + field[0] + "'")
            count = cursor.fetchall()
            companySizeResult.append(count[0][0])
    else:
        positionName = str(request.args['positionName']).lower()
        # Query criteria: a certain occupation
        # Number of certain occupations per city
        for i in city:
            cursor.execute("SELECT count(*) from demo where city = '" + i + "' and positionName like '%"+positionName+"%'")
            count = cursor.fetchall()
            dict = {'value': count[0][0], 'name': i}
            city_result.append(dict)
        for i in city[10:]:
            selected[i] = False
        cursor.execute("SELECT DISTINCT(companySize) from demo")
        company = cursor.fetchall()
        for field in company:
            companySize.append(field[0])
            cursor.execute("SELECT count(*) from demo where companySize = '" + field[0] + "' and positionName like '%"+positionName+"%'")
            count = cursor.fetchall()
            companySizeResult.append(count[0][0])
    result = {"city": city, "city_result": city_result, "selected": selected, "companySize": companySize, "companySizeResult": companySizeResult}
    cursor.close()
    return jsonify(result)

Enterprise analysis in first-tier cities is shown in the following figure:

@app.route('/qiye_first',methods=['GET'])
def qiye_first():
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',
                           charset='utf8mb4')
    cursor = conn.cursor()
    #cursor.execute("SELECT DISTINCT(city) from demo")
    #result = cursor.fetchall()
    city = ['Beijing', 'Shanghai', 'Guangzhou', 'Shenzhen']
    city_result = []
    companySize = []
    companySizeResult = []
    selected = {}
    # Captured Cities
    #for field in result:
        #city.append(field[0])
    if (len(request.args) == 0):
        # No query criteria
        # Get the number of cities
        for i in city:
            cursor.execute("SELECT count(*) from demo where city = '" + i + "'")
            count = cursor.fetchall()
            dict = {'value': count[0][0], 'name': i}
            city_result.append(dict)
        # Initialize initial display of several cities
        for i in city[4:]:
            selected[i] = False
        # Acquire several company sizes
        cursor.execute("SELECT DISTINCT(companySize) from demo where city in ('Beijing', 'Shanghai', 'Guangzhou', 'Shenzhen');")
        company = cursor.fetchall()
        for field in company:
            companySize.append(field[0])
            # Number of companies of each size
            cursor.execute("SELECT count(*) from demo where companySize = '" + field[0] + "' and city in ('Beijing', 'Shanghai', 'Guangzhou', 'Shenzhen');")
            count = cursor.fetchall()
            companySizeResult.append(count[0][0])
    else:
        positionName = str(request.args['positionName']).lower()
        # Query criteria: a certain occupation
        # Number of certain occupations per city
        for i in city:
            cursor.execute("SELECT count(*) from demo where city = '" + i + "' and positionName like '%"+positionName+"%'")
            count = cursor.fetchall()
            dict = {'value': count[0][0], 'name': i}
            city_result.append(dict)
        for i in city[4:]:
            selected[i] = False
        cursor.execute("SELECT DISTINCT(companySize) from demo where city in ('Beijing', 'Shanghai', 'Guangzhou', 'Shenzhen');")
        company = cursor.fetchall()
        for field in company:
            companySize.append(field[0])
            cursor.execute("SELECT count(*) from demo where companySize = '" + field[0] + "' and positionName like '%"+positionName+"%' and city in ('Beijing', 'Shanghai', 'Guangzhou', 'Shenzhen');")
            count = cursor.fetchall()
            companySizeResult.append(count[0][0])

    result = {"city": city, "city_result": city_result, "selected": selected, "companySize": companySize, "companySizeResult": companySizeResult}
    cursor.close()
    return jsonify(result)

An analysis of salaries for undergraduate degree in China is shown in the figure:

@app.route('/xinzi',methods=['GET'])
def xinzi():
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',
                           charset='utf8mb4')
    cursor = conn.cursor()
    positionName = ['java', 'python', 'php', 'web', 'bi', 'android', 'ios', 'algorithm', 'Big data', 'test', 'Operation and maintenance', 'data base']
    #Column chart return list
    zzt_list = []
    zzt_list.append(['product', 'Java', 'Python', 'PHP', 'web', 'bi', 'android', 'ios', 'algorithm', 'Big data', 'test', 'Operation and maintenance', 'data base'])
    if (len(request.args) == 0 or str(request.args['education'])=='Unlimited'):
        temp_list = []
        for i in positionName:
            cursor.execute("SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) like '%K%' and positionName like '%"+i+"%';")
            count = cursor.fetchall()
            temp_list += count[0]
        zzt_list.append(['0—10K', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5], temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
        temp_list = []
        for i in positionName:
            cursor.execute("SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) BETWEEN 10 AND 20 and positionName like '%"+i+"%';")
            count = cursor.fetchall()
            temp_list += count[0]
        zzt_list.append(['10—20K', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5], temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
        temp_list = []
        for i in positionName:
            cursor.execute("SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) BETWEEN 20 AND 30 and positionName like '%"+i+"%';")
            count = cursor.fetchall()
            temp_list += count[0]
        zzt_list.append(['20—30K', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5], temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
        temp_list = []
        for i in positionName:
            cursor.execute(
                "SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) BETWEEN 30 AND 40 and positionName like '%" + i + "%';")
            count = cursor.fetchall()
            temp_list += count[0]
        zzt_list.append(['30—40K', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5], temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
        temp_list = []
        for i in positionName:
            cursor.execute(
                "SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) > 40 and positionName like '%" + i + "%';")
            count = cursor.fetchall()
            temp_list += count[0]
        zzt_list.append(['40 Above', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5], temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
    else:
        education = str(request.args['education'])
        temp_list = []
        for i in positionName:
            cursor.execute(
                "SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) like '%K%' and positionName like '%" + i + "%' and education = '"+education+"'")
            count = cursor.fetchall()
            temp_list += count[0]
        zzt_list.append(['0—10K', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5], temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
        temp_list = []
        for i in positionName:
            cursor.execute(
                "SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) BETWEEN 10 AND 20 and positionName like '%" + i + "%' and education = '"+education+"'")
            count = cursor.fetchall()
            temp_list += count[0]
        zzt_list.append(['10—20K', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5], temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
        temp_list = []
        for i in positionName:
            cursor.execute(
                "SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) BETWEEN 20 AND 30 and positionName like '%" + i + "%' and education = '"+education+"'")
            count = cursor.fetchall()
            temp_list += count[0]
        zzt_list.append(['20—30K', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5], temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
        temp_list = []
        for i in positionName:
            cursor.execute(
                "SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) BETWEEN 30 AND 40 and positionName like '%" + i + "%' and education = '"+education+"'")
            count = cursor.fetchall()
            temp_list += count[0]
        zzt_list.append(['30—40K', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5], temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
        temp_list = []
        for i in positionName:
            cursor.execute(
                "SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) > 40 and positionName like '%" + i + "%' and education = '"+education+"'")
            count = cursor.fetchall()
            temp_list += count[0]
        zzt_list.append(['40 Above', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5], temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
    result = {"zzt": zzt_list}
    cursor.close()
    return jsonify(result)

An analysis of the national welfare situation is shown in the figure:

@app.route('/fuli',methods=['GET'])
def fuli():
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',
                           charset='utf8mb4')
    cursor = conn.cursor(cursor=pymysql.cursors.DictCursor)
    cursor.execute("select positionAdvantage from `demo`")
    data_dict = []
    result = cursor.fetchall()
    for field in result:
        data_dict.append(field['positionAdvantage'])
    content = ''.join(data_dict)
    positionAdvantage = []
    jieba.analyse.set_stop_words('./stopwords.txt')
    tags = jieba.analyse.extract_tags(content, topK=100, withWeight=True)
    for v, n in tags:
        mydict = {}
        mydict["name"] = v
        mydict["value"] = str(int(n * 10000))
        positionAdvantage.append(mydict)
    cursor.execute("select companyLabelList from `demo`")
    data_dict = []
    result = cursor.fetchall()
    for field in result:
        data_dict.append(field['companyLabelList'])
    content = ''.join(data_dict)
    companyLabelList = []
    jieba.analyse.set_stop_words('./stopwords.txt')
    tags = jieba.analyse.extract_tags(content, topK=100, withWeight=True)
    for v, n in tags:
        mydict = {}
        mydict["name"] = v
        mydict["value"] = str(int(n * 10000))
        companyLabelList.append(mydict)
    cursor.close()
    return jsonify({"zwfl": positionAdvantage, "gsfl": companyLabelList})

The analysis and display of the requirements of Internet job qualification and work experience nationwide is shown in the figure:

@app.route('/xueli',methods=['GET'])
def xueli():
    #Open database connection
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',
                           charset='utf8mb4')
    #Create a cursor object cursor
    cursor = conn.cursor()
    #Execute sql statement
    cursor.execute("SELECT DISTINCT(education) from demo")
    #Get a list of all records
    result = cursor.fetchall()
    education = []
    education_data = []
    color_list = ['#459AF0', '#38C3B0', '#86CA5A', '#BFD44F', '	#90EE90']
    #Five situations of obtaining academic qualifications: unlimited, junior college, undergraduate, master, doctor
    for field in result:
        education.append(field[0])
    #Number of corresponding degrees obtained
    for i in range(len(education)):
        cursor.execute("SELECT count(*) from demo where education = '" + education[i] + "'")
        count = cursor.fetchall()
        education_data.append({'value': count[0][0], 'itemStyle': {'color': color_list[i]}})
    cursor.execute("SELECT DISTINCT(workYear) from demo")
    result = cursor.fetchall()
    workYear = []
    workYear_data = []
    #Several experience gained
    for field in result:
        workYear.append(field[0])
    #Get the number for each work experience
    for i in workYear:
        cursor.execute("SELECT count(*) from demo where workYear = '" + i + "'")
        count = cursor.fetchall()
        workYear_data.append({'value': count[0][0], 'name': i})
    cursor.close()
    return jsonify({"education":education, "education_data":education_data, "workYear_data":workYear_data})

An analysis of the distribution of the financing stages of Internet companies throughout the country is shown in the figure:

@app.route('/rongzi',methods=['GET'])
def rongzi():
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',
                           charset='utf8mb4')
    cursor = conn.cursor()
    cursor.execute("SELECT DISTINCT(financeStage) from demo")
    result = cursor.fetchall()
    finance = []
    finance_data = []
    # Several scenarios for obtaining financing
    for field in result:
        finance.append(field[0])
    # Get the number for each financing
    for i in range(len(finance)):
        cursor.execute("SELECT count(*) from demo where financeStage = '" + finance[i] + "'")
        count = cursor.fetchall()
        finance_data.append({'value': count[0][0], 'name': finance[i]})
    cursor.close()
    return jsonify({"finance": finance, "finance_data": finance_data})

An analysis of the distribution of job types in Internet companies across the country is shown in the figure:

@app.route('/poststyle',methods=['GET'])
def poststyle():
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',
                           charset='utf8mb4')
    cursor = conn.cursor()
    cursor.execute("SELECT DISTINCT(firstType) from demo")
    result = cursor.fetchall()
    firstType = []
    firstType_data = []
    # Several cases of job type acquisition
    for field in result:
        firstType.append(field[0])
    # Get the number for each job type
    for i in range(len(firstType)):
        cursor.execute("SELECT count(*) from demo where firstType = '" + firstType[i] + "'")
        count = cursor.fetchall()
        firstType_data.append({'value': count[0][0], 'name': firstType[i]})
    cursor.execute("SELECT DISTINCT(secondType) from demo")
    second = cursor.fetchall()
    secondType = []
    secondType_data = []
    # Several cases of job type acquisition
    for field in second:
        secondType.append(field[0])
    # Get the number for each job type
    for i in range(len(secondType)):
        cursor.execute("SELECT count(*) from demo where secondType = '" + secondType[i] + "'")
        count = cursor.fetchall()
        secondType_data.append({'value': count[0][0], 'name': secondType[i]})
    cursor.close()
    return jsonify({"firstType": firstType, "firstType_data": firstType_data, "secondType": secondType, "secondType_data": secondType_data})

(9) Comparison of recruitment in nine popular cities

A pie chart texture can be used to show the size of an Internet company in a city, a funnel chart can show the academic requirements of a job in a city, etc.

Take Beijing for example:

@app.route('/beijing',methods=['GET'])
def beijing():
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',
                           charset='utf8mb4')
    cursor = conn.cursor()

    district = []
    district_result = []
    companySize = []
    companySizeResult = []
    education = []
    educationResult = []
    workYear = []
    workYear_data = []
    firstType = []
    firstType_data = []
    finance = []
    finance_data = []
    leida_max_dict = []

    # Accessed Administrative Areas
    cursor.execute("SELECT DISTINCT(district) from demo where city='Beijing';")
    result = cursor.fetchall()
    for field in result:
        district.append(field[0])
    if (len(request.args) == 0):
        # No query criteria
        # Get the corresponding number of administrative areas
        for i in district:
            cursor.execute("SELECT count(*) from demo where district = '" + i + "';")
            count = cursor.fetchall()
            dict = {'value': count[0][0], 'name': i}
            district_result.append(dict)

        # Acquire several company sizes
        cursor.execute("SELECT DISTINCT(companySize) from demo where city = 'Beijing';")
        company = cursor.fetchall()
        for field in company:
            companySize.append(field[0])
            # Number of companies of each size
            cursor.execute("SELECT count(*) from demo where companySize = '" + field[0] + "' and city = 'Beijing';")
            count = cursor.fetchall()
            dict = {'value': count[0][0], 'name': field[0]}
            companySizeResult.append(dict)

        # Acquire several academic qualifications
        cursor.execute("SELECT DISTINCT(education) from demo where city = 'Beijing';")
        eduresult = cursor.fetchall()
        for field in eduresult:
            education.append(field[0])
            # Number of corresponding academic qualifications
            cursor.execute("SELECT count(*) from demo where education = '" + field[0] + "' and city = 'Beijing';")
            count = cursor.fetchall()
            dict = {'value': count[0][0], 'name': field[0]}
            educationResult.append(dict)


        cursor.execute("SELECT DISTINCT(workYear) from demo where city = 'Beijing';")
        workyear = cursor.fetchall()
        # Several experience gained
        for field in workyear:
            workYear.append(field[0])
        # Get the number for each work experience
        for i in workYear:
            cursor.execute("SELECT count(*) from demo where workYear = '" + i + "' and city = 'Beijing';")
            count = cursor.fetchall()
            workYear_data.append({'value': count[0][0], 'name': i})

        cursor.execute("SELECT DISTINCT(financeStage) from demo where city = 'Beijing';")
        result = cursor.fetchall()
        # Several scenarios for obtaining financing
        for field in result:
            finance.append(field[0])
            leida_max_dict.append({'name': field[0], 'max': 300})
        # Get the number for each financing
        for i in range(len(finance)):
            cursor.execute("SELECT count(*) from demo where financeStage = '" + finance[i] + "' and city = 'Beijing';")
            count = cursor.fetchall()
            finance_data.append(count[0][0])

        # Job Benefits
        cursor = conn.cursor(cursor=pymysql.cursors.DictCursor)
        cursor.execute("select positionAdvantage from `demo` where city = 'Beijing';")
        data_dict = []
        result = cursor.fetchall()
        for field in result:
            data_dict.append(field['positionAdvantage'])
        content = ''.join(data_dict)
        positionAdvantage = []
        jieba.analyse.set_stop_words('./stopwords.txt')
        tags = jieba.analyse.extract_tags(content, topK=100, withWeight=True)
        for v, n in tags:
            mydict = {}
            mydict["name"] = v
            mydict["value"] = str(int(n * 10000))
            positionAdvantage.append(mydict)

        # Job Type
        cursor.execute("SELECT DISTINCT(firstType) from demo where city = 'Beijing';")
        result = cursor.fetchall()
        # Several cases of job type acquisition
        for field in result:
            for i in field.keys():
                firstType.append(field[i])

        # Get the number for each job type
        for i in range(len(firstType)):
            cursor.execute("SELECT count(*) from demo where firstType = '" + firstType[i] + "' and city = 'Beijing';")
            count = cursor.fetchall()
            for field in count:
                for j in field.keys():
                    value = field[j]

            firstType_data.append({'value': value, 'name': firstType[i]})

        #Salary
        positionName = ['java', 'python', 'php', 'web', 'bi', 'android', 'ios', 'algorithm', 'Big data', 'test', 'Operation and maintenance', 'data base']
        # Column chart return list
        zzt_list = []
        zzt_list.append(
            ['product', 'Java', 'Python', 'PHP', 'web', 'bi', 'android', 'ios', 'algorithm', 'Big data', 'test', 'Operation and maintenance', 'data base'])
        temp_list = []
        for i in positionName:
            cursor.execute("SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) like '%k%' and positionName like '%" + i + "%' and city = 'Beijing';")
            count = cursor.fetchall()
            for i in count[0].keys():
                value = count[0][i]
            temp_list.append(value)
        zzt_list.append(
            ['0—10K', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5],temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
        temp_list = []
        for i in positionName:
            cursor.execute("SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) BETWEEN 10 AND 20 and positionName like '%" + i + "%' and city = 'Beijing';")
            count = cursor.fetchall()
            for i in count[0].keys():
                value = count[0][i]
            temp_list.append(value)
        zzt_list.append(['10—20K', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5],temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
        temp_list = []
        for i in positionName:
            cursor.execute("SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) BETWEEN 20 AND 30 and positionName like '%" + i + "%' and city = 'Beijing';")
            count = cursor.fetchall()
            for i in count[0].keys():
                value = count[0][i]
            temp_list.append(value)
        zzt_list.append(['20—30K', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5],temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
        temp_list = []
        for i in positionName:
            cursor.execute("SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) BETWEEN 30 AND 40 and positionName like '%" + i + "%' and city = 'Beijing';")
            count = cursor.fetchall()
            for i in count[0].keys():
                value = count[0][i]
            temp_list.append(value)
        zzt_list.append(['30—40K', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5],temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
        temp_list = []
        for i in positionName:
            cursor.execute("SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) > 40 and positionName like '%" + i + "%' and city = 'Beijing';")
            count = cursor.fetchall()
            for i in count[0].keys():
                value = count[0][i]
            temp_list.append(value)
        zzt_list.append(['40 Above', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5],temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])


    else:
        positionName = str(request.args['positionName']).lower()
        print(positionName)
        # Query criteria: a certain occupation
        # Administrative Region
        for i in district:
            cursor.execute("SELECT count(*) from demo where district = '" + i + "' and positionName like '%"+positionName+"%';")
            count = cursor.fetchall()
            dict = {'value': count[0][0], 'name': i}
            district_result.append(dict)
        # company size
        cursor.execute("SELECT DISTINCT(companySize) from demo where city = 'Beijing';")
        company = cursor.fetchall()
        for field in company:
            companySize.append(field[0])
            cursor.execute("SELECT count(*) from demo where companySize = '" + field[0] + "' and positionName like '%"+positionName+"%' and city = 'Beijing';")
            count = cursor.fetchall()
            dict = {'value': count[0][0], 'name': field[0]}
            companySizeResult.append(dict)
        # Educational requirements
        cursor.execute("SELECT DISTINCT(education) from demo where city = 'Beijing';")
        eduresult = cursor.fetchall()
        for field in eduresult:
            education.append(field[0])
            cursor.execute("SELECT count(*) from demo where education = '" + field[0] + "' and positionName like '%" + positionName + "%' and city = 'Beijing';")
            count = cursor.fetchall()
            dict = {'value': count[0][0], 'name': field[0]}
            educationResult.append(dict)
        #Hands-on background
        cursor.execute("SELECT DISTINCT(workYear) from demo where city = 'Beijing';")
        workyear = cursor.fetchall()
        for field in workyear:
            workYear.append(field[0])
            cursor.execute("SELECT count(*) from demo where workYear = '" + field[0] + "' and positionName like '%" + positionName + "%' and city = 'Beijing';")
            count = cursor.fetchall()
            workYear_data.append({'value': count[0][0], 'name': field[0]})
        # Financing Phase
        cursor.execute("SELECT DISTINCT(financeStage) from demo where city = 'Beijing';")
        result = cursor.fetchall()
        # Several scenarios for obtaining financing
        for field in result:
            finance.append(field[0])
            leida_max_dict.append({'name': field[0], 'max': 300})
        # Get the number for each financing
        for i in range(len(finance)):
            cursor.execute("SELECT count(*) from demo where financeStage = '" + finance[i] + "' and positionName like '%" + positionName + "%' and city = 'Beijing';")
            count = cursor.fetchall()
            finance_data.append(count[0][0])
        # Job Type
        cursor.execute("SELECT DISTINCT(firstType) from demo where city = 'Beijing';")
        result = cursor.fetchall()
        # Several cases of job type acquisition
        for field in result:
            firstType.append(field[0])
        # Get the number for each job type
        for i in range(len(firstType)):
            cursor.execute("SELECT count(*) from demo where firstType = '" + firstType[i] + "' and positionName like '%" + positionName + "%' and city = 'Beijing';")
            count = cursor.fetchall()
            firstType_data.append({'value': count[0][0], 'name': firstType[i]})
        # Job Benefits
        cursor = conn.cursor(cursor=pymysql.cursors.DictCursor)
        cursor.execute("select positionAdvantage from `demo` where city = 'Beijing' and positionName like '%" + positionName + "%' ;")
        data_dict = []
        result = cursor.fetchall()
        for field in result:
            data_dict.append(field['positionAdvantage'])
        content = ''.join(data_dict)
        positionAdvantage = []
        jieba.analyse.set_stop_words('./stopwords.txt')
        tags = jieba.analyse.extract_tags(content, topK=100, withWeight=True)
        for v, n in tags:
            mydict = {}
            mydict["name"] = v
            mydict["value"] = str(int(n * 10000))
            positionAdvantage.append(mydict)
        # Salary
        positionName_sample = ['java', 'python', 'php', 'web', 'bi', 'android', 'ios', 'algorithm', 'Big data', 'test', 'Operation and maintenance', 'data base']
        # Column chart return list
        zzt_list = []
        zzt_list.append(['product', 'Java', 'Python', 'PHP', 'Web', 'BI', 'Android', 'ios', 'algorithm', 'Big data', 'test', 'Operation and maintenance', 'data base'])
        # <10k
        temp_list = []
        cursor.execute("SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) like '%k%' and positionName like '%" + positionName + "%' and city = 'Beijing';")
        count = cursor.fetchall()
        #print(count)
        for i in count[0].keys():
            value = count[0][i]
        print(value)
        for num in range(len(positionName_sample)):
            if positionName == positionName_sample[num]:
                temp_list.append(value)
            else:
                temp_list.append(0)
        # print(temp_list)
        # temp_list.append(value)
        zzt_list.append(['0—10K', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5],temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
        # 10-20k
        temp_list = []
        cursor.execute("SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) BETWEEN 10 AND 20 and positionName like '%" + positionName + "%' and city = 'Beijing';")
        count = cursor.fetchall()
        for i in count[0].keys():
            value = count[0][i]
        for num in range(len(positionName_sample)):
            if positionName == positionName_sample[num]:
                temp_list.append(value)
            else:
                temp_list.append(0)
        # temp_list.append(value)
        zzt_list.append(['10—20K', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5],temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
        # 20-30k
        temp_list = []
        cursor.execute("SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) BETWEEN 20 AND 30 and positionName like '%" + positionName + "%' and city = 'Beijing';")
        count = cursor.fetchall()
        for i in count[0].keys():
            value = count[0][i]
        for num in range(len(positionName_sample)):
            if positionName == positionName_sample[num]:
                temp_list.append(value)
            else:
                temp_list.append(0)
        #temp_list.append(value)
        zzt_list.append(['20—30K', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5],temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
        # 30-40k
        temp_list = []
        cursor.execute("SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) BETWEEN 30 AND 40 and positionName like '%" + positionName + "%' and city = 'Beijing';")
        count = cursor.fetchall()
        for i in count[0].keys():
            value = count[0][i]
        for num in range(len(positionName_sample)):
            if positionName == positionName_sample[num]:
                temp_list.append(value)
            else:
                temp_list.append(0)
        #temp_list.append(value)
        zzt_list.append(['30—40K', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5],temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
        # >40k
        temp_list = []
        cursor.execute("SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) > 40 and positionName like '%" + positionName + "%' and city = 'Beijing';")
        count = cursor.fetchall()
        for i in count[0].keys():
            value = count[0][i]
        for num in range(len(positionName_sample)):
            if positionName == positionName_sample[num]:
                temp_list.append(value)
            else:
                temp_list.append(0)
        #temp_list.append(value)
        zzt_list.append(['>40K', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5],temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
    print(zzt_list)
    result = {"district": district, "district_result": district_result, "companySize": companySize, "companySizeResult": companySizeResult, "education": education, "educationResult": educationResult, "workYear_data":workYear_data, "firstType": firstType, "firstType_data": firstType_data, "leida_max_dict":leida_max_dict, "cyt": positionAdvantage, "finance": finance, "finance_data": finance_data, "zzt": zzt_list}
    cursor.close()
    return jsonify(result)

(10) Visualization of rental data
From four perspectives: national, first-line city, new-line city and second-line city.
Take a chestnut and analyze the rental housing area nationwide as shown in the following figure:

@app.route('/area',methods=['GET'])
def area():
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',
                           charset='utf8mb4')
    cursor = conn.cursor()
    area_kind = ['<=20㎡', '21~40㎡', '41~60㎡', '61~80㎡', '81~100㎡', '101~120㎡', '121~140㎡', '141~160㎡', '161~180㎡', '181~200㎡']
    area_data = []
    # Get the number corresponding to each area category
    #<=20㎡
    cursor.execute("SELECT count(*) from house where area between 0 and 20;")
    count = cursor.fetchall()
    area_data.append(count[0][0])
    #21~40㎡
    cursor.execute("SELECT count(*) from house where area between 21 and 40;")
    count = cursor.fetchall()
    area_data.append(count[0][0])
    # 41~60㎡
    cursor.execute("SELECT count(*) from house where area between 41 and 60;")
    count = cursor.fetchall()
    area_data.append(count[0][0])
    # 61~80㎡
    cursor.execute("SELECT count(*) from house where area between 61 and 80;")
    count = cursor.fetchall()
    area_data.append(count[0][0])
    # 81~100㎡
    cursor.execute("SELECT count(*) from house where area between 81 and 100;")
    count = cursor.fetchall()
    area_data.append(count[0][0])
    # 101~120㎡
    cursor.execute("SELECT count(*) from house where area between 101 and 120;")
    count = cursor.fetchall()
    area_data.append(count[0][0])
    # 121~140㎡
    cursor.execute("SELECT count(*) from house where area between 121 and 140;")
    count = cursor.fetchall()
    area_data.append(count[0][0])
    # 141~160㎡
    cursor.execute("SELECT count(*) from house where area between 141 and 160;")
    count = cursor.fetchall()
    area_data.append(count[0][0])
    # 161~180㎡
    cursor.execute("SELECT count(*) from house where area between 161 and 180;")
    count = cursor.fetchall()
    area_data.append(count[0][0])
    # 181~200㎡
    cursor.execute("SELECT count(*) from house where area between 181 and 200;")
    count = cursor.fetchall()
    area_data.append(count[0][0])
    cursor.close()
    print(area_data)
    return jsonify({"area_kind": area_kind, "area_data": area_data})

National analysis of rental housing floors shows as shown in the figure:

@app.route('/floor',methods=['GET'])
def floor():
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',
                           charset='utf8mb4')
    cursor = conn.cursor()
    cursor.execute("SELECT DISTINCT(floor) from house;")
    result = cursor.fetchall()
    floor_kind = []
    floor_data = []
    # Several cases of obtaining floors
    for field in result:
        floor_kind.append(field[0])
    # Get the number corresponding to each floor type
    for i in range(len(floor_kind)):
        cursor.execute("SELECT count(*) from house where floor = '" + floor_kind[i] + "'")
        count = cursor.fetchall()
        floor_data.append({'value': count[0][0], 'name': floor_kind[i]})
    cursor.close()
    return jsonify({"floor_kind": floor_kind, "floor_data": floor_data})

An analysis of rental housing orientation nationwide is shown in the figure:

@app.route('/orient',methods=['GET'])
def orient():
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',
                           charset='utf8mb4')
    cursor = conn.cursor()
    cursor.execute("SELECT DISTINCT(orient) from house;")
    result = cursor.fetchall()
    orient_kind = []
    orient_data = []
    # Get several cases of orientation
    for field in result:
        orient_kind.append(field[0])
    # Get the number corresponding to each orientation type
    for i in range(len(orient_kind)):
        cursor.execute("SELECT count(*) from house where orient = '" + orient_kind[i] + "'")
        count = cursor.fetchall()
        orient_data.append({'value': count[0][0], 'name': orient_kind[i]})
    cursor.close()
    print(orient_data)
    return jsonify({"orient_kind": orient_kind, "orient_data": orient_data})

The nationwide analysis of rental housing prices shows as follows:

@app.route('/price',methods=['GET'])
def price():
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',
                           charset='utf8mb4')
    cursor = conn.cursor()
    price_kind = ['<=1000', '1001~2000', '2001~3000', '3001~4000', '4001~5000', '5001~6000', '6001~7000', '7001~8000', '8001~9000', '9001~10000', '>10000']
    price_data = []
    # Get the number corresponding to each price category
    # <=1000
    cursor.execute("SELECT count(*) from house where price between 0 and 1000;")
    count = cursor.fetchall()
    price_data.append(count[0][0])
    # 1001~2000
    cursor.execute("SELECT count(*) from house where price between 1001 and 2000;")
    count = cursor.fetchall()
    price_data.append(count[0][0])
    # 2001~3000
    cursor.execute("SELECT count(*) from house where price between 2001 and 3000;")
    count = cursor.fetchall()
    price_data.append(count[0][0])
    # 3001~4000
    cursor.execute("SELECT count(*) from house where price between 3001 and 4000;")
    count = cursor.fetchall()
    price_data.append(count[0][0])
    # 4001~5000
    cursor.execute("SELECT count(*) from house where price between 4001 and 5000;")
    count = cursor.fetchall()
    price_data.append(count[0][0])
    # 5001~6000
    cursor.execute("SELECT count(*) from house where price between 5001 and 6000;")
    count = cursor.fetchall()
    price_data.append(count[0][0])
    # 6001~7000
    cursor.execute("SELECT count(*) from house where price between 6001 and 7000;")
    count = cursor.fetchall()
    price_data.append(count[0][0])
    # 7001~8000
    cursor.execute("SELECT count(*) from house where price between 7001 and 8000;")
    count = cursor.fetchall()
    price_data.append(count[0][0])
    # 8001~9000
    cursor.execute("SELECT count(*) from house where price between 8001 and 9000;")
    count = cursor.fetchall()
    price_data.append(count[0][0])
    # 9001~10000
    cursor.execute("SELECT count(*) from house where price between 9001 and 10000;")
    count = cursor.fetchall()
    price_data.append(count[0][0])
    # >10000
    cursor.execute("SELECT count(*) from house where price >10000;")
    count = cursor.fetchall()
    price_data.append(count[0][0])
    cursor.close()
    print(price_data)
    return jsonify({"price_kind": price_kind, "price_data": price_data})

The analysis of the relationship between rental housing price and housing area nationwide is shown in the figure:

@app.route('/relation',methods=['GET'])
def relation():
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',
                           charset='utf8mb4')
    cursor = conn.cursor()
    relation_data = []
    cursor.execute("select count(*) from house;")
    count = cursor.fetchall()
    #print(count[0][0])
    cursor.execute("SELECT area,price from house;")
    result = cursor.fetchall()
    for i in range(count[0][0]):
        relation_data.append(list(result[i]))
    #print(relation_data)
    cursor.close()
    return jsonify({"relation_data": relation_data})

(11) Intelligent prediction

@app.route('/predict',methods=['GET'])
def predict():
    y_data = ['0—10K', '10—20K', '20—30K', '30—40K', '40K Above']
    positionName = str(request.args['positionName']).lower()
    model = str(request.args['model'])
    with open(positionName+'_'+model+'.model', 'rb') as fr:
        selected_model = pickle.load(fr)
    companySize = int(request.args['companySize'])
    workYear = int(request.args['workYear'])
    education = int(request.args['education'])
    city = int(request.args['city'])
    x = [companySize, workYear, education, city]
    x = np.array(x)
    y = selected_model.predict(x.reshape(1, -1))
    return jsonify(y_data[y[0]])

(12) Web site access

The project also has some shortcomings:

1. There is less job information in the recruitment database, the types of crawling websites are single, only Internet jobs are studied.
2. The system can only crawl the web address manually by crawling recruitment information and rental information
3. Job information and rental information of this system have not been intersected yet. It would be better to recommend nearby houses intelligently according to company address.

Keywords: crawler Data Analysis data visualization

Added by exoskeleton on Mon, 24 Jan 2022 21:15:24 +0200