Python linkage CSV and MySQL + a requirement implementation

Python linked CSV and MySQL

1, Make sure MySQL is available

After MySQL is installed (the simplest installation method is to download the installation package from the official website and then install it step by step), it's time to operate ~

% mysql --version
% zsh: command not found: mysql
 Copy code

command not found: mysql appears mostly because you haven't added environment variables after installing mysql, so this error occurs. resolvent:

# 1. Enter the following command to enter the environment variable file (be careful not to%)
% vim ~/.bash_profile 
# 2. Press the A key to enter the editing state and paste the following contents
export PATH=$PATH:/usr/local/mysql/bin
# 3. Press ESC, then shift +:
# Enter wq to save the edits
# 4. Run the following code to restart the configuration file
% source ~/.bash_profile
 Copy code

Enter the following command again, and you can see the relevant information of mysql version ~

% mysql --version
% mysql  Ver 8.0.22 for macos10.15 on x86_64 (MySQL Community Server - GPL)
Copy code

2, Simply get some data to CSV

2.1 page view and analysis

Here, we climb down the learning contest page data of Alibaba cloud Tianchi

chineseenglishdata typeremarks
Event nameraceNamestringcharacter string
Event IDraceIdstring531842
Event introductionbriefstringLong string
Event start timecurrentSeasonStartdate2020-11-19 17:00:00
Event end timecurrentSeasonEnddate2021-01-14 23:55:00
Event statusraceStateint1 in progress
0 has ended
Number of participating teamsteamNumintinteger
Event typetypeintinteger
Event bonuscurrencyintinteger
Event bonus symbolcurrencySymbolstringcharacter string

Hey hey, after planning the data you need, you can start page analysis. Press and hold F12 to refresh the page. Refresh, refresh, look at the page data loading in the Network, find it one by one, bang, you find a listbrief? pageNum=1&pageSize..., This is fate. The data are here. Click Headers to view the Request URL, and you will find that this is actually a data interface, comfortable ~

In this way, data acquisition is much easier ~!

2.2 write a few lines of code

First import a series of packages to be used

'''
What are the learning competitions and related data for climbing Alibaba cloud's big data platform Tianchi
 Address: https://tianchi.aliyun.com/competition/gameList/coupleList

'''

import warnings
warnings.filterwarnings("ignore")
# Ignore warning
import requests
# Import page request module
from fake_useragent import UserAgent
# Import the module that randomly generates the request header
import json
# Import json data processing package
import pandas as pd
# Import data processing module
import pymysql
# Import database processing module
 Copy code

If you find no module name, you can directly pip insatll. You're welcome.

'''
Crawling data
'''
def get_data():
    # Request header
    headers = {
            "User-Agent": UserAgent(verify_ssl=False,use_cache_server=False).random
        }
    # List for storing data
    csv_data = []

    # Header, data items you need
    head_csv = ['raceName', 'raceId', 'brief', 'currentSeasonStart', 'currentSeasonEnd', 'raceState', 'teamNum', 'type', 'currency', 'currencySymbol']
    # After page analysis, there are 4 pages and 32 pieces of data, which are crawled in a circular way
    for i in range(1,5):
        # Data api and rules found through page analysis
        competition_api = 'https://tianchi.aliyun.com/competition/proxy/api/competition/api/race/listBrief?pageNum=%d&pageSize=10&type=2'%i
        # print(competition_api)
        # Send get request
        response_data = requests.get(competition_api,headers=headers)
        # Convert the acquired data into json format, and then process the data like a dictionary
        json_data = response_data.text
        json_data = json.loads(json_data)
        # Print observation data
        # print(type(json_data['data']['list']))

        # Loop through each data and get the data you need
        for i in json_data['data']['list']:
            one_csv_data = []
            one_csv_data.append(i['raceName'])
            one_csv_data.append(i['raceId'])
            one_csv_data.append(i['brief'])
            one_csv_data.append(i['currentSeasonStart'])
            one_csv_data.append(i['currentSeasonEnd'])
            one_csv_data.append(1 if i['raceState']=='ONGOING' else 0)
            one_csv_data.append(i['teamNum'])
            one_csv_data.append(i['type'])
            one_csv_data.append(i['currency'])
            one_csv_data.append('RMB' if i['currencySymbol']=='¥' else 'dollar')
            csv_data.append(one_csv_data)
    return head_csv,csv_data

data = get_data()
Copy code

The logic of the whole code is very clear. Through the discovered data interface, obtain the data, convert it into json format data, and then traverse it, take out the data you need and store it in the list first.

Then we can save the data to CSV file while it is hot. The code is as follows:

'''
Store list data as csv Format file
'''
def save_to_cvs(data):
    # Parse data
    head_csv = data[0] # Header
    csv_data = data[1] # Data in table
    # Convert data to DataFrame format
    csv_data =pd.DataFrame(columns=head_csv,data=csv_data,)
    # print(csv_data)
    # Call to_ The CSV function stores the data as a CSV file
    csv_data.to_csv('tianchi_competition.csv',index=False, encoding='gbk')

save_to_cvs(data)
Copy code

The data are as follows:

3, Linkage of Python, CSV and MySQL

3.1 about pymysql

Pymysql is a package (module) in python that can connect to MySQL. It requires Python version 3 or above. It is easy to use. It was introduced in the crawler article long ago. For example, xxx also uses pymysql to write data related to medical devices into mysql.

3.2 basic operation code of MySQL

'''
Create a new database
'''
def create_database(database, cursor):
    # Create a database, delete the database if it exists, and then create a new one
    cursor.execute("drop database if exists %s;"%database)
    cursor.execute("create database %s;"%database)
    print("Successfully created database:%s"%database)

'''
Create a new data table in the formulation database
'''
def create_table(table_name, table_sql, database, cursor):
    # Select database database
    cursor.execute("use %s;"%database)
    # Create a database named table
    # If the table exists, it will be deleted
    cursor.execute("drop table if exists %s;"%table_name)
    # Create table
    cursor.execute(table_sql)
    print("Success in database{0}To create a data table in:{1}".format(database,table_name))

'''
Add, delete, query and modify in the data table of the specified database
'''
def sql_basic_operation(table_sql, database, cursor):
    # Select database database
    cursor.execute("use %s"%database)
    # Execute sql statement
    cursor.execute(table_sql)
    print("sql Statement executed successfully")

'''
Later, we will continue to enrich relevant functions and add web frame flask Linkage in
'''
Copy code

3.3 connect and store data to MySQL

  • Connect to MySQL and create a new database
# Connect to the database and add cursor cursor
conn = pymysql.connect(host = "localhost",port = 3306,user = "Your username", password = "Yours MySQL password",charset="utf8")
cursor = conn.cursor()
# Data sheet name
database = "tianchi"
# Create database
create_database(database, cursor)
Copy code
  • Create a data table
table_name = "competition"
#  Create a data table and set raceId as the primary key
table_sql = """create table %s(
                 raceId int not null auto_increment,
                 raceName  varchar(200),
                 brief varchar(1000),
                 currentSeasonStart date,
                 currentSeasonEnd date,
                 raceState int,
                 teamNum int,
                 type int,
                 currency int,
                 currencySymbol char(3),
                 PRIMARY KEY(raceId));
                 """%table_name
create_table(table_name, table_sql, database, cursor)
Copy code
  • Read data from CSV
'''
read csv file data 
'''
def read_csv(filepath):
    # Read the csv file and directly call the read of pandas_ csv function
    data = pd.read_csv(filepath, encoding='gbk')
    # print(type(data)) # <class 'pandas.core.frame.DataFrame'>
    sql_data = []
    # Traverse the data in DataFrame format and take out the data in one row
    for row in data.itertuples():
        # print(type(row))  # <class 'pandas.core.frame.Pandas'>
        # print(getattr(row,'raceName')) # Zero basic introduction recommendation system - news recommendation
        sql_data.append([getattr(row,'raceId'), getattr(row,'raceName'), getattr(row,'brief'), getattr(row,'currentSeasonStart'), getattr(row,'currentSeasonEnd'),
                        getattr(row,'raceState'), getattr(row,'teamNum'), getattr(row,'type'), getattr(row,'currency'), getattr(row,'currencySymbol')])
    return sql_data

filepath = 'tianchi_competition.csv'
sql_data = read_csv(filepath)
Copy code
  • Traverse the data and store it in MySQL
# Circular insert data
for row in sql_data:
    # Splicing and inserting sql statements
    sql_insert = '''insert into {0} values({1},'{2}','{3}','{4}','{5}',{6},{7},{8},{9},'{10}');
    '''.format(table_name,row[0],row[1],row[2],row[3],row[4],row[5],row[6],row[7],row[8],row[9])                                                                                    
    # print(sql_insert)
    sql_basic_operation(sql_insert, database, cursor)

conn.commit()

# Close connection
conn.close()
cursor.close()
Copy code

3.4 view data storage in MySQL

  • Terminal enters mysql
mysql -hlocalhost -uroot -p
 Copy code

  • Select database and view table data
# Select database: tianchi
use tianchi;
# View data in data table competition
select * from competition;
Copy code

This completes the linkage among python, csv and mysql.

4, For learning, small analysis

One small demand: the ID, name, number of participants and brief introduction of the top five participants in the learning competition

  • Python
filepath = 'tianchi_competition.csv'
# Read the csv file and directly call the read of pandas_ csv function
data = pd.read_csv(filepath, encoding='gbk')
# ID, name, number of participants and introduction of the top five learning competitions currently in progress in the learning competition
# 1. Remove closed games
data = data[data['raceState']==1]
# 2. Sort according to teamNum, in descending order
data = data.sort_values(by='teamNum', axis=0, ascending=False)
# 3. Take the first 5 and output the specified column
data.head(5).ix[:, ['raceId', 'raceName', 'teamNum', 'brief']]
Copy code

  • SQL
select raceId, raceName, teamNum, brief 
from competition 
where raceState=1 
order by teamNum DESC 
limit 5;
Copy code

  • Excel

step1 select the raceState column, click Sort and filter, and select filter

setp2 selects the raceState with a value of 1, indicating the ongoing game

step3 hold down the control key, select and right-click to hide unnecessary columns

step4 select the teamNum column and click Sort and filter to sort in descending order

step5 copy the first five data to the new table and finish

In terms of simplicity and difficulty, SQL may be the most convenient. One SQL statement is done. In terms of universality, Excel has a low threshold, what you see is what you get, and what you want is what you get.

Python is more convenient, especially for learners like me who use Python more. In addition, Python is obviously much more flexible.

Keywords: Python Database MySQL

Added by argyrism on Thu, 27 Jan 2022 10:22:28 +0200