python operation excel import data to mysql

Requirement: the exchange code of excel is imported into mysql, with a total of 30W pieces of data. Navicat has its own functions. It can directly import excel data into mysql. There are too many databases and manual operation is very troublesome. A script is written with python.

Use pip to import xlrd,pymysql library.

First edition:

import xlrd
import pymysql
import math

# book = xlrd.open_workbook('activity_password1.xlsx')
# sheet = book.sheet_by_name('@activity_password')

filelist = ['activity_password1.xlsx', 'activity_password2.xlsx', 'activity_password3.xlsx', \
            'activity_password4.xlsx', 'activity_password5.xlsx', 'activity_password6.xlsx', \
            'activity_password6_1.xlsx', 'activity_password7.xlsx', 'activity_password8.xlsx']

for i in range(1, 100):
    # Establish mysql connection
    conn = pymysql.connect(
            host = '127.0.0.1',
            user='root',
            passwd='123456',
            db='youxi' + str(i),
            port=3306,
            charset='utf8'
        )

    # Get cursor
    cur = conn.cursor()

    sql = 'insert into gm_password (id, type, code, status) values (%s, %s, %s, %s)'

    for filename in filelist:
        book = xlrd.open_workbook(filename)
        sheet = book.sheet_by_name('@activity_password')
        for r in range(1, sheet.nrows):
            values = (sheet.cell(r, 0).value, sheet.cell(r, 1).value, sheet.cell(r, 2).value, sheet.cell(r, 3).value)
            cur.execute(sql, values)
        conn.commit()
    cur.close()
    conn.close()
    print ('youxi'+str(i)+' sucess')

After a little execution, I couldn't bear the speed. And if you import data to the Internet, it is a timeout. Optimize.

Optimized code:

import xlrd
import pymysql
import math

# book = xlrd.open_workbook('activity_password1.xlsx')
# sheet = book.sheet_by_name('@activity_password')

filelist = ['activity_password1.xlsx', 'activity_password2.xlsx', 'activity_password3.xlsx', \
            'activity_password4.xlsx', 'activity_password5.xlsx', 'activity_password6.xlsx', \
            'activity_password6_1.xlsx', 'activity_password7.xlsx', 'activity_password8.xlsx']

for i in range(1100):
    # Establish mysql connection
    conn = pymysql.connect(
            host = '127.0.0.1',
            user='root',
            passwd = '123456',
            db='youxi' + str(i),
            port=3306,
            charset='utf8'
        )

    # Get cursor
    cur = conn.cursor()

    for filename in filelist:
        book = xlrd.open_workbook(filename)
        sheet = book.sheet_by_name('@activity_password')
        ops = []
        for r in range(1, sheet.nrows):
            values = (sheet.cell(r, 0).value, sheet.cell(r, 1).value, sheet.cell(r, 2).value, sheet.cell(r, 3).value)
            ops.append(values)
        n = math.ceil(len(ops) / 5000)
        for n1 in range(0, n):
            cur.executemany('insert into gm_password (id, type, code, status) values (%s, %s, %s, %s)', ops[5000*n1:5000*(n1+1)])
    cur.close()
    conn.commit()
    conn.close()
    print ('youxi'+str(i)+' sucess')

Summary of optimization points:
1. Insert in bulk, and then submit, rather than insert one and submit one.
2. Replace execute with executemany, but don't insert too many lines. My setting is 5000.
3. Do not splice SQL statements by yourself. Execute directly in the executemany method.
4. Reducing the number of commit s is very important.

Results: it takes 30s for a database to be inserted, which has met the requirements, and has not been optimized. There should be some points that can be optimized.

Keywords: MySQL SQL Excel Navicat

Added by uncleronin on Mon, 06 Apr 2020 19:22:11 +0300