Author: Yang Taotao
This paper introduces two import features of MySQL 8.0 shell sub-module Util, importTable/import_table (named difference between JS and python versions), and the usage of importJson/import_json. Import_table communicates through traditional MySQL protocol and Import_json communicates through X plug-in protocol.
MySQL has always provided import file SQL command load data infile (single thread) and corresponding executable file mysqlimport (multi-thread), such as I imported 100W line sample data to table ytt.tl1, which took 24 seconds. This is by default the fastest import of MySQL.
[root@mysql-dev ytt]# time mysqlimport --login-path=ytt_master --fields-terminated-by=, --use-threads=4 ytt /var/lib/mysql-files/tl1.csv ytt.tl1: Records: 1048576 Deleted: 0 Skipped: 0 Warnings: 0 real 0m24.815s user 0m0.013s sys 0m0.031s
Now let's look at the upgraded version of mysqlimport tool, the util toolkit of MySQL shell. Before using these two tools, you must temporarily turn on the local_infile option.
1.import_table
[root@mysql-dev ytt]# mysqlsh MySQL Shell 8.0.17 Copyright (c) 2016, 2019, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type '\help' or '\?' for help; '\quit' to exit.
Establish a new connection to port 3306
MySQL JS > \c admin@127.0.0.1:3306 Creating a session to 'admin@127.0.0.1:3306' Fetching schema names for autocompletion... Press ^C to stop. Your MySQL connection id is 56 Server version: 8.0.17 MySQL Community Server - GPL No default schema selected; type \use <schema> to set one.
I switch to python mode here
MySQL 127.0.0.1:3306 ssl JS > \py Switching to Python mode... MySQL 127.0.0.1:3306 ssl Py > \use ytt Default schema set to `ytt`.
Clear out the sample table Ytt.tl1
MySQL 127.0.0.1:3306 ssl ytt Py > \sql truncate tl1; Fetching table and column names from `ytt` for auto-completion... Press ^C to stop. Query OK, 0 rows affected (0.2354 sec)
Sample table structure
MySQL 127.0.0.1:3306 ssl ytt Py > \sql show create table tl1\G *************************** 1. row *************************** Table: tl1 Create Table: CREATE TABLE `tl1` ( `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT, `r1` int(11) DEFAULT NULL, `r2` int(11) DEFAULT NULL, `r3` varchar(30) COLLATE utf8mb4_general_ci DEFAULT NULL, `r4` datetime DEFAULT NULL, PRIMARY KEY (`id`), UNIQUE KEY `id` (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci 1 row in set (0.0011 sec)
import_table has two parameters. The first parameter defines the path of the import file, and the second defines the relevant options, such as the format of the import, the number of concurrent files, and so on.
Define file path (parameter 1)
MySQL 127.0.0.1:3306 ssl ytt Py > y_file1='/var/lib/mysql-files/tl1.csv'
Define options (parameter 2)
MySQL 127.0.0.1:3306 ssl ytt Py > y_options1={"schema":"ytt","table":"tl1","fieldsTerminatedBy":",","showProgress":True,"threads":4}
Import execution:
MySQL 127.0.0.1:3306 ssl ytt Py > util.import_table(y_file1,y_options1); Importing from file '/var/lib/mysql-files/tl1.csv' to table `ytt`.`tl1` in MySQL Server at 127.0.0.1:3306 using 1 thread [Worker000] ytt.tl1: Records: 1048576 Deleted: 0 Skipped: 0 Warnings: 0 100% (40.87 MB / 40.87 MB), 2.14 MB/s File '/var/lib/mysql-files/tl1.csv' (40.87 MB) was imported in 16.7394 sec at 2.44 MB/s Total rows affected in ytt.tl1: Records: 1048576 Deleted: 0 Skipped: 0 Warnings: 0
It took less than 17 seconds, much faster than the traditional mysqlimport.
We specified the field separator explicitly above. Is there a well-defined combination format? The answer is yes. Option dialect can specify the following format: csv,tsv, json,csv-unix
So the import above, we can be simpler.
Change the definition of variable y_options 1
MySQL 127.0.0.1:3306 ssl ytt Py > \sql truncate tl1; Query OK, 0 rows affected (0.0776 sec) MySQL 127.0.0.1:3306 ssl ytt Py > y_options1={"schema":"ytt","table":"tl1","dialect":"csv-unix","showProgress":True,"threads":4} MySQL 127.0.0.1:3306 ssl ytt Py > util.import_table(y_file1,y_options1); Importing from file '/var/lib/mysql-files/tl1.csv' to table `ytt`.`tl1` in MySQL Server at 127.0.0.1:3306 using 1 thread [Worker000] ytt.tl1: Records: 1048576 Deleted: 0 Skipped: 0 Warnings: 0 100% (40.87 MB / 40.87 MB), 2.67 MB/s File '/var/lib/mysql-files/tl1.csv' (40.87 MB) was imported in 14.1000 sec at 2.90 MB/s Total rows affected in ytt.tl1: Records: 1048576 Deleted: 0 Skipped: 0 Warnings: 0
The import time is about the same. To illustrate, the dialect option has a lower priority, such as adding'lines Terminated By':' r n', which overrides his own' n'.
Option diaelect also has an optional value of json, which can import JSON results directly into the document table. For example, I created a new table tl1_json
MySQL 127.0.0.1:3306 ssl ytt Py > \sql create table tl1_json( id bigint primary key, content json); Query OK, 0 rows affected (0.3093 sec)
Redefine files and import options.
MySQL 127.0.0.1:3306 ssl ytt Py > y_file2='/var/lib/mysql-files/tl1.json' MySQL 127.0.0.1:3306 ssl ytt Py > rows=['content'] MySQL 127.0.0.1:3306 ssl ytt Py > y_options2={"schema":"ytt","table":"tl1_json","dialect":"json","showProgress":True,"threads":4,'columns':rows}
Importing JSON data
MySQL 127.0.0.1:3306 ssl ytt Py > util.import_table(y_file2,y_options2) Importing from file '/var/lib/mysql-files/tl1.json' to table `ytt`.`tl1_json` in MySQL Server at 127.0.0.1:3306 using 2 threads [Worker000] ytt.tl1_json: Records: 464633 Deleted: 0 Skipped: 0 Warnings: 0 [Worker001] ytt.tl1_json: Records: 583943 Deleted: 0 Skipped: 0 Warnings: 0 100% (90.15 MB / 90.15 MB), 2.71 MB/s File '/var/lib/mysql-files/tl1.json' (90.15 MB) was imported in 23.3530 sec at 3.86 MB/s Total rows affected in ytt.tl1_json: Records: 1048576 Deleted: 0 Skipped: 0 Warnings: 0
The speed is OK, less than 24 seconds.
When importing JSON data, you have to mention import_json, a tool for communicating with the X Plug-in Protocol.
2.imort_json
We switch to mysqlx port
MySQL 127.0.0.1:3306 ssl ytt Py > \c admin@127.0.0.1:33060 Creating a session to 'admin@127.0.0.1:33060' Fetching schema names for autocompletion... Press ^C to stop. Closing old connection... Your MySQL connection id is 16 (X protocol) Server version: 8.0.17 MySQL Community Server - GPL No default schema selected; type \use <schema> to set one. MySQL 127.0.0.1:33060+ ssl Py > \use ytt Default schema `ytt` accessible through db. -- Empty list tl1_json MySQL 127.0.0.1:33060+ ssl ytt Py > \sql truncate tl1_json; Query OK, 0 rows affected (0.1098 sec)
The import_json parameter is similar to the Import_table parameter.
Here I change the options.
MySQL 127.0.0.1:33060+ ssl ytt Py > y_file3=y_file2 MySQL 127.0.0.1:33060+ ssl ytt Py > y_options3={"schema":"ytt","table":"tl1_json",'tableColumn':'content'} MySQL 127.0.0.1:33060+ ssl ytt Py > util.import_json(y_file3,y_options3) Importing from file "/var/lib/mysql-files/tl1.json" to table `ytt`.`tl1_json` in MySQL Server at 127.0.0.1:33060 .. 517776.. 1032724.. 1048576.. 1048576 Processed 90.15 MB in 1048576 documents in 35.2400 sec (29.76K documents/s) Total successfully imported documents 1048576 (29.76K documents/s)
I don't see the option of multithreading in the manual, so single-threading runs 35 seconds slower.
View the data just imported
MySQL 127.0.0.1:33060+ ssl ytt Py > \sql select id,json_pretty(content) from tl1_json limit 1\G *************************** 1. row *************************** id: 1 json_pretty(content): { "id": 1, "r1": 10, "r2": 10, "r3": "mysql", "r4": "2019-09-16 16:49:50.000000" } 1 row in set (0.0007 sec)
import_json can not only import Json data, but also transform smoothly between BSON and JSON. Interested students can go to TRY.