New Feature Interpretation | MySQL 8.0 shell util feature

Author: Yang Taotao

This paper introduces two import features of MySQL 8.0 shell sub-module Util, importTable/import_table (named difference between JS and python versions), and the usage of importJson/import_json. Import_table communicates through traditional MySQL protocol and Import_json communicates through X plug-in protocol.

MySQL has always provided import file SQL command load data infile (single thread) and corresponding executable file mysqlimport (multi-thread), such as I imported 100W line sample data to table ytt.tl1, which took 24 seconds. This is by default the fastest import of MySQL.

[root@mysql-dev ytt]# time mysqlimport  --login-path=ytt_master --fields-terminated-by=, --use-threads=4 ytt  /var/lib/mysql-files/tl1.csv
ytt.tl1: Records: 1048576  Deleted: 0  Skipped: 0  Warnings: 0

real    0m24.815s
user    0m0.013s
sys     0m0.031s

Now let's look at the upgraded version of mysqlimport tool, the util toolkit of MySQL shell. Before using these two tools, you must temporarily turn on the local_infile option.

1.import_table

[root@mysql-dev ytt]# mysqlsh
MySQL Shell 8.0.17

Copyright (c) 2016, 2019, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates.
Other names may be trademarks of their respective owners.

Type '\help' or '\?' for help; '\quit' to exit.

Establish a new connection to port 3306

MySQL  JS > \c admin@127.0.0.1:3306
Creating a session to 'admin@127.0.0.1:3306'
Fetching schema names for autocompletion... Press ^C to stop.
Your MySQL connection id is 56
Server version: 8.0.17 MySQL Community Server - GPL
No default schema selected; type \use <schema> to set one.

I switch to python mode here

MySQL  127.0.0.1:3306 ssl  JS > \py
Switching to Python mode...
MySQL  127.0.0.1:3306 ssl  Py > \use ytt
Default schema set to `ytt`.

Clear out the sample table Ytt.tl1

MySQL  127.0.0.1:3306 ssl  ytt  Py > \sql truncate tl1;
Fetching table and column names from `ytt` for auto-completion... Press ^C to stop.
Query OK, 0 rows affected (0.2354 sec)

Sample table structure

MySQL  127.0.0.1:3306 ssl  ytt  Py > \sql show create table tl1\G
*************************** 1. row ***************************
       Table: tl1
Create Table: CREATE TABLE `tl1` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `r1` int(11) DEFAULT NULL,
  `r2` int(11) DEFAULT NULL,
  `r3` varchar(30) COLLATE utf8mb4_general_ci DEFAULT NULL,
  `r4` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `id` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci
1 row in set (0.0011 sec)

import_table has two parameters. The first parameter defines the path of the import file, and the second defines the relevant options, such as the format of the import, the number of concurrent files, and so on.

Define file path (parameter 1)

MySQL  127.0.0.1:3306 ssl  ytt  Py > y_file1='/var/lib/mysql-files/tl1.csv'

Define options (parameter 2)

MySQL  127.0.0.1:3306 ssl  ytt  Py > y_options1={"schema":"ytt","table":"tl1","fieldsTerminatedBy":",","showProgress":True,"threads":4}

Import execution:

MySQL  127.0.0.1:3306 ssl  ytt  Py > util.import_table(y_file1,y_options1);
Importing from file '/var/lib/mysql-files/tl1.csv' to table `ytt`.`tl1` in MySQL Server at 127.0.0.1:3306 using 1 thread
[Worker000] ytt.tl1: Records: 1048576  Deleted: 0  Skipped: 0  Warnings: 0
100% (40.87 MB / 40.87 MB), 2.14 MB/s
File '/var/lib/mysql-files/tl1.csv' (40.87 MB) was imported in 16.7394 sec at 2.44 MB/s
Total rows affected in ytt.tl1: Records: 1048576  Deleted: 0  Skipped: 0  Warnings: 0
It took less than 17 seconds, much faster than the traditional mysqlimport.

We specified the field separator explicitly above. Is there a well-defined combination format? The answer is yes. Option dialect can specify the following format: csv,tsv, json,csv-unix

So the import above, we can be simpler.

Change the definition of variable y_options 1

MySQL  127.0.0.1:3306 ssl  ytt  Py > \sql truncate tl1;
Query OK, 0 rows affected (0.0776 sec)
MySQL  127.0.0.1:3306 ssl  ytt  Py > y_options1={"schema":"ytt","table":"tl1","dialect":"csv-unix","showProgress":True,"threads":4}
MySQL  127.0.0.1:3306 ssl  ytt  Py > util.import_table(y_file1,y_options1);
Importing from file '/var/lib/mysql-files/tl1.csv' to table `ytt`.`tl1` in MySQL Server at 127.0.0.1:3306 using 1 thread
[Worker000] ytt.tl1: Records: 1048576  Deleted: 0  Skipped: 0  Warnings: 0
100% (40.87 MB / 40.87 MB), 2.67 MB/s
File '/var/lib/mysql-files/tl1.csv' (40.87 MB) was imported in 14.1000 sec at 2.90 MB/s
Total rows affected in ytt.tl1: Records: 1048576  Deleted: 0  Skipped: 0  Warnings: 0

The import time is about the same. To illustrate, the dialect option has a lower priority, such as adding'lines Terminated By':' r n', which overrides his own' n'.

Option diaelect also has an optional value of json, which can import JSON results directly into the document table. For example, I created a new table tl1_json

MySQL  127.0.0.1:3306 ssl  ytt  Py > \sql create table tl1_json(
     id bigint primary key, 
     content json);
Query OK, 0 rows affected (0.3093 sec)

Redefine files and import options.

MySQL  127.0.0.1:3306 ssl  ytt  Py > y_file2='/var/lib/mysql-files/tl1.json'
MySQL  127.0.0.1:3306 ssl  ytt  Py > rows=['content']
MySQL  127.0.0.1:3306 ssl  ytt  Py > y_options2={"schema":"ytt","table":"tl1_json","dialect":"json","showProgress":True,"threads":4,'columns':rows}

Importing JSON data

MySQL  127.0.0.1:3306 ssl  ytt  Py > util.import_table(y_file2,y_options2)
Importing from file '/var/lib/mysql-files/tl1.json' to table `ytt`.`tl1_json` in MySQL Server at 127.0.0.1:3306 using 2 threads
[Worker000] ytt.tl1_json: Records: 464633  Deleted: 0  Skipped: 0  Warnings: 0
[Worker001] ytt.tl1_json: Records: 583943  Deleted: 0  Skipped: 0  Warnings: 0
100% (90.15 MB / 90.15 MB), 2.71 MB/s
File '/var/lib/mysql-files/tl1.json' (90.15 MB) was imported in 23.3530 sec at 3.86 MB/s
Total rows affected in ytt.tl1_json: Records: 1048576  Deleted: 0  Skipped: 0  Warnings: 0

The speed is OK, less than 24 seconds.

When importing JSON data, you have to mention import_json, a tool for communicating with the X Plug-in Protocol.

2.imort_json

We switch to mysqlx port

MySQL  127.0.0.1:3306 ssl  ytt  Py > \c admin@127.0.0.1:33060
Creating a session to 'admin@127.0.0.1:33060'
Fetching schema names for autocompletion... Press ^C to stop.
Closing old connection...
Your MySQL connection id is 16 (X protocol)
Server version: 8.0.17 MySQL Community Server - GPL
No default schema selected; type \use <schema> to set one.
 MySQL  127.0.0.1:33060+ ssl  Py > \use ytt
Default schema `ytt` accessible through db.
-- Empty list tl1_json
 MySQL  127.0.0.1:33060+ ssl  ytt  Py > \sql truncate tl1_json;
Query OK, 0 rows affected (0.1098 sec)

The import_json parameter is similar to the Import_table parameter.

Here I change the options.

MySQL  127.0.0.1:33060+ ssl  ytt  Py > y_file3=y_file2
MySQL  127.0.0.1:33060+ ssl  ytt  Py > y_options3={"schema":"ytt","table":"tl1_json",'tableColumn':'content'}
MySQL  127.0.0.1:33060+ ssl  ytt  Py > util.import_json(y_file3,y_options3)
Importing from file "/var/lib/mysql-files/tl1.json" to table `ytt`.`tl1_json` in MySQL Server at 127.0.0.1:33060
.. 517776.. 1032724.. 1048576.. 1048576
Processed 90.15 MB in 1048576 documents in 35.2400 sec (29.76K documents/s)
Total successfully imported documents 1048576 (29.76K documents/s)

I don't see the option of multithreading in the manual, so single-threading runs 35 seconds slower.

View the data just imported

MySQL  127.0.0.1:33060+ ssl  ytt  Py > \sql select  id,json_pretty(content) from tl1_json limit 1\G
*************************** 1. row ***************************
                  id: 1
json_pretty(content): {
  "id": 1,
  "r1": 10,
  "r2": 10,
  "r3": "mysql",
  "r4": "2019-09-16 16:49:50.000000"
}
1 row in set (0.0007 sec)

import_json can not only import Json data, but also transform smoothly between BSON and JSON. Interested students can go to TRY.

Keywords: Database MySQL SSL JSON SQL

Added by Matt Kindig on Tue, 24 Sep 2019 11:53:34 +0300