Is database batch insertion so particular?

First, insert multiple pieces of data. Options:

  • foreach loop insert
  • Splicing sql, one execution
  • Insert using batch function

1. foreach insertion

First obtain the list, and then perform a database operation for each piece of data to insert data:

@SpringBootTest
@MapperScan("com.aphysia.springdemo.mapper")
class SpringdemoApplicationTests {

    @Autowired
    SqlSessionFactory sqlSessionFactory;

    @Resource
    UserMapper userMapper;

    static int num = 100000;

    static int id = 1;

    @Test
    void insertForEachTest() {
        List<User> users = getRandomUsers();
        long start = System.currentTimeMillis();
        for (int i = 0; i < users.size(); i++) {
            userMapper.insertUser(users.get(i));
        }
        long end = System.currentTimeMillis();
        System.out.println("time:" + (end - start));
    }
}

2. Splicing and inserting

In fact, you insert data in the following way:

INSERT INTO `user` (`id`, `age`) 
VALUES (1, 11),
(2, 12),
(3, 13),
(4, 14),
(5, 15);
    @Test
    void insertSplicingTest() {
        List<User> users = getRandomUsers();
        long start = System.currentTimeMillis();
        userMapper.insertBatch(users);
        long end = System.currentTimeMillis();
        System.out.println("time:" + (end - start));
    }

3. Batch insert using batch

Set the "executor type" of MyBatis session to "Batch", use sqlSessionFactory to set the execution method to Batch, and set automatic submission to false. After all are inserted, submit at one time:

    @Test
    public void insertBatch(){
        SqlSession sqlSession = sqlSessionFactory.openSession(ExecutorType.BATCH, false);
        UserMapper mapper = sqlSession.getMapper(UserMapper.class);
        List<User> users = getRandomUsers();
        long start = System.currentTimeMillis();
        for(int i=0;i<users.size();i++){
            mapper.insertUser(users.get(i));
        }
        sqlSession.commit();
        sqlSession.close();
        long end = System.currentTimeMillis();
        System.out.println("time:" + (end - start));
    }

4. Batch processing + batch submission

On the basis of batch processing, submit every 1000 pieces of data first, that is, submit in batches.

    @Test
    public void insertBatchForEachTest(){
        SqlSession sqlSession = sqlSessionFactory.openSession(ExecutorType.BATCH, false);
        UserMapper mapper = sqlSession.getMapper(UserMapper.class);
        List<User> users = getRandomUsers();
        long start = System.currentTimeMillis();
        for(int i=0;i<users.size();i++){
            mapper.insertUser(users.get(i));
            if (i % 1000 == 0 || i == num - 1) {
                sqlSession.commit();
                sqlSession.clearCache();
            }
        }
        sqlSession.close();
        long end = System.currentTimeMillis();
        System.out.println("time:" + (end - start));
    }

The first result, obviously wrong?

Running the above code, we can get the following results. The efficiency of for loop insertion is really poor, and the efficiency of spliced sql is relatively high. I see some data that spliced sql may be limited by mysql, but I only see heap memory overflow when I execute to 1000w.

The following is the incorrect result!!!

The spliced sql does not exceed memory

Let's take a look at the limitations of mysql:

mysql> show VARIABLES like '%max_allowed_packet%';
+---------------------------+------------+
| Variable_name             | Value      |
+---------------------------+------------+
| max_allowed_packet        | 67108864   |
| mysqlx_max_allowed_packet | 67108864   |
| slave_max_allowed_packet  | 1073741824 |
+---------------------------+------------+
3 rows in set (0.12 sec)

This 67108864 is more than 600 M, which is too large. No wonder it won't report an error. Let's change it and retest after changing it:

  1. First, when starting mysql, enter the container, or directly click the Cli icon in the Docker desktop version:
    docker exec -it mysql bash

  2. Enter the / etc/mysql directory to modify my CNF file:
    cd /etc/mysql
  3. Follow vim first, or you won't be able to edit the file:
    apt-get update
    apt-get install vim
  4. Modify my cnf
    vim my.cnf
  5. Add Max on the last line_ allowed_ Packet = 20M (press i to edit, press esc after editing, enter: wq to exit)
     
    [mysqld]
    pid-file        = /var/run/mysqld/mysqld.pid
    socket          = /var/run/mysqld/mysqld.sock
    datadir         = /var/lib/mysql
    secure-file-priv= NULL
    # Disabling symbolic-links is recommended to prevent assorted security risks
    symbolic-links=0
     
    # Custom config should go here
    !includedir /etc/mysql/conf.d/
    max_allowed_packet=2M
  6. Exit container
    # exit
  7. Restart mysql
    docker restart c178e8998e68

    View the maximum Max after successful restart_ allowed_ Pact, it is found that it has been modified successfully:

    mysql> show VARIABLES like '%max_allowed_packet%';
    +---------------------------+------------+
    | Variable_name             | Value      |
    +---------------------------+------------+
    | max_allowed_packet        | 2097152    |
    | mysqlx_max_allowed_packet | 67108864   |
    | slave_max_allowed_packet  | 1073741824 |
    +---------------------------+------------+

    We executed the splicing sql again and found that at 100w, the sql reached about 3.6M, exceeding the 2M we set. The successful demonstration threw an error

    org.springframework.dao.TransientDataAccessResourceException: 
    ### Cause: com.mysql.cj.jdbc.exceptions.PacketTooBigException: Packet for query is too large (36,788,583 > 2,097,152). You can change this value on the server by setting the 'max_allowed_packet' variable.
    ; Packet for query is too large (36,788,583 > 2,097,152). You can change this value on the server by setting the 'max_allowed_packet' variable.; nested exception is com.mysql.cj.jdbc.exceptions.PacketTooBigException: Packet for query is too large (36,788,583 > 2,097,152). You can change this value on the server by setting the 'max_allowed_packet' variable.
    
    

    Why is batch processing so slow?

    However, if you look carefully, you will find that the above method, how to batch processing, does not show an advantage, and is no different from the for loop? Is that right?

    This is definitely wrong. From the official documents, we can see that it will be updated in batches and will not create preprocessing statements every time. The theory is faster.

Then I found one of my most important problems: the database connection URL address is missing. Rewritebackedstatements = true

If we don't write, the MySQL JDBC} driver will ignore the} executeBatch() statement by default. We expect a group of} sql} statements executed in batches to be broken up, but when executed, they are sent to the} MySQL} database one by one. In fact, it is a single insert, directly resulting in low performance. I said that the performance is similar to that of looping to insert data.

The database driver will help us execute SQL in batches only when the parameter "rewritebackedstatements" is set to "true".

Correct database connection:

jdbc:mysql://127.0.0.1:3306/test?characterEncoding=utf-8&useSSL=false&allowPublicKeyRetrieval=true&&serverTimezone=UTC&setUnicode=true&characterEncoding=utf8&&nullCatalogMeansCurrent=true&&autoReconnect=true&&allowMultiQueries=true&&&rewriteBatchedStatements=true

After finding the problem, we retest the batch test, and the final results are as follows:

Insertion mode1010010001w10w100w1000w
for loop insertion3871150790770026635984Too long...Too long...
Splicing sql insert308320392838315624948 (probably exceeding the sql length limit)OutOfMemoryError: heap memory overflow
Batch processing (key)33332336263616388978OutOfMemoryError: heap memory overflow
Batch + batch submission359313394630290718631OutOfMemoryError: heap memory overflow

From the above results, batch processing is indeed much faster. When the order of magnitude is too large, it will actually exceed the memory overflow. Batch processing and batch submission do not become faster, which is similar to batch processing, but slower. They are submitted too many times. When the number of sql splicing schemes is relatively small, they are not much different from batch processing, The worst solution is to insert data into the for loop, which is really time-consuming. It already needs 1s when there are 100 articles. You can't choose this scheme.

At first, when I found that batch processing was slow, I really doubted myself. Later, I found that there was a parameter, which felt like clearing the clouds. The more I knew, the more I didn't know.

Keywords: Operation & Maintenance Database MySQL

Added by NathanLedet on Fri, 21 Jan 2022 18:03:40 +0200