Analysis of distributed DBLE LOAD DATA function implementation

Author: Lu Lu Those who love technology and are willing to share are mainly engaged in the research of database related technology. Source: original contribution *Aikesheng is produced by the open source community. The original content cannot be used without authorization. Please contact the editor for reprint and indicate the source.

1. overview

This article mainly introduces the realization of large-scale data import function of DBLE LOAD DATA, including scheme design and source code interpretation.

Let's explore how DBLE realizes this function!

2. Scheme design

LOAD DATA provides MySQL with the syntax of importing data from text file to table. As a database middleware, it also needs to implement corresponding functions to meet the user's import data requirements.

DBLE's implementation of this function directly simulates the corresponding processing protocol of MySQL for LOAD DATA command. Of course, as a database middleware, it also needs to deal with the logic of data storage, data routing and interaction with the back-end mysql.

The following figure shows the overall process of DBLE processing LOAD DATA:

3. Source code interpretation

There are two classes related to the implementation of LOAD DATA function of DBLE: ServerLoadDataInfileHandler class and LoadDataUtil class. ServerLoadDataInfileHandler class mainly deals with the logic of interaction with the client, while LoadDataUtil class mainly deals with the logic of interaction with MySQL at the back.

Next, we will send commands from the client to DBLE for processing, and finally to the process of DBLE interacting with the backend MySQL to see the corresponding code in detail.

When the client sends the LOAD DATA import data to the table command, DBLE as the server will receive the corresponding command and process it. The corresponding code is in the serverqueryhandler ා query method. Here, it will judge that the SQL type is LOAD DATA, and then further processing:

public void query(String sql) {
        ServerConnection c = this.source;
        if (LOGGER.isDebugEnabled()) {
            LOGGER.debug(String.valueOf(c) + sql);
        }
         ......
        int rs = ServerParse.parse(sql);
        boolean isWithHint = ServerParse.startWithHint(sql);
        int sqlType = rs & 0xff;
        ......
        switch (sqlType) {
        ......
            case ServerParse.LOAD_DATA_INFILE_SQL:
                    //For the processing of LOAD DATA, call the frontendconnection ා loaddatainfilestart method
                    c.loadDataInfileStart(sql);
                    break;
      ......
      }
  }

Continue to look at the frontendconnection? Loaddatainfilestart method:

	public void loadDataInfileStart(String sql) {
        if (loadDataInfileHandler != null) {
            try {
                //The serverloaddatainfilehandlerාstart method is further called
                loadDataInfileHandler.start(sql);
            } catch (Exception e) {
                LOGGER.info("load data error", e);
                writeErrMessage(ErrorCode.ERR_HANDLE_DATA, e.getMessage());
            }

        } else {
            writeErrMessage(ErrorCode.ER_UNKNOWN_COM_ERROR, "load data infile sql is not  unsupported!");
        }
    }

Next, enter the serverloaddatainfilehandler ා start method. As mentioned above, this class mainly deals with the interaction logic between DBLE and client.

This method is relatively long. You can go to have a closer look. The main function is to parse the SQL statement sent by the client, and then analyze it directly according to the LOAD DATA syntax. If the imported file is a local file, otherwise, the command to get the file will be sent to the client to transfer the file to the client:

public void start(String strSql) {
        ......
        parseLoadDataPram();
        //If the file is not local, send the command to the client to request the data file. The local may be confusing here, but the MySQL syntax does stipulate that the usage of load data local is not the local usage of the file
        if (statement.isLocal()) {
            isStartLoadData = true;
            //request file from client
            ByteBuffer buffer = serverConnection.allocate();
            RequestFilePacket filePacket = new RequestFilePacket();
            filePacket.setFileName(fileName.getBytes());
            filePacket.setPacketId(1);
            filePacket.write(buffer, serverConnection, true);
        } else {
            //If the file is local, first judge whether the file exists, if not, error will be reported. If it exists, read the file, calculate the routing result of each line, and then store the data of different nodes separately
            if (!new File(fileName).exists()) {
                String msg = fileName + " is not found!";
                clear();
                serverConnection.writeErrMessage(ErrorCode.ER_FILE_NOT_FOUND, msg);
            } else {
                if (parseFileByLine(fileName, loadData.getCharset(), loadData.getLineTerminatedBy())) {
                    RouteResultset rrs = buildResultSet(routeResultMap);
                    if (rrs != null) {
                        flushDataToFile();
                        isStartLoadData = false;
                        serverConnection.getSession2().execute(rrs);
                    }
                }
            }
        }
    }

After DBLE sends the command to the client, the client will send the data file continuously. The processing logic of the sent file is in the serverloaddatainfilehandler ා handle method, which is to dump the transferred file. If the default data is less than 200Mb, it will be stored in memory. Otherwise, it will be stored in the local file:

public void handle(byte[] data) {
        try {
            if (sql == null) {
                clear();
                serverConnection.writeErrMessage(ErrorCode.ER_UNKNOWN_COM_ERROR, "Unknown command");
                return;
            }
            BinaryPacket packet = new BinaryPacket();
            ByteArrayInputStream inputStream = new ByteArrayInputStream(data, 0, data.length);
            packet.read(inputStream);
            //Here is the dump of the sent file
            saveByteOrToFile(packet.getData(), false);
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

After the file is sent, the client will send an empty package to tell DBLE that the data has been sent, and then DBLE will carry out the next processing (in fact, this is the MySQL protocol). The next processing logic is in the serverloaddatainfilehandler ා end method.

This method is also relatively long. The main processing logic is to further calculate the route of the received files. According to the calculation results, the files are stored separately according to different nodes. Finally, the routing result set is built. The DBLE sends the LOAD DATA command to different MySQL nodes at the back end

public void end(byte packId) {
        isStartLoadData = false;
        this.packID = packId;
        //empty packet for end
        saveByteOrToFile(null, true);

        if (isHasStoreToFile) {
            //This is to calculate the route and store the data files of different nodes according to the route results
            parseFileByLine(tempFile, loadData.getCharset(), loadData.getLineTerminatedBy());
        }
        ......
        //Build routing result set, issue back-end MySQL and execute LOAD DATA command
        RouteResultset rrs = buildResultSet(routeResultMap);
        if (rrs != null) {
            flushDataToFile();
            serverConnection.getSession2().execute(rrs);
        }
}

The interaction logic between DBLE and back-end MySQL is basically the same as that between client and DBLE, because they are all based on MySQL protocol. What DBLE needs to do is to send data files of different nodes to back-end mysql. The specific logic is in the loaddatautil ᦇ requestfiledataresponse method, which is to send DBLE The processed data file is sent to the MySQL on the back end, and MySQL is used for real data storage:

public static void requestFileDataResponse(byte[] data, BackendConnection conn) {
        byte packId = data[3];
        MySQLConnection c = (MySQLConnection) conn;
        RouteResultsetNode rrn = (RouteResultsetNode) conn.getAttachment();
        LoadData loadData = rrn.getLoadData();
        List<String> loadDataData = loadData.getData();

        BufferedInputStream in = null;
        try {
            //If the data is small and in memory, it will be sent directly
            if (loadDataData != null && loadDataData.size() > 0) {
                ByteArrayOutputStream bos = new ByteArrayOutputStream();
                for (String loadDataDataLine : loadDataData) {
                    String s = loadDataDataLine + loadData.getLineTerminatedBy();
                    byte[] bytes = s.getBytes(CharsetUtil.getJavaCharset(loadData.getCharset()));
                    bos.write(bytes);
                }
                packId = writeToBackConnection(packId, new ByteArrayInputStream(bos.toByteArray()), c);
            } else {
                //Otherwise, read the file first, and then send the data
                in = new BufferedInputStream(new FileInputStream(loadData.getFileName()));
                packId = writeToBackConnection(packId, in, c);
            }
        } 
     ......
    }

Here, the whole DBLE process of LOAD DATA is finished.

4. summary

This article mainly analyzes and explains DBLE's implementation of LOAD DATA function, including scheme design and source code interpretation. I hope you can have a further understanding of the whole LOAD DATA function after reading.

Keywords: Database MySQL SQL less

Added by Superian on Thu, 12 Mar 2020 11:42:09 +0200