Sqlite3 insert optimization summary

background

Recently, the main application scenario for the function of persisting vector data to Spatialite database is to request vector data in a certain range from db in real time, obtain its information (point set, style and field), present it on the map, or carry out vector information processing related businesses through the RTree based spatial index capability provided by Spatialite.

Why Spatialite:

  1. In essence, it is a lightweight text database based on Sqlite, which provides spatial indexing capability on it;
  2. Local text database, easy to transplant;
  3. RTree's spatial index algorithm meets the needs of most geographic data applications based on spatial requests;
  4. Fit GIS field: providing API is very friendly for having PostGIS database experience.

This chapter focuses on the optimization of the insertion efficiency of N vector data, which is essentially the optimization of Sqlite3 insertion efficiency (any information about vectors and spatialites is only the support of the application environment and can be ignored temporarily).

1, Pre preparation

Database table design

(combined with a simple application environment verification, because it is combined with the development in the field of GIS, the table design is related to the expression of vector information)
A table named [shape] is provided, and the field design is as follows:
That is, the following data will insert the structure corresponding to the above fields.

Data import

(because the data is obtained from *. shp file, GDAL-OGR method is used to analyze vector data, but it is not the focus of this chapter, just to explain the data source)
Import file: [block. shp] - 44.4MB number of vectors: 266311, i.e. number of inserts: 266311.

  1. OGR parses ogrfeatures as vector data;
  2. Randomly generate a self increasing number as UUID and write it into the UUID field;
  3. The fields puuid, type, createtime and modifytime are ignored;
  4. OGRFeature gets the name and writes it to the name field
  5. The point set of OGRFeature is converted to wkb form and written into geom field.

(OGR reading, point set to Blob data, no time is consumed after the test, which can be ignored)

Basic import process

Load -> TravelShpFile -> Unload

  • Load: load *. Through OGR shp file; Open the specified db file (sqlite3_open_v2) and associate spatialite;
  • TravelShpFile: insert each solved vector data. Later, we will compare different implementations of this function;
  • Unload: destroy the Handler of the open file and close the Handler of the db operation (sqlite3_close).

About sql insert operation of Sqlite3

sqlite3 provides the following two ways to insert sql:

  • sqlite3_ The exec () function directly calls the sql statement string. Each time the function is executed, it actually encapsulates the process of "lexical analysis" and "syntax analysis";
  • The "execution preparation" function provided by sqlite3 compiles sql statements into a language that the system can understand in advance, and then executes them step by step, which is equivalent to a detailed separation of "lexical analysis" and "syntax analysis".
    preparer -> [reset -> bind -> step] x n -> finalize

2, Data insertion

The comparison practice of Sqlite3 insertion optimization is divided into the following three cases:

1. Do not start the transaction, encapsulate [prepare - > Reset - > bind - > step - > finalize] in a function, and call n times

Because the table field has blob field, SQLite3 is not used directly_ Exec () call to analyze;
The above operations can be understood as that the function is encapsulated into sqlite3_exec();
A call method is not recommended for batch operation. It is encapsulated here for testing data.

  • The encapsulation functions are as follows:
bool CShapeGeometryDB::InsertShape(const StShapeInfo& stShapeInfo, unsigned char* pBlob, int nBlobSize)
{
	std::string strSql = "INSERT INTO 'main'.'shape'('uuid', 'puuid', 'type', 'name', 'createtime', 'modifytime', 'geom') VALUES(?, ?, ?, ?, ?, ?, ?)";
	if (NULL == m_pDB)
	{
		return false;
	}

	sqlite3_stmt* stmt;
	int ret = sqlite3_prepare_v2(m_pDB, strSql.data(), strSql.size(), &stmt, NULL);
	if (ret != SQLITE_OK)
	{
		std::cout << "INSERT prepare Error! " << sqlite3_errmsg(m_pDB) << std::endl;
		return false;
	}

	sqlite3_bind_text(stmt, 1, stShapeInfo.m_strUUID.data(), stShapeInfo.m_strUUID.size(), NULL);
	sqlite3_bind_text(stmt, 2, stShapeInfo.m_strPUUID.data(), stShapeInfo.m_strPUUID.size(), NULL);
	sqlite3_bind_int(stmt, 3, stShapeInfo.m_nType);
	sqlite3_bind_text(stmt, 4, stShapeInfo.m_strName.data(), stShapeInfo.m_strName.size(), NULL);
	sqlite3_bind_text(stmt, 5, NULL, 0, NULL);
	sqlite3_bind_text(stmt, 6, NULL, 0, NULL);
	sqlite3_bind_blob(stmt, 7, pBlob, nBlobSize, free);

	bool bOk = true;
	ret = sqlite3_step(stmt);
	if (ret != SQLITE_DONE && ret != SQLITE_ROW)
	{
		std::cout << "InsertShape sqlite3_step Error! " << sqlite3_errmsg(m_pDB) << std::endl;
		bOk = false;
	}

	sqlite3_finalize(stmt);
	return bOk;
}
  • Call logic: N x InsertShape(xxx);

Total time:
4985046ms (about 83.0841 minutes)

2. After the transaction is started, execute the same function as above:

  • Transaction related code:
bool CShapeGeometryDB::Transaction()
{
        if (NULL == m_pDB)
        {
                return false;
        }

        char* err_msg = NULL;
        std::string strSql = "BEGIN";	// Open transaction
        int ret = sqlite3_exec(m_pDB, strSql.data(), NULL, NULL, &err_msg);
        if (ret != SQLITE_OK)
        {
                if (NULL != err_msg)
                {
                        std::cout << "BEGIN Error! " << err_msg << std::endl;
                        sqlite3_free(err_msg);
                }
                return false;
        }

        return true;
}

bool CShapeGeometryDB::Commit()
{
        if (NULL == m_pDB)
        {
                return false;
        }

        char* err_msg = NULL;
        std::string strSql = "COMMIT";	// Commit transaction
        int ret = sqlite3_exec(m_pDB, strSql.data(), NULL, NULL, &err_msg);
        if (ret != SQLITE_OK)
        {
                if (NULL != err_msg)
                {
                        std::cout << "COMMIT Error! " << err_msg << std::endl;
                        sqlite3_free(err_msg);
                }
                return false;
        }

        return true;
}

bool CShapeGeometryDB::RollBack()
{
        if (NULL == m_pDB)
        {
                return false;
        }

        char* err_msg = NULL;
        std::string strSql = "ROLLBACK";	// Rollback transaction
        int ret = sqlite3_exec(m_pDB, strSql.data(), NULL, NULL, &err_msg);
        if (ret != SQLITE_OK)
        {
                if (NULL != err_msg)
                {
                        std::cout << "COMMIT Error! " << err_msg << std::endl;
                        sqlite3_free(err_msg);
                }
                return false;
        }

        return true;
}

  • Call logic: transaction() - > n x insertshape (xxx) - > commit()

Total time:
66234ms (about 1.1039 minutes)

3. After the transaction is started, use the "execution preparation" method

Using SQLite3_ The member variable of stmt * is used to string the execution of this batch operation;
Begin and End mark the beginning and End of this batch. They are also the "execution preparation" provided by sqlite3, a complete closed loop.

  • Code related:
Member variable, initialization list: NULL
sqlite3_stmt* m_stmt;
bool CShapeGeometryDB::BeginForBatch()
{
	Transaction();
	return true;
}

bool CShapeGeometryDB::InsertForBatch(const StShapeInfo& stShapeInfo, unsigned char* pBlob, int nBlobSize)
{
	// reset
	if (NULL != m_stmt)
	{
		sqlite3_reset(m_stmt);
		sqlite3_clear_bindings(m_stmt);
	}

	// prepare
	if (NULL == m_stmt)
	{
		std::string strSql = "INSERT INTO 'main'.'shape'('uuid', 'puuid', 'type', 'name', 'createtime', 'modifytime', 'geom') VALUES(?, ?, ?, ?, ?, ?, ?)";
		int ret = sqlite3_prepare_v2(m_pDB, strSql.data(), strSql.size(), &m_stmt, NULL);
		if (ret != SQLITE_OK)
		{
			std::cout << "INSERT prepare Error! " << sqlite3_errmsg(m_pDB) << std::endl;
			return false;
		}
	}

	// bind
	sqlite3_bind_text(m_stmt, 1, stShapeInfo.m_strUUID.data(), stShapeInfo.m_strUUID.size(), NULL);
	sqlite3_bind_text(m_stmt, 2, stShapeInfo.m_strPUUID.data(), stShapeInfo.m_strPUUID.size(), NULL);
	sqlite3_bind_int(m_stmt, 3, stShapeInfo.m_nType);
	sqlite3_bind_text(m_stmt, 4, stShapeInfo.m_strName.data(), stShapeInfo.m_strName.size(), NULL);
	sqlite3_bind_text(m_stmt, 5, NULL, 0, NULL);
	sqlite3_bind_text(m_stmt, 6, NULL, 0, NULL);
	sqlite3_bind_blob(m_stmt, 7, pBlob, nBlobSize, free);

	// step
	bool bOk = true;
	int ret = sqlite3_step(m_stmt);
	if (ret != SQLITE_DONE && ret != SQLITE_ROW)
	{
		std::cout << "InsertShape sqlite3_step Error! " << sqlite3_errmsg(m_pDB) << std::endl;
		bOk = false;
	}

	return bOk;
}

bool CShapeGeometryDB::EndForBatch()
{
	if (NULL == m_stmt)
	{
		return false;
	}

	// finalize
	sqlite3_finalize(m_stmt);
	m_stmt = NULL;

	Commit();

	return true;
}
  • Call logic: beginforbatch() - > n x insertforbatch (xxx) - > endforbatch()

Total time:
33640ms

3, Comparative summary

Method 1: do not open the transaction, encapsulate [prepare - > Reset - > bind - > step - > finalize] in a function, and call n times.
Method 2: after the transaction is started, execute the same function as above.
Method 3: after the transaction is started, use the "execution preparation" method.

To sum up: there is a big difference between opening a transaction and not opening a transaction. After all, frequent opening and closing of I/O is the least recommended. Opening a transaction is equivalent to batch writing after one opening and then one closing.
"Execution preparation" is also a little faster than the ordinary exec() operation, because it is a more detailed API provided by sqlite3 to support "preparation - > execution x n - > compilation".

Bigger try

In method 3, import a 1.02GB shp file, that is, 6994448 pieces of data are inserted:
Total time:
1583000ms (26.38 minutes)
Statistics of every 50000 insertions: the range of 10125ms-15265ms fluctuates, and there is no law of gradual increase

other

For Sqlite insertion optimization, there are other configuration methods from para, and some mechanisms are changed to support it, such as:
synchronous mode selection of Sqlite3

bool BuildParamaSynchronous()
{
	std::string strSql = "PRAGMA synchronous = OFF";
	return SQLITE_OK == sqlite3_exec(m_pHandle, strSql.data(), NULL, NULL, NULL);
}

There are three modes:
PRAGMA synchronous = FULL; (2)
PRAGMA synchronous = NORMAL; (1)
PRAGMA synchronous = OFF; (0)
Refer to: https://blog.csdn.net/chinaclock/article/details/48622243

Keywords: Database SQLite gis

Added by alirezaok on Thu, 23 Dec 2021 15:54:16 +0200