Explore the application feasibility and practical research of TDengine in the project of picture code joint investigation

[this article is participating in the prize solicitation of "embracing open source | Taosi data TDengine]

https://marketing.csdn.net/p/0ada836ca30caa924b9baae0fd33857c

catalogue

Picture code project introduction and problems

TDengine exploration feasibility study

Installation and use of TDengine

Source installation

docker usage

Main operation

JAVA client connection

Performance considerations

taosdump use

taosimport

taosdemo

Shortcomings and conclusions

Picture code project introduction and problems

The image code joint detection system is based on the in-depth application of video surveillance technology and combined with the important mobile device feature code IMSI in security. Even without the restriction of access to identity card, telephone number and other features, it can still obtain the matching relationship between each other through the collision algorithm between IMSI and portrait, and establish a "one person, one file" archive. Technically, the relationship between "image" and "code" is combined, and the correlation between them is established according to the collision algorithm; In the application scenario, in view of the current situation of difficult system docking and difficult data exchange in reality, compared with the large platform, it has the advantage of simple deployment. Through the establishment of the archives, it can quickly help the public security locate the identity of the observation object and greatly assist the public security technical investigation and case solving.

In big data processing, the architecture of mycat + mysql (active and standby) is adopted.

The advantage is to make full use of the technical advantages of the mature database, which can seamlessly replace the traditional non big data persistence layer and give programmers an imperceptible replacement; The threshold of this architecture is also relatively low. It only needs 8G8 cores to start and run normally. Unlike hadoop,spark and other big data architectures, it requires five high-performance hosts to ensure the basic operation.

Disadvantages: architects need to have strong professional knowledge, do a good job in the structure and deployment of this area, and related monitoring. For example, setting of primary and standby, load balancing, capacity expansion or replacement, data migration, etc; On the mass, it is limited by mycat1 Due to the limitation of X's 1 billion data, timely dynamic query will have performance problems.

 

If there is a technology that can replace mycat+mysql without users considering the details of load balancing, capacity management and capacity constraints, it will reduce the burden on the product architecture and focus more on their own business.

At this time, I saw the introduction of Taosi technology TDengine on CSDN.

TDengine exploration feasibility study

Let's first introduce the product features of TDengine by Taosi Technology:

TDengine is a big data platform designed and optimized by Taosi data for the Internet of things, Internet of vehicles, industrial Internet, IT operation and maintenance, etc. In addition to the core time series database function that is more than 10 times faster, IT also provides functions such as caching, data subscription and streaming computing to minimize the complexity of R & D, operation and maintenance, and all core codes, including cluster functions, are open source (open source protocol, AGPL v3.0).

More than 10x performance improvement. An innovative data storage structure is defined. A single core can process at least 20000 requests per second, insert millions of data points and read more than 10 million data points, which is more than ten times faster than the existing general database.
Reduce hardware or cloud service costs to 1 / 5. Due to its super performance, the computing resources are less than 1 / 5 of the general big data scheme; Through column storage and advanced compression algorithm, the storage space is less than 1 / 10 of the general database.
Full stack sequential data processing engine. Integrating database, message queue, cache, streaming computing and other functions, the application does not need to integrate Kafka/Redis/HBase/Spark and other software, greatly reducing the cost of application development and maintenance.
Powerful analysis function. Whether it is data ten years ago or one second ago, you can query it within a specified time range. Data can be aggregated on the timeline or on multiple devices. Ad hoc queries can be performed at any time through Shell/Python/R/Matlab.
Seamless connection with third-party tools. It can be integrated with telegraf, grafana, EMQ x, Prometheus, MATLAB and R without one line of code. In the future, MQTT, OPC, Hadoop, Spark, etc. will be supported, and BI tools will be seamlessly connected.
Zero operation and maintenance cost and zero learning cost. The installation and cluster can be completed in one second, and there is no need to separate databases and tables, and real-time backup. Standard SQL, JDBC,RESTful, python / Java / C / C + + / go / node JS, similar to MySQL, has zero learning cost

Let's take a look at its applicable scenarios. For details, please refer to the link: https://www.taosdata.com/cn/documentation/evaluation#scenes

One of the words attracted me. "TDengine is an innovative big data processing product launched in the face of the fast-growing internet of things big data market and technical challenges." in this way, its positioning well matches the data characteristics of our image code joint investigation project, which is similar to the timing big data of IOT.

But see the following sentence:

”TDengine can significantly reduce the total cost of ownership of typical Internet of things, Internet of vehicles and industrial Internet big data platforms. However, it should be pointed out that because it makes full use of the characteristics of time series data of the Internet of things, it can not be used to process general data such as web crawler, microblog, wechat, e-commerce, ERP, CRM and so on. “

I think there will be big problems, because a system should not only have big data, but also deal with ordinary businesses such as CRM. Is it because another data source is defined to deal with these?

however, such doubts do not prevent me from continuing to explore TDengine.

Installation and use of TDengine

Source installation

You can refer to the record link of my previous experiment: https://blog.csdn.net/dualvencsdn/article/details/115202910

CentOS 7: 
sudo yum install -y gcc gcc-c++ make cmake git
 install OpenJDK 8: 

sudo yum install -y java-1.8.0-openjdk
 install Apache Maven: 

sudo yum install -y maven

git clone https://github.com/taosdata/TDengine.git
cd TDengine

mkdir debug && cd debug
cmake .. && cmake --build .

stay debug catalogue 

make install 

/usr/bin/cmake -P cmake_install.cmake
-- Install configuration: "Debug"
make install script: /home/dualven/TDengine/packaging/tools/make_install.sh
this is centos system
source directory: /home/dualven/TDengine
binary directory: /home/dualven/TDengine/debug
Start to install TDEngine...
 
TDengine is installed successfully!
 
To configure TDengine : edit /etc/taos/taos.cfg
To start TDengine     : ./taosd
To access TDengine    : use taos in shell
 
TDengine is installed successfully!

docker usage

Create container
docker run --name taos192 -p6030-6042:6030-6042/tcp -p6030-6042:6030-6042/udp -d tdengine/tdengine

Firewall release
 firewall-cmd --zone=public --add-port=6030-6042/tcp --permanent

firewall-cmd --zone=public --add-port=6030-6042/udp --permanent

FQDN configuration
This is very important. Why should I configure it? Refer to my previous experimental records:

https://blog.csdn.net/dualvencsdn/article/details/118019565.

Is to let the client find it. However, Taos is sometimes used in its own DB operations The fqdn in CFG, so it's best to use its fqdn when accessing the remote client. If you use ip to access it first, but the internal program will report an error when using fqdn.

DB error: Unable to resolve FQDN (0.011791s)

Therefore, we mapped the hosts file on the 192 host: 10.60.100.192 289461ef56c4

On the client file, also map the hosts file 10.60.100.192 289461ef56c4,

Pit: remember that the firewall can only release TCP and UDP on those ports.

Main operation

View configuration: docker exec -it taos192 taosd -C

[root@taos192 TDengine-server-2.1.2.0]# docker exec -it taos192 taosd -C
taos global config:
==================================
 serverPort:             6030 
 arbitrator:              
 numOfMnodes:            3 
 vnodeBak:               1 
 telemetryReporting:     1 
 balance:                1 
 balanceInterval:        300 
 maxTmrCtrl:             512 
 offlineThreshold:       864000(s)
 statusInterval:         1(s)
 minSlidingTime:         10(ms)
 minIntervalTime:        10(ms)
 maxStreamCompDelay:     20000(ms)
 maxFirstStreamCompDelay:10000(ms)
 retryStreamCompDelay:   10000(ms)
 streamCompDelayRatio:   0.100000 
 maxVgroupsPerDb:        0 
 maxTablesPerVnode:      1000000 
 minTablesPerVnode:      1000 
 tableIncStepPerVnode:   1000 
 cache:                  16(Mb)
 blocks:                 6 
 days:                   10 
 keep:                   3650 
 minRows:                100 
 maxRows:                4096 
 comp:                   2 
 walLevel:               1 
 fsync:                  3000 
 replica:                1 
 partitions:             4 
 quorum:                 1 
 update:                 0 
 compressMsgSize:        -1 
 maxSQLLength:           1048576(byte)
 maxNumOfOrderedRes:     100000 
 queryBufferSize:        -1(byte)
 retrieveBlockingModel:  0 
 keepColumnName:         0 
 timezone:                
 locale:                  
 charset:                 
 maxShellConns:          50000 
 maxConnections:         5000 
 minimalLogDirGB:        1.000000(GB)
 minimalTmpDirGB:        1.000000(GB)
 minimalDataDirGB:       2.000000(GB)
 mnodeEqualVnodeNum:     4 
 flowctrl:               1 
 slaveQuery:             1 
 adjustMaster:           1 
 http:                   1 
 mqtt:                   0 
 monitor:                1 
 stream:                 1 
 telegrafUseFieldNum:    0 
 gitinfo:                2019939bcc5567212d6e07af557c2c4ea540c091 
 gitinfoOfInternal:      NULL 
 buildinfo:              Built at 2021-06-07 14:27 
 version:                2.1.2.0 
 
taos local config:
==================================
 firstEp:                 
 secondEp:                
 fqdn:                    
 configDir:              /etc/taos 
 logDir:                 /var/log/taos 
 scriptDir:              /etc/taos 
 numOfThreadsPerCore:    1.000000 
 numOfCommitThreads:     4 
 ratioOfQueryCores:      1.000000 
 role:                   0 
 monitorInterval:        30(s)
 rpcTimer:               300(ms)
 rpcForceTcp:            0 
 rpcMaxTime:             600(s)
 shellActivityTimer:     3(s)
 httpEnableRecordSql:    0 
 httpMaxThreads:         2 
 restfulRowLimit:        10240 
 numOfLogLines:          10000000 
 logKeepDays:            0 
 asyncLog:               1 
 debugFlag:              0 
 mDebugFlag:             131 
 dDebugFlag:             135 
 sDebugFlag:             135 
 wDebugFlag:             135 
 sdbDebugFlag:           131 
 rpcDebugFlag:           131 
 tmrDebugFlag:           131 
 cDebugFlag:             131 
 jniDebugFlag:           131 
 odbcDebugFlag:          131 
 uDebugFlag:             131 
 httpDebugFlag:          131 
 mqttDebugFlag:          131 
 monDebugFlag:           131 
 qDebugFlag:             131 
 vDebugFlag:             135 
 tsdbDebugFlag:          131 
 cqDebugFlag:            131 
 enableRecordSql:        0 
 enableCoreFile:         0 
 maxBinaryDisplayWidth:  30 
 tempDir:                /tmp/ 

To create a database, create a table:

 taos -s "CREATE DATABASE pdas KEEP 365 DAYS 10 BLOCKS 4 UPDATE 1;"

 taos -d pdas -f onetable.txt

 
CREATE STABLE dw_reportimsi( \
  ts timestamp,\
  id bigint, \
  device_name nchar(60),\
  equip_id nchar(60), \
   imsi nchar(60),\
   imei nchar(60),\
   source_tac bigint, \
   cell_no bigint,\
   mcc bigint,\
   mnc bigint, \
   equip_longitude nchar(60),\
   equip_latitude nchar(60), \
   belong nchar(60),\
   address nchar(60),\
   rssi nchar(60),\
   phone_num nchar(60)  \
)tags(location binary(64),  groupdId int);

For others, please refer to my previous records: https://blog.csdn.net/dualvencsdn/article/details/118025718

JAVA client connection

As the main developer of java, it's a bit of a pit that there is no English document. I have been reading English at first, but I didn't find it. Later, there is this chapter in the Chinese document

https://www.taosdata.com/cn/documentation/connector/java

But the most complete code demo is actually in the docker project, which is a surprise: tdengine/tdengine

I sorted it out. It can be integrated into a project to meet the calls of various forms of java. You can confide in me if you need it.

1)connection pool

(2) mybatis plus

(3) jdbc

(4) mybatis 

Performance considerations

In order to test the performance, I tried to use some import tools.

taosdump use

https://github.com/taosdata/TDengine

https://www.taosdata.com/blog/2020/03/09/1334.html

taosimport

According to the official R & D instructions, it is not very useful anymore. After a day of research, I contacted the official customer service to know. It's a little pit. Also misled by official documents: https://www.taosdata.com/blog/2020/01/18/1166.html.

howerver, or record it
https://github.com/taosdata/TDengine/tree/develop/importSampleData

bin/taosimport -cfg config/cfg.toml -db pdas  -cases pdas  -hnum 1 -vnum 10 -port 6030 -user root -password taosdata -host 10.60.100.192

It's really not easy to use. For example, the taos client does not need to enter the default account password and host. It must be entered.

The error report is not obvious. No problem can be found.

2021/06/22 18:00:13 main.go:404: create table error: invalid operation: illegal value or data overflow


taosdemo

After tossing around, I found that I can experience the data reading and writing speed of taos through taosdemo. Please refer to:

https://www.taosdata.com/cn/documentation/getting-started

Shortcomings and conclusions

Because at the beginning, I didn't have a deep understanding

"TDengine can significantly reduce the total cost of ownership of typical Internet of things, Internet of vehicles and industrial Internet big data platforms. However, it should be pointed out that because it makes full use of the characteristics of time series data of the Internet of things, it can not be used to process general data such as web crawler, microblog, wechat, e-commerce, ERP, CRM and so on. "The meaning of this sentence is to actively establish and test some common tables in the experiment, so as to deeply experience some of its defects:

Record of official conversation

Conclusions: (1) delete is not supported, and it will be supported later. (2) update is completed by inserting the same timestamp

(3) For the usage mode of intermediate database, frequent deletion and update of millions of single tables are not supported temporarily.

Small T:
Modify the data that can currently use the same timestamp update
 
Small T:
delete I just haven't been able to support it yet
 
Small T:
This is us 3.0 Version plan
 
Small T:
There will be some in the future
 
 
Small T:
actually
 
Small T:
The data of the table and the time range of the database are approximately close
 
Small T:
This is limited
 
Small T:
Your table can't insert data of any time period at will
 
Small T:
So they were unified keep influence
 
Small T:
For example, keep 3650 day
 
Small T:
You cannot insert data before 3650 days
 
 
Small T:
The other is mainly at present mysql The efficiency of deleting and modifying millions of indexed data in a single table is relatively low - now TDengine For data updates, only the same timestamp can be overwritten. Deletion can only be automatically deleted through the time period (it will be supported in the future) delete). As for the low efficiency you mentioned, we have to wait until our functions are completed
 
Small T:
As a special big data engine, it will be more efficient than mysql Tall
 

Architecture recommendations:

(1) For ordinary business tables, it can be similar to MySQL. Mongo allows users to use ordinary tables without specifying the characteristics of time series database. Do I have to establish a set of database access logic for some of the most common table businesses in a project? Of course, at this stage, if you use TDengine, it seems that you can only use this scheme.

(2) For the 6030-6042 TCP/UDP ports in this article, I think the TDengine should adopt the blocking application access mode (I don't read the source code, but speculate). It is recommended to refer to the architecture of netty and the selector mode, so there should be no need for so many ports.

[this article is participating in the prize solicitation of "embracing open source | Taosi data TDengine]

https://marketing.csdn.net/p/0ada836ca30caa924b9baae0fd33857c

Keywords: Big Data AI tdengine

Added by manojhooda on Sat, 22 Jan 2022 06:50:52 +0200