clickhouse version: 22.1.2.2
1.Mysql engine (not recommended)
CREATE DATABASE [IF NOT EXISTS] db_name [ON CLUSTER cluster] ENGINE = MySQL('host:port', ['database' | database], 'user', 'password')
Engine parameters
- host:port - MySQL service address
- Database - MySQL database name
- User - MySQL user name
- Password - MySQL user password
Speechless what the engine make complaints about. The MySQL engine maps the tables in the remote MySQL server to ClickHouse. The MySQL database engine converts its query to MySQL syntax and sends it to the MySQL server.
2. Materialized MySQL (not recommended)
The engine creates a ClickHouse database that contains all the tables in MySQL and all the data in these tables. The ClickHouse server works as a MySQL copy. It reads binlog and executes DDL and DML queries.
CREATE DATABASE [IF NOT EXISTS] db_name [ON CLUSTER cluster] ENGINE = MaterializeMySQL('host:port', ['database' | database], 'user', 'password') [SETTINGS ...]
Engine parameters
- host:port - MySQL service address
- Database - MySQL database name
- User - MySQL user name
- Password - MySQL user password
Engine configuration
- max_rows_in_buffer - the maximum number of rows that allow data to be cached in memory (for a single table and cached data that cannot be queried). When the number of rows is exceeded, the data will be materialized. Default value: 65505.
- max_bytes_in_buffer - the maximum number of bytes allowed to cache data in memory (for a single table and cached data that cannot be queried). When the number of rows is exceeded, the data will be materialized. Default: 1048576
- max_rows_in_buffers - the maximum number of rows that allow data to be cached in memory (for databases and cached data that cannot be queried). When the number of rows is exceeded, the data will be materialized. Default: 65505
- max_bytes_in_buffers - the maximum number of bytes allowed to cache data in memory (for databases and cached data that cannot be queried). When the number of rows is exceeded, the data will be materialized. Default: 1048576
- max_flush_data_time - the maximum number of milliseconds that data is allowed to be cached in memory (for databases and cached data that cannot be queried). When this time is exceeded, the data will be materialized. Default: 1000
- max_wait_time_when_mysql_unavailable - retry interval (MS) when MySQL is unavailable. Negative values prohibit retry. Default: 1000
- allows_query_when_mysql_lost - allows querying materialized tables when MySQL is lost. Default value: 0 (false)
eg:
CREATE DATABASE mysql ENGINE = MaterializeMySQL('localhost:3306', 'db', 'user', '***') SETTINGS allows_query_when_mysql_lost=true, max_wait_time_when_mysql_unavailable=10000;
Use steps:
1. Modify the mysql configuration and restart
gtid_mode=ON enforce_gtid_consistency=1 binlog_format=ROW log_bin=mysql-bin expire_logs_days=7 max_binlog_size = 256m binlog_cache_size = 4m max_binlog_cache_size = 512m server-id=1
2. Use the following command
set allow_experimental_database_materialized_mysql = 1; CREATE DATABASE test ENGINE = MaterializeMySQL('ip:port', 'mysql Library of', 'mysql account', 'mysql password');
Question:
1. This engine greatly facilitates mysql to import data into the clickhouse, but the official prompt is still in the experiment and should not be used in the production environment
2. I have tested that under the condition of large amount of data, the efficiency is still not high, which is better than mysql
3. There are many limitations in using clusters
3.mysql table function
clickhouse allows you to perform SELECT and INSERT queries on data stored on a remote MySQL server
mysql('host:port', 'database', 'table', 'user', 'password'[, replace_query, 'on_duplicate_clause']);
parameter
- host:port - MySQL server address
- Database - remote database name
- Table - remote table name
- User - MySQL user
- Password - user password
- replace_query - flag to convert INSERT INTO query into REPLACE INTO. If replace_query=1 ', the query is replaced.
- on_ duplicate_ Claim - add on duplicate key on_ duplicate_ The clause expression to the INSERT query. It is specified that only replace can be used_ Query = 0, if you also set replace_query = 1 and on_duplicate_clause `, ClickHouse will generate an exception
Import all data of mysql into clickhouse
1. Create a clickhouse table
2. Execute the order
INSERT INTO table SELECT * FROM mysql('ip:port', 'database', 'table', 'user', 'password') [where]
Note: 1 If * is directly used here, the order of clickhouse fields should be consistent with that of mysql, otherwise there will be problems. If only some fields are required, you can specify them manually
2. You can import by yourself with query criteria here
3. Data duplication may occur in the Clickhouse, even in the replacing mergetree
4.canal synchronization
1. Prepare the jar package (avoid stepping on the pit): https://mvnrepository.com/
clickhouse-jdbc-0.2.6.jar httpclient-4.5.13.jar httpcore-4.4.13.jar lz4-1.5.0.jar lz4-java-1.8.0.jar druid-1.2.8.jar(This version cannot be too low, otherwise druid Connection pool is not supported clickhouse)
2.docker installation of canal server
docker run --name canal-server -p 11111:11111 -d canal/canal-server:latest
3. Enter the container
Copy / home / admin / canal server / conf / example to the name of the new database instance you want to create. Use example2 here
4. Modify instance. Under example2 Properties configuration
#MySQL address canal.instance.master.address=127.0.0.1:3306 #mysql account canal.instance.dbUsername=root canal.instance.dbPassword=root #Filtering rules canal.instance.filter.regex=example2\\*
5. Modify / home / admin / canal server / conf / canal Properties configuration
canal.serverMode = tcp canal.destinations = example2
6. Install the Canadian adapter (you can also use kafka)
docker run --name canal-adapter -p 8081:8081 -d slpcat/canal-adapter:v1.1.5
7. Modify / opt / Canadian adapter / conf / application. Of the Canadian adapter YML configuration file (in container)
canal.conf: # Note the mode here mode: tcp flatMessage: true zookeeperHosts: syncBatchSize: 1000 retries: 0 timeout: accessKey: secretKey: consumerProperties: # Note the address of the canal service here canal.tcp.server.host: 127.0.0.1:11111 canal.tcp.zookeeper.hosts: canal.tcp.batch.size: 500 canal.tcp.username: canal.tcp.password: srcDataSources: # Note the mysql connection here defaultDS: url: jdbc:mysql://127.0.0.1:3306/example2?useUnicode=true username: root password: root canalAdapters: - instance: example2 groups: - groupId: g1 outerAdapters: - name: logger - name: rdb key: mysql1 # clickhouse configuration properties: jdbc.driverClassName: ru.yandex.clickhouse.ClickHouseDriver jdbc.url: jdbc:clickhouse://127.0.0.1:8123 jdbc.username: default jdbc.password: default
8. Modify mytest under / opt / Canadian adapter / conf / RDB_ user. yml
dataSourceKey: defaultDS destination: example2 groupId: g1 outerAdapterKey: mysql concurrent: true dbMapping: mirrorDb: true database: example2
9. Restart the server and adapter
Insert data into mysql to check whether there is data synchronization. If not, check the log to find out the problem
Attention
1. To enable MySQL configuration
gtid_mode=ON enforce_gtid_consistency=1 binlog_format=ROW log_bin=mysql-bin expire_logs_days=7 max_binlog_size = 256m binlog_cache_size = 4m max_binlog_cache_size = 512m server-id=1