Doris partition cache

Partition cache

Demand scenario

Most data analysis scenarios write less and read more. The data is written once and read more frequently. For example, the dimensions and indicators involved in a report are calculated at one time in the early morning, but there are hundreds or even thousands of page accesses every day. Therefore, it is very suitable for caching the result set. In data analysis or BI applications, the following business scenarios exist:

In high concurrency scenarios, Doris can better support high concurrency, but a single server cannot carry too high QPS

Kanban for complex charts, complex Dashboard or large screen applications. The data comes from multiple tables. There are dozens of queries on each page. Although each query is only tens of milliseconds, the overall query time will be in seconds

Trend analysis: for the query of a given date range, the indicators are displayed daily, such as the trend of the number of users in the last 7 days. This kind of query has a large amount of data and a wide query range, and the query time often takes tens of seconds

The user repeatedly queries. If the product does not have a re brushing prevention mechanism, the user repeatedly refreshes the page due to manual error or other reasons, resulting in submitting a large number of duplicate SQL

For the above four scenarios, in the application layer solution, the query results are put into Redis, and the cache is updated periodically or manually refreshed by the user,

However, this scheme has the following problems

Inconsistent data, unable to perceive the update of data, resulting in users often seeing old data

The hit rate is low and the whole query result is cached. If the data is written in real time, the cache will fail frequently, the hit rate is low and the system load is heavy

Additional cost. The introduction of external cache components will bring system complexity and increase additional cost

Solution

This partition cache strategy can solve the above problems, give priority to ensuring data consistency, refine cache granularity and improve hit rate. Therefore, it has the following characteristics:

Users do not need to worry about data consistency. Cache invalidation is controlled by version. The cached data is consistent with the data queried from BE

Without additional components and costs, the cache results are stored in the memory of BE, and users can adjust the cache memory size as needed

Two cache strategies, SQLCache and PartitionCache, are implemented. The latter has finer cache granularity
The consistency hash is used to solve the problem of going online and offline of BE node. The cache algorithm in BE is an improved LRU

SQLCache

SQLCache stores and obtains the cache according to the signature of SQL, the partition ID of the queried table, and the latest version of the partition. The combination of the three determines a cache data set. If any of them changes, such as SQL changes, query fields or conditions are different, or the version of the data changes after updating, it will lead to failure to hit the cache.

If multiple tables are joined, the latest updated partition ID and the latest version number will be used. If one of the tables is updated, the partition ID or version number will be different and the cache will not be hit.

SQLCache is more suitable for the scenario of T+1 update. The data is updated in the early morning. The results of the first query are obtained from BE and put into the cache, and the subsequent same queries are obtained from the cache. Real time updating data can also BE used, but there may BE a problem of low hit rate. Please refer to the following PartitionCache.

PartitionCache

design principle

SQL can be split in parallel. Q = Q1 ∪ Q2... ∪ Qn, R= R1 ∪ R2... ∪ Rn, q is the query statement and R is the result set

Split into read-only partitions and updatable partitions, read-only partitions are cached, and updated partitions are not cached

As above, query the number of users per day in the last 7 days. For example, partition by date, the data is only written to the partition of the current day, and the data of other partitions other than the current day is fixed. Under the same query SQL, the index of querying a partition that does not update is fixed. As follows, the number of users in the seven days before the query on March 9, 2020. The data from March 3, 2020 to March 7, 2020 comes from the cache. The first query on March 8, 2020 comes from the partition, and the subsequent queries come from the cache. 2020-03-09 comes from the partition because it is constantly written on that day.

Therefore, querying the data of N days and updating the data of the last d days are only queries with different date ranges. Only D partitions need to be queried, and other parts come from the cache, which can effectively reduce the cluster load and query time.

MySQL [(none)]> SELECT eventdate,count(userid) FROM testdb.appevent WHERE eventdate>="2020-03-03" AND eventdate<="2020-03-09" GROUP BY eventdate ORDER BY eventdate;
+------------+-----------------+
| eventdate  | count(`userid`) |
+------------+-----------------+
| 2020-03-03 |              15 |
| 2020-03-04 |              20 |
| 2020-03-05 |              25 |
| 2020-03-06 |              30 |
| 2020-03-07 |              35 |
| 2020-03-08 |              40 | //The first time from the partition, and then from the cache
| 2020-03-09 |              25 | //From partition
+------------+-----------------+
7 rows in set (0.02 sec)

In PartitionCache, the first level Key of the cache is the 128 bit MD5 signature of the SQL without partition conditions. The following is the rewritten SQL to be signed:

SELECT eventdate,count(userid) FROM testdb.appevent GROUP BY eventdate ORDER BY eventdate;

The cached second level Key is the content of the partition field of the query result set, such as the content of the eventdate column of the query result above. The attached information of the second level Key is the partition version number and version update time.

The following is a demonstration of the first execution process of the above SQL on March 9, 2020:

Get data from cache

+------------+-----------------+
| 2020-03-03 |              15 |
| 2020-03-04 |              20 |
| 2020-03-05 |              25 |
| 2020-03-06 |              30 |
| 2020-03-07 |              35 |
+------------+-----------------+

SQL and data for getting data from BE

SELECT eventdate,count(userid) FROM testdb.appevent WHERE eventdate>="2020-03-08" AND eventdate<="2020-03-09" GROUP BY eventdate ORDER BY eventdate;

+------------+-----------------+
| 2020-03-08 |              40 |
+------------+-----------------+
| 2020-03-09 |              25 | 
+------------+-----------------+
Last data sent to terminal
+------------+-----------------+
| eventdate  | count(`userid`) |
+------------+-----------------+
| 2020-03-03 |              15 |
| 2020-03-04 |              20 |
| 2020-03-05 |              25 |
| 2020-03-06 |              30 |
| 2020-03-07 |              35 |
| 2020-03-08 |              40 |
| 2020-03-09 |              25 |
+------------+-----------------+
Data sent to cache
+------------+-----------------+
| 2020-03-08 |              40 |
+------------+-----------------+

Partition cache is suitable for partitioning by date. Some partitions are updated in real time, and the query SQL is relatively fixed.

The partition field can also be other fields, but you need to ensure that only a small number of partitions are updated.

Some restrictions

Only OlapTable is supported. Other tables stored in MySQL do not have version information and cannot sense whether the data is updated
Only grouping by partition field is supported. Grouping by other fields is not supported. Grouping by other fields may update the grouped data, resulting in cache invalidation
Only the first half, the second half and all hit caches of the result set are supported, and the result set is not divided into several parts by the cached data

Mode of use

Open SQLCache

ensure fe.conf of cache_enable_sql_mode=true(Default is true)
vim fe/conf/fe.conf
cache_enable_sql_mode=true

Setting variables on the MySQL command line

MySQL [(none)]> set [global] enable_sql_cache=true;

Note: Global is a global variable and does not refer to the current session variable

Enable PartitionCache

ensure fe.conf of cache_enable_partition_mode=true(The default is true)
vim fe/conf/fe.conf
cache_enable_partition_mode=true

Setting variables on the MySQL command line

MySQL [(none)]> set [global] enable_partition_cache=true;

If two cache policies are enabled at the same time, the following parameters should be noted:

cache_last_version_interval_second=900
If the interval between the latest version of the partition and the present is greater than the cache_last_version_interval_second, the whole query result will be cached first. If it is less than this interval, if the conditions of PartitionCache are met, the data will be processed by PartitionCache.

monitor

FE monitoring items:

query_table            //Number of tables in Query
query_olap_table       //Number of Olap tables in Query
cache_mode_sql         //Identify the number of queries whose cache mode is sql
cache_hit_sql          //The number of queries with sql mode hitting the Cache
query_mode_partition   //Identify the number of queries whose cache mode is Partition
cache_hit_partition    //Number of queries hit through Partition
partition_all          //All partitions scanned in Query
partition_hit          //Number of partitions hit by Cache

Cache hit rate     = (cache_hit_sql + cache_hit_partition) / query_olap_table
Partition hit rate = partition_hit / partition_all
BE Monitoring items:

query_cache_memory_total_byte       //Cache memory size
query_query_cache_sql_total_count   //Number of SQL in Cache
query_cache_partition_total_count   //Number of Cache partitions

SQL Average data size       = cache_memory_total / cache_sql_total
Partition Average data size = cache_memory_total / cache_partition_total

Other monitoring: you can view the CPU and memory indicators of BE node and Query Percentile and other indicators in Query statistics from Grafana to achieve business objectives in combination with the adjustment of Cache parameters.

Optimization parameters

FE configuration item cache_result_max_row_count, the maximum number of rows the query result set can be put into the cache, which can be adjusted according to the actual situation. However, it is recommended not to set it too large to avoid excessive memory occupation. Result sets exceeding this size will not be cached.

vim fe/conf/fe.conf
cache_result_max_row_count=3000

BE maximum number of partitions cache_max_partition_count refers to the maximum number of partitions corresponding to each SQL. If it is partitioned by date, it can cache data for more than 2 years. If you want to keep the cache for a longer time, please set this parameter larger and modify the cache at the same time_ result_ max_ row_ Count parameter.

vim be/conf/be.conf
cache_max_partition_count=1024

The cache memory setting in BE has two parameters: query_cache_max_size and query_cache_elasticity_size consists of two parts (unit: MB), and the memory exceeds query_cache_max_size + cache_elasticity_size will start cleaning up and control the memory to query_cache_max_size below. These two parameters can BE set according to the number of BE nodes, node memory size, and cache hit rate.

query_cache_max_size_mb=256
query_cache_elasticity_size_mb=128

Calculation method:

If 10K queries are cached, each Query caches 1000 lines, each line is 128 bytes, and is distributed on 10 bes, 128M memory (10K1000128/10) is required for each BE.

Matters not covered

Can Partition cache be used for T+1 data? Not currently supported
For similar SQL, two indicators were queried before, but now three indicators are queried. Can the cache of two indicators be used? Not currently supported
Partition by date, but data needs to be summarized by week dimension. Can PartitionCache be used? Not currently supported

Keywords: Redis Distribution Cache

Added by ph3n0m on Mon, 24 Jan 2022 22:50:10 +0200