Introduction and usage of Apache Doris dynamic partition

1. Introduction

In some usage scenarios, the user will partition the table by day and perform routine tasks regularly every day. At this time, the user needs to manually manage the partition. Otherwise, the data import may fail because the user does not create a partition, which brings additional maintenance costs to the user.

Through the dynamic partition function, users can set the rules of dynamic partition when creating tables. FE will start a background thread to create or delete partitions according to the rules specified by the user. Users can also make changes to existing rules at run time

Dynamic partitioning is a new feature introduced in Doris version 0.12. It aims to implement life cycle management (TTL) for table level partitions and reduce the burden of users.

At present, the functions of dynamically adding partitions and dynamically deleting partitions are realized.

Since version 0.15.0, it supports list partition and dynamic creation of history partition.

2. Usage

The rules of dynamic partitioning can be specified during table creation or modified at run time. Currently, only dynamic partition rules can be set for partition tables with single partition columns.

2.1 specify when creating a table

Syntax:

CREATE TABLE tbl1
(...)
PROPERTIES
(
    "dynamic_partition.prop1" = "value1",
    "dynamic_partition.prop2" = "value2",
    ...
)

Example:

CREATE TABLE user_log_1
(
    user_id VARCHAR(20),
    ts datetime,
    item_id VARCHAR(30),
    category_id VARCHAR(30),
    behavior VARCHAR(30)
)
DUPLICATE KEY(user_id, ts)
PARTITION BY RANGE(ts) (
    PARTITION P_20210926 VALUES [('2021-09-26'), ('2021-09-27')),
    PARTITION P_20210927 VALUES [('2021-09-27'), ('2021-09-28')),
)
DISTRIBUTED BY HASH(user_id)
PROPERTIES
(
    "dynamic_partition.enable" = "true",
    "dynamic_partition.time_unit" = "DAY",
    "dynamic_partition.start" = "-7",
    "dynamic_partition.end" = "3",
    "dynamic_partition.prefix" = "p",
    "dynamic_partition.buckets" = "32"
);

2.2 runtime modification

Syntax:

ALTER TABLE tbl1 SET
(
    "dynamic_partition.prop1" = "value1",
    "dynamic_partition.prop2" = "value2",
    ...
)

Example:

ALTER TABLE tbl1 user_log_1
(
    "dynamic_partition.enable" = "true",
    "dynamic_partition.time_unit" = "DAY",
    "dynamic_partition.start" = "-7",
    "dynamic_partition.end" = "3",
    "dynamic_partition.prefix" = "p",
    "dynamic_partition.buckets" = "32"
)

2.3 dynamic partition rule parameters

The rule parameters of dynamic partition are in dynamic_partition. Prefix:

  • dynamic_partition.enable

    Whether to turn on the dynamic partition feature. Can be specified as TRUE or FALSE. If it is not filled in, the default value is TRUE. If FALSE, Doris will ignore the dynamic partition rules of the table.

  • dynamic_partition.time_unit

    Dynamic partition scheduling unit. Can be specified as HOUR, DAY, WEEK, MONTH. It means to create or delete partitions by DAY, WEEK and MONTH respectively.

    When HOUR is specified, the suffix format of the dynamically created partition name is yyyyMMddHH, such as 2020032501. Partition column data type in hours cannot be DATE.

    When DAY is specified, the suffix format of the dynamically created partition name is yyyyMMdd, such as 20200325.

    When WEEK is specified, the suffix format of the dynamically created partition name is yyyy_ww. That is, the current date belongs to the WEEK of the year. For example, the suffix of the partition name created on March 25, 2020 is 2020_ 13, indicating that it is the 13th WEEK of 2020.

    When MONTH is specified, the suffix format of the dynamically created partition name is yyyyMM, such as 202003.

  • dynamic_partition.time_zone

    The time zone of the dynamic partition. If it is not filled in, it defaults to the time zone of the system of the current machine, such as Asia/Shanghai. If you want to obtain the currently supported time zone settings, you can refer to https://en.wikipedia.org/wiki/List_of_tz_database_time_zones.

  • dynamic_partition.start

    The starting offset of the dynamic partition, which is a negative number. According to time_ Different from the unit attribute, based on the current day (week / month), the partition range before this offset will be deleted. If it is not filled in, it defaults to - 2147483648, that is, the historical partition will not be deleted.

  • dynamic_partition.end

    The end offset of the dynamic partition, which is a positive number. According to time_ Different from the unit attribute, the partition of the corresponding range is created in advance based on the current day (week / month).

  • dynamic_partition.prefix

    Dynamically created partition name prefix.

  • dynamic_partition.buckets

    The number of buckets corresponding to the dynamically created partition.

  • dynamic_partition.replication_num

    The number of replicas corresponding to the dynamically created partition. If it is not filled in, it defaults to the number of replicas specified when the table is created.

  • dynamic_partition.start_day_of_week

    When time_ When unit is WEEK, this parameter is used to specify the starting point of each WEEK. Values are 1 to 7. Where 1 means Monday and 7 means Sunday. The default value is 1, which means that Monday is the starting point of each WEEK.

  • dynamic_partition.start_day_of_month

    When time_ When unit is MONTH, this parameter is used to specify the start date of each MONTH. Values are 1 to 28. Where 1 represents the 1st day of each MONTH and 28 represents the 28th day of each MONTH. The default value is 1, which means that the starting point of each MONTH is position 1. Starting dates 29, 30 and 31 are not supported temporarily to avoid ambiguity caused by leap year or leap MONTH.

  • dynamic_partition.create_history_partition

    The default is false. When set to true, Doris will automatically create all partitions. See the specific creation rules below. At the same time, the parameter max of FE_ dynamic_ partition_ Num limits the total number of partitions to avoid creating too many partitions at once. When the number of partitions expected to be created is greater than max_ dynamic_ partition_ When the value of num is, the operation is prohibited.

    This parameter does not take effect when the start attribute is not specified.

  • dynamic_partition.history_partition_num

    When create_ history_ When partition is true, this parameter is used to specify the number of historical partitions created. The default value is - 1, which is not set.

  • dynamic_partition.hot_partition_num

    Specify the latest number of partitions as hot partitions. For a hot partition, the system automatically sets its storage_ The medium parameter is SSD and storage is set_ cooldown_ time.

    hot_partition_num is all partitions in the previous n days and in the future

    We give examples. Suppose today is 2021-05-20, partition by day, and the attribute of dynamic partition is set to: hot_partition_num=2, end=3, start=-3. The system will automatically create the following partitions and set storage_ Media and storage_cooldown_time

    Parameters:

    p20210517: ["2021-05-17", "2021-05-18") storage_medium=HDD storage_cooldown_time=9999-12-31 23:59:59
    p20210518: ["2021-05-18", "2021-05-19") storage_medium=HDD storage_cooldown_time=9999-12-31 23:59:59
    p20210519: ["2021-05-19", "2021-05-20") storage_medium=SSD storage_cooldown_time=2021-05-21 00:00:00
    p20210520: ["2021-05-20", "2021-05-21") storage_medium=SSD storage_cooldown_time=2021-05-22 00:00:00
    p20210521: ["2021-05-21", "2021-05-22") storage_medium=SSD storage_cooldown_time=2021-05-23 00:00:00
    p20210522: ["2021-05-22", "2021-05-23") storage_medium=SSD storage_cooldown_time=2021-05-24 00:00:00
    p20210523: ["2021-05-23", "2021-05-24") storage_medium=SSD storage_cooldown_time=2021-05-25 00:00:00
    

2.3.1 create history partition rules

Doris 0.14.0 and earlier versions do not support the creation of historical partitions, but Baidu released 0.14.12 version supports it, and the community version will not support it until 0.15 release.

When create_history_partition is true, that is, when the function of creating historical partitions is enabled, Doris will create historical partitions according to dynamic_partition.start and dynamic_partition.history_partition_num to determine the number of history partitions created.

Suppose the number of historical partitions to be created is expect_create_partition_num, according to different settings, the specific quantity is as follows:

  1. create_history_partition = true
    • dynamic_partition.history_partition_num is not set, i.e. - 1
      expect_create_partition_num = end - start;
    • dynamic_partition.history_partition_num set
      expect_create_partition_num = end - max(start, -histoty_partition_num);
  2. create_history_partition = false
    The history partition will not be created, expect_create_partition_num = end - 0;

When expect_create_partition_num greater than max_ dynamic_ partition_ When num (default 500), it is prohibited to create too many partitions.

For example:

  1. Suppose today is September 27, 2021. Partition by day. The attribute of dynamic partition is set to create_history_partition=true, end=3, start=-3, history_partition_num=1, the system will automatically create the following partitions:

    p20210926
    p20210927
    p20210928
    p20210930
    p20211001
    

Create table statement:

CREATE TABLE user_log_1
(
    user_id VARCHAR(20),
    ts datetime,
    item_id VARCHAR(30),
    category_id VARCHAR(30),
    behavior VARCHAR(30)
)
DUPLICATE KEY(user_id, ts)
PARTITION BY RANGE(ts) (
    PARTITION P_20210926 VALUES [('2021-09-26'), ('2021-09-27')),
    PARTITION P_20210927 VALUES [('2021-09-27'), ('2021-09-28')),
)
DISTRIBUTED BY HASH(user_id)
PROPERTIES
(
    "dynamic_partition.enable" = "true",
    "dynamic_partition.create_history_partition" = "true",
    "dynamic_partition.history_partition_num" = "1",
    "dynamic_partition.time_unit" = "DAY",
    "dynamic_partition.start" = "-3",
    "dynamic_partition.end" = "3",
    "dynamic_partition.prefix" = "p",
    "dynamic_partition.buckets" = "32"
);
  1. history_partition_num=-1, that is, the number of historical partitions is not set, and other attributes remain the same as those in 1. The system will automatically create the following partitions:

    p20210924
    p20210925
    p20210926
    p20210927
    p20210928
    p20210930
    p20211001
    

2.3.2 precautions

During the use of dynamic partition, if dynamic partition is caused by some unexpected circumstances_ Partition.start and dynamic_ If some partitions between partition.end are lost, the current time and dynamic_ The missing partition between partition.end will be recreated, dynamic_ Missing partitions between partition.start and the current time will not be recreated.

2.4 viewing dynamic partition table scheduling

The following commands can be used to further view the scheduling of all dynamic partition tables in the current database

SHOW DYNAMIC PARTITION TABLES;

  • LastUpdateTime: the time when the dynamic partition property was last modified
  • LastSchedulerTime: the last time dynamic partition scheduling was executed
  • State: the state of the last execution of dynamic partition scheduling
  • LastCreatePartitionMsg: error message of the last execution of dynamic add partition scheduling
  • LastDropPartitionMsg: error message of the last execution of dynamic deletion partition scheduling

3. Advanced operation

3.1 dynamic_partition_enable

Whether to enable the dynamic partition function of Doris. The default value is false, that is, off. This parameter only affects the partitioning operation of dynamic partitioned tables, not ordinary tables. It can take effect by modifying the parameters in fe.conf and restarting FE. You can also execute the following command to take effect at run time:

MySQL protocol:

ADMIN SET FRONTEND CONFIG ("dynamic_partition_enable" = "true")

HTTP protocol:

curl --location-trusted -u username:password -XGET http://fe_host:fe_http_port/api/_set_config?dynamic_partition_enable=true

To turn off dynamic partitions globally, set this parameter to false.

3.2 dynamic_partition_check_interval_seconds

The execution frequency of the dynamic partition thread is 600 (10 minutes) by default, that is, it is scheduled every 10 minutes. It can take effect by modifying the parameters in fe.conf and restarting FE. You can also perform the following command modifications at run time:

MySQL protocol:

ADMIN SET FRONTEND CONFIG ("dynamic_partition_check_interval_seconds" = "7200")

HTTP protocol:

curl --location-trusted -u username:password -XGET http://fe_host:fe_http_port/api/_set_config?dynamic_partition_check_interval_seconds=432000

3.3 conversion between dynamic partition table and manual partition table

For a table, dynamic partition and manual partition can be converted freely, but they cannot exist at the same time, and there is only one state.

3.4 conversion from manual partition to dynamic partition

If a table is not specified with a dynamic partition when it is created, it can be converted to a dynamic partition by modifying the relevant properties of the dynamic partition at run time through ALTER TABLE. Specific examples can be viewed through HELP ALTER TABLE.

After the dynamic partition function is enabled, Doris will no longer allow users to manage partitions manually, but will automatically manage partitions according to the dynamic partition attributes.

Note: if dynamic is set_ Partition.start, the historical partition whose partition range is before the dynamic partition start offset will be deleted.

3.5 conversion from dynamic partition to manual partition

By executing ALTER TABLE tbl_name SET ("dynamic_partition.enable" = "false") to turn off the dynamic partition function and convert it into a manual partition table.

After the dynamic partition function is turned off, Doris will no longer automatically manage partitions, and users need to manually create or delete partitions through ALTER TABLE.

Keywords: Big Data Apache Hadoop Doris

Added by lost305 on Tue, 28 Sep 2021 08:33:08 +0300