insert overwrite implementation of Doris partition bucket table

insert overwrite implementation of Doris partition bucket table

(1) Real time project
If it is real-time, we should pay attention to timeliness. Directly select the unique key model table and select the non repeatable columns as unique keys

A simple example of creating a single partition table

CREATE TABLE order_tab
(
    orderid     BIGINT,
    username    VARCHAR(32),
    amount      BIGINT DEFAULT '0'
)
UNIQUE KEY(orderid,username)
DISTRIBUTED BY HASH(orderid) BUCKETS 10
PROPERTIES("replication_num" = "1");

Insert data after

insert into order_tab values(1,'Zhang San',100);

View table data

Insert another one. If the key is different, it will not be overwritten

insert into order_tab values(1,'Li Si',200);

When viewing table data, there will be two

Insert another one. If the key is the same, it will be overwritten

insert into order_tab values(1,'Zhang San',200);

View results

This completes the implementation of insert overwrite

(2) Offline project [T+1]
If it is an offline project, it does not pursue timeliness. If the UNIQUE KEY model table is selected, doris will compare a large amount of data every time it is imported. It is not friendly to doris maintenance and is not recommended. Therefore, the DUPLICATE KEY model table is selected

Create a table using DUPLICATE KEY model, and then use broker load for daily scheduling
Examples of table creation are as follows:

CREATE TABLE `usertouch_all_kpi_detail_user_day_snap` (
  `mid` BIGINT(20) NOT NULL COMMENT "activity id",
  `active_id` INT(11) NOT NULL COMMENT "activity id",
  `plan_id` INT(11) NOT NULL COMMENT "activity id",
  `wait_send_time` DATE NOT NULL COMMENT "Touch time",
  `par_date` DATE NOT NULL COMMENT "Partition field",
  `touch_flag` INT(11) NULL COMMENT "Touch mark",
  `succ_flag` INT(11) NULL COMMENT "Send success id",
  `target_flag` INT(11) NULL COMMENT "Target identification",
  `click_flag` INT(11) NULL COMMENT "Click logo",
  `plan_flag` INT(11) NULL COMMENT "Plan enrollment logo",
  `xiao_flag` INT(11) NULL COMMENT "Small class conversion logo",
  `xuda_flag` INT(11) NULL COMMENT "Major renewal logo",
  `gmv_flag` INT(11) NULL COMMENT "Continuation of major courses gmv(plan_id granularity)"
) ENGINE = OLAP 
DUPLICATE KEY(MID,active_id,plan_id,wait_send_time,par_date) COMMENT "List of indicators" 
PARTITION BY RANGE(par_date)() 
DISTRIBUTED BY HASH(mid) BUCKETS 10 PROPERTIES (
  "replication_num" = "3",
  "dynamic_partition.enable" = "true", 
  "dynamic_partition.time_unit" = "DAY",
  "dynamic_partition.start" = "-7", 
  "dynamic_partition.end" = "3", 
  "dynamic_partition.prefix" = "p", 
  "dynamic_partition.replication_num" = "3", 
  "dynamic_partition.buckets" = "10"
);

First, there should be a broker load scheduling script (refer to the detailed explanation of broker load writing doris written by me), and then update the partition data of the previous day

Idea 1: delete the partition of the previous day, then create a new partition for the table of the previous day, import the data, and update the data through operation, that is, complete insert overwrite

The operation is as follows:
Because I build a dynamic partition bucket table, I need to turn off the configuration of dynamic partitions before operating partitions, otherwise an error will be reported

First, turn off dynamic partitioning

ALTER TABLE usertouch_all_kpi_detail_user_day_snap SET ("dynamic_partition.enable" = "false")

Delete previous day's partition

ALTER TABLE usertouch_all_kpi_detail_user_day_snap DROP PARTITION p20211213;

New partition of the previous day

ALTER TABLE usertouch_all_kpi_detail_user_day_snap ADD PARTITION p20211213 VALUES [('2021-12-13'), ('2021-12-14'));

Query the partition of the previous day. The partition is empty

Then rerun the broker load script

show load 

Completed

View partition data, which is the updated table data

be careful!!! Finally, the dynamic partition configuration must be turned on, or the subsequent scheduling will report an error (no partition)

ALTER TABLE usertouch_all_kpi_detail_user_day_snap SET ("dynamic_partition.enable" = "true")

Summary: this completes the implementation of insert overwrite, but the data cannot be accessed during the deletion and reconstruction of the partition the previous day

Idea 2: create a new temporary partition, write the updated data to the temporary partition, and finally use the temporary partition to replace the partition of the previous day
The specific operations are as follows
New temporary partition
Turn off dynamic partitions

ALTER TABLE usertouch_all_kpi_detail_user_day_snap ADD TEMPORARY PARTITION tp20211213 VALUES [("2021-12-13"), ("2021-12-14"));

Use the break load script to write data to the temporary partition

After that, replace the previous day's formal partition of the table with a temporary partition

ALTER TABLE usertouch_all_kpi_detail_user_day_snap REPLACE PARTITION (p20211213) WITH TEMPORARY PARTITION (tp20211213);

Summary: during the operation of insert overwrite, the data can still be accessed

Idea 3: of course, you can also directly use the UNIQUE KEY model table, which makes the requirements for the configuration and maintenance of doris cluster more stringent

Keywords: Data Warehouse OLAP Doris

Added by snipesh0tz on Tue, 14 Dec 2021 10:25:54 +0200