insert overwrite implementation of Doris partition bucket table
(1) Real time project
If it is real-time, we should pay attention to timeliness. Directly select the unique key model table and select the non repeatable columns as unique keys
A simple example of creating a single partition table
CREATE TABLE order_tab ( orderid BIGINT, username VARCHAR(32), amount BIGINT DEFAULT '0' ) UNIQUE KEY(orderid,username) DISTRIBUTED BY HASH(orderid) BUCKETS 10 PROPERTIES("replication_num" = "1");
Insert data after
insert into order_tab values(1,'Zhang San',100);
View table data
Insert another one. If the key is different, it will not be overwritten
insert into order_tab values(1,'Li Si',200);
When viewing table data, there will be two
Insert another one. If the key is the same, it will be overwritten
insert into order_tab values(1,'Zhang San',200);
View results
This completes the implementation of insert overwrite
(2) Offline project [T+1]
If it is an offline project, it does not pursue timeliness. If the UNIQUE KEY model table is selected, doris will compare a large amount of data every time it is imported. It is not friendly to doris maintenance and is not recommended. Therefore, the DUPLICATE KEY model table is selected
Create a table using DUPLICATE KEY model, and then use broker load for daily scheduling
Examples of table creation are as follows:
CREATE TABLE `usertouch_all_kpi_detail_user_day_snap` ( `mid` BIGINT(20) NOT NULL COMMENT "activity id", `active_id` INT(11) NOT NULL COMMENT "activity id", `plan_id` INT(11) NOT NULL COMMENT "activity id", `wait_send_time` DATE NOT NULL COMMENT "Touch time", `par_date` DATE NOT NULL COMMENT "Partition field", `touch_flag` INT(11) NULL COMMENT "Touch mark", `succ_flag` INT(11) NULL COMMENT "Send success id", `target_flag` INT(11) NULL COMMENT "Target identification", `click_flag` INT(11) NULL COMMENT "Click logo", `plan_flag` INT(11) NULL COMMENT "Plan enrollment logo", `xiao_flag` INT(11) NULL COMMENT "Small class conversion logo", `xuda_flag` INT(11) NULL COMMENT "Major renewal logo", `gmv_flag` INT(11) NULL COMMENT "Continuation of major courses gmv(plan_id granularity)" ) ENGINE = OLAP DUPLICATE KEY(MID,active_id,plan_id,wait_send_time,par_date) COMMENT "List of indicators" PARTITION BY RANGE(par_date)() DISTRIBUTED BY HASH(mid) BUCKETS 10 PROPERTIES ( "replication_num" = "3", "dynamic_partition.enable" = "true", "dynamic_partition.time_unit" = "DAY", "dynamic_partition.start" = "-7", "dynamic_partition.end" = "3", "dynamic_partition.prefix" = "p", "dynamic_partition.replication_num" = "3", "dynamic_partition.buckets" = "10" );
First, there should be a broker load scheduling script (refer to the detailed explanation of broker load writing doris written by me), and then update the partition data of the previous day
Idea 1: delete the partition of the previous day, then create a new partition for the table of the previous day, import the data, and update the data through operation, that is, complete insert overwrite
The operation is as follows:
Because I build a dynamic partition bucket table, I need to turn off the configuration of dynamic partitions before operating partitions, otherwise an error will be reported
First, turn off dynamic partitioning
ALTER TABLE usertouch_all_kpi_detail_user_day_snap SET ("dynamic_partition.enable" = "false")
Delete previous day's partition
ALTER TABLE usertouch_all_kpi_detail_user_day_snap DROP PARTITION p20211213;
New partition of the previous day
ALTER TABLE usertouch_all_kpi_detail_user_day_snap ADD PARTITION p20211213 VALUES [('2021-12-13'), ('2021-12-14'));
Query the partition of the previous day. The partition is empty
Then rerun the broker load script
show load
Completed
View partition data, which is the updated table data
be careful!!! Finally, the dynamic partition configuration must be turned on, or the subsequent scheduling will report an error (no partition)
ALTER TABLE usertouch_all_kpi_detail_user_day_snap SET ("dynamic_partition.enable" = "true")
Summary: this completes the implementation of insert overwrite, but the data cannot be accessed during the deletion and reconstruction of the partition the previous day
Idea 2: create a new temporary partition, write the updated data to the temporary partition, and finally use the temporary partition to replace the partition of the previous day
The specific operations are as follows
New temporary partition
Turn off dynamic partitions
ALTER TABLE usertouch_all_kpi_detail_user_day_snap ADD TEMPORARY PARTITION tp20211213 VALUES [("2021-12-13"), ("2021-12-14"));
Use the break load script to write data to the temporary partition
After that, replace the previous day's formal partition of the table with a temporary partition
ALTER TABLE usertouch_all_kpi_detail_user_day_snap REPLACE PARTITION (p20211213) WITH TEMPORARY PARTITION (tp20211213);
Summary: during the operation of insert overwrite, the data can still be accessed
Idea 3: of course, you can also directly use the UNIQUE KEY model table, which makes the requirements for the configuration and maintenance of doris cluster more stringent