TTL rules and practice of tables and columns in clickhouse

1. TTL of tables and columns

Defines the lifecycle of the value

You can set for the entire table or for each individual column.

Table level TTL can also specify the logic to automatically move data between disks and volumes.

The table that sets TTL must contain fields of type Date or DateTime.

To define the life cycle of data, you need to use the operator in this date field:

TTL time_column
TTL time_column + interval

Example:

TTL date_time + INTERVAL 1 MONTH

TTL date_time + INTERVAL 15 HOUR

2. Train level TTL

When the values in the column expire, ClickHouse replaces them with the default values of the corresponding data type of the column.

If all column values in the data fragment are out of date, delete the file of the column under the data fragment.

TTL clause cannot be used for key column.

Example:

  1. Create TTL table
  2. TTL expiration verification
  3. Add TTL to column
  4. Modify TTL of column

3. Table level TTL

  1. Table level TTL defines the logic of deleting expired rows and automatically moving data between disks and volumes.
  2. A table can define an expression for removing expired rows and a logical expression for automatically moving data between multiple disks and volumes.
    TTL expr [DELETE|TO DISK 'aaa'|TO VOLUME 'bbb'], ...
  3. When the data in the table expires, ClickHouse deletes all corresponding rows.
  4. The type of TTL rule follows each TTL expression. It represents the operation to be performed after the expression is satisfied (reaching the current time).
    DELETE - DELETE expired rows (default operation);
    TO DISK 'aaa' - move the fragment TO DISK aaa;
    TO VOLUME 'bbb' - move the fragment to disk bbb;
  5. Create an example:
    CREATE TABLE example_table
    (
        d DateTime, 
        a Int 
    )ENGINE = MergeTree 
    PARTITION BY toYYYYMM(d) 
    ORDER BY d 
    TTL d + INTERVAL 1 MONTH [DELETE], 
        d + INTERVAL 1 WEEK TO VOLUME 'aaa',
        d + INTERVAL 2 WEEK TO DISK 'bbb'
    SETTINGS storage_policy = 'moving_from_ssd_to_hdd';
    

    Note: when the TTL expression specifies the logic of moving data between disks and volumes, the ClickHouse table must specify the storage policy, and the storage policy must include the corresponding disks and volumes.

  6. Use case: data deletion after TTL expires.
    1). When ClickHouse merges data fragments, the TTL expired data will be deleted.
    2). When ClickHouse finds that the data is out of date, it will perform an unplanned merge. To control the frequency of such merging, set the parameter merge_with_ttl_timeout. If this value is set too low, it will lead to many unplanned mergers, which may consume a lot of resources.
    3). If you execute a SELECT query when merging, you may get expired data. To avoid this, you can use the OPTIMIZE query before SELECT.

4. Example of column level TTL

  1. Create table with TTL
    DROP TABLE example_table;
    CREATE TABLE example_table 
    (
        d DateTime,
        a Int TTL d + INTERVAL 1 MINUTE,
        b String TTL d + INTERVAL 1 MINUTE,
        c String
    )
    ENGINE = MergeTree
    ORDER BY d;
    xxxxx :) show create table example_table;
    
    SHOW CREATE TABLE example_table
    
    ┌─statement─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
    │ CREATE TABLE default.example_table
    (
        `d` DateTime,
        `a` Int32 TTL d + toIntervalMinute(1),
        `b` String TTL d + toIntervalMinute(1),
        `c` String
    )
    ENGINE = MergeTree
    ORDER BY d
    SETTINGS index_granularity = 8192 │
    └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
    
    1 rows in set. Elapsed: 0.007 sec. 
    
    xxxxx :)
    
  2. insert data
    insert into example_table values (now(), 1, 'value1', 'ccc1');
    insert into example_table values(now(), 2, 'value2', 'ccc2');
    
    After 1 minute wait:
    xxxxx :) select * from example_table;
    
    SELECT *
    FROM example_table
    
    ┌───────────────────d─┬─a─┬─b──────┬─c────┐
    │ 2020-11-26 14:53:16 │ 2 │ value2 │ ccc2 │
    └─────────────────────┴───┴────────┴──────┘
    ┌───────────────────d─┬─a─┬─b──────┬─c────┐
    │ 2020-11-26 14:53:15 │ 1 │ value1 │ ccc1 │
    └─────────────────────┴───┴────────┴──────┘
    
    2 rows in set. Elapsed: 0.009 sec. 
    
    xxxxx :) optimize table example_table;
    
    OPTIMIZE TABLE example_table
    
    Ok.
    
    0 rows in set. Elapsed: 0.004 sec. 
    
    xxxxx :) select * from example_table;
    
    SELECT *
    FROM example_table
    
    ┌───────────────────d─┬─a─┬─b─┬─c────┐
    │ 2020-11-26 14:53:15 │ 0 │   │ ccc1 │
    │ 2020-11-26 14:53:16 │ 0 │   │ ccc2 │
    └─────────────────────┴───┴───┴──────┘
    
    2 rows in set. Elapsed: 0.007 sec. 
    
    xxxxx :)
    

     

  3. Add TTL to the column of the table:
    ALTER TABLE example_table
    MODIFY COLUMN
    c String TTL d + INTERVAL 1 DAY;
    xxxxx :) ALTER TABLE example_table
    :-] MODIFY COLUMN
    :-] c String TTL d + INTERVAL 1 DAY;
    
    ALTER TABLE example_table
        MODIFY COLUMN `c` String TTL d + toIntervalDay(1)
    
    
    Ok.
    
    0 rows in set. Elapsed: 0.018 sec. 
    
    xxxxx :)
    
    To modify the TTL of a column:
    ALTER TABLE example_table
        MODIFY COLUMN
        c String TTL d + INTERVAL 1 MONTH;
    xxxxx :) ALTER TABLE example_table
    :-]     MODIFY COLUMN
    :-]     c String TTL d + INTERVAL 1 MONTH;
    
    ALTER TABLE example_table
        MODIFY COLUMN `c` String TTL d + toIntervalMonth(1)
    
    
    Ok.
    
    0 rows in set. Elapsed: 0.015 sec. 
    
    xxxxx :) show create table example_table;
    
    SHOW CREATE TABLE example_table
    
    ┌─statement────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
    │ CREATE TABLE default.example_table
    (
        `d` DateTime,
        `a` Int32 TTL d + toIntervalMinute(1),
        `b` String TTL d + toIntervalMinute(1),
        `c` String TTL d + toIntervalMonth(1)
    )
    ENGINE = MergeTree
    ORDER BY d
    SETTINGS index_granularity = 8192 │
    └──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
    
    1 rows in set. Elapsed: 0.004 sec. 
    
    xxxxx :) 
    

     

5. Table level TTL example

  1. Create table
    drop table example_table ;
    CREATE TABLE example_table 
    (
        d DateTime,
        a Int,
        b String,
        c String
    )
    ENGINE = MergeTree
    ORDER BY d
    TTL d + INTERVAL 1 MINUTE DELETE;
    
  2. insert data
    insert into example_table values (now(), 1, 'value1', 'ccc1');
    insert into example_table values(now(), 2, 'value2', 'ccc2');
    
    Then query results:
    xxxxx :) select * from example_table;
    
    SELECT *
    FROM example_table
    
    ┌───────────────────d─┬─a─┬─b──────┬─c────┐
    │ 2020-11-26 15:24:53 │ 1 │ value1 │ ccc1 │
    │ 2020-11-26 15:24:54 │ 2 │ value2 │ ccc2 │
    └─────────────────────┴───┴────────┴──────┘
    
    2 rows in set. Elapsed: 0.008 sec. 
    
    xxxxx :)
    

    After waiting for 1 minute, perform the optimize operation.

    xxxxx :) optimize table example_table;
    
    OPTIMIZE TABLE example_table
    
    Ok.
    
    0 rows in set. Elapsed: 0.006 sec. 
    
    xxxxx :) select * from example_table;
    
    SELECT *
    FROM example_table
    
    Ok.
    
    0 rows in set. Elapsed: 0.003 sec. 
    
    xxxxx :)
    

     

Keywords: Database

Added by icez on Tue, 08 Mar 2022 13:59:48 +0200