ClickHouse data type

1. Basic data type

1.1 value type

1.1.1 Int

ClickHouse	Relational database	Number of bytes
Int8	TinyInt	1
Int16	SmallInt	2
Int32	Int	4
Int64	BigInt	8

1.1.2 Float

ClickHouse	accuracy	Number of bytes
Float32	7 bits	4
Float64	16 bit	8

1.1.3 Decimal

ClickHouse	Native Writing	Abbreviation
Decimal32	Decimal32(P,S)	Decimal 32 (s), equivalent to Decimal(1~9,S)
Decimal64	Decimal64(P,S)	Decimal 64 (s), equivalent to Decimal(10~18,S)
Decimal128	Decimal128(P,S)	Decimal 128 (s), equivalent to Decimal(19~38,S)

explain:

P stands for precision, which determines the total number of digits (integer part + decimal part), and the value range is [1,38]; S represents the scale, determines the decimal places, and the value range is [0,P]

When dividing two fixed-point numbers with different accuracy, S is the S of the divisor (numerator), but the accuracy of the numerator is required to be greater than or equal to the accuracy of the denominator

Because modern calculator systems only support 32-bit and 64 bit CPU s, Decimal128 is simulated at the software level, and its speed will be significantly slower than Decimal32 and Decimal64

1.1.4 special value of value type

inf: positive infinity

-inf: negative infinity

nan: non numeric

1.2 string type

1.2.1 String

The string is defined by string and has unlimited length. Therefore, there is no need to declare the size when using string. It completely replaces the character types such as Varchar, Text, Clob and Blob in the traditional database. String type does not limit the character set, because it does not have this concept at all, so any encoded string can be stored in it. However, for the standardization and maintainability of the program, unified coding, such as UTF-8, should be followed in the same set of programs.

1.2.2 FixedString

The FixedString type is somewhat similar to the traditional Char type. When some characters have a definite length, a fixed length string can be used. A fixed length string is declared by FixedString(N), where N represents the length of the string. However, unlike Char, FixedString uses null bytes to fill the end characters, while Char usually uses spaces to fill. For example, in the following example, although the string "abc" has only 3 bits, its length is 5, because there are 2 empty characters at the end.

:) SELECT toFixedString('abc',5), LENGTH(toFixedString('abc',5)) AS LENGTH;
┌─toFixedString('abc', 5)─┬─LENGTH─┐
│ abc                     │      5 │
└─────────────────────────┴────────┘

1.2.3 UUID

UUID is a common primary key type in database. It is directly regarded as a data type in ClickHouse. UUID has 32 bits in total, and its format is 8-4-4-4-12. If a field of UUID type is not assigned when writing data, it will be filled with 0 according to the format, for example:

CREATE TABLE UUID_TEST (c1 UUID,c2 String) ENGINE = Memory;
INSERT INTO UUID_TEST SELECT generateUUIDv4(),'t1';
INSERT INTO UUID_TEST(c2) VALUES('t2');
SELECT * FROM UUID_TEST;
┌───────────────────────────────────c1─┬─c2─┐
│ 308297a4-c66e-4ecd-ac34-f948b0f9752c │ t1 │
└──────────────────────────────────────┴────┘
┌───────────────────────────────────c1─┬─c2─┐
│ 00000000-0000-0000-0000-000000000000 │ t2 │
└──────────────────────────────────────┴────┘

1.3 time type

1.3.1 DateTime

DateTime type contains hour, minute and second information, accurate to seconds. It supports writing in string form:

CREATE TABLE Datetime_TEST (c1 Datetime) ENGINE = Memory;
--Write as string
INSERT INTO Datetime_TEST VALUES('2019-06-22 00:00:00');
SELECT c1, toTypeName(c1) FROM Datetime_TEST;
┌──────────────────c1─┬─toTypeName(c1)─┐
│ 2019-06-22 00:00:00 │ DateTime       │
└─────────────────────┴────────────────┘

1.3.2 DateTime64

DateTime64 can record sub seconds. It adds precision settings to DateTime, for example:

CREATE TABLE Datetime64_TEST (c1 Datetime64(2)) ENGINE = Memory;
INSERT INTO Datetime64_TEST VALUES('2019-06-22 00:00:00');
SELECT c1, toTypeName(c1) FROM Datetime64_TEST;

There is a problem: no matter whether the data type is datetime64 or datetime64, an error will be reported when creating a table:

DB::Exception: Unknown data type family: DateTime64. Maybe you meant: ['DateTime']

1.3.3 Date

Date type does not contain specific time information. It is only accurate to days. It also supports writing in string form:

CREATE TABLE Date_TEST (c1 Date) ENGINE = Memory;
INSERT INTO Date_TEST VALUES('2019-06-22');
SELECT c1, toTypeName(c1) FROM Date_TEST;
┌─────────c1─┬─toTypeName(c1)─┐
│ 2019-06-22 │ Date           │
└────────────┴────────────────┘

2. Composite type

2.1 Array

# Conventional mode
SELECT array(1, 2) as a , toTypeName(a);
┌─a─────┬─toTypeName([1, 2])─┐
│ [1,2] │ Array(UInt8)       │
└───────┴────────────────────┘

# Abbreviation
SELECT [1, 2];
┌─[1, 2]─┐
│ [1,2]  │
└────────┘

Through the above example, it can be found that it is not necessary to actively declare the element type of the array during query. Because the ClickHouse array has the ability of type inference, the inference basis is based on the principle of minimum storage cost, that is, using the smallest expressible data type. For example, in the above example, array(1,2) will automatically infer UInt8 as the array type. However, if there is a Null value in the array element, the element type will become Nullable, for example:

SELECT [1, 2, null] as a , toTypeName(a);
┌─a──────────┬─toTypeName([1, 2, NULL])─┐
│ [1,2,NULL] │ Array(Nullable(UInt8))   │
└────────────┴──────────────────────────┘

Multiple data types can be included in the same array. For example, array [1,2.0] is feasible. However, all types must be compatible. For example, the array [1, '2'] will report an error.

When defining table fields, the array needs to specify explicit element types, such as:

CREATE TABLE Array_TEST (c1 Array(String)) engine = Memory;

2.2 Tuple

The group type consists of 1 ~ n elements. Different data types are allowed to be set between each element, and they are not required to be compatible with each other. Tuples also support type inference, which is still based on the principle of minimum storage cost.

# Conventional mode
SELECT tuple(1,'a',now()) AS x, toTypeName(x);
┌─x─────────────────────────────┬─toTypeName(tuple(1, 'a', now()))─┐
│ (1,'a','2021-08-16 16:08:30') │ Tuple(UInt8, String, DateTime)   │
└───────────────────────────────┴──────────────────────────────────┘

# Abbreviation
SELECT (1,2.0,null) AS x, toTypeName(x);
┌─x──────────┬─toTypeName(tuple(1, 2., NULL))───────────┐
│ (1,2,NULL) │ Tuple(UInt8, Float64, Nullable(Nothing)) │
└────────────┴──────────────────────────────────────────┘

When defining table fields, tuples also need to specify explicit element types. Element types have similar functions as generics, which can further ensure data quality. Type checking is performed during data writing.

CREATE TABLE Tuple_TEST (c1 Tuple(String,Int8)) ENGINE = Memory;

2.3 Enum

ClickHouse supports enumeration types, a data type that is often used when defining constants. ClickHouse provides Enum8 and Enum16 enumeration types, which are the same except for different value ranges. Enum defines data in the form of (String:Int)Key/Value pairs, so Enum8 and Enum16 correspond to (String:Int8) and (String:Int16) respectively, for example:

CREATE TABLE Enum_TEST (c1 Enum8('ready'=1,'start'=2,'success'=3,'error'=4)) ENGINE = Memory;

There are several points to note when defining enumeration collections. First of all, Key and Value are not allowed to be repeated, and uniqueness should be guaranteed. Secondly, the values of Key and Value cannot be Null, but Key can be an empty string. When writing enumeration data, only the Key string part will be used, for example:

INSERT INTO Enum_TEST VALUES('ready');
INSERT INTO Enum_TEST VALUES('start');
select * from Enum_TEST;
┌─c1────┐
│ ready │
└───────┘
┌─c1────┐
│ start │
└───────┘

During data writing, it will be checked one by one against the contents of enumerated collection items. If the Key string is not within the range of the collection, an exception will be thrown.

The use of enumeration type can improve the calculation performance. Although the Key in the enumeration definition is of String type, the Value value of Int type will be used in all subsequent operations on enumeration (including sorting, grouping, de duplication, filtering, etc.).

2.4 Nested

Nested type, as the name suggests, is a nested table structure. A data table can define any number of nested type fields, but the nesting level of each field only supports one level, that is, nested types cannot continue to be used in nested tables. Nested types are also a good choice for hierarchical or associative relationships in simple scenarios. For example, the following nested_test is a simulated employee table, and its department field uses nested type:

CREATE TABLE nested_test (
    name String,
    age  UInt8 ,
    dept Nested(
        id UInt8,
        name String
    )
) ENGINE = Memory;

Take the above table as an example. If you understand it literally, it will be easy to understand as nested_test and dept are one-to-one inclusive relationships. In fact, this is wrong:

INSERT INTO nested_test VALUES ('nauu',18,10000,'R & D department');
# Error: DB:: exception: type mismatch in in or values section Expected: Array(UInt8).  Got: UInt64

Note the above exception information, which indicates that you want to write an Array type.

Nested type is essentially a multi-dimensional array structure. Each field in the nested table is an array, and the length of the array between rows does not need to be aligned. Therefore, you need to adjust the INSERT statement to the following form:

INSERT INTO nested_test VALUES ('bruce',30,[10000,10001,10002],['R & D department','Technical support center','Testing department']);
-- Between rows,Array lengths do not need to be aligned
INSERT INTO nested_test VALUES ('bruce',30,[10000,10001],['R & D department','Technical support center']);

If the length of the array field in the row is not aligned, an exception will be thrown:

INSERT INTO nested_test VALUES ('bruce',30,[10000,10001],['R & D department','Technical support center','Testing department']);
# DB::Exception: Elements 'dept.id' and 'dept.name' of Nested data structure 'dept' (Array columns) have different array sizes..

You need to use point symbols when accessing nested types of data, for example:

SELECT name, dept.id, dept.name FROM nested_test;
┌─name──┬─dept.id────┬─dept.name──────────────────────────┐
│ bruce │ [16,17,18] │ ['R & D department','Technical support center','Testing department'] │
└───────┴────────────┴────────────────────────────────────┘
┌─name──┬─dept.id─┬─dept.name─────────────────┐
│ bruce │ [16,17] │ ['R & D department','Technical support center'] │
└───────┴─────────┴───────────────────────────┘

3. Special types

3.1 Nullable

To be exact, nullable is not an independent data type. It is more like an auxiliary modifier, which needs to be used together with the basic data type. Nullable type is somewhat similar to the Optional object in Java 8. It indicates that an underlying data type can be Null. Its specific usage is as follows:

CREATE TABLE Null_TEST (
    c1 String,
    c2 Nullable(UInt8)
) ENGINE = TinyLog;
# After being decorated with Nullable, c2 field can be written with Null value
INSERT INTO Null_TEST VALUES ('nauu',null);
INSERT INTO Null_TEST VALUES ('bruce',20);
SELECT c1,c2,toTypeName(c2) FROM Null_TEST;
┌─c1────┬───c2─┬─toTypeName(c2)──┐
│ nauu  │ ᴺᵁᴸᴸ │ Nullable(UInt8) │
│ bruce │   20 │ Nullable(UInt8) │
└───────┴──────┴─────────────────┘

There are two other points to note when using Nullable types:

First of all, it can only be used with basic types, not for composite types such as arrays and tuples, nor as index fields;

Secondly, Nullable types should be used with caution, including Nullable data tables, otherwise the query and write performance will slow down. Because under normal circumstances, the data of each column field will be stored in the corresponding [column] In the bin file. If a column field is decorated with Nullable type, an additional [column] will be generated Null. The bin file specifically holds its Null value. This means that double the additional file operations are required when reading and writing data.

3.2 Domain

Domain name types are divided into IPv4 and IPv6. In essence, they are further encapsulation of integers and strings. IPv4 type is encapsulated based on UInt32. Its specific usage is as follows:

CREATE TABLE IP4_TEST (
    url String,
    ip IPv4
) ENGINE = Memory;
INSERT INTO IP4_TEST VALUES ('www.nauu.com','192.0.0.0');
SELECT url,ip,toTypeName(ip) FROM IP4_TEST;

IPv4 type supports format check. IP data with wrong format cannot be written:

INSERT INTO IP4_TEST VALUES ('www.nauu.com','192.0.0');
# DB::Exception: Invalid IPv4 value.

IPv4 uses UInt32 storage, which is more compact than String, occupies less space and has faster query performance.

IPv6 type is encapsulated based on FixedString(16), and its use method is the same as IPv4.

Although the Domain type looks like a String on the surface, the Domain type is not a String, so it does not support implicit automatic type conversion. If you need to return the String form of IP, you need to explicitly call the IPv4NumToString or IPv6NumToString function for conversion.

Added by Iank1968 on Sat, 08 Jan 2022 04:01:15 +0200

Programming VIP