hive data type

1. Basic data type

Hive data type Java data type length example
TINYINT byte 1byte signed integer 20
SMALINT short 2byte signed integer 20
INT int 4byte signed integer 20
BIGINT long 8byte signed integer 20
BOOLEAN boolean Boolean type, true or false TRUE FALSE
FLOAT float Single-precision floating-point 3.14159
DOUBLE double Double precision floating point number 3.14159
STRING string Character series. You can specify a character set. You can use single or double quotation marks. 'now is the time' "for all good men"
TIMESTAMP Time type
BINARY Byte array
deicimal Signed decimal of arbitrary precision 1.0
varchar Variable length string "a", "b"
char Fixed length string "a", "b"
binary Byte array
timestamp Timestamp, nanosecond precision 122327493795
date date 2018-04-07

The String type of Hive is equivalent to the varchar type of the database. This type is a variable String, but it cannot declare how many characters it can store at most. Theoretically, it can store 2GB characters; Data types are the implementation of interfaces in Java, so the specific behavior details of these types are completely consistent with the corresponding types in Java

2. Collection data type

data type describe Syntax example
STRUCT Similar to struct in c language, element content can be accessed through "point" symbol. For example, if the data type of a column is STRUCT{first STRING, last STRING}, the first element can be referenced through the field. First. struct()
MAP A MAP is a set of key value pair tuples that can be accessed using array notation. For example, if the data type of a column is MAP, and the key - > value pairs are 'first' - > 'John' and 'last' - > 'Doe', the last element can be obtained through the field name ['last'] map()
ARRAY An array is a collection of variables of the same type and name. These variables are called array elements. Each array element has a number, starting from zero. For example, if the array value is ['John', 'Doe], the second element can be referenced by the array name [1]. Array()

Example

1. Create table:

hive (hive_1)> create table test(
             > name string,
             > friends array<string>,
             > children map<string, int>,
             > address struct<street:string, city:string>
             > )
             > row format delimited fields terminated by ','	// Column separator
             > collection items terminated by '_'	// Separator for map structure and ARRAY (data split symbol)
             > map keys terminated by ':'		// Separator between key and value in MAP
             > lines terminated by '\n';		// line Separator 
OK
Time taken: 0.894 seconds

// View table structure
hive (hive_1)> desc test;
OK
col_name        data_type       comment
name                    string
friends                 array<string>
children                map<string,int>
address                 struct<street:string,city:string>
Time taken: 0.593 seconds, Fetched: 4 row(s)
hive (hive_1)> load data local inpath '/home/hadoop/apps/my_code/test.txt' into table test;
Loading data to table hive_1.test
Table hive_1.test stats: [numFiles=1, totalSize=144]
OK
Time taken: 1.675 seconds

2. Data to insert:

// [hadoop@hadoop1 apps]$ cat my_code/test.txt

songsong,bingbing_lili,xiao song:18_xiaoxiao song:19,hui long guan_beijing
yangyang,caicai_susu,xiao yang:18_xiaoxiao yang:19,chao yang_beijing

3. Import data and query:

hive (hive_1)> select * from test;
OK
test.name       test.friends    test.children   test.address
songsong        ["bingbing","lili"]     {"xiao song":18,"xiaoxiao song":19}     {"street":"hui long guan","city":"beijing"}
yangyang        ["caicai","susu"]       {"xiao yang":18,"xiaoxiao yang":19}     {"street":"chao yang","city":"beijing"}
Time taken: 0.998 seconds, Fetched: 2 row(s)


hive (hive_1)> select friends[1], children['xiao song'], address.city from test where name='songsong';
OK
_c0     _c1     city
lili    18      beijing
Time taken: 0.792 seconds, Fetched: 1 row(s)

3. Type conversion

Hive's yard data type can be implicitly converted, which is similar to Java's type conversion. For example, tinyint will be automatically converted to int, but the reverse requires cast display (conversion) operation

Implicit conversion rule:

  • Any integer type can be implicitly converted to a broader type, such as tinyint - > int
  • All integer types, float and string can be converted to double
  • Boolean cannot be converted to any other type

**cast display conversion**

// Convert string '1' to integer 1. The conversion fails and returns NULL
CAST('1' AS INT)

4. Data format

Hive data is stored in the / user/hive/warehouse directory of hdfs file system by default. The default data format is textfile, which is not compressed. Other data formats include SequenceFile, RCFile, ORCFile and Parquet

For details, please refer to: Hive learning path (6) data type and storage format of Hive SQL

Keywords: hive

Added by choubix on Sun, 05 Dec 2021 16:11:35 +0200