1. Basic data type
Hive data type | Java data type | length | example |
---|---|---|---|
TINYINT | byte | 1byte signed integer | 20 |
SMALINT | short | 2byte signed integer | 20 |
INT | int | 4byte signed integer | 20 |
BIGINT | long | 8byte signed integer | 20 |
BOOLEAN | boolean | Boolean type, true or false | TRUE FALSE |
FLOAT | float | Single-precision floating-point | 3.14159 |
DOUBLE | double | Double precision floating point number | 3.14159 |
STRING | string | Character series. You can specify a character set. You can use single or double quotation marks. | 'now is the time' "for all good men" |
TIMESTAMP | Time type | ||
BINARY | Byte array | ||
deicimal | Signed decimal of arbitrary precision | 1.0 | |
varchar | Variable length string | "a", "b" | |
char | Fixed length string | "a", "b" | |
binary | Byte array | ||
timestamp | Timestamp, nanosecond precision | 122327493795 | |
date | date | 2018-04-07 |
The String type of Hive is equivalent to the varchar type of the database. This type is a variable String, but it cannot declare how many characters it can store at most. Theoretically, it can store 2GB characters; Data types are the implementation of interfaces in Java, so the specific behavior details of these types are completely consistent with the corresponding types in Java
2. Collection data type
data type | describe | Syntax example |
---|---|---|
STRUCT | Similar to struct in c language, element content can be accessed through "point" symbol. For example, if the data type of a column is STRUCT{first STRING, last STRING}, the first element can be referenced through the field. First. | struct() |
MAP | A MAP is a set of key value pair tuples that can be accessed using array notation. For example, if the data type of a column is MAP, and the key - > value pairs are 'first' - > 'John' and 'last' - > 'Doe', the last element can be obtained through the field name ['last'] | map() |
ARRAY | An array is a collection of variables of the same type and name. These variables are called array elements. Each array element has a number, starting from zero. For example, if the array value is ['John', 'Doe], the second element can be referenced by the array name [1]. | Array() |
Example
1. Create table:
hive (hive_1)> create table test( > name string, > friends array<string>, > children map<string, int>, > address struct<street:string, city:string> > ) > row format delimited fields terminated by ',' // Column separator > collection items terminated by '_' // Separator for map structure and ARRAY (data split symbol) > map keys terminated by ':' // Separator between key and value in MAP > lines terminated by '\n'; // line Separator OK Time taken: 0.894 seconds // View table structure hive (hive_1)> desc test; OK col_name data_type comment name string friends array<string> children map<string,int> address struct<street:string,city:string> Time taken: 0.593 seconds, Fetched: 4 row(s) hive (hive_1)> load data local inpath '/home/hadoop/apps/my_code/test.txt' into table test; Loading data to table hive_1.test Table hive_1.test stats: [numFiles=1, totalSize=144] OK Time taken: 1.675 seconds
2. Data to insert:
// [hadoop@hadoop1 apps]$ cat my_code/test.txt songsong,bingbing_lili,xiao song:18_xiaoxiao song:19,hui long guan_beijing yangyang,caicai_susu,xiao yang:18_xiaoxiao yang:19,chao yang_beijing
3. Import data and query:
hive (hive_1)> select * from test; OK test.name test.friends test.children test.address songsong ["bingbing","lili"] {"xiao song":18,"xiaoxiao song":19} {"street":"hui long guan","city":"beijing"} yangyang ["caicai","susu"] {"xiao yang":18,"xiaoxiao yang":19} {"street":"chao yang","city":"beijing"} Time taken: 0.998 seconds, Fetched: 2 row(s) hive (hive_1)> select friends[1], children['xiao song'], address.city from test where name='songsong'; OK _c0 _c1 city lili 18 beijing Time taken: 0.792 seconds, Fetched: 1 row(s)
3. Type conversion
Hive's yard data type can be implicitly converted, which is similar to Java's type conversion. For example, tinyint will be automatically converted to int, but the reverse requires cast display (conversion) operation
Implicit conversion rule:
- Any integer type can be implicitly converted to a broader type, such as tinyint - > int
- All integer types, float and string can be converted to double
- Boolean cannot be converted to any other type
**cast display conversion**
// Convert string '1' to integer 1. The conversion fails and returns NULL CAST('1' AS INT)
4. Data format
Hive data is stored in the / user/hive/warehouse directory of hdfs file system by default. The default data format is textfile, which is not compressed. Other data formats include SequenceFile, RCFile, ORCFile and Parquet
For details, please refer to: Hive learning path (6) data type and storage format of Hive SQL