hive data type

1. Basic data type

Hive data type	Java data type	length	example
TINYINT	byte	1byte signed integer	20
SMALINT	short	2byte signed integer	20
INT	int	4byte signed integer	20
BIGINT	long	8byte signed integer	20
BOOLEAN	boolean	Boolean type, true or false	TRUE FALSE
FLOAT	float	Single-precision floating-point	3.14159
DOUBLE	double	Double precision floating point number	3.14159
STRING	string	Character series. You can specify a character set. You can use single or double quotation marks.	'now is the time' "for all good men"
TIMESTAMP		Time type
BINARY		Byte array
deicimal		Signed decimal of arbitrary precision	1.0
varchar		Variable length string	"a", "b"
char		Fixed length string	"a", "b"
binary		Byte array
timestamp		Timestamp, nanosecond precision	122327493795
date		date	2018-04-07

The String type of Hive is equivalent to the varchar type of the database. This type is a variable String, but it cannot declare how many characters it can store at most. Theoretically, it can store 2GB characters; Data types are the implementation of interfaces in Java, so the specific behavior details of these types are completely consistent with the corresponding types in Java

2. Collection data type

data type	describe	Syntax example
STRUCT	Similar to struct in c language, element content can be accessed through "point" symbol. For example, if the data type of a column is STRUCT{first STRING, last STRING}, the first element can be referenced through the field. First.	struct()
MAP	A MAP is a set of key value pair tuples that can be accessed using array notation. For example, if the data type of a column is MAP, and the key - > value pairs are 'first' - > 'John' and 'last' - > 'Doe', the last element can be obtained through the field name ['last']	map()
ARRAY	An array is a collection of variables of the same type and name. These variables are called array elements. Each array element has a number, starting from zero. For example, if the array value is ['John', 'Doe], the second element can be referenced by the array name [1].	Array()

Example

1. Create table:

hive (hive_1)> create table test(
             > name string,
             > friends array<string>,
             > children map<string, int>,
             > address struct<street:string, city:string>
             > )
             > row format delimited fields terminated by ','	// Column separator
             > collection items terminated by '_'	// Separator for map structure and ARRAY (data split symbol)
             > map keys terminated by ':'		// Separator between key and value in MAP
             > lines terminated by '\n';		// line Separator 
OK
Time taken: 0.894 seconds

// View table structure
hive (hive_1)> desc test;
OK
col_name        data_type       comment
name                    string
friends                 array<string>
children                map<string,int>
address                 struct<street:string,city:string>
Time taken: 0.593 seconds, Fetched: 4 row(s)
hive (hive_1)> load data local inpath '/home/hadoop/apps/my_code/test.txt' into table test;
Loading data to table hive_1.test
Table hive_1.test stats: [numFiles=1, totalSize=144]
OK
Time taken: 1.675 seconds

2. Data to insert:

// [hadoop@hadoop1 apps]$ cat my_code/test.txt

songsong,bingbing_lili,xiao song:18_xiaoxiao song:19,hui long guan_beijing
yangyang,caicai_susu,xiao yang:18_xiaoxiao yang:19,chao yang_beijing

3. Import data and query:

hive (hive_1)> select * from test;
OK
test.name       test.friends    test.children   test.address
songsong        ["bingbing","lili"]     {"xiao song":18,"xiaoxiao song":19}     {"street":"hui long guan","city":"beijing"}
yangyang        ["caicai","susu"]       {"xiao yang":18,"xiaoxiao yang":19}     {"street":"chao yang","city":"beijing"}
Time taken: 0.998 seconds, Fetched: 2 row(s)


hive (hive_1)> select friends[1], children['xiao song'], address.city from test where name='songsong';
OK
_c0     _c1     city
lili    18      beijing
Time taken: 0.792 seconds, Fetched: 1 row(s)

3. Type conversion

Hive's yard data type can be implicitly converted, which is similar to Java's type conversion. For example, tinyint will be automatically converted to int, but the reverse requires cast display (conversion) operation

Implicit conversion rule:

Any integer type can be implicitly converted to a broader type, such as tinyint - > int
All integer types, float and string can be converted to double
Boolean cannot be converted to any other type

**cast display conversion**

// Convert string '1' to integer 1. The conversion fails and returns NULL
CAST('1' AS INT)

4. Data format

Hive data is stored in the / user/hive/warehouse directory of hdfs file system by default. The default data format is textfile, which is not compressed. Other data formats include SequenceFile, RCFile, ORCFile and Parquet

For details, please refer to: Hive learning path (6) data type and storage format of Hive SQL

Keywords: hive

Added by choubix on Sun, 05 Dec 2021 16:11:35 +0200

Programming VIP

hive data type

1. Basic data type

2. Collection data type

3. Type conversion

4. Data format

Popular Keywords