Hive DDL data definition

DDL data definition

Create database

CREATE DATABASE [IF NOT EXISTS] database_name
[COMMENT database_comment] // notes
[LOCATION hdfs_path] // The path where the library directory is stored
[WITH DBPROPERTIES (property_name=property_value, ...)]; // Other information

1) Create a database. The default storage path of the database on HDFS is / user / hive / warehouse / * db.

hive (default)> create database db_hive;

2) To avoid existing errors in the database to be created, add the if not exists judgment. (standard writing)

hive (default)> create database db_hive;
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask. Database db_hive already exists

hive (default)> create database if not exists db_hive;

3) Create a database and specify the location where the database is stored on HDFS

hive (default)> create database db_hive2 location '/db_hive2.db';

query data base

show database

1) Display database

hive> show databases;

2) Filter the database that displays the query

hive (default)> show databases like 'db_hive*';
OK
database_name
db_hive2
Time taken: 0.086 seconds, Fetched: 1 row(s)

View database details

1) Display database information

hive (default)> desc database db_hive2;
OK
db_name comment location        owner_name      owner_type      parameters
db_hive2                hdfs://hadoop113:8020/db_hive2.db       bd      USER
Time taken: 0.141 seconds, Fetched: 1 row(s)

2) Display database information

hive (default)> desc database extended db_hive2;
OK
db_name comment location        owner_name      owner_type      parameters
db_hive2                hdfs://hadoop113:8020/db_hive2.db       bd      USER
Time taken: 0.141 seconds, Fetched: 1 row(s)

Switch current database

hive (default)> use db_hive2;

modify the database

Users can use the ALTER DATABASE command to set key value pair attribute values for DBPROPERTIES of a database to describe the attribute information of the database.

hive (default)> alter database db_hive set dbproperties('createtime'='20210830');

View the available desc database extended dB for the modified database information_ hive;

Delete database

1) Delete empty database

hive>drop database db_hive2;

2) If the deleted database does not exist, it is best to use if exists to judge whether the database exists

hive> drop database db_hive;
FAILED: SemanticException [Error 10072]: Database does not exist: db_hive
hive> drop database if exists db_hive;

3) If the database is not empty, you can use the cascade command to forcibly delete it

hive> drop database db_hive;
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask.
InvalidOperationException(message:Database db_hive is not empty. One ormore tables exist.)

hive> drop database db_hive cascade;

Create table

1) Table building syntax

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...)
[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[ROW FORMAT row_format]
[STORED AS file_format]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)]
[AS select_statement]

2) Field explanation

(1) CREATE TABLE creates a table with the specified name. If a table with the same name already exists, an exception is thrown; The user can ignore this exception with the IF NOT EXISTS option.

(2)EXTERNAL keyword allows users to create an external table. While creating the table, you can specify a path (LOCATION) to the actual data. When deleting the table, the metadata and data of the internal table will be deleted together, while the external table only deletes the metadata and does not delete the data. There are internal tables and external tables in Hive.

(3)COMMENT: adds comments to tables and columns.

(4)PARTITIONED BY create partitioned table

(5)CLUSTERED BY create bucket table

(6)SORTED BY is not commonly used. One or more columns in the bucket are sorted separately

(7)ROW FORMAT, which defines the format of rows.

DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
| SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value,property_name=property_value, ...)]

When creating tables, users can customize SerDe or use the self-contained SerDe. If ROWFORMAT or row format deleted is not specified, the self-contained SerDe will be used. When creating a table, you also need to specify columns for the table. When you specify the columns of the table, you will also specify a custom SerDe. Hive determines the data of the specific columns of the table through SerDe.

Serde is short for Serialize/Deserilize. hive uses Serde to sequence and deserialize row objects.

(8)STORED AS specifies the storage file type

Common storage file types: SEQUENCEFILE (binary sequence file), TEXTFILE (text), RCFILE (column storage format file). If the file data is plain text, STORED AS TEXTFILE can be used. If the data needs to be compressed, use stored as sequence file.

(9)LOCATION: Specifies the storage location of the table on HDFS.

(10) Additional properties of the tblproperties table.

(11)AS: followed by a query statement, create a table according to the query results. create table if not exists student2 as select id, name from student;

(12)LIKE allows users to copy existing table structures without copying data. create table if not exists student3 like student;

You can view whether the table is external or internal through the TBLS table

When creating a table, for multiple fields, it is best to execute the separator to facilitate the download and view of the source data and directly put the file to the specified directory.

Management table (internal table)

The tables created by default are so-called management tables, sometimes referred to as internal tables. Because of this table, hive controls (more or less) the life cycle of the data. Hive stores the data of these tables in hive by default metastore. warehouse. Dir (for example, / user/hive/warehouse).

When we delete a management table, Hive will also delete the data in the table. Management tables are not suitable for sharing data with other tools.

External table

Because the table is an external table, Hive does not think it has the data completely. Deleting the table will not delete the data, but the metadata information describing the table will be deleted.

Manage usage scenarios for tables and external tables

The collected website logs are regularly flowed into HDFS text files every day. Do a lot of statistical analysis based on the external table (original log table). The intermediate table and result table used are stored in the internal table, and the data is entered into the internal table through SELECT+INSERT.

The difference between an external table and an internal table is that when the table is deleted, the internal table will delete the data on HDFS, while the external table will not. External tables are more secure than internal tables.

Manage the conversion between tables and external tables

// Modify the internal table student2 to an external table
alter table student2 set tblproperties('EXTERNAL'='TRUE');

// Modify the external table student2 to the internal table
alter table student2 set tblproperties('EXTERNAL'='FALSE');

// Type of query table
desc formatted student2;

Note: ('EXTERNAL '='TRUE') and ('EXTERNAL '='FALSE') are fixed and case sensitive!

Modify table

rename table

ALTER TABLE table_name RENAME TO new_table_name
hive (default)> alter table dept_partition2 rename to dept_partition3;

Add / modify / replace column information

// Update column
ALTER TABLE table_name 
CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENT col_comment] [FIRST|AFTER column_name]

// Add and replace columns
ALTER TABLE table_name 
ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...)

Note: ADD means to ADD a new field, the field position is after all columns (before the partition column), and REPLACE means to REPLACE all fields in the table.

You can query the table structure through desc tablename

The modification of the table is only the operation of metadata, not the actual operation of files on HDFS.

Delete table

hive (default)> drop table dept;

Keywords: hive

Added by akillez on Wed, 22 Dec 2021 20:39:21 +0200