DDL data definition
Create database
CREATE DATABASE [IF NOT EXISTS] database_name [COMMENT database_comment] // notes [LOCATION hdfs_path] // The path where the library directory is stored [WITH DBPROPERTIES (property_name=property_value, ...)]; // Other information
1) Create a database. The default storage path of the database on HDFS is / user / hive / warehouse / * db.
hive (default)> create database db_hive;
2) To avoid existing errors in the database to be created, add the if not exists judgment. (standard writing)
hive (default)> create database db_hive; FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Database db_hive already exists hive (default)> create database if not exists db_hive;
3) Create a database and specify the location where the database is stored on HDFS
hive (default)> create database db_hive2 location '/db_hive2.db';
query data base
show database
1) Display database
hive> show databases;
2) Filter the database that displays the query
hive (default)> show databases like 'db_hive*'; OK database_name db_hive2 Time taken: 0.086 seconds, Fetched: 1 row(s)
View database details
1) Display database information
hive (default)> desc database db_hive2; OK db_name comment location owner_name owner_type parameters db_hive2 hdfs://hadoop113:8020/db_hive2.db bd USER Time taken: 0.141 seconds, Fetched: 1 row(s)
2) Display database information
hive (default)> desc database extended db_hive2; OK db_name comment location owner_name owner_type parameters db_hive2 hdfs://hadoop113:8020/db_hive2.db bd USER Time taken: 0.141 seconds, Fetched: 1 row(s)
Switch current database
hive (default)> use db_hive2;
modify the database
Users can use the ALTER DATABASE command to set key value pair attribute values for DBPROPERTIES of a database to describe the attribute information of the database.
hive (default)> alter database db_hive set dbproperties('createtime'='20210830');
View the available desc database extended dB for the modified database information_ hive;
Delete database
1) Delete empty database
hive>drop database db_hive2;
2) If the deleted database does not exist, it is best to use if exists to judge whether the database exists
hive> drop database db_hive; FAILED: SemanticException [Error 10072]: Database does not exist: db_hive hive> drop database if exists db_hive;
3) If the database is not empty, you can use the cascade command to forcibly delete it
hive> drop database db_hive; FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidOperationException(message:Database db_hive is not empty. One ormore tables exist.) hive> drop database db_hive cascade;
Create table
1) Table building syntax
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name [(col_name data_type [COMMENT col_comment], ...)] [COMMENT table_comment] [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] [ROW FORMAT row_format] [STORED AS file_format] [LOCATION hdfs_path] [TBLPROPERTIES (property_name=property_value, ...)] [AS select_statement]
2) Field explanation
(1) CREATE TABLE creates a table with the specified name. If a table with the same name already exists, an exception is thrown; The user can ignore this exception with the IF NOT EXISTS option.
(2)EXTERNAL keyword allows users to create an external table. While creating the table, you can specify a path (LOCATION) to the actual data. When deleting the table, the metadata and data of the internal table will be deleted together, while the external table only deletes the metadata and does not delete the data. There are internal tables and external tables in Hive.
(3)COMMENT: adds comments to tables and columns.
(4)PARTITIONED BY create partitioned table
(5)CLUSTERED BY create bucket table
(6)SORTED BY is not commonly used. One or more columns in the bucket are sorted separately
(7)ROW FORMAT, which defines the format of rows.
DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
| SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value,property_name=property_value, ...)]
When creating tables, users can customize SerDe or use the self-contained SerDe. If ROWFORMAT or row format deleted is not specified, the self-contained SerDe will be used. When creating a table, you also need to specify columns for the table. When you specify the columns of the table, you will also specify a custom SerDe. Hive determines the data of the specific columns of the table through SerDe.
Serde is short for Serialize/Deserilize. hive uses Serde to sequence and deserialize row objects.
(8)STORED AS specifies the storage file type
Common storage file types: SEQUENCEFILE (binary sequence file), TEXTFILE (text), RCFILE (column storage format file). If the file data is plain text, STORED AS TEXTFILE can be used. If the data needs to be compressed, use stored as sequence file.
(9)LOCATION: Specifies the storage location of the table on HDFS.
(10) Additional properties of the tblproperties table.
(11)AS: followed by a query statement, create a table according to the query results. create table if not exists student2 as select id, name from student;
(12)LIKE allows users to copy existing table structures without copying data. create table if not exists student3 like student;
You can view whether the table is external or internal through the TBLS table
When creating a table, for multiple fields, it is best to execute the separator to facilitate the download and view of the source data and directly put the file to the specified directory.
Management table (internal table)
The tables created by default are so-called management tables, sometimes referred to as internal tables. Because of this table, hive controls (more or less) the life cycle of the data. Hive stores the data of these tables in hive by default metastore. warehouse. Dir (for example, / user/hive/warehouse).
When we delete a management table, Hive will also delete the data in the table. Management tables are not suitable for sharing data with other tools.
External table
Because the table is an external table, Hive does not think it has the data completely. Deleting the table will not delete the data, but the metadata information describing the table will be deleted.
Manage usage scenarios for tables and external tables
The collected website logs are regularly flowed into HDFS text files every day. Do a lot of statistical analysis based on the external table (original log table). The intermediate table and result table used are stored in the internal table, and the data is entered into the internal table through SELECT+INSERT.
The difference between an external table and an internal table is that when the table is deleted, the internal table will delete the data on HDFS, while the external table will not. External tables are more secure than internal tables.
Manage the conversion between tables and external tables
// Modify the internal table student2 to an external table alter table student2 set tblproperties('EXTERNAL'='TRUE'); // Modify the external table student2 to the internal table alter table student2 set tblproperties('EXTERNAL'='FALSE'); // Type of query table desc formatted student2;
Note: ('EXTERNAL '='TRUE') and ('EXTERNAL '='FALSE') are fixed and case sensitive!
Modify table
rename table
ALTER TABLE table_name RENAME TO new_table_name
hive (default)> alter table dept_partition2 rename to dept_partition3;
Add / modify / replace column information
// Update column ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENT col_comment] [FIRST|AFTER column_name] // Add and replace columns ALTER TABLE table_name ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...)
Note: ADD means to ADD a new field, the field position is after all columns (before the partition column), and REPLACE means to REPLACE all fields in the table.
You can query the table structure through desc tablename
The modification of the table is only the operation of metadata, not the actual operation of files on HDFS.
Delete table
hive (default)> drop table dept;