DDL and DML data operations of Hive

catalogue

1, DDL(Data Definition Language)

1. Create database

2. Query database

3. Modify database

4. Delete database

5. Create table

Management table (internal table)

External table

Conversion between two tables

Modify table

Delete table

Clear table

2, DML data operation

1. Data import

Load data into a table (load)

Insert data into a table through a query statement (insert)

Create tables and load data in query statements (As Select)

When creating a table, specify the load data path through Location

Import data into the specified Hive table

2. Data export

Hadoop command export to local

Hive Shell command export

Export to HDFS

1, DDL(Data Definition Language)

1. Create database

CREATE DATABASE [IF NOT EXISTS] database_name     --[IF NOT EXISTS] Enhance code robustness
[COMMENT database_comment]                        --Comments for Library
[LOCATION hdfs_path]                              -- Specify the corresponding hdfs route
[WITH DBPROPERTIES (property_name=property_value, ...)]; --Properties of the library

Create Library

//The default storage path of the database on HDFS is / user / hive / warehouse / * db
hive (default)> create database db_hive; 
hive (default)> create database if not exists db_hive; //standard notation 
hive (default)> create database db_hive2 location '/db_hive2';//Specify where to store on hdfs

2. Query database

hive> show databases; //query data base
hive> show databases like 'db_hive*'; //Query the specified database
hive> desc database db_hive; //Display database information
hive> desc database extended db_hive; //Show database details
hive (default)> use db_hive; //Switch database

3. Modify database

You can only use the ALTER DATABASE command to set key value pair attribute values for DBPROPERTIES of a database to describe the attribute information of the database. Other metadata information of the database cannot be changed, including the database name and the directory location where the database is located.

hive (default)> alter database db_hive set dbproperties('createtime'='20170830');
//Set some key value pair properties

4. Delete database

hive>drop database db_hive2; //Delete empty database
hive> drop database if exists db_hive2; //standard notation 
hive> drop database db_hive cascade; //Force database deletion

5. Create table

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name   --external If you do not create a table, it is an internal table
                                                              --In addition, the table created later is an external table
[(col_name data_type [COMMENT col_comment], ...)]    --Comments for column name type columns
[COMMENT table_comment]                              --Comments for table
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]  --Create table as partitioned table
[CLUSTERED BY (col_name, col_name, ...)                           --Create a bucket table
[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]  --The sorting field in the bucket is divided into several buckets
[ROW FORMAT DELIMITED]                    --Data corresponding format of table
 [FIELDS TERMINATED BY char]              --Field separator  ascii 1 Sign position ^A  ctrl+v ctrl+a
 [COLLECTION ITEMS TERMINATED BY char]    --Collection separator  ascii 2 Sign position ^B  ctrl+v ctrl+b
 [MAP KEYS TERMINATED BY char]            --map of kv Separator ascii 3 Sign position ^C  ctrl+v ctrl+c
 [LINES TERMINATED BY char] ]              --Row separator default'\n'
 The above default values are very difficult to use
[STORED AS file_format]   --Format of data storage corresponding to the table  TEXTFILE .txt orc 
[LOCATION hdfs_path]      --Corresponding to table hdfs route
[TBLPROPERTIES (property_name=property_value, ...)]  --The attributes corresponding to the table are very useful
[AS select_statement]     --Create a table based on the query results. All the structures and data of the table are without separators
[LIKE table_name]         --Imitate the structure of a table without data but with delimiters

Management table (internal table)

Introduction: all the tables created by default are so-called management tables, sometimes referred to as internal tables. Because of these tables, Hive controls (more or less) the life cycle of the data. By default, Hive stores the data of these tables in a subdirectory of the directory defined by the configuration item Hive.metastore.warehouse.dir (for example, / user/hive/warehouse). When deleting a management table, Hive will also delete the data in the table. Management tables are not suitable for sharing data with other tools.

create table if not exists student(
id int, name string
)
row format delimited fields terminated by '\t'
location '/user/hive/warehouse/student'; //Normal create table
create table if not exists student as select id, name from student;//Create tables based on query results
create table if not exists student3 like student; //Create a table based on an existing table structure
desc formatted student2; //View table details
drop table student2; //Delete table

External table

Introduction: because the table is an external table, Hive does not think it has this data completely. Deleting the table will not delete the data, but the metadata information describing the table will be deleted.

create external table if not exists dept(
deptno int,
dname string,
loc int
)
row format delimited fields terminated by '\t'
location '/company/dept'; //Create an external table

Conversion between two tables

alter table student set tblproperties('EXTERNAL'='TRUE'); //Modify the internal table student to an external table
alter table student set tblproperties('EXTERNAL'='FALSE'); //Modify external table student to internal table
 be careful:('EXTERNAL'='TRUE')and('EXTERNAL'='FALSE')It is fixed and case sensitive

Modify table

ALTER TABLE table_name RENAME TO new_table_name //rename
alter table dept add columns(deptdesc string); //Add column in dept table
alter table dept change column deptdesc desc string; //Update column
alter table dept replace columns(deptno string, dname
 string, loc string); //Replace column

Delete table

drop table dept;

Clear table

truncate table student; //Truncate can only delete management tables, but cannot delete data in external tables

2, DML data operation

1. Data import

Load data into a table (load)

load data [local] inpath 'Data path' [overwrite] into table student [partition (partcol1=val1,...)];
(1)load data:Indicates loading data
(2)local:Indicates loading data from local to hive Table; Otherwise from HDFS Load data to hive surface
(3)inpath:Indicates the path to load data
(4)overwrite:It means to overwrite the existing data in the table, otherwise it means to append
(5)into table:Indicates which table to load
(6)student:Represents a specific table
(7)partition:Indicates uploading to the specified partition

example

create table student(id int, name string) row format delimited fields terminated by '\t';//Create a table
load data local inpath '/opt/module/hive/datas/student.txt' into table student;//Load local files to hive
dfs -put /opt/module/hive/datas/student.txt /user/student.txt //Upload files to HDFS
load data inpath '/user/student.txt' into table student; //Load the HDFS file to Hive, and the file disappears after uploading
load data inpath '/user/student.txt' overwrite into table student; //Loading data overwrites the existing data in the table

Insert data into a table through a query statement (insert)

insert into table  student values(1,'ll'),(2,'ss'); //Basic mode insert data
insert overwrite table student select id, name from student where id < 5;
//Insert into: insert into a table or partition by appending data. The original data will not be deleted
//insert overwrite: it will overwrite the existing data in the table
//insert does not support inserting some fields, and when followed by a select statement, it should be distinguished from as select

Create tables and load data in query statements (As Select)

create table if not exists student3 as select id, name from student; //Create a table based on the results of the query

When creating a table, specify the load data path through Location

create external table if not exists student(
              id int, name string
              )
              row format delimited fields terminated by '\t'
              location '/student';

Import data into the specified Hive table

import table student  from '/user/hive/warehouse/export/student';
//Export first, and then import the data. Moreover, because the data exported by export contains metadata, the table to be imported by import cannot exist, otherwise an error will be reported.

2. Data export

//Export query results to local
insert overwrite local directory '/opt/module/hive/datas/export/student'
select * from student;
//Format and export query results to local
insert overwrite local directory '/opt/module/hive/datas/export/student'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'            
select * from student;
//Export the query results to HDFS (no local)
insert overwrite directory '/user/student'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 
select * from student;
Note: because it is overwrite,Therefore, the export path must be written specifically

Hadoop command export to local

dfs -get /user/hive/warehouse/student/student.txt
/opt/module/hive/datas/export/student.txt;

Hive Shell command export

hive -e 'select * from default.student;' > /opt/module/hive/datas/export/student4.txt;

Export to HDFS

export table default.student to '/user/hive/warehouse/export/student';
//export and import are mainly used for Hive table migration between two Hadoop platform clusters, and cannot be exported directly

Keywords: Database hive Data Warehouse

Added by jtapoling on Thu, 23 Dec 2021 19:57:52 +0200