catalogue
1, DDL(Data Definition Language)
Management table (internal table)
Insert data into a table through a query statement (insert)
Create tables and load data in query statements (As Select)
When creating a table, specify the load data path through Location
Import data into the specified Hive table
Hadoop command export to local
1, DDL(Data Definition Language)
1. Create database
CREATE DATABASE [IF NOT EXISTS] database_name --[IF NOT EXISTS] Enhance code robustness [COMMENT database_comment] --Comments for Library [LOCATION hdfs_path] -- Specify the corresponding hdfs route [WITH DBPROPERTIES (property_name=property_value, ...)]; --Properties of the library
Create Library
//The default storage path of the database on HDFS is / user / hive / warehouse / * db hive (default)> create database db_hive; hive (default)> create database if not exists db_hive; //standard notation hive (default)> create database db_hive2 location '/db_hive2';//Specify where to store on hdfs
2. Query database
hive> show databases; //query data base hive> show databases like 'db_hive*'; //Query the specified database hive> desc database db_hive; //Display database information hive> desc database extended db_hive; //Show database details hive (default)> use db_hive; //Switch database
3. Modify database
You can only use the ALTER DATABASE command to set key value pair attribute values for DBPROPERTIES of a database to describe the attribute information of the database. Other metadata information of the database cannot be changed, including the database name and the directory location where the database is located.
hive (default)> alter database db_hive set dbproperties('createtime'='20170830'); //Set some key value pair properties
4. Delete database
hive>drop database db_hive2; //Delete empty database hive> drop database if exists db_hive2; //standard notation hive> drop database db_hive cascade; //Force database deletion
5. Create table
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name --external If you do not create a table, it is an internal table --In addition, the table created later is an external table [(col_name data_type [COMMENT col_comment], ...)] --Comments for column name type columns [COMMENT table_comment] --Comments for table [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] --Create table as partitioned table [CLUSTERED BY (col_name, col_name, ...) --Create a bucket table [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] --The sorting field in the bucket is divided into several buckets [ROW FORMAT DELIMITED] --Data corresponding format of table [FIELDS TERMINATED BY char] --Field separator ascii 1 Sign position ^A ctrl+v ctrl+a [COLLECTION ITEMS TERMINATED BY char] --Collection separator ascii 2 Sign position ^B ctrl+v ctrl+b [MAP KEYS TERMINATED BY char] --map of kv Separator ascii 3 Sign position ^C ctrl+v ctrl+c [LINES TERMINATED BY char] ] --Row separator default'\n' The above default values are very difficult to use [STORED AS file_format] --Format of data storage corresponding to the table TEXTFILE .txt orc [LOCATION hdfs_path] --Corresponding to table hdfs route [TBLPROPERTIES (property_name=property_value, ...)] --The attributes corresponding to the table are very useful [AS select_statement] --Create a table based on the query results. All the structures and data of the table are without separators [LIKE table_name] --Imitate the structure of a table without data but with delimiters
Management table (internal table)
Introduction: all the tables created by default are so-called management tables, sometimes referred to as internal tables. Because of these tables, Hive controls (more or less) the life cycle of the data. By default, Hive stores the data of these tables in a subdirectory of the directory defined by the configuration item Hive.metastore.warehouse.dir (for example, / user/hive/warehouse). When deleting a management table, Hive will also delete the data in the table. Management tables are not suitable for sharing data with other tools.
create table if not exists student( id int, name string ) row format delimited fields terminated by '\t' location '/user/hive/warehouse/student'; //Normal create table create table if not exists student as select id, name from student;//Create tables based on query results create table if not exists student3 like student; //Create a table based on an existing table structure desc formatted student2; //View table details drop table student2; //Delete table
External table
Introduction: because the table is an external table, Hive does not think it has this data completely. Deleting the table will not delete the data, but the metadata information describing the table will be deleted.
create external table if not exists dept( deptno int, dname string, loc int ) row format delimited fields terminated by '\t' location '/company/dept'; //Create an external table
Conversion between two tables
alter table student set tblproperties('EXTERNAL'='TRUE'); //Modify the internal table student to an external table alter table student set tblproperties('EXTERNAL'='FALSE'); //Modify external table student to internal table be careful:('EXTERNAL'='TRUE')and('EXTERNAL'='FALSE')It is fixed and case sensitive
Modify table
ALTER TABLE table_name RENAME TO new_table_name //rename alter table dept add columns(deptdesc string); //Add column in dept table alter table dept change column deptdesc desc string; //Update column alter table dept replace columns(deptno string, dname string, loc string); //Replace column
Delete table
drop table dept;
Clear table
truncate table student; //Truncate can only delete management tables, but cannot delete data in external tables
2, DML data operation
1. Data import
Load data into a table (load)
load data [local] inpath 'Data path' [overwrite] into table student [partition (partcol1=val1,...)]; (1)load data:Indicates loading data (2)local:Indicates loading data from local to hive Table; Otherwise from HDFS Load data to hive surface (3)inpath:Indicates the path to load data (4)overwrite:It means to overwrite the existing data in the table, otherwise it means to append (5)into table:Indicates which table to load (6)student:Represents a specific table (7)partition:Indicates uploading to the specified partition
example
create table student(id int, name string) row format delimited fields terminated by '\t';//Create a table load data local inpath '/opt/module/hive/datas/student.txt' into table student;//Load local files to hive dfs -put /opt/module/hive/datas/student.txt /user/student.txt //Upload files to HDFS load data inpath '/user/student.txt' into table student; //Load the HDFS file to Hive, and the file disappears after uploading load data inpath '/user/student.txt' overwrite into table student; //Loading data overwrites the existing data in the table
Insert data into a table through a query statement (insert)
insert into table student values(1,'ll'),(2,'ss'); //Basic mode insert data insert overwrite table student select id, name from student where id < 5; //Insert into: insert into a table or partition by appending data. The original data will not be deleted //insert overwrite: it will overwrite the existing data in the table //insert does not support inserting some fields, and when followed by a select statement, it should be distinguished from as select
Create tables and load data in query statements (As Select)
create table if not exists student3 as select id, name from student; //Create a table based on the results of the query
When creating a table, specify the load data path through Location
create external table if not exists student( id int, name string ) row format delimited fields terminated by '\t' location '/student';
Import data into the specified Hive table
import table student from '/user/hive/warehouse/export/student'; //Export first, and then import the data. Moreover, because the data exported by export contains metadata, the table to be imported by import cannot exist, otherwise an error will be reported.
2. Data export
//Export query results to local insert overwrite local directory '/opt/module/hive/datas/export/student' select * from student; //Format and export query results to local insert overwrite local directory '/opt/module/hive/datas/export/student' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' select * from student; //Export the query results to HDFS (no local) insert overwrite directory '/user/student' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' select * from student; Note: because it is overwrite,Therefore, the export path must be written specifically
Hadoop command export to local
dfs -get /user/hive/warehouse/student/student.txt /opt/module/hive/datas/export/student.txt;
Hive Shell command export
hive -e 'select * from default.student;' > /opt/module/hive/datas/export/student4.txt;
Export to HDFS
export table default.student to '/user/hive/warehouse/export/student'; //export and import are mainly used for Hive table migration between two Hadoop platform clusters, and cannot be exported directly