Pre command
- Initialize metadata information
schematool -dbType mysql -initSchema
- Start hive and hivever2
nohup hive --service metastore 1>/mnt/metastore.log 2>&1 & nohup hive --service hiveserver2 1>/mnt/hiveserver2.log 2>&1 &
1, Connect hive
- beeline connect hive without permission
beeline -u "jdbc:hive2://192.168.0.100:10000"
- beeline connects to the hive of kerberos
//1. Authentication keytab file
kinit -kt hive.keytab hive/tdh1
//2. beeline connect hive
beeline -u "jdbc:hive2://192.168.0.100:10000/default;principal=hive/thd1@TDH"
- beeline connect to the hive of ldap
To use ldap for authentication, you need to add - n (name) and - p (password) parameters
beeline -u "jdbc:hive2://192.168.0.100:10000" -n hive -p 123456
- hive specifies the yarn queue
set mapred.job.queue.name=root.risk; set mapreduce.job.queuename=root.tj; set mapred.queue.names=queue3;
Old versions usually start with mapred
The new version starts with mapreduce: CDH-5.13-hive1.1
The parameters corresponding to the old version and the new version can be found
2, create
- Create the same table structure as the source table (except that the table format is different)
Create a table with the storage format of parquet type
create table student_parquet like student_txt stored as parquet;
Create a table with the storage format of parquet and the compression type of snappy
create table student_parquet like student_txt stored as parquet TBLPROPERTIES('parquet.compression'='SNAPPY');
Create a table whose storage format is orc type
create table student_orc like student_txt stored as orc;
Create a table with orc storage format and snappy compression type
create table student_orc like student_txt stored as orc TBLPROPERTIES ("orc.compress"="SNAPPY");
Create a table with the storage format of rc type
create table student_rc like student_txt stored as rcfile;
Create a table with a storage format of type sequence
create table student_seq like student_txt stored as sequencefile;
3, Partition table
- Create partition table
Here's a tip: generally, statements with "ed" at the end are used for creating tables, such as partitioned by, stored as, clustered by, etc.
create table <table_name> (name string,age int) partitioned by (year string);
- Create a multi partition table
create table <table_name> (name string,age int) partitioned by (year string, date string);
- View partition of hive table
show partitions <table_name>;
- Delete partition of hive table
ALTER TABLE student_txt DROP IF EXISTS PARTITION (day='2020');
- Add partition of hive table
ALTER TABLE student_txt add partition (day='2020');
- Query the data of hive partition table
select * from student_parquet where day=2021; ## day = partition field
4, Barrel table
- Create bucket table (bucket)
Clustered by < bucket splitting field >, there must be a bucket splitting field. hash the bucket splitting key and take the module
create table student_bucket_parquet (name string, age int) partitioned by (year string) clustered by (age) into 16 buckets stored as parquet;
5, Modify hive table properties
- Modify the location of hive table
alter table ods_lhzb_lhzb_xxgl_tszs_xlxx set location 'hdfs://inceptot1/user/hive/warehouse/ods_lhzb.db/admin/ods_lhzb_lhzb_xxgl_tszs_xlxx_test';
2. Modify the field separator in hive table
alter table test01 set serdeproperties('field.delim'='\t');
3. Modify serialization separator
alter table test01 set serdeproperties('serialization.format'='\t');
- Modify comments for table fields
alter table student CHANGE COLUMN name name int comment 'full name';
6, Import data
- Incremental import data
insert into student_score select stu.s_id,stu.s_name,sc.s_score from student stu join score sc on stu.s_id = sc.s_id;
- Overwrite imported data
insert overwrite table student_score select stu.s_id,stu.s_name,sc.s_score from student stu join score sc on stu.s_id = sc.s_id;
- Import data from local file system
LOAD DATA LOCAL INFILE 'D:/app/load_data_mysql/test.txt' INTO TABLE;
- Append import data from HDFS file system
LOAD DATA INFILE '/app/load_data_mysql/test.txt' INTO TABLE;
- Overwrite import data from HDFS file system
LOAD DATA INFILE '/app/load_data_mysql/test.txt' overwrite INTO TABLE;
- Multi table insert (multi insert mode)
Note: you cannot insert the same table, but you can insert it into different partition tables of the same table
from student_txt insert overwrite table student_parquet partition(day) select name , min(age), min(day) group by name insert into table student_parquet partition(day) select name , max(age), max(day) group by name;
7, desc
- View table information
desc <table_name>;
- View table structure details
You can view numFiles, totalSize and other information.
desc formatted <table_name>;
- Describes the attribute information of the database
desc database <database_name>;
8, explain
View the sql execution plan. explain is followed by the sql statement
explain select * from student_txt;
View extended information of execution plan
explain extended select * from student_txt;
View information about SQL data input dependencies
explain dependency select * from student_parquet;
See the related permission information involved in SQL operation
explain authorization select * from student_parquet;
9, analyze
- Collect statistics for tables
Table statistics generally include the number of files stored in the table (numFiles), the total file size (totalSize), the total number of rows (numRows), the number of partitions (numPartitions), and the amount of uncompressed data per row (rawDataSize).
analyze table <table_name> compute statistics;
10, lock
Check the lock table
show locks;
View the lock status of a table
show locks <table_name>;
Note: in cdh5.13, the test is not successful, which is consistent with the result of the show locks command. There are great gods who know can share.
Unlock
Unlock table
unlock table <table_name>;
Unlock a partition
unlock table <table_name> partition(dt='2014-04-01');
Note: table lock and partition lock are two different locks. Unlocking the table is invalid for the partition, and the partition needs to be unlocked separately
11, Authority management
View role permissions
show grant role <role_name>;
12, hive set common parameter summary
Set the number of reduce
set mapred.reduce.tasks = 20;
Turn on hive's dynamic partition mode
set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict;
Set the number and size of output files
The target size of each file after a merge job. Divide the total size of the output file of the previous job by this value to determine the number of reduce of the merge job. The map end of the merge job is equivalent to the identity map, and then shuffle to reduce. Each reduce dump is a file. In this way, the number and size of files are controlled
hive.merge.size.per.task / / the default is 256M
Merge small files at the end of the map reduce task
set hive.merge.mapredfiles=true
Set client size:
Taking the Map Task as an example, the Container is actually executing a script file, in which a Java sub process is executed. This sub process is the real Map Task. mapreduce.map.java.opts is actually the startup parameter passed to the virtual machine when starting the JVM virtual machine, and the default value - Xmx200m represents the maximum heap memory that can be used by the Java program, Once this size is exceeded, the JVM throws an Out of Memory exception and terminates the process. mapreduce.map.memory.mb sets the upper memory limit of the Container. This parameter is read and controlled by the NodeManager. When the memory size of the Container exceeds this parameter value, the NodeManager will be responsible for killing the Container. Later, when analyzing the yarn.nodemanager.vmem-pmem-ratio parameter, we will explain the process of NodeManager monitoring Container memory (including virtual memory and physical memory) and killing Container.
That is, mapreduce.map.java.opts must be smaller than mapreduce.map.memory.mb
mapreduce.reduce.java.opts is the same as mapreduce.map.java.opts.
mapreduce.map.java.opts
mapreduce.reduce.java.opts
set mapreduce.reduce.java.opts=-Xmx5000M -XX:MaxPermSize=1024m;
That's all for now!