[hive] record the hive commands used in work

Pre command

  1. Initialize metadata information
schematool -dbType mysql -initSchema
  1. Start hive and hivever2
nohup hive --service metastore  1>/mnt/metastore.log 2>&1 &
nohup hive --service hiveserver2 1>/mnt/hiveserver2.log 2>&1 &

1, Connect hive

  1. beeline connect hive without permission
beeline -u "jdbc:hive2://192.168.0.100:10000"
  1. beeline connects to the hive of kerberos

//1. Authentication keytab file

kinit -kt hive.keytab hive/tdh1

//2. beeline connect hive

beeline -u "jdbc:hive2://192.168.0.100:10000/default;principal=hive/thd1@TDH"
  1. beeline connect to the hive of ldap

To use ldap for authentication, you need to add - n (name) and - p (password) parameters

beeline -u "jdbc:hive2://192.168.0.100:10000" -n hive -p 123456
  1. hive specifies the yarn queue
set mapred.job.queue.name=root.risk;
set mapreduce.job.queuename=root.tj;
set mapred.queue.names=queue3;

Old versions usually start with mapred
The new version starts with mapreduce: CDH-5.13-hive1.1
The parameters corresponding to the old version and the new version can be found

2, create

  1. Create the same table structure as the source table (except that the table format is different)

Create a table with the storage format of parquet type

create table student_parquet like student_txt stored as parquet;

Create a table with the storage format of parquet and the compression type of snappy

create table student_parquet like student_txt stored as parquet TBLPROPERTIES('parquet.compression'='SNAPPY');

Create a table whose storage format is orc type

create table student_orc like student_txt stored as orc;

Create a table with orc storage format and snappy compression type

create table student_orc like student_txt stored as orc TBLPROPERTIES ("orc.compress"="SNAPPY");

Create a table with the storage format of rc type

create table student_rc like student_txt stored as rcfile;

Create a table with a storage format of type sequence

create table student_seq like student_txt stored as sequencefile;

3, Partition table

  1. Create partition table

Here's a tip: generally, statements with "ed" at the end are used for creating tables, such as partitioned by, stored as, clustered by, etc.

create table <table_name> (name string,age int) partitioned by (year string);
  1. Create a multi partition table
create table <table_name> (name string,age int) partitioned by (year string, date string);
  1. View partition of hive table
show partitions <table_name>;
  1. Delete partition of hive table
ALTER TABLE student_txt DROP IF EXISTS PARTITION (day='2020');
  1. Add partition of hive table
ALTER TABLE student_txt add partition (day='2020');
  1. Query the data of hive partition table
select * from student_parquet where day=2021;
##  day = partition field

4, Barrel table

  1. Create bucket table (bucket)

Clustered by < bucket splitting field >, there must be a bucket splitting field. hash the bucket splitting key and take the module

create table student_bucket_parquet (name string, age int) partitioned by (year string) clustered by (age) into 16 buckets stored as parquet;

5, Modify hive table properties

  1. Modify the location of hive table
alter table ods_lhzb_lhzb_xxgl_tszs_xlxx set location 'hdfs://inceptot1/user/hive/warehouse/ods_lhzb.db/admin/ods_lhzb_lhzb_xxgl_tszs_xlxx_test';

2. Modify the field separator in hive table

alter table test01 set serdeproperties('field.delim'='\t');

3. Modify serialization separator

alter table test01 set serdeproperties('serialization.format'='\t');
  1. Modify comments for table fields
alter table student CHANGE COLUMN name name int comment 'full name';

6, Import data

  1. Incremental import data
insert into student_score select stu.s_id,stu.s_name,sc.s_score from student stu join score sc on stu.s_id = sc.s_id;
  1. Overwrite imported data
insert overwrite table student_score select stu.s_id,stu.s_name,sc.s_score from student stu join score sc on stu.s_id = sc.s_id;
  1. Import data from local file system
LOAD DATA LOCAL INFILE 'D:/app/load_data_mysql/test.txt' INTO TABLE;
  1. Append import data from HDFS file system
LOAD DATA INFILE '/app/load_data_mysql/test.txt' INTO TABLE;
  1. Overwrite import data from HDFS file system
LOAD DATA INFILE '/app/load_data_mysql/test.txt' overwrite INTO TABLE;
  1. Multi table insert (multi insert mode)
    Note: you cannot insert the same table, but you can insert it into different partition tables of the same table
from student_txt 
insert overwrite table student_parquet partition(day) 
select name , min(age), min(day) group by name
insert into table student_parquet partition(day) 
select name , max(age), max(day) group by name;

7, desc

  1. View table information
desc <table_name>;
  1. View table structure details
    You can view numFiles, totalSize and other information.
desc formatted <table_name>;
  1. Describes the attribute information of the database
desc database <database_name>;

8, explain

View the sql execution plan. explain is followed by the sql statement

explain select * from student_txt;

View extended information of execution plan

explain extended select * from student_txt;

View information about SQL data input dependencies

explain dependency select * from student_parquet;

See the related permission information involved in SQL operation

explain authorization select * from student_parquet;

9, analyze

  1. Collect statistics for tables
    Table statistics generally include the number of files stored in the table (numFiles), the total file size (totalSize), the total number of rows (numRows), the number of partitions (numPartitions), and the amount of uncompressed data per row (rawDataSize).
analyze table <table_name> compute statistics;

10, lock

Check the lock table

show locks;

View the lock status of a table

show locks <table_name>;

Note: in cdh5.13, the test is not successful, which is consistent with the result of the show locks command. There are great gods who know can share.

Unlock

Unlock table

unlock table <table_name>;

Unlock a partition

unlock table <table_name> partition(dt='2014-04-01'); 

Note: table lock and partition lock are two different locks. Unlocking the table is invalid for the partition, and the partition needs to be unlocked separately

11, Authority management

View role permissions

show grant role <role_name>;

12, hive set common parameter summary

Set the number of reduce

set mapred.reduce.tasks = 20;

Turn on hive's dynamic partition mode

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

Set the number and size of output files
The target size of each file after a merge job. Divide the total size of the output file of the previous job by this value to determine the number of reduce of the merge job. The map end of the merge job is equivalent to the identity map, and then shuffle to reduce. Each reduce dump is a file. In this way, the number and size of files are controlled

hive.merge.size.per.task / / the default is 256M
Merge small files at the end of the map reduce task
set hive.merge.mapredfiles=true
Set client size:
Taking the Map Task as an example, the Container is actually executing a script file, in which a Java sub process is executed. This sub process is the real Map Task. mapreduce.map.java.opts is actually the startup parameter passed to the virtual machine when starting the JVM virtual machine, and the default value - Xmx200m represents the maximum heap memory that can be used by the Java program, Once this size is exceeded, the JVM throws an Out of Memory exception and terminates the process. mapreduce.map.memory.mb sets the upper memory limit of the Container. This parameter is read and controlled by the NodeManager. When the memory size of the Container exceeds this parameter value, the NodeManager will be responsible for killing the Container. Later, when analyzing the yarn.nodemanager.vmem-pmem-ratio parameter, we will explain the process of NodeManager monitoring Container memory (including virtual memory and physical memory) and killing Container.
That is, mapreduce.map.java.opts must be smaller than mapreduce.map.memory.mb
mapreduce.reduce.java.opts is the same as mapreduce.map.java.opts.

mapreduce.map.java.opts
mapreduce.reduce.java.opts
set mapreduce.reduce.java.opts=-Xmx5000M -XX:MaxPermSize=1024m;
That's all for now!

Keywords: PHP hive SQLite

Added by Ellypsys on Tue, 30 Nov 2021 13:30:36 +0200