Big data Hive parameter configuration

1 clips and commands client and commands

1.1 Hive CLI

$HIVE_HOME/bin/hive is a shellUtil, usually called hive's first generation client or old client. It has two main functions:
1: It is used to run Hive queries in interactive or batch mode. Note that as a client, the Hive metastore service is required and accessible, not the hiveserver2 service.
2: It is used to start hive related services, such as metastore service.
You can view command line options by running "hive -H" or "hive --help".

-e <quoted-query-string>        Execute command line-e Specified after parameter sql Exit after running the statement.
-f <filename>                  Execute command line-f Specified after parameter sql Exit after running the file.
-H,--help                      print the help information
    --hiveconf <property=value>   Set parameters
-S,--silent                     silent mode 
-v,--verbose                   Detailed mode, will execute sql Echo to console
   --service service_name        start-up hive Related services

Those marked in red are important parameters.

1.1.1 Batch Mode

When running $hive with the - e or - f option_ When home / bin / hive, it executes SQL commands in batch mode. The so-called batch processing can be understood as one-time execution and exit after execution.

#-e
$HIVE_HOME/bin/hive -e 'show databases'

#-f
cd ~

#Edit an sql file and write legal and correct sql statements in it

vim hive.sql
show databases;
#Execute loading the file from the local disk of the machine where the client is located
$HIVE_HOME/bin/hive -f /root/hive.sql
#You can also load sql files from other file systems for execution
$HIVE_HOME/bin/hive -f hdfs://<namenode>:<port>/hive-script.sql
$HIVE_HOME/bin/hive -f s3://mys3bucket/s3-script.sql
#Use silent mode to dump data from a query to a file
$HIVE_HOME/bin/hive -S -e 'select * from itheima.student' > a.txt

1.1.2 Interactive Shell interactive mode

The so-called interactive mode can be understood as that the client and hive service remain connected unless the client is manually exited.

/export/server/hive/bin/hive

hive> show databases;
OK
default
itcast
itheima
Time taken: 0.028 seconds, Fetched: 3 row(s)

hive> use itcast;
OK
Time taken: 0.027 seconds

hive> exit;

1.1.3 start service and modify configuration

In the remote mode deployment mode, the hive metastore service needs to be manually started separately. At this time, the Hive CLI can be used to start related services, which is similar to the hiveserver2 service.

#--service
$HIVE_HOME/bin/hive --service metastore
$HIVE_HOME/bin/hive --service hiveserver2

#--hiveconf
$HIVE_HOME/bin/hive --hiveconf hive.root.logger=DEBUG,console

1.2 Beeline CLI

$HIVE_HOME/bin/beeline is called the second-generation client or new client. It is a JDBC client. It is a Hive command-line tool strongly recommended by the government. Compared with the first-generation client, it has enhanced performance and security. Beeline works in both embedded and remote modes.
In embedded mode, it runs embedded hive (similar to Hive CLI);
In the remote mode, beeline connects to a separate HiveServer2 service through Thrift, which is also the mode officially recommended for use in the production environment.
The common usage is as follows. Use beeline to connect remotely on the premise of starting the hiveserver2 service:

[root@node3 ~]# /export/server/hive/bin/beeline 
Beeline version 3.1.2 by Apache Hive
beeline> ! connect jdbc:hive2://node1:10000
Connecting to jdbc:hive2://node1:10000
Enter username for jdbc:hive2://node1:10000: root
Enter password for jdbc:hive2://node1:10000: 
Connected to: Apache Hive (version 3.1.2)
Driver: Hive JDBC (version 3.1.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://node1:10000>

beeline supports many parameters, which can be queried through official documents
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-Beeline%E2%80%93NewCommandLineShell

2 Configuration Properties

2.1 overview of configuration attributes

Hive is a complex data warehouse software. In addition to some default attribute behaviors, hive also supports the modification of user configuration attributes to meet the needs of users in some scenarios.
As users, we need to master two things:
First, what attributes of Hive can be modified and what functions can be modified;
Second, which method Hive supports to modify, and whether the modification takes effect temporarily or permanently.
The specification list of Hive configuration properties is managed in the hiveconf.java class. Therefore, please refer to the hiveconf.java file to obtain a complete list of configuration properties available in the distribution currently used by Hive. Starting from Hive 0.14.0, a configuration template file hive-default.xml.template will be generated directly from the hiveconf.java class, which is a reliable source of the current version configuration and its default values.
For a complete list of detailed configuration parameters, you can refer to the configuration parameters on Hive's official website and use ctrl+f to search on the page.
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties

2.2 modify configuration attribute method

2.2.1 method 1: hive-site.xml configuration file

At $Hive_ Under the home / conf path, you can add a hive-site.xml file to add the configuration attributes that need to be defined and modified. This configuration file will affect any service startup and client usage of the Hive installation package, which can be understood as the global configuration of Hive.
For example, if we specify MySQL as the storage medium of Hive metadata, we need to configure the relevant attributes of Hive connection to MySQL in hive-site.xml file. In this way, the same metadata storage medium will be accessed whether it is started in local mode or remote mode, and whether the client is connected locally or remotely. The metadata used by everyone is the same.

<configuration>
    <!-- Store metadata mysql Related configuration -->
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value> jdbc:mysql://node1:3306/hive?createDatabaseIfNotExist=true&amp;useSSL=false&amp;useUnicode=true&amp;characterEncoding=UTF-8</value>
    </property>

    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>

    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
    </property>

    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>hadoop</value>
    </property>
</configuration>

2.2.2 mode 2: hiveconf command line parameters

hiveconf is a command line parameter used to specify configuration parameters when using Hive CLI or Beeline CLI. The configuration in this way is valid in the whole session, and the session ends and becomes invalid.
For example, when starting the hive service, in order to better view the startup details, you can modify the log level through the hiveconf parameter:
$HIVE_HOME/bin/hive --hiveconf hive.root.logger=DEBUG,console

2.2.3 mode 3: set command

Use the set command in Hive CLI or Beeline to set configuration parameters for all SQL statements after the set command, which is also session level.
This method is also a parameter configuration method most used by users in daily development. Because Hive advocates an idea of who needs, who configures and who uses, so as to avoid your attribute modification affecting the modification of other users.
#To enable hive dynamic partition, you need to set two parameters in the hive session:

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

2.2.4 mode 4: server specific configuration file

You can set the configuration value of a specific metastore in hivemetastore-site.xml and in hiveserver2-site.xml.
The Hive Metastore server reads $hive_ CONF_ hive-site.xml and hivemetastore-site.xml configuration files available in dir or classpath.
HiveServer2 reads $hive_ CONF_ hive-site.xml and hiveserver2-site.xml available in dir or classpath.
If HiveServer2 uses meta storage in embedded mode, hivemetastore-site.xml will also be loaded.

2.2.5 overview summary

The priority of configuration files is as follows, and the higher the priority is:
Hive-site.xml - > hivemetastore-site.xml - > hiveserver2-site.xml - > '- hiveconf' command line parameters
Starting from Hive 0.14.0, a configuration template file hive-default.xml.template will be generated directly from the HiveConf.java class. It is a reliable source of configuration variables and their default values in the current version.
hive-default.xml.template is located in the conf directory under the installation root directory, and hive-site.xml should also be created in the same directory.
Starting with Hive 0.14.0, you can use the SHOW CONF command to display information about configuration variables.
The priority order of the configuration mode, and the priority increases in turn:
set parameter > hiveconf command line parameter > hive-site.xml configuration file.
That is, the set parameter declaration overrides the command line parameter hiveconf, and the command line parameter overrides the setting of the configuration file hive-site.xml.
In daily development and use, if it is not the core parameter attribute that needs to be modified globally, it is recommended that you use the set command to set it.
In addition, Hive will also read the Hadoop configuration, because Hive is started as a Hadoop client, and the Hive configuration will overwrite the Hadoop configuration.

Keywords: hive SQL

Added by Kodak07 on Tue, 21 Sep 2021 14:17:31 +0300

Programming VIP