Hive parameter configuration is used with functions and operators

Chapter I Hive parameter configuration

1.1 clips and commands client and commands

Hive CLI

$HIVE_HOME/bin/hive is a shellUtil, usually called hive's first generation client or old client. It has two main functions:

  • 1: It is used to run Hive queries in interactive or batch mode. Note that as a client, the Hive metastore service is required and accessible, not the hiveserver2 service.
  • 2: It is used to start hive related services, such as metastore service.

You can view command line options by running "hive -H" or "hive --help".

Batch Mode batch mode

When running $hive with the - e or - f option_ When home / bin / hive, it executes SQL commands in batch mode. The so-called batch processing can be understood as one-time execution and exit after execution.

#-e
$HIVE_HOME/bin/hive -e 'show databases'

#-f
cd ~
#Edit a sql Legal and correct information is written in the document sql sentence
vim hive.sql
show databases;
#Execute loading the file from the local disk of the machine where the client is located
$HIVE_HOME/bin/hive -f /root/hive.sql
#You can also load from other file systems sql File execution
$HIVE_HOME/bin/hive -f hdfs://<namenode>:<port>/hive-script.sql
$HIVE_HOME/bin/hive -f s3://mys3bucket/s3-script.sql
#Use silent mode to dump data from a query to a file
$HIVE_HOME/bin/hive -S -e 'select * from itheima.student' > a.txt

Interactive Shell interactive mode

The so-called interactive mode can be understood as that the client and hive service remain connected unless the client is manually exited.

/export/server/hive/bin/hive

hive> show databases;
OK
default
itcast
itheima
Time taken: 0.028 seconds, Fetched: 3 row(s)

hive> use itcast;
OK
Time taken: 0.027 seconds

hive> exit;

Start service and modify configuration

In the remote mode deployment mode, the hive metastore service needs to be manually started separately. At this time, the Hive CLI can be used to start related services, which is similar to the hiveserver2 service.

#--service
$HIVE_HOME/bin/hive --service metastore
$HIVE_HOME/bin/hive --service hiveserver2

#--hiveconf
$HIVE_HOME/bin/hive --hiveconf hive.root.logger=DEBUG,console

Beeline CLI

$HIVE_HOME/bin/beeline is called the second-generation client or new client. It is a JDBC client. It is a Hive command-line tool strongly recommended by the government. Compared with the first-generation client, it has enhanced performance and security. Beeline works in both embedded and remote modes.

In embedded mode, it runs embedded hive (similar to Hive CLI);

In the remote mode, beeline connects to a separate HiveServer2 service through Thrift, which is also the mode officially recommended for use in the production environment.

The common usage is as follows. Use beeline to connect remotely on the premise of starting the hiveserver2 service:

[root@node3 ~]# /export/server/hive/bin/beeline
Beeline version 3.1.2 by Apache Hive
beeline> ! connect jdbc:hive2://node1:10000
Connecting to jdbc:hive2://node1:10000
Enter username for jdbc:hive2://node1:10000: root
Enter password for jdbc:hive2://node1:10000:
Connected to: Apache Hive (version 3.1.2)
Driver: Hive JDBC (version 3.1.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://node1:10000>

beeline supports many parameters, which can be queried through official documents
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-Beeline%E2%80%93NewCommandLineShell

1.2 Configuration Properties

Overview of configuration properties

Hive is a complex data warehouse software. In addition to some default attribute behaviors, hive also supports the modification of user configuration attributes to meet the needs of users in some scenarios.
As users, we need to master two things:

  • First, what attributes of Hive can be modified and what functions can be modified;
  • Second, which method Hive supports to modify, and whether the modification takes effect temporarily or permanently.

The specification list of Hive configuration properties is in hiveconf Managed in the Java class, so please refer to the hiveconf Java file to get a complete list of configuration properties available in the distribution Hive is currently using. Starting from Hive 0.14.0, it will start from hiveconf Directly generate the configuration template file Hive default. Java class xml. Template, which is a reliable source of the current version configuration and its default values.

For a complete list of detailed configuration parameters, you can refer to the configuration parameters on Hive's official website and use ctrl+f to search on the page.

https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties

1.2.2 modify configuration attribute method

Method 1: hive site XML configuration file

At $Hive_ Under the home / conf path, you can add a Hive site XML file to add the configuration attributes that need to be defined and modified. This configuration file will affect any service startup and client use mode of the Hive installation package, which can be understood as the global configuration of Hive.
For example, if we specify MySQL as the storage medium of Hive metadata, we need to configure the related attributes of Hive connection to MySQL in Hive site In the XML file, the same metadata storage medium will be accessed whether it is started in local mode or remote mode, whether the client is connected locally or remotely, and the metadata used by everyone is the same.
<configuration>

    <!-- Store metadata mysql Related configuration -->
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value> jdbc:mysql://node1:3306/hive?createDatabaseIfNotExist=true&amp;useSSL=false&amp;useUnicode=true&amp;characterEncoding=UTF-8</value>
    </property>

    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>

    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
    </property>

    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>hadoop</value>
    </property>
</configuration>

Mode 2: hiveconf command line parameters

hiveconf is a command line parameter used to specify configuration parameters when using Hive CLI or Beeline CLI. The configuration in this way is valid in the whole session, and the session ends and becomes invalid.
For example, when starting the hive service, in order to better view the startup details, you can modify the log level through the hiveconf parameter:

$HIVE_HOME/bin/hive --hiveconf hive.root.logger=DEBUG,consol

Mode 3: set command

Use the set command in Hive CLI or Beeline to set configuration parameters for all SQL statements after the set command, which is also session level.

This method is also a parameter configuration method most used by users in daily development. Because Hive advocates an idea of who needs, who configures and who uses, so as to avoid your attribute modification affecting the modification of other users.

#To enable hive dynamic partition, you need to set two parameters in the hive session:
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

Method 4: server specific configuration file

You can set the configuration value of a specific metastore, hivemetastore site XML, and in the HiveServer2 specific configuration value HiveServer2-site XML.

The Hive Metastore server reads $hive_ CONF_ Hive site. Available in dir or classpath XML and hivemeta store site XML configuration file.

HiveServer2 reads $hive_ CONF_ Hive site. Available in dir or classpath XML and HiveServer2 - site xml.

If HiveServer2 uses meta storage in embedded mode, it will also load hivemeta store site xml.

Overview summary

The priority of configuration files is as follows, and the higher the priority is:

hive-site. xml-> hivemetastore-site. xml-> hiveserver2-site. XML - > '- hiveconf' command line parameter

Starting from Hive 0.14.0, it will start from hiveconf Directly generate the configuration template file hive default. Java class xml. Template, which is a reliable source of configuration variables and their default values in the current version.

hive-default.xml.template is located in the conf directory under the installation root directory, and hive site XML should also be created in the same directory.

Starting with Hive 0.14.0, you can use the SHOW CONF command to display information about configuration variables.

The priority order of the configuration mode, and the priority increases in turn:

set parameter > hiveconf command line parameter > hive site XML configuration file.

That is, the set parameter declaration overrides the command line parameter hiveconf, and the command line parameter overrides the configuration file hive site XML settings.

In daily development and use, if it is not the core parameter attribute that needs to be modified globally, it is recommended that you use the set command to set it.

In addition, Hive will also read the Hadoop configuration, because Hive is started as a Hadoop client, and the Hive configuration will overwrite the Hadoop configuration.


Added by hukadeeze on Wed, 12 Jan 2022 11:53:00 +0200