Installation and Configuration of Hive

In order to explore the mystery and greatness of Hive, we embarked on the road of learning Hive, the good and bad of this tool, let alone install Hive first...

We use MySQL to store Hive's metadata Metastore, so install MySQL first. The specific installation and configuration steps are as follows:

My whole operation process is divided into seven parts:

1. Install MySQL

2. Install Hive

3. Configure Hive metadata Metastore to MySQL

4.Hadoop cluster configuration

5.Hive Data Warehouse Location Configuration

6. Information display configuration after query (match or not match according to your preference)

7.Hive log file allocation

----------------------------------------------------- OK, start my long talk - ------------------------------------------------------------------------------------------------------------------------------------

>>>>>>> 1. Install MySQL

step 1: Check whether mysql is installed or not. If it is installed, uninstall the original mysql

# yum list installed | grep mysql - View

# yum-y remove mysql-libs.x86-64 - - - uninstall

step 1: Download the compressed package and install it

# rpm-Uvh http://repo.mysql.com/mysql-community-release-el6-5.noarch.rpm (download website)

#yum install mysql-community-server -y

perhaps

# yum -y install mysql mysql-server mysql-devel

# wget http://dev.mysql.com/get/mysql-community-release-el7-5.noarch.rpm

# rpm -ivh mysql-community-release-el7-5.noarch.rpm

# yum -y install mysql-community-server

Step 3: Open mysql

#service mysqld start

Step 4: Set the root user login password

# mysqladmin -uroot password 'rootroot'

Step 5: Log on to mysql

# mysql -uroot -prootroot

Step 6: Set access permissions after login to enable machine access within the cluster

mysql> grant all privileges on *.* to root@'%' identified by 'rootrooot';

mysql>

Yes > exit; exit.

At this point, MySQL has been installed.

>>>>>>> 2. Installing Hive

Step 1: Download a lot of hive installer packages apache-hive-1.2.1-bin.tar.gz to upload the virtual machine (hadoop 011)

Alt +p enters sftp interface

sftp> put G:/Hive/apache-hive-1.2.1-bin.tar.gz

Then go to the home directory # cd ~to find the file and move the file package to the specified path

# mv apache-hive-1.2.1-bin.tar.gz /opt/soft

Step 2: Unzip the installation package to / opt/app/

# tar -zxvf apache-hive-1.2.1-bin.tar.gz -C /opt/app/

Enter / opt/app / Modify the file name

# cd /opt/app

#mv apache-hive-1.2.1-bin apache-hive-1.2.1

step 3: Configuration file hive-env.sh

Enter / opt/app/apache-hive-1.2.1/conf to find the file hive-env.sh.template and modify the name of the file

[root@hadoop011 conf]# mv hive-env.sh.template hive-env.sh

[root@hadoop011 conf]# vim hive-env.sh

Enter the file hive-env.sh, add HADOOP_HOME, HIVE_CONF_DIR path

export HADOOP_HOME=/opt/app/hadoop-2.7.2

export HIVE_CONF_DIR=/opt/app/apache-hive-1.2.1/conf

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Set Hive and Hadoop environment variables here. These variables can be used
# to control the execution of Hive. It should be used by admins to configure
# the Hive installation (so that users do not have to set environment variables
# or set command line parameters to get correct behavior).
#
# The hive service being invoked (CLI/HWI etc.) is available via the environment
# variable SERVICE
# Hive Client memory usage can be an issue if a large number of clients
# are running at the same time. The flags below have been useful in
# reducing memory usage:
#
# if [ "$SERVICE" = "cli" ]; then
#   if [ -z "$DEBUG" ]; then
#     export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParNewGC -XX:-UseGCOverheadLimit"
#   else
#     export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit"
#   fi
# fi
# The heap size of the jvm stared by hive shell script can be controlled via:
# export HADOOP_HEAPSIZE=1024
# Larger heap size may be required when running queries over large number of files or partitions.
# By default hive shell scripts use a heap size of 256 (MB).  Larger heap size would also be
# appropriate for hive server (hwi etc).
# Set HADOOP_HOME to point to a specific hadoop install directory
export HADOOP_HOME=/opt/app/hadoop-2.7.2
# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/opt/app/apache-hive-1.2.1/conf
# Folder containing extra ibraries required for hive compilation/execution can be controlled by:
# export HIVE_AUX_JARS_PATH=
"hive-env.sh" 54L, 2407C Written

At this point, you can start Hive under the directory / opt/app/apache-hive-1.2.1/conf, but our overall work has not been configured yet. Let's continue.

>>>>>>> 3. Hive metadata (Metastore) is configured to MySQL

step 1: Upload the downloaded driver files to the local machine

Alt+p:

sftp> put G:/Hive/mysql-connector-java-5.1.37-bin.jar

Uploading mysql-connector-java-5.1.37-bin.jar to /root/mysql-connector-java-5.1.37-bin.jar

  100% 962KB    962KB/s 00:00:00    

G:/Hive/mysql-connector-java-5.1.37-bin.jar: 985603 bytes transferred in 0 seconds (962 KB/s)

sftp>

Move (or copy) mysql-connector-java-5.1.37-bin.jar to / opt/app/apache-hive-1.2.1/lib/

[root@hadoop011 ~]# mv mysql-connector-java-5.1.37-bin.jar /opt/app/apache-hive-1.2.1/lib/

step 2: Configure Metastore to mySQL

Create hive-site.xml under / opt/app/apache-hive-1.2.1/conf/.

[root@hadoop011 ~]# cd /opt/app/apache-hive-1.2.1/conf

[root@hadoop011 conf]# touch hive-site.xml

Content will be configured( https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin ) Put it in the hive-site.xml file

[root@hadoop011 conf]# vim hive-site.xml

Configuration information:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
	<property>
	  <name>javax.jdo.option.ConnectionURL</name>
	  <value>jdbc:mysql://hadoop102:3306/metastore?createDatabaseIfNotExist=true</value>
	  <description>JDBC connect string for a JDBC metastore</description>
	</property>

	<property>
	  <name>javax.jdo.option.ConnectionDriverName</name>
	  <value>com.mysql.jdbc.Driver</value>
	  <description>Driver class name for a JDBC metastore</description>
	</property>

	<property>
	  <name>javax.jdo.option.ConnectionUserName</name>
	  <value>root</value>
	  <description>username to use against metastore database</description>
	</property>

	<property>
	  <name>javax.jdo.option.ConnectionPassword</name>
	  <value>000000</value>
	  <description>password to use against metastore database</description>
	</property>
</configuration>

Two of the contents are modified as follows:

a. My MySQL is on Hadoop 015

<value>jdbc:mysql://hadoop102:3306/metastore?createDatabaseIfNotExist=true</value>

-> <value>jdbc:mysql://hadoop015:3306/metastore?createDatabaseIfNotExist=true</value>

b. MySQL password has been set to'rootroot', so it needs to be changed here

<property>

        <name>javax.jdo.option.ConnectionPassword</name>

        <value>rootrooot</value>

        <description>password to use against metastore database</description>

</property>

step 3: Configuration is complete and view can be restarted

#reboot

#service mysqld start

#mysql -uroot -prootroot

mysql> show databases:

At this point, you can find the database metastore, indicating that the configuration was successful

Of course, cluster and Hive can also be started.

Do you think it's over, NONONO, you can continue with other configurations,,,,,,

>>>>>>> 4. Hadoop cluster configuration

step 1: Start the cluster

[root@hadoop011 ~]# start-dfs.sh

[root@hadoop012 ~]# start-yarn.sh

step 2: Create a directory on HDFS / user/hive/warehouse (you can set it up yourself) as a data warehouse for HIVE

[root@hadoop011 conf]# hadoop fs -mkdir -p /user/hive/warehouse

step 2: Modify permissions

[root@hadoop011 conf]# hadoop fs -chmod g+w /user/hive/warehouse

 

>>>>>>> 5. Data Warehouse Location Configuration

Modify the original location of default data warehouse (copy hive-default.xml.template under directory/opt/app/apache-hive-1.2.1/conf to the hive-site.xml file as follows).

<property>

<name>hive.metastore.warehouse.dir</name>

<value>/user/hive/warehouse</value>

<description>location of default database for the warehouse</description>

</property>

>>>>>>> 6. Configuration of Information Display after Query

In order to display the current database and the header information of the query table after the query, the following configuration information can be added to the hive-site.xml file:

<property>
	<name>hive.cli.print.header</name>
	<value>true</value>
</property>

<property>
	<name>hive.cli.print.current.db</name>
	<value>true</value>
</property>

>>>>>>> 7. Hive log file configuration

View log files:

#cd /opt/app/apache-hive-1.2.1/conf 

Discover the log file hive-log4j.properties.template and change the name to hive-log4j.properties.

[root@hadoop011 conf]# mv hive-log4j.properties.template hive-log4j.properties

[root@hadoop011 conf]# pwd

/opt/app/apache-hive-1.2.1/conf

Enter log file and modify LOG storage path

[root@hadoop011 conf]# vim hive-log4j.properties

Log file information:

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Define some default values that can be overridden by system properties
hive.log.threshold=ALL
hive.root.logger=INFO,DRFA
hive.log.dir=/opt/app/apache-hive-1.2.1/logs
hive.log.file=hive.log

Raw data: hive.log.dir=${java.io.tmpdir}/${user.name}

Modified path: hive.log.dir=/opt/app/apache-hive-1.2.1/logs

---------------------------------

OK, so far, the basic requirements have been configured. Of course, you can configure other information as needed, such as parameter configuration, and so on.

I think these are enough for my current study.

So, finally, start HIVE

# reboot

[root@hadoop011 ~]# start-dfs.sh

[root@hadoop012 ~]# start-yarn.sh

Enter the directory / opt/app/apache-hive-1.2.1/conf

[root@hadoop011 bin]# ./hive

----------------------------------------------------------------------------------------------------------------------------------------------------------

 

 

Keywords: hive MySQL Apache Hadoop

Added by alpachino on Sat, 07 Sep 2019 15:05:02 +0300