What is hive?
- Open source by facebook, used to solve the data statistics of massive structured logs;
- A data warehouse tool based on hadoop uses HDFS to store and map structured data files into a table, and provides the function of sql like query. The bottom layer uses MR to calculate;
- The essence is to transform HQL into MR program.
Preparation
- Java 1.5 or above (my is jdk1.8)
- Hadoop 2.0 and above (mine is 2.8.4)
Installation process
# 1. Download and unzip the installation package cd /usr/local wget http://archive.apache.org/dist/hive/hive-1.2.1/apache-hive-1.2.1-bin.tar.gz tar -zxvf apache-hive-0.13.1-bin.tar.gz
2.Configure environment variables vi /etc/profile #Add content: #Hive export HIVE_HOME=/usr/local/hive-1.2.1 export PATH=$PATH:$HIVE_HOME/bin source /etc/profile
# 3. Configuration file # conf/hive-env.sh cd conf cp hive-env.sh.template hive-env.sh vi hive-env.sh #Add content: #Hadoop&&Hive HADOOP_HOME=/usr/local/hadoop-2.8.4 export HIVE_CONF_DIR=/usr/local/hive-1.2.1/conf
#4. Configure hive-site.xml [root@master conf]# cp hive-default.xml.template hive-default.xml [root@master conf]# vi hive-site.xml ##Add content: <?xml version="1.0" encoding="UTF-8" standalone="no"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value> <description>JDBC connect string for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> <description>username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>hive</value> <description>password to use against metastore database</description> </property> </configuration>
Install and configure mysql
Here, we use MySQL database to store Hive metadata, rather than the derby that comes with Hive.
1. For the installation of mysql under centos7, please refer to: centos7 install MySQL
##Install Mysql cd /usr/local wget http://dev.mysql.com/get/Downloads/MySQL-5.5/MySQL-5.5.48-1.linux2.6.x86_64.rpm-bundle.tar tar -zxvf MySQL-5.5.48-1.linux2.6.x86_64.rpm-bundle.tar yum install perl rpm -ivh MySQL-server-5.5.48-1.linux2.6.x86_64.rpm rpm -ivh MySQL-client-5.5.48-1.linux2.6.x86_64.rpm rpm -e [Original database] --nodeps service mysql start /usr/bin/mysql-secure-installation
Mysql installation is not successful. I'm replacing it with MariaDB. There is no problem starting Hive at present. I don't know if I will encounter problems in other aspects.
##Install mariaDB ##The MySQL image built in centos7 has given up Oracle's mysql, and instead uses the MySQL branch MariaDB to install MariaDB: yum install mariadb ##Then use the command systemctl start mariadb, and the prompt is as follows: Failed to start mariadb.service: Unit mariadb.service failed to load: No such file or directory ##The mariadb service could not be found. The reason why it cannot be found is that the installation of mariadb itself has not been completed. Execute the following command to check the dependency of mariadb: $ sudo yum search mariadb //Perform the following to install the missing dependency package: $ yum install mariadb-embedded mariadb-libs mariadb-bench mariadb mariadb-sever ##Then start mariadb. Normally, if you want to set the power on and self start mariadb, use the following command: $ systemctl enable mariadb ##Remember that when you install mysql using the yum install command, you need to add an additional wildcard "*", using the following command: yum install mariadb*
As for the relationship between mysql and mariaDB, it's still a story: Why did CentOS 7 abandon MySQL and use MariaDB instead?
2. Download mysql jdbc package, Download mysql-connector-java-5.1.46.tar.gz
cd /usr/local tar -zxvf mysql-connector-java-5.1.46.tar.gz cp mysql-connector-java-5.1.46/mysql-connector-java-5.1.46-bin.jar /usr/local/hive-1.2.1/lib
3. Start and log in mysql shell
mysql -uroot -p create database hive; grant all on *.* to hive@localhost identified by 'hive';
4. Create a new hive database:
mysql -uroot -p create database hive;
5. Configure mysql to allow hive access:
grant all on *.* to hive@localhost identified by 'hive'; flush privileges;
6. Start hive
start-dfs.sh start-yarn.sh hive
Before you start hive, start the hadoop cluster.
Hive is shown as follows:
Reference resources:
https://www.zhihu.com/question/41832866