CDH6.1. Upgrade Impala to version 3.4 to enable auto refresh metadata function and Its Solutions

At cdh6 Version 1 we try on cdh6 In version 1, Impala was upgraded and the function of automatically refreshing metadata was enabled. Some problems were encountered during this period. They were finally solved by checking the log, source code, Google and so on. Use this article to sort it out and give back to the community.

The main reference documents are:

[1]At cdh6 Upgrade Impala to Apache Impala 3.4 separately in 3

[2]0757-6.3.3 - how to configure impala to automatically synchronize HMS metadata

1.Impala reported that Hadoop LZO could not be found during compilation

cloudera has deleted the Hadoop LZO repository. Check the compiled script bin/bootstrap_system.sh found the following comment

#LZO is not needed to compile or run Impala, but it is needed for the data load

Since Hadoop LZO is not used in our environment, comment out the following script and run it again to complete the normal compilation

echo ">>> Checking out Impala-lzo"
: ${IMPALA_LZO_HOME:="${IMPALA_HOME}/../Impala-lzo"}
if ! [[ -d "$IMPALA_LZO_HOME" ]]
then
  git clone --branch master https://github.com/cloudera/impala-lzo.git "$IMPALA_LZO_HOME"
fi

echo ">>> Checking out and building hadoop-lzo"

: ${HADOOP_LZO_HOME:="${IMPALA_HOME}/../hadoop-lzo"}
if ! [[ -d "$HADOOP_LZO_HOME" ]]
then
  git clone https://github.com/cloudera/hadoop-lzo.git "$HADOOP_LZO_HOME"
fi
cd "$HADOOP_LZO_HOME"
time -p ant package
cd "$IMPALA_HOME"

2. When the database is created, the metadata cannot be refreshed automatically

2.1 problems found

At cdh6 Create a database under version 1, such as create database test_db; Then show databases in impala;

Discovery test_db was not refreshed into Impala Catalog. By searching the role log of Impala Catalog, the following exception logs were found:

Unexpected exception received while processing event
Java exception follows:
org.apache.impala.catalog.events.MetastoreNotificationException: EventId: 591869 EventType: CREATE_DATABASE Database object is null in the event. This could be a metastore configuration problem. Check if hive.metastore.notifications.add.thrift.objects is set to true in metastore configuration
    at org.apache.impala.catalog.events.MetastoreEvents$CreateDatabaseEvent.<init>(MetastoreEvents.java:1108)
    at org.apache.impala.catalog.events.MetastoreEvents$CreateDatabaseEvent.<init>(MetastoreEvents.java:1089)
    at org.apache.impala.catalog.events.MetastoreEvents$MetastoreEventFactory.get(MetastoreEvents.java:168)
    at org.apache.impala.catalog.events.MetastoreEvents$MetastoreEventFactory.getFilteredEvents(MetastoreEvents.java:205)
    at org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:601)
    at org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:513)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
    at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:191)
    at org.apache.impala.catalog.events.MetastoreEvents$CreateDatabaseEvent.<init>(MetastoreEvents.java:1106)
    ... 12 more

The important part is: CREATE_DATABASE Database object is null in the event. This could be a metastore configuration problem. Check if hive.metastore.notifications.add.thrift.objects is set to true in metastore configuration

However, we did configure this configuration and it didn't take effect. We found it in the notification of metastore_ The message of this event is found in the log

{"server":"","servicePrincipal":"","db":"test_db","timestamp":1622247221,"location":"hdfs://nameservice1/user/hive/warehouse/test_db.db","ownerType":"USER","ownerName":"admin"}

Since reference [2] is built under the environment of CDH 6.3.3, I built a single node cdh6 in the test environment 3.1 after getting up, the same is in the new cdh6 3.1 configure the function of automatically refreshing metadata in the environment, and find that the message of creating database is:

{"server":"","servicePrincipal":"","db":"test_db","dbJson":"{\"1\":{\"str\":\"davie_test\"},\"3\":{\"str\":\"hdfs://nameservice1/user/hive/warehouse/test_db.db\"},\"6\":{\"str\":\"admin\"},\"7\":{\"i32\":1},\"9\":{\"i32\":1622248258}}","timestamp":1622248259,"location":"hdfs://nameservice1/user/hive/warehouse/davie_test.db","ownerType":"USER","ownerName":"admin"};

A comparison between the two was found in cdh6 In Hive of 1.0, the field dbJson is missing in message

It is found in the. IDEA field

If you want to unlock the automatic refresh function of impala metadata, you can only upgrade Hive.

2.2} Hive upgrade to 2.1.1-cdh6 Version 3.1

2.2.1 compilation and packaging

The specific steps are to download the Cloudera Hive code, and then compile and package it

git clone --single-branch --branch cdh6.3.1-release https://github.com/cloudera/hive.git hive

mvn clean package -DskipTests -Pdist

During the compilation process, it is found that some Cloudera packages cannot be downloaded, and a new mirror needs to be added

<repository>
  <id>nexus-aliyun</id>
  <url>http://maven.aliyun.com/nexus/content/groups/public</url>
  <name>nexus-aliyun</name>
  <snapshots>
    <enabled>false</enabled>
  </snapshots>
</repository>
<repository>
  <id>cloudera-repos</id>
  <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
  <name>CDH Releases Repository</name>
  <snapshots>
    <enabled>false</enabled>
  </snapshots>
</repository>

In the process of Maven downloading dependencies, it is found that Hbase relies on javax El can't get down all the time. Report the following errors

Could not find artifact org.glassfish:javax.el:pom:3.0.1-b06-SNAPSHOT

In POM Add javax. XML file El's dependency, specify the version, and rerun the maven command

<glassfish.el.version>3.0.1-b06</glassfish.el.version>

<dependency>
    <groupId>org.glassfish</groupId>
    <artifactId>javax.el</artifactId>
    <version>${glassfish.el.version}</version>
</dependency>

After normal compilation, you can see apache-hive-2.1.1-cdh6.0 in the hive/packaging/target directory 3.1-bin. tar. GZ file

2.2.2 metadata backup

Then copy the file to the CDH cluster, decompress it, upgrade the metadata, and back up the metadata before upgrading.

mysqldump -uroot  -ptest metastore > ./metastore.sql  

2.2.3 metadata upgrade

After the backup is completed, log in to the metastore metabase and run the following command for cdh6 Upgrade metadata of 1.0

source $HIVE_6.3.1/scripts/metastore/upgrade/mysql/upgrade-2.1.1-cdh6.1.0-to-2.1.1-cdh6.2.0.mysql.sql

Let's check upgrade-2.1.1-cdh6 1.0-to-2.1.1-cdh6. 2.0. mysql. SQL script, it is found that only a create is added to the} DBS table_ Time field, and then updated some CDH_ Schema of version_ Version information, no major changes.

2.2.4 update Hive lib directory

Then create a new lib631 directory under / opt/cloudera/parcels/CDH/lib/hive /

mkdir /opt/cloudera/parcels/CDH/lib/hive/lib631

Cdh6 Copy the files under lib of Hive in 3.1 to lib631 directory

cp $HIVE_6.3.1/lib/* lib631/.

Then modify the lib directory specified by the hive script

In line 94, hide_ Lib = ${hide_home} / lib changed to} hide_ LIB=${HIVE_HOME}/lib631

After completion, restart Hive related services on CM

2.2.5 Hive upgrade verification

Verify the functions of hive, including hive sql execution, hive udf test, hive related component (hbase impala) test, etc

For details, please refer to How to install hive2.0 in a CDH cluster three point three And 0671-6.2.0 - how to convert cdh5 Migrate Hive metadata of 12 to cdh6 two

2.3. Impala auto refresh metadata function verification

After completing the above series of steps, finally verify whether the Impala auto refresh metadata function is OK.

Because Catalog received create_ In case of exception of database, the event listening is stopped, and the related services of Impala need to be restarted.

After restart, execute the following command to verify the impala auto refresh function:

Execute in hive

create database test_db2;

Execute in impala

show databases;

If the display is normal, problem 2 has been solved.

3. Summary

After some twists and turns, finally in cdh6 Impala 3.4 is used in the 1 environment, and the auto refresh metadata function is enabled. The effect is remarkable. We don't need to use invalidate metadata to refresh metadata regularly, which reduces resource consumption and improves the stability of impala.

Keywords: Big Data hive impala

Added by gwydionwaters on Tue, 08 Feb 2022 02:43:35 +0200