At cdh6 Version 1 we try on cdh6 In version 1, Impala was upgraded and the function of automatically refreshing metadata was enabled. Some problems were encountered during this period. They were finally solved by checking the log, source code, Google and so on. Use this article to sort it out and give back to the community.
The main reference documents are:
[1]At cdh6 Upgrade Impala to Apache Impala 3.4 separately in 3
[2]0757-6.3.3 - how to configure impala to automatically synchronize HMS metadata
1.Impala reported that Hadoop LZO could not be found during compilation
cloudera has deleted the Hadoop LZO repository. Check the compiled script bin/bootstrap_system.sh found the following comment
#LZO is not needed to compile or run Impala, but it is needed for the data load
Since Hadoop LZO is not used in our environment, comment out the following script and run it again to complete the normal compilation
echo ">>> Checking out Impala-lzo" : ${IMPALA_LZO_HOME:="${IMPALA_HOME}/../Impala-lzo"} if ! [[ -d "$IMPALA_LZO_HOME" ]] then git clone --branch master https://github.com/cloudera/impala-lzo.git "$IMPALA_LZO_HOME" fi echo ">>> Checking out and building hadoop-lzo" : ${HADOOP_LZO_HOME:="${IMPALA_HOME}/../hadoop-lzo"} if ! [[ -d "$HADOOP_LZO_HOME" ]] then git clone https://github.com/cloudera/hadoop-lzo.git "$HADOOP_LZO_HOME" fi cd "$HADOOP_LZO_HOME" time -p ant package cd "$IMPALA_HOME"
2. When the database is created, the metadata cannot be refreshed automatically
2.1 problems found
At cdh6 Create a database under version 1, such as create database test_db; Then show databases in impala;
Discovery test_db was not refreshed into Impala Catalog. By searching the role log of Impala Catalog, the following exception logs were found:
Unexpected exception received while processing event Java exception follows: org.apache.impala.catalog.events.MetastoreNotificationException: EventId: 591869 EventType: CREATE_DATABASE Database object is null in the event. This could be a metastore configuration problem. Check if hive.metastore.notifications.add.thrift.objects is set to true in metastore configuration at org.apache.impala.catalog.events.MetastoreEvents$CreateDatabaseEvent.<init>(MetastoreEvents.java:1108) at org.apache.impala.catalog.events.MetastoreEvents$CreateDatabaseEvent.<init>(MetastoreEvents.java:1089) at org.apache.impala.catalog.events.MetastoreEvents$MetastoreEventFactory.get(MetastoreEvents.java:168) at org.apache.impala.catalog.events.MetastoreEvents$MetastoreEventFactory.getFilteredEvents(MetastoreEvents.java:205) at org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:601) at org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:513) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:191) at org.apache.impala.catalog.events.MetastoreEvents$CreateDatabaseEvent.<init>(MetastoreEvents.java:1106) ... 12 more
The important part is: CREATE_DATABASE Database object is null in the event. This could be a metastore configuration problem. Check if hive.metastore.notifications.add.thrift.objects is set to true in metastore configuration
However, we did configure this configuration and it didn't take effect. We found it in the notification of metastore_ The message of this event is found in the log
{"server":"","servicePrincipal":"","db":"test_db","timestamp":1622247221,"location":"hdfs://nameservice1/user/hive/warehouse/test_db.db","ownerType":"USER","ownerName":"admin"}
Since reference [2] is built under the environment of CDH 6.3.3, I built a single node cdh6 in the test environment 3.1 after getting up, the same is in the new cdh6 3.1 configure the function of automatically refreshing metadata in the environment, and find that the message of creating database is:
{"server":"","servicePrincipal":"","db":"test_db","dbJson":"{\"1\":{\"str\":\"davie_test\"},\"3\":{\"str\":\"hdfs://nameservice1/user/hive/warehouse/test_db.db\"},\"6\":{\"str\":\"admin\"},\"7\":{\"i32\":1},\"9\":{\"i32\":1622248258}}","timestamp":1622248259,"location":"hdfs://nameservice1/user/hive/warehouse/davie_test.db","ownerType":"USER","ownerName":"admin"};
A comparison between the two was found in cdh6 In Hive of 1.0, the field dbJson is missing in message
It is found in the. IDEA field
If you want to unlock the automatic refresh function of impala metadata, you can only upgrade Hive.
2.2} Hive upgrade to 2.1.1-cdh6 Version 3.1
2.2.1 compilation and packaging
The specific steps are to download the Cloudera Hive code, and then compile and package it
git clone --single-branch --branch cdh6.3.1-release https://github.com/cloudera/hive.git hive mvn clean package -DskipTests -Pdist
During the compilation process, it is found that some Cloudera packages cannot be downloaded, and a new mirror needs to be added
<repository> <id>nexus-aliyun</id> <url>http://maven.aliyun.com/nexus/content/groups/public</url> <name>nexus-aliyun</name> <snapshots> <enabled>false</enabled> </snapshots> </repository> <repository> <id>cloudera-repos</id> <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url> <name>CDH Releases Repository</name> <snapshots> <enabled>false</enabled> </snapshots> </repository>
In the process of Maven downloading dependencies, it is found that Hbase relies on javax El can't get down all the time. Report the following errors
Could not find artifact org.glassfish:javax.el:pom:3.0.1-b06-SNAPSHOT
In POM Add javax. XML file El's dependency, specify the version, and rerun the maven command
<glassfish.el.version>3.0.1-b06</glassfish.el.version> <dependency> <groupId>org.glassfish</groupId> <artifactId>javax.el</artifactId> <version>${glassfish.el.version}</version> </dependency>
After normal compilation, you can see apache-hive-2.1.1-cdh6.0 in the hive/packaging/target directory 3.1-bin. tar. GZ file
2.2.2 metadata backup
Then copy the file to the CDH cluster, decompress it, upgrade the metadata, and back up the metadata before upgrading.
mysqldump -uroot -ptest metastore > ./metastore.sql
2.2.3 metadata upgrade
After the backup is completed, log in to the metastore metabase and run the following command for cdh6 Upgrade metadata of 1.0
source $HIVE_6.3.1/scripts/metastore/upgrade/mysql/upgrade-2.1.1-cdh6.1.0-to-2.1.1-cdh6.2.0.mysql.sql
Let's check upgrade-2.1.1-cdh6 1.0-to-2.1.1-cdh6. 2.0. mysql. SQL script, it is found that only a create is added to the} DBS table_ Time field, and then updated some CDH_ Schema of version_ Version information, no major changes.
2.2.4 update Hive lib directory
Then create a new lib631 directory under / opt/cloudera/parcels/CDH/lib/hive /
mkdir /opt/cloudera/parcels/CDH/lib/hive/lib631
Cdh6 Copy the files under lib of Hive in 3.1 to lib631 directory
cp $HIVE_6.3.1/lib/* lib631/.
Then modify the lib directory specified by the hive script
In line 94, hide_ Lib = ${hide_home} / lib changed to} hide_ LIB=${HIVE_HOME}/lib631
After completion, restart Hive related services on CM
2.2.5 Hive upgrade verification
Verify the functions of hive, including hive sql execution, hive udf test, hive related component (hbase impala) test, etc
For details, please refer to How to install hive2.0 in a CDH cluster three point three And 0671-6.2.0 - how to convert cdh5 Migrate Hive metadata of 12 to cdh6 two
2.3. Impala auto refresh metadata function verification
After completing the above series of steps, finally verify whether the Impala auto refresh metadata function is OK.
Because Catalog received create_ In case of exception of database, the event listening is stopped, and the related services of Impala need to be restarted.
After restart, execute the following command to verify the impala auto refresh function:
Execute in hive
create database test_db2;
Execute in impala
show databases;
If the display is normal, problem 2 has been solved.
3. Summary
After some twists and turns, finally in cdh6 Impala 3.4 is used in the 1 environment, and the auto refresh metadata function is enabled. The effect is remarkable. We don't need to use invalidate metadata to refresh metadata regularly, which reduces resource consumption and improves the stability of impala.