Use presto to query hudi data synchronized to hive

reminder To complete the following tasks, make sure you have used other methods to hudi Data synchronization to hive Yes. If you do not synchronize hive data, refer to the article: Use the flick SQL client to write mysql data to hudi and synchronize to hive . And, in the following presto Query is based on the hive table synchronized by the above reference articles. It is recommended to read the above reference articles first. The following presto installation takes a single node as an example.

presto 0.261 Download

Download Presto server and presto cli

mkdir /data
mkdir /data/presto-cli
cd /data
wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.261/presto-server-0.261.tar.gz
wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.261/presto-cli-0.261-executable.jar

cp presto-cli-0.261-executable.jar /data/presto-cli
chmod +x /data/presto-cli/presto-cli-0.261-executable.jar
ln -s /data/presto-cli/presto-cli-0.261-executable.jar /data/presto-cli/presto
tar zxvf presto-server-0.261.tar.gz
ln -s /data/presto-server-0.261 /data/presto-serverCopy

presto server configuration

Enter the / data / Presto server directory and perform the following operations:

Create a new etc directory and configuration file:

cd /data/presto-server
mkdir data
mkdir etc
cd etc
touch config.properties
touch jvm.config
touch log.properties
touch node.properties
mkdir catalog
touch catalog/hive.propertiesCopy

Edit profile:

vim config.properties

fill:

coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8282
query.max-memory=5GB
query.max-memory-per-node=1GB
query.max-total-memory-per-node=2GB
discovery-server.enabled=true
discovery.uri=http://hadoop1:8282Copy

The above configuration items are Presto server configuration information, and the coordinator and worker are concentrated on the same host.

vim jvm.config

fill:

-server
-Xmx8G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryErrorCopy

The above configuration items are related configuration information of Presto server JVM.

vim log.properties

fill:

com.facebook.presto = INFOCopy

The above configuration item is Presto server log level.

vim node.properties

fill:

node.environment=production
node.id=presto1
node.data-dir=/data/presto-server/dataCopy

The above configuration items are node information.

vim catalog/hive.properties

fill:

connector.name = hive-hadoop2
hive.metastore.uri = thrift://hadoop:9083
hive.parquet.use-column-names=true
hive.config.resources = /data/presto-server/etc/catalog/core-site.xml,/data/presto-server/etc/catalog/hdfs-site.xml Copy

The above configuration items are hive connection information, where,

  • connector.name is the name of hive connector
  • hive.metastore.uri is hive metastore connection information
  • hive. parquet. Use column names = true is set to solve the problem of presto reading parquet type. It is required
  • hive.config.resources is the related configuration file information of hdfs cluster, which can be copied to / data / Presto server / etc / Catalog directory

For more detailed configuration information of presto, please refer to: https://prestodb.io/docs/current/installation/deployment.html

presto server startup

The startup command is as follows:

/data/presto-server/bin/launcher startCopy

After startup, you can see the corresponding log file in / data / Presto server / data / var / log.

[root@hadoop1 presto-server]# ll /data/presto-server/data/var/log
total 3208
-rw-r--r--. 1 root root 1410243 Sep 27 07:07 http-request.log
-rw-r--r--. 1 root root    2715 Sep 27 05:44 launcher.log
-rw-r--r--. 1 root root 1867319 Sep 27 06:01 server.logCopy

presto cli startup

/data/presto-cli/presto --server hadoop1:8282 --catalog hive --schema testCopy

Where schema represents the library name.

So far, we have completed the presto installation and startup, and then we can query the data in hive.

Querying the cow table using presto

First, make sure that you have synchronized the hudi COW table to hudi through other methods. If there is no relevant synchronization, please refer to the article: Use the flick SQL client to write mysql data to hudi and synchronize to hive

This paper is based on the reference article, and the queried table is also based on the table data imported from the above reference article.

Perform the following query operations:

select count(*) from stu_tmp_1;

select * from stu_tmp_1 limit 10;

select name from stu_tmp_1 group by name, school limit 10;Copy

This article is the original article of "xiaozhch5", a blogger from big data to artificial intelligence. It follows the CC 4.0 BY-SA copyright agreement. Please attach the original source link and this statement for reprint.

Original link: https://lrting.top/backend/2055/

Added by mahaguru on Thu, 20 Jan 2022 09:47:53 +0200