reminder To complete the following tasks, make sure you have used other methods to hudi Data synchronization to hive Yes. If you do not synchronize hive data, refer to the article: Use the flick SQL client to write mysql data to hudi and synchronize to hive . And, in the following presto Query is based on the hive table synchronized by the above reference articles. It is recommended to read the above reference articles first. The following presto installation takes a single node as an example.
presto 0.261 Download
Download Presto server and presto cli
mkdir /data mkdir /data/presto-cli cd /data wget https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.261/presto-server-0.261.tar.gz wget https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.261/presto-cli-0.261-executable.jar cp presto-cli-0.261-executable.jar /data/presto-cli chmod +x /data/presto-cli/presto-cli-0.261-executable.jar ln -s /data/presto-cli/presto-cli-0.261-executable.jar /data/presto-cli/presto tar zxvf presto-server-0.261.tar.gz ln -s /data/presto-server-0.261 /data/presto-serverCopy
presto server configuration
Enter the / data / Presto server directory and perform the following operations:
Create a new etc directory and configuration file:
cd /data/presto-server mkdir data mkdir etc cd etc touch config.properties touch jvm.config touch log.properties touch node.properties mkdir catalog touch catalog/hive.propertiesCopy
Edit profile:
vim config.properties
fill:
coordinator=true node-scheduler.include-coordinator=true http-server.http.port=8282 query.max-memory=5GB query.max-memory-per-node=1GB query.max-total-memory-per-node=2GB discovery-server.enabled=true discovery.uri=http://hadoop1:8282Copy
The above configuration items are Presto server configuration information, and the coordinator and worker are concentrated on the same host.
vim jvm.config
fill:
-server -Xmx8G -XX:+UseG1GC -XX:G1HeapRegionSize=32M -XX:+UseGCOverheadLimit -XX:+ExplicitGCInvokesConcurrent -XX:+HeapDumpOnOutOfMemoryError -XX:+ExitOnOutOfMemoryErrorCopy
The above configuration items are related configuration information of Presto server JVM.
vim log.properties
fill:
com.facebook.presto = INFOCopy
The above configuration item is Presto server log level.
vim node.properties
fill:
node.environment=production node.id=presto1 node.data-dir=/data/presto-server/dataCopy
The above configuration items are node information.
vim catalog/hive.properties
fill:
connector.name = hive-hadoop2 hive.metastore.uri = thrift://hadoop:9083 hive.parquet.use-column-names=true hive.config.resources = /data/presto-server/etc/catalog/core-site.xml,/data/presto-server/etc/catalog/hdfs-site.xml Copy
The above configuration items are hive connection information, where,
- connector.name is the name of hive connector
- hive.metastore.uri is hive metastore connection information
- hive. parquet. Use column names = true is set to solve the problem of presto reading parquet type. It is required
- hive.config.resources is the related configuration file information of hdfs cluster, which can be copied to / data / Presto server / etc / Catalog directory
For more detailed configuration information of presto, please refer to: https://prestodb.io/docs/current/installation/deployment.html
presto server startup
The startup command is as follows:
/data/presto-server/bin/launcher startCopy
After startup, you can see the corresponding log file in / data / Presto server / data / var / log.
[root@hadoop1 presto-server]# ll /data/presto-server/data/var/log total 3208 -rw-r--r--. 1 root root 1410243 Sep 27 07:07 http-request.log -rw-r--r--. 1 root root 2715 Sep 27 05:44 launcher.log -rw-r--r--. 1 root root 1867319 Sep 27 06:01 server.logCopy
presto cli startup
/data/presto-cli/presto --server hadoop1:8282 --catalog hive --schema testCopy
Where schema represents the library name.
So far, we have completed the presto installation and startup, and then we can query the data in hive.
Querying the cow table using presto
First, make sure that you have synchronized the hudi COW table to hudi through other methods. If there is no relevant synchronization, please refer to the article: Use the flick SQL client to write mysql data to hudi and synchronize to hive
This paper is based on the reference article, and the queried table is also based on the table data imported from the above reference article.
Perform the following query operations:
select count(*) from stu_tmp_1; select * from stu_tmp_1 limit 10; select name from stu_tmp_1 group by name, school limit 10;Copy
This article is the original article of "xiaozhch5", a blogger from big data to artificial intelligence. It follows the CC 4.0 BY-SA copyright agreement. Please attach the original source link and this statement for reprint.
Original link: https://lrting.top/backend/2055/