apache tez compilation, installation and verification

Basic introduction

Apache Tez is a data processing framework built on Apache Hadoop YARN and based on directed acyclic graph.

Main design theme:

  • Authorized end user
    • Expressive data flow definition API
    • Flexible input processing output operation model
    • Data type independent
    • Easy to deploy
  • Execution performance
    • Better than mapreduce
    • Optimize resource management
    • Run time scheduled reconfiguration
    • Dynamic physical data flow decision

By allowing projects like Apache Hive and Apache Pig to run complex DAG tasks, Tez can be used to process data. Previously, multiple MR tasks were required, but now only one Tez task is required, as shown below.

Download address

https://tez.apache.org/releases/index.html

Installation deployment

Version adaptation

For Tez version 0.8.3 and later, Tez requires Apache Hadoop version 2.6.0 or later. For Tez version 0.9.0 and later, Tez requires Apache Hadoop version 2.7.0 or later. So we're choosing tez When, we need to determine our hadoop version first.

Adapt hadoop version for tez source code compilation

Compiling platform

Operating system: centos 7.6

CPU architecture: x86_ sixty-four

Dependent installation

  1. First make sure it is installed
  • jdk8
  • maven3

protobuf-2.5.0 installation

yum install protobuf protobuf-develCopy

Source code compilation

After determining the hadoop version we use, select the appropriate tez for source code compilation. This way

  • tez-0.9.2
  • hadoop-3.2.0

Take tez as an example to compile the source code.

Source download and decompression

wget https://mirror.olnevhost.net/pub/apache/tez/0.9.2/apache-tez-0.9.2-src.tar.gz
tar zxvf apache-tez-0.9.2-src.tar.gz
Copy

Source code compilation

cd apache-tez-0.9.2-src && mvn clean package -Dtar -Dhadoop.version=3.2.0 -DskipTestsCopy

After compiling, you get tez dist / target / tez-0.9.2 tar. gz

functional testing

First, ensure that hadoop is installed normally, including hdfs and yarn

reference resources: How to install hadoop yarn

Tez-0.9.2 tar. GZ upload to / app/tez directory of hdfs

hdfs dfs -put tez-0.9.2.tar.gz /app/tez/
Copy

Create a new tez directory and add tez-0.9.2 tar. GZ copy to tez clock

mkdir -p /data/tez/conf
cp tez-0.9.2.tar.gz /data/tez
cd /data/tez && tar zvf tez-0.9.2.tar.gzCopy

New tez site XML, as follows

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<configuration>
<property>
<name>tez.lib.uris</name>
<value>/app/tez/tez-0.9.2.tar.gz</value>
</property>
</configuration>
Copy

Modify / etc/profile and add

export TEZ_CONF_DIR=/data/tez/conf
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$TEZ_CONF_DIR:/data/tez/*:/data/tez/lib/*Copy

Modify mapred site XML, will

  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>Copy

Change to

  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>Copy

Execute test script:

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0.jar wordcount /test/ output-1Copy

Results obtained:

This article is the original article of "xiaozhch5", a blogger from big data to artificial intelligence. It follows the CC 4.0 BY-SA copyright agreement. Please attach the original source link and this statement for reprint.

Original link: https://lrting.top/backend/2078/

Added by bpp198 on Wed, 19 Jan 2022 08:03:17 +0200