hadoop_ Hdfs07 hdfsha cluster configuration & ZK cluster configuration & yarnHA configuration

hadoop_ Hdfs07 hdfsha cluster configuration & ZK cluster configuration & yarnha configuration

Note: notes

(1) Cluster planning

Hadoop102	Hadoop03	Hadoop04
ZK	ZK	ZK
JournaleNode	JournaleNode	JournaleNode
NameNode	NameNode
DataNode	DataNode	DataNode
ResourceManager	ResourceManager
NodeManager	NodeManager	NodeManager

(2) Configuring Zookeeper clusters

Official website: https://archive.apache.org/dist/
Unzip ZK installation package

cmd+shirft+p enter sftp mode and drag the installation package

[user02@hadoop102 software]$ tar -zxvf zookeeper-3.4.9.tar.gz -C /opt/module/
[user02@hadoop102 software]$ cd /opt//module/zookeeper-3.4.9/
[user02@hadoop102 zookeeper-3.4.9]$ mkdir -p zkData

Configure zoo Cfg file

[user02@hadoop102 zookeeper-3.4.9]$ mv ./conf/zoo_sample.cfg ./conf/zoo.cfg
[user02@hadoop102 conf]$ vim zoo.cfg 
# Add the following configuration
dataDir=/opt/module/zookeeper-3.4.9/zkData
######cluster####
server.2=hadoop102:2888:3888
server.3=hadoop103:2888:3888
server.4=hadoop104:2888:3888

server. 2 = Hadoop 102:2888:3888 Description:

2: Is a number indicating the second server

Hadoop 102: indicates the ip address of the server

2888: the port on which the Leader server exchanges information in the server cluster

3888: if the Leader server in the cluster hangs, you need a port to re elect. Select a new Leader. This port is the port used to communicate with each other during the election

In the cluster mode, a file myid is configured. This file is in the dataDir directory. There is a data in this file, which is the value of "2" (the second server). When Zookeeper starts, read this file and get the zoo in it Compare the CFG configuration information to determine which is the server

zk cluster operation

In / opt / moudle / zookeeper-3.4 Create a myid file in the 9 / zkdata directory

[user02@hadoop102 zKdata]$ touch myid
[user02@hadoop102 zKdata]$ vim myid
# Add the number 2 corresponding to the server
2

Copy to other clusters and modify the myid content to 3 and 4 respectively

[user02@hadoop102 module]$ xsync zookeeper-3.4.9/

Single point start zk

[user02@hadoop102 zookeeper-3.4.9]$ bin/zkServer.sh start
[user02@hadoop103 zookeeper-3.4.9]$ bin/zkServer.sh start
[user02@hadoop104 zookeeper-3.4.9]$ bin/zkServer.sh start

View status

[user02@hadoop102 zookeeper-3.4.9]$ bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /opt/module/zookeeper-3.4.9/bin/../conf/zoo.cfg
Mode: follower

[user02@hadoop103 zookeeper-3.4.9]$ bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /opt/module/zookeeper-3.4.9/bin/../conf/zoo.cfg
Mode: leader

[user02@hadoop104 zookeeper-3.4.9]$ bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /opt/module/zookeeper-3.4.9/bin/../conf/zoo.cfg
Mode: follower

(3) Configure HDFS-HA cluster

Official website: https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

Copy hadoop

[user02@hadoop104 module]$ mkdir ha
[user02@hadoop104 hadoop-2.7.2]$ cp -r hadoop-2.7.2/ /opt/module/ha

Configure Hadoop env sh

export JAVA_HOME=/opt/module/jdk1.8.0_144

Configure core site xml

 				<!--appoint hdfs in namenode Address of-->
        <!--Take two nn The addresses are assembled into a cluster-->
        <property>
                <name>fs.defaultFS</name>
                <!--hdfs://hadoop102:9000-->
                <value>hdfs://mycluster</value>
        </property>
        <!--appoint hadoop The storage directory where files are generated at run time-->
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/opt/module/ha/hadoop-2.7.2/data/tmp</value>
        </property>

Configure HDFS site xml

				<!--Fully distributed cluster name-->
        <property>
          <name>dfs.nameservices</name>
          <value>mycluster</value>
        </property>

        <!--In cluster namenode What are the nodes-->
        <property>
          <name>dfs.ha.namenodes.mycluster</name>
          <value>nn1,nn2</value>
        </property>

        <!--nn1 of RPC mailing address-->
        <property>
           <name>dfs.namenode.rpc-address.mycluster.nn1</name>
           <value>hadoop102:8020</value>
        </property>

        <!--nn1 of http mailing address-->
        <property>
          <name>dfs.namenode.http-address.mycluster.nn1</name>
          <value>hadoop102:50070</value>
        </property>

        <!--nn2 of RPC mailing address-->
        <property>
          <name>dfs.namenode.rpc-address.mycluster.nn2</name>
          <value>hadoop103:8020</value>
        </property>

        <!--nn2 of http mailing address-->
        <property>
          <name>dfs.namenode.http-address.mycluster.nn2</name>
          <value>hadoop103:50070</value>
        </property>

        <!--appoint NameNode Metadata in JournalNode Storage location on-->
        <property>
          <name>dfs.namenode.shared.edits.dir</name>
          <value>qjournal://hadoop102:8485;hadoop103:8485;hadoop104:8485/mycluster</value>
        </property>

        <!--Configure isolation mechanism,That is, only one service mechanism can respond at the same time-->
         <property>
           <name>dfs.ha.fencing.methods</name>
           <value>sshfence</value>
         </property>

        <!--Required when using isolation mechanism ssh No key login-->
       <property>
         <name>dfs.ha.fencing.ssh.private-key-files</name>
         <value>/home/user02/.ssh/id_rsa</value>
       </property>

        <!--statement journalnode Server storage directory-->
        <property>
          <name>dfs.journalnode.edits.dir</name>
          <value>/opt/module/ha/hadoop-2.7.2/data/jn</value>
        </property>

        <!--Turn off permission check-->
        <property>
          <name>dfs.permissions.enable</name>
          <value>false</value>
        </property>

        <!--Access proxy class:client,mycluster,active Implementation mode of automatic switching in case of configuration failure-->
        <property>
          <name>dfs.client.failover.proxy.provider.mycluster</name>
          <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
        </property>

Configure distribution to other nodes
```
[user02@hadoop104 module]$ xsync ./ha
```

(4) Start HDFS-HA cluster (single point)

Each journalnode node starts the journalnode service

Note: sbin should be added when starting. There are two hdfs in the path

[user02@hadoop102 ~]$ cd /opt/module/ha/hadoop-2.7.2/
[user02@hadoop103 ~]$ cd /opt/module/ha/hadoop-2.7.2/
[user02@hadoop104 ~]$ cd /opt/module/ha/hadoop-2.7.2/
[user02@hadoop102 hadoop-2.7.2]$ sbin/hadoop-daemon.sh start journalnode
[user02@hadoop103 hadoop-2.7.2]$ sbin/hadoop-daemon.sh start journalnode
[user02@hadoop104 hadoop-2.7.2]$ sbin/hadoop-daemon.sh start journalnode

Format [nn1] and start (one format, one synchronization)

Delete the data and logs folders first

[user02@hadoop102 hadoop-2.7.2]$ rm -rf data/ logs/
[user02@hadoop103 hadoop-2.7.2]$ rm -rf data/ logs/
[user02@hadoop104 hadoop-2.7.2]$ rm -rf data/ logs/

Format nn1

[user02@hadoop102 hadoop-2.7.2]$ bin/hdfs namenode -format

Start nn1

[user02@hadoop102 hadoop-2.7.2]$ sbin/hadoop-daemon.sh start namenode

Synchronize nn1 metadata information on [nn2] and start

[user02@hadoop103 hadoop-2.7.2]$ bin/hdfs namenode -bootstrapStandby

[user02@hadoop103 hadoop-2.7.2]$ sbin/hadoop-daemon.sh start namenode

View the web pages Hadoop 102:50070 and Hadoop 103:50070, both of which are in standby status

Start all datanode s on [nn1]

[user02@hadoop102 hadoop-2.7.2]$ sbin/hadoop-daemon.sh start datanode
[user02@hadoop103 hadoop-2.7.2]$ sbin/hadoop-daemon.sh start datanode
[user02@hadoop104 hadoop-2.7.2]$ sbin/hadoop-daemon.sh start datanode

Switch [nn1] to active

[user02@hadoop102 hadoop-2.7.2]$ bin/hdfs haadmin -transitionToActive nn1

Check whether it is active

[user02@hadoop102 hadoop-2.7.2]$ bin/hdfs haadmin -getServiceState nn1
active

(5) Configure HDFS-HA automatic failover

prospect

      When kill -9 Yes hadoop02 of namenode,Think again nn2 Switch to active Reject link will be reported,Hang up one,Unable to communicate,Need to hadoop102 Start up to ensure standby,Re cut. (Manual switching)

to configure

      1)  hdfs-site.xml

      ```
       				<!--hdfs-ha Automatic failover-->
              <property>
                <name>dfs.ha.automatic-failover.enabled</name>
                <value>true</value>
              </property>
      ```

      [user02@hadoop102 hadoop]$ xsync ./hdfs-site.xml 

      2) core-site.xml

      ```
       				<!--to configure hdfs-ha Automatic failover-->
              <property>
                <name>ha.zookeeper.quorum</name>
                <value>hadoop102:2181,hadoop103:2181,hadoop104:2181</value>
              </property>
      ```

      [user02@hadoop102 hadoop]$ xsync ./core-site.xml

start-up

      1) Close all hdfs service

      ```
      sbin/stop-dfs.sh
      ```

      2) start-up ZK colony 

      bin/zkServer.sh  start

      ```
      [user02@hadoop102 ~]$ cd /opt/module/zookeeper-3.4.9/
      [user02@hadoop102 zookeeper-3.4.9]$ bin/zkServer.sh  start
      ZooKeeper JMX enabled by default
      Using config: /opt/module/zookeeper-3.4.9/bin/../conf/zoo.cfg
      Starting zookeeper ... STARTED
      [user02@hadoop102 zookeeper-3.4.9]$ jps
      3351 Jps
      3326 QuorumPeerMain
      
      Three stations rise together
      ```

      3) initialization HA stay ZK Status in

      bin/hdfs zkfc -formatZK

      ```
      [user02@hadoop102 zookeeper-3.4.9]$ cd /opt/module/ha/hadoop-2.7.2/
      [user02@hadoop102 hadoop-2.7.2]$ bin/hdfs zkfc -formatZK
      ```

      /opt/module/zookeeper-3.4.9/bin/zkServer.sh status

      display Mode:follower

      4) start-up hdfs service

      sbin/start-dfs.sh

      ```
      [user02@hadoop102 hadoop-2.7.2]$ sbin/start-dfs.sh
      Starting namenodes on [hadoop102 hadoop103]
      hadoop102: starting namenode, logging to /opt/module/ha/hadoop-2.7.2/logs/hadoop-user02-namenode-hadoop102.out
      hadoop103: starting namenode, logging to /opt/module/ha/hadoop-2.7.2/logs/hadoop-user02-namenode-hadoop103.out
      hadoop103: starting datanode, logging to /opt/module/ha/hadoop-2.7.2/logs/hadoop-user02-datanode-hadoop103.out
      hadoop102: starting datanode, logging to /opt/module/ha/hadoop-2.7.2/logs/hadoop-user02-datanode-hadoop102.out
      hadoop104: starting datanode, logging to /opt/module/ha/hadoop-2.7.2/logs/hadoop-user02-datanode-hadoop104.out
      Starting journal nodes [hadoop102 hadoop103 hadoop104]
      hadoop104: starting journalnode, logging to /opt/module/ha/hadoop-2.7.2/logs/hadoop-user02-journalnode-hadoop104.out
      hadoop102: starting journalnode, logging to /opt/module/ha/hadoop-2.7.2/logs/hadoop-user02-journalnode-hadoop102.out
      hadoop103: starting journalnode, logging to /opt/module/ha/hadoop-2.7.2/logs/hadoop-user02-journalnode-hadoop103.out
      Starting ZK Failover Controllers on NN hosts [hadoop102 hadoop103]
      hadoop103: starting zkfc, logging to /opt/module/ha/hadoop-2.7.2/logs/hadoop-user02-zkfc-hadoop103.out
      hadoop102: starting zkfc, logging to /opt/module/ha/hadoop-2.7.2/logs/hadoop-user02-zkfc-hadoop102.out
      ```

      

      5) In each NameNode Start on node DFSZK Failover Controller,Which machine to start first,Which machine NameNode namely Active NameNode

      ```
      sbin/hadoop-daemin.sh start zkfc
      ```

      visit/opt/module/zookeeper-3.4.9/bin/zkcli.sh

      implement ls /One more hadoop -ha process

verification

      visit http://hadoop102:50070/dfshealth.html#tab-overview ---active

      visit http://hadoop103:50070/dfshealth.html#tab-overview ---standby

      kill active of nn after,standby Switch to immediately active

      1) take Active NameNode process kill

      ```
      kill -9 namenode Process of id
      ```

      2) take Active NameNode The machine is disconnected from the network

      ```
      service network stop
      ```

Process of each node

      ```
      [user02@hadoop102 sbin]$ jps
      3424 QuorumPeerMain--------ZK Cluster process
      3666 NameNode
      3779 DataNode
      3978 JournalNode
      4170 DFSZKFailoverController
      4284 Jps
      
      [user02@hadoop103 bin]$ jps
      3586 JournalNode
      3491 DataNode
      3412 NameNode
      3326 QuorumPeerMain
      3806 Jps
      5188 DFSZKFailoverController
      
      [user02@hadoop104 bin]$ jps
      3553 Jps
      3475 JournalNode
      3380 DataNode
      3305 QuorumPeerMain
      ```

(6) YARN-HA configuration

Specific configuration

yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->
         <!--Reducer How to get data-->
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>

        <!--appoint YARN of ResourceManager Address of-->
        <!-- <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>hadoop103</value>
        </property> -->

        <!-- yarn-ha -->
        <property>
          <name>yarn.resourcemanager.ha.enabled</name>
          <value>true</value>
        </property>
        <property>
          <name>yarn.resourcemanager.cluster-id</name>
          <value>cluster-yarn1</value>
        </property>
        <property>
          <name>yarn.resourcemanager.ha.rm-ids</name>
          <value>rm1,rm2</value>
        </property>
        <property>
          <name>yarn.resourcemanager.hostname.rm1</name>
          <value>hadoop102</value>
        </property>
        <property>
          <name>yarn.resourcemanager.hostname.rm2</name>
          <value>hadoop103</value>
        </property>
        <property>
          <name>yarn.resourcemanager.webapp.address.rm1</name>
          <value>hadoop102:8088</value>
        </property>
        <property>
          <name>yarn.resourcemanager.webapp.address.rm2</name>
          <value>hadoop103:8088</value>
        </property>
        <property>
          <name>yarn.resourcemanager.zk-address</name>
          <value>hadoop102:2181,hadoop103:2181,hadoop104:2181</value>
        </property>

        <!-- Start automatic recovery -->
        <property>
                <name>yarn.resoucemanager.recovery.enabled</name>
                <value>true</value>
        </property>

        <!-- appoint resoucemanager The status information of is stored in zookeeper upper -->
        <property>
                <name>yarn.resourcemanager.store.class</name>
                <value>org.apache.hadoop.yarn.server.resourcemanager.recover.ZKRMStateStore</value>
        </property>

        <!-- Log aggregation function -->
        <propery>
         <name>yarn.log-aggregation-enable</name>
         <value>true</value>
        </propery>

        <!-- Log retention for 7 days -->
        <propery>
         <name>yarn.nodemanager.log.retain-seconds</name>
         <value>604800</value>
        </propery>
</configuration>

Distribute xsync

Start HDFS

Start the journalnode service on the three journalnodes

sbin/hadoop-daemon.sh start journalnode

On nn1, format namenode1 and start

First RM - RF/ data ./ logs

bin/hdfs namenode -format
sbin/hadoop-daemon.sh start namenode

On nn2, synchronize the metadata information of nn1

bin/hdfs namenode -bootstrapStandby

Start nn2

sbin/hadoop-daemon.sh start namenode

Start all datanode s

sbin/hadoop-daemon.sh start datanode

Switch nn1 to active

# Automatic failover execution is not configured:
bin/hdfs haadmin -transitionToActive nn1
# Configure automatic failover execution: two nn nodes start the zkfc service. Whichever starts first is active
sbin/hadoop-daemon.sh start zkfc

Start YARN
1. Execute in Hadoop 102
```
sbin/start-yarn.sh
```
1. Execute in Hadoop 103
```
sbin/yarn-daemon.sh start resourcemanager
```
1. View service status
```
bin/yarn rmadmin -getServiceState rm1
```
According to web Hadoop 102:8088, yarn can't see which node is active and which standby is active. After killing one of the rm processes, the other word is switched to active

(7) Port number

secondary namenode http mailing address: 50090 DFS namenode. secondary. http-address

namenode http address: 50070 DFS namenode. http-address. mycluster. nn1

namenode rpc communication address: 9000 or 8020 DFS namenode. rpc-address. mycluster. nn1

yarn web address: 8088 yarn resourcemanager. webapp. address. rm1

zookeeper : 2181 ha.zookeeper.quorum

Keywords: Big Data Hadoop Zookeeper hdfs Yarn

Added by DarkArchon on Sun, 26 Dec 2021 06:36:22 +0200

Programming VIP