Hadoop HA high availability deployment

Hadoop HA high availability installation

Problems needing attention in this scheme

hdfs-site. Dfs.xml file ha. fencing. The methods parameter is shell instead of sshence Because the host of the primary node is down (the host is down instead of stopping the service) and cannot be switched However, most articles related to Hadoop HA from Baidu are in the sshfence mode

Official reference documents: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html

The sshence option sshes to the target node and uses fuser to kill the process listening on the service's TCP port In order for this fencing option to work, it must be able to SSH to the target node without providing a passphrase. Thus, one must also configure the dfs. ha. fencing. ssh. private-key-files option, which is a comma-separated list of SSH private key files. For example:

When the active namenode connection fails, standby will ssh to the previous active namenode and kill the process again

The problem is that if the active namenode node hangs up, the standby node ssh cannot switch to this node Therefore, this high availability is only applicable to prevent the death of the namenode service, but can not bear the death of the node If the active node hangs, you can see Java. Net from the zkfc log of the node net. Noroutetohostexception: exception of no route to host (host unreachable)

Note 2: Hadoop HA also uses half of the mechanism to prevent brain crack I.e. (n - 1) / 2 Therefore, you need to run at least three journalnode services, that is, three primary nodes (one primary node and two standby nodes)

Official original:

JournalNode machines - the machines on which you run the JournalNodes. The JournalNode daemon is relatively lightweight, so these daemons may reasonably be collocated on machines with other Hadoop daemons, for example NameNodes, the JobTracker, or the YARN ResourceManager. Note: There must be at least 3 JournalNode daemons, since edit log modifications must be written to a majority of JNs. This will allow the system to tolerate the failure of a single machine. You may also run more than 3 JournalNodes, but in order to actually increase the number of failures the system can tolerate, you should run an odd number of JNs, (i.e. 3, 5, 7, etc.). Note that when running with N JournalNodes, the system can tolerate at most (N - 1) / 2 failures and continue to function normally.

Host infrastructure configuration

Close the firewall and selinux to ensure normal node communication

Modify the host name (because the number of machines is small, many components share a node)

IP address	host name	describe
20.88.10.31	emr-header-01	Primary node
20.88.10.32	emr-header-02	Standby primary node
20.88.10.33	emr-worker-01	Working node, standby primary node, zookeeper node
20.88.10.34	emr-worker-02	Work node

Modify the hosts file and hostname
```
hostnamectl set-hostname emr-header-01;bash
```
Add the contents of the hosts file (each node is the same)

This item is required

Because the host names of the master and slave nodes are specified in the Hadoop configuration, if hosts is not added, the resolution of the host name will fail
```
20.88.10.31 emr-header-01
20.88.10.32 emr-header-02
20.88.10.33 emr-worker-01
20.88.10.34 emr-worker-02
```

Nodes are mutually confidential (executed by emr-header-01 node)

When Hadoop starts, it will ssh connect to other nodes. If no secret key is required, it will prompt for the password

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys

Distribute to each host

sshpass -psinobase@123  ssh-copy-id -i /root/.ssh/id_dsa.pub "-o StrictHostKeyChecking=no"  root@emr-header-01
sshpass -psinobase@123  ssh-copy-id -i /root/.ssh/id_dsa.pub "-o StrictHostKeyChecking=no"  root@emr-header-02
sshpass -psinobase@123  ssh-copy-id -i /root/.ssh/id_dsa.pub "-o StrictHostKeyChecking=no"  root@emr-worker-01
sshpass -psinobase@123  ssh-copy-id -i /root/.ssh/id_dsa.pub "-o StrictHostKeyChecking=no"  root@emr-worker-02

Install JAVA environment

Install the system centos7. The user is root (you can also install without it). The installation directory of all packages is / opt

All nodes need to be installed

Download the corresponding jdk according to the required version. Skip the download. Java official website: https://www.oracle.com/java/technologies/downloads/

tar xf jdk-8u181-linux-x64.tar.gz -C /opt/
echo 'export JAVA_HOME=/opt/jdk1.8.0_181/
export PATH=${PATH}:${JAVA_HOME}/bin' >>/etc/profile
source /etc/profile
java -version

Installing zookeeper

zookeeper is installed on emr-header-01, emr-header-02 and emr-worker-01 nodes

Official download address of zookeeper: https://zookeeper.apache.org/releases.html

emr-header-01 node operation:

# Upload after downloading, skip here
tar xf zookeeper-3.4.13.tar.gz -C /opt/

Modify configuration

cd /opt/zookeeper-3.4.13/conf/
vim zoo.cfg

All configurations:

minSessionTimeout=16000
maxSessionTimeout=300000
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/data/zookeeper/data
dataLogDir=/data/zookeeper/datalog
clientPort=2181
server.1=emr-header-01:2888:3888
server.2=emr-header-02:2888:3888
server.3=emr-worker-01:2888:3888

Add variable

echo 'ZK_HOME=/opt/zookeeper-3.4.13' >>/etc/profile
echo 'PATH=$PATH:${ZK_HOME}/bin/' >>/etc/profile
source /etc/profile

Distribute to other nodes

scp -r /opt/* emr-header-02:/opt/
scp -r /opt/* emr-worker-01:/opt/
scp -r /opt/* emr-worker-02:/opt/
scp /etc/profile emr-header-02:/etc/
scp /etc/profile emr-worker-01:/etc/
scp /etc/hosts emr-header-02:/etc/
scp /etc/hosts emr-worker-01:/etc/
scp /etc/hosts emr-worker-02:/etc/

All nodes execute

mkdir -p /data/zookeeper/data
cd /data/zookeeper/data

Note: the execution content of each node here is different

The number of each file is globally unique and cannot be written casually at the same time

For example, the number 1 in emr-header-01 corresponds to the server in the above configuration 1 = 1 of emr-header-01:2888:3888

# emr-header-01 node execution
echo "1" >myid
# emr-header-02 node execution
echo "2" >myid
# emr-worker-01 node execution
echo "3" >myid

Start, all nodes execute

source /etc/profile;zkServer.sh start

Check status

zkServer.sh status

Two of the three nodes are Mode: follower, and one Mode: leader is successful

Install Hadoop

Official website address: https://archive.apache.org/dist/hadoop/common/

Here you can use Tsinghua source download at: https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.10.1/hadoop-2.10.1.tar.gz

emr-header-01 execution

curl -O https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.10.1/hadoop-2.10.1.tar.gz
tar xf hadoop-2.10.1.tar.gz -C /opt/
cd /opt/hadoop-2.10.1/etc/hadoop/

Check whether there is a fuser command. If not, install (all nodes)

If the command is used in the sshference mode, it is necessary to ssh connect to the NameNode node that has been down and kill the NameNode process again in order to prevent brain crack This command is used when killing the process. If not, execute the command Hadoop daemon on the master node in the sshence mode The SH stop NameNode node cannot be switched over, let alone the host of the primary node

yum install -y psmisc

Start modifying configuration

If the host name and resolution are different, you need to modify the corresponding contents in the file

Note the yarn site Yarn. XML file nodemanager. resource. Memory MB and yarn nodemanager. resource. CPU vcores parameter, which is used to change the maximum number of CPUs and memory size that can be called by this node. Here, it is set to half of the host memory
In addition to the high availability related configurations, there are other configurations in the configuration, such as enabling log collection and setting the maximum available resources of the node. You can add the configuration separately or directly overwrite all previous configurations. Hadoop version 1.10.1 test is available

hadoop-env.sh insert in the first line, or modify Java in the file_ Home variable

source /etc/profile

Contents of the slave file (delete the original localhost)

# Write all slave node aliases in this file
emr-worker-01
emr-worker-02

core-site.xml file content. Note that the < configuration > in the file should not be repeated

<configuration>
<property>
	<!-- appoint HDFS in NameNode Address of -->
	<name>fs.defaultFS</name>
	<value>hdfs://emr-header-01:9000</value>
</property>
<property>
	<!-- appoint Hadoop The storage directory where files are generated at run time -->
	<name>hadoop.tmp.dir</name>
	<value>/data/hadoop/tmp</value>
</property>
    <!-- Put multiple NameNode The addresses are assembled into a cluster mycluster -->
    <property>
         <name>fs.defaultFS</name>
        <value>hdfs://mycluster</value>
    </property>
<!-- appoint hadoop The storage directory where files are generated at run time -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/data/tmp</value>
    </property>
<!--zookeeper Cluster address-->
    <property>
       <name>ha.zookeeper.quorum</name>
       <value>emr-header-01:2181,emr-header-02:2181,emr-worker-01:2181</value>
     </property>
</configuration>

yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->
<property>
    <!-- Reducer How to get data -->
	<name>yarn.nodemanager.aux-services</name>
	<value>mapreduce_shuffle</value>
</property>
<property>
    <!-- appoint YARN of ResourceManager Address of -->
	<name>yarn.resourcemanager.hostname</name>
	<value>emr-header-01</value>
</property>
<property>
    <!-- Maximum number of applications allowed in this node CPU quantity -->
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>2</value>
</property>
<property>
    <!-- The node allows maximum physical memory calls -->
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>4096</value>
</property>
<property>
    <!-- The maximum number of applications for a single task CPU quantity -->
    <name>yarn.scheduler.maximum-allocation-vcores</name>
    <value>4</value>
</property>
<property>
    <!-- Maximum amount of memory that can be requested for a single task -->
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>3072</value>
</property>
<property>
    <!-- Minimum number of applications for a single task CPU quantity -->
    <name>yarn.scheduler.minimum-allocation-vcores</name>
    <value>1</value>
</property>
<property>
    <!-- Minimum amount of memory available for a single task -->
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>1024</value>
</property>
<property>
	<!-- Log aggregation enabled -->
	<name>yarn.log-aggregation-enable</name>
	<value>true</value>
</property>
<property>
	<!-- Log retention time is set to 7 days -->
	<name>yarn.log-aggregation.retain-seconds</name>
	<value>604800</value>
</property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <!-- Enable resourcemanager ha -->
    <property>
        <name>yarn.resourcemanager.ha.enabled</name>
        <value>true</value>
    </property>
    <!-- Declare two resourcemanager Address of -->
	<property>
		<name>yarn.resourcemanager.cluster-id</name>
		<value>cluster-yarn</value>
	</property>
	<!--appoint resourcemanager Logical list of-->
	<property>
		<name>yarn.resourcemanager.ha.rm-ids</name>
		<value>rm1,rm2</value>
	</property>
	<!-- ========== rm1 Configuration of ========== -->
	<!-- appoint rm1 Host name of -->
	<property>
		<name>yarn.resourcemanager.hostname.rm1</name>
		<value>emr-header-01</value>
	</property>
	<!-- appoint rm1 of web End address -->
	<property>
		<name>yarn.resourcemanager.webapp.address.rm1</name>
		<value>emr-header-01:8088</value>
	</property>
	<!-- appoint rm1 Internal communication address of -->
	<property>
		<name>yarn.resourcemanager.address.rm1</name>
		<value>emr-header-01:8032</value>
	</property>
	<!-- appoint AM towards rm1 Address of the requested resource -->
	<property>
		<name>yarn.resourcemanager.scheduler.address.rm1</name>
		<value>emr-header-01:8030</value>
	</property>
	<!-- Designated for NM Connected address -->
	<property>
		<name>yarn.resourcemanager.resource-tracker.address.rm1</name>
		<value>emr-header-01:8031</value>
	</property>
	<!-- ========== rm2 Configuration of ========== -->
	<!-- appoint rm2 Host name of -->
	<property>
		<name>yarn.resourcemanager.hostname.rm2</name>
		<value>emr-header-02</value>
	</property>
	<property>
		<name>yarn.resourcemanager.webapp.address.rm2</name>
		<value>emr-header-02:8088</value>
	</property>
	<property>
		<name>yarn.resourcemanager.address.rm2</name>
        <value>emr-header-02:8032</value>
	</property>
	<property>
		<name>yarn.resourcemanager.scheduler.address.rm2</name>
		<value>emr-header-02:8030</value>
	</property>
	<property>
		<name>yarn.resourcemanager.resource-tracker.address.rm2</name>
		<value>emr-header-02:8031</value>
	</property>
	<!-- appoint zookeeper Address of the cluster -->
	<property>
		<name>yarn.resourcemanager.zk-address</name>
		<value>emr-header-01:2181,emr-header-02:2181,emr-header-03:2181</value>
	</property>
	<!-- Enable automatic recovery -->
	<property>
		<name>yarn.resourcemanager.recovery.enabled</name>
		<value>true</value>
	</property>
	<!-- appoint resourcemanager The status information of is stored in zookeeper colony -->
	<property>
		<name>yarn.resourcemanager.store.class</name>
		<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
	</property>
	<!-- Inheritance of environment variables -->
	<property>
		<name>yarn.nodemanager.env-whitelist</name>
		<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
	</property>
</configuration>

hdfs-site.xml

<configuration>
<!-- Specify the number of copies -->
	<property>
		<name>dfs.replication</name>
		<value>1</value>
	</property>
<!-- appoint Hadoop Secondary name node host configuration -->
    <property>
		<name>dfs.namenode.secondary.http-address</name>
		<value>emr-header-01:50090</value>
	</property>
<!-- NameNode Data storage directory -->
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file://${hadoop.tmp.dir}/name</value>
    </property>
<!-- DataNode Data storage directory -->
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file://${hadoop.tmp.dir}/data</value>
    </property>
<!-- JournalNode Data storage directory -->
    <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>${hadoop.tmp.dir}/jn</value>
    </property>
<!-- Fully distributed cluster name -->
    <property>
        <name>dfs.nameservices</name>
        <value>mycluster</value>
    </property>
<!-- In cluster NameNode What are the nodes -->
    <property>
        <name>dfs.ha.namenodes.mycluster</name>
        <value>nn1,nn2,nn3</value>
    </property>
<!-- NameNode of RPC mailing address -->
    <property>
        <name>dfs.namenode.rpc-address.mycluster.nn1</name>
        <value>emr-header-01:8020</value>
    </property>
    <property>
        <name>dfs.namenode.rpc-address.mycluster.nn2</name>
        <value>emr-header-02:8020</value>
    </property>
    <property>
        <name>dfs.namenode.rpc-address.mycluster.nn3</name>
        <value>emr-worker-01:8020</value>
    </property>
<!-- NameNode of http mailing address -->
    <property>
        <name>dfs.namenode.http-address.mycluster.nn1</name>
        <value>emr-header-01:9870</value>
    </property>
    <property>
        <name>dfs.namenode.http-address.mycluster.nn2</name>
        <value>emr-header-02:9870</value>
    </property>
    <property>
        <name>dfs.namenode.http-address.mycluster.nn3</name>
        <value>emr-worker-01:9870</value>
    </property>
<!-- appoint NameNode Metadata in JournalNode Storage location on -->
    <property>
        <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://emr-header-01:8485;emr-header-02:8485;emr-worker-01:8485/mycluster</value>
    </property>
<!-- Access proxy class: client Used to determine which NameNode by Active -->
    <property>
        <name>dfs.client.failover.proxy.provider.mycluster</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>
<!-- Configure the isolation mechanism, that is, only one server can respond at the same time -->
    <property>
        <name>dfs.ha.fencing.methods</name>
        <value>shell(/bin/true)</value>
    </property>
<!-- Required when using isolation mechanism ssh Secret key login-->
    <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/root/.ssh/id_dsa</value>
    </property>
<!--open Automatic Failover pattern-->
    <property>
           <name>dfs.ha.automatic-failover.enabled</name>
           <value>true</value>
     </property>
</configuration>

mapred-site.xml, you need to copy a copy of CP mapred site xml. template mapred-site.xml

<configuration>
<property>
    <!-- Specify the number of copies -->
	<name>dfs.replication</name>
	<value>1</value>
</property>
<property>
    <!-- appoint Hadoop Secondary name node host configuration -->
	<name>dfs.namenode.secondary.http-address</name>
	<value>emr-header-01:50090</value>
</property>
</configuration>

Edit / opt / hadoop-2.10.1/sbin/start-yarn.com sh

# start resourceManager
"$bin"/yarn-daemon.sh --config $YARN_CONF_DIR  start resourcemanager
# Insert the following paragraph to start the resource manager process of the emr-header-02 node at the same time
ssh emr-header-02 "$bin"/yarn-daemon.sh --config $YARN_CONF_DIR  start resourcemanager
# start nodeManager
"$bin"/yarn-daemons.sh --config $YARN_CONF_DIR  start nodemanager
# start proxyserver
#"$bin"/yarn-daemon.sh --config $YARN_CONF_DIR  start proxyserver

Edit / opt / hadoop-2.10.1/sbin/stop-yarn sh

# stop resourceManager
"$bin"/yarn-daemon.sh --config $YARN_CONF_DIR  stop resourcemanager
# Insert the following line to stop the task and stop the resource manager process of the emr-header-02 node
ssh emr-header-02 "$bin"/yarn-daemon.sh --config $YARN_CONF_DIR  stop resourcemanager
# stop nodeManager
"$bin"/yarn-daemons.sh --config $YARN_CONF_DIR  stop nodemanager
# stop proxy server
"$bin"/yarn-daemon.sh --config $YARN_CONF_DIR  stop proxyserver

Insert environment variable

echo 'export HADOOP_HOME=/opt/hadoop-2.10.1/' >>/etc/profile
echo 'export PATH=${PATH}:${HADOOP_HOME}/sbin/:${HADOOP_HOME}/bin/' >>/etc/profile
source /etc/profile

Copy to other nodes (emr-header-01 node operation)

# Copy the key to other nodes because DFS is configured above ha. fencing. ssh. The private key files parameter specifies the key file
scp -r /root/.ssh/* emr-header-02:/root/.ssh/
scp /etc/profile emr-header-02:/etc/
scp /etc/profile emr-worker-01:/etc/
scp /etc/profile emr-worker-02:/etc/
scp -r /opt/hadoop-2.10.1/ emr-header-02:/opt/
scp -r /opt/hadoop-2.10.1/ emr-worker-01:/opt/
scp -r /opt/hadoop-2.10.1/ emr-worker-02:/opt/

Start

Refresh environment variables for all nodes

source /etc/profile

The three nodes emr-header-01, emr-header-02 and emr-worker-01 are executed

hadoop-daemon.sh start journalnode

If you want to check the startup status, see the log. The path is the same as / opt / hadoop-2.10.1/logs/hadoop-root-journalnode-emr-header-01 Log, the file names of different nodes are different

emr-header-01 node execution

# If this item is repeated, you will be prompted to enter y
hdfs zkfc -formatZK
hdfs namenode -format
hadoop-daemon.sh start namenode

emr-header-02 and emr-worker-02 nodes execute (synchronize namenode data)

hdfs namenode -bootstrapStandby

emr-header-01 node running

stop-all.sh

Start cluster

start-all.sh

Page access

It is recommended to add hosts to the local computer now. It is also available if it is not added. However, when accessing the resource manager standby node (that is, YARN), it will jump to the host alias of the primary node

20.88.10.31 emr-header-01
20.88.10.32 emr-header-02
20.88.10.33 emr-worker-01
20.88.10.34 emr-worker-02

Accessing the of header1

# HDFS is accessible to every running journalnode
http://20.88.10.31:9870/
http://20.88.10.32:9870/
http://20.88.10.33:9870/

In the first line of HDFS, Overview 'emr-header-01:8020' (active) is the primary node and Overview 'emr-header-02:8020' (standby) is the standby node

# YARN or MapReduce
http://20.88.10.31:8088/

View node status

rm1 and rm2 are specified from the above configuration. rm1 is the emr-header-01 node and rm2 is the emr-header-02 node

[root@emr-header-01 ~]#  yarn rmadmin -getServiceState rm1
standby
[root@emr-header-01 ~]#  yarn rmadmin -getServiceState rm2
active
[root@emr-header-01 ~]#

test

According to the above view, the HDFS master node is emr-header-01, while the Yarn master node is emr-header-02

# Execute emr-header-01 and stop namenode. Note that Hadoop daemon SH instead of Hadoop daemons sh
hadoop-daemon.sh stop namenode
# emr-header-02 node, execute the command to stop the resourceManager
yarn-daemon.sh stop resourcemanager

The Yarn header02 node is no longer accessible, unable to access http://emr-header-01:8088/cluster Available

HDFS header-01 node is inaccessible. Please access header-02 http://emr-header-02:9870/dfshealth.html#tab-overview, showing active status

The stopped service restarts, that is, it automatically becomes the standby node and starts the command (if stop is executed at that node, the corresponding start is also executed at that node):

yarn-daemon.sh start resourcemanager
hadoop-daemon.sh start namenode

When the above tests are available, you can stop the machine of any primary node at will to simulate the fault and check whether it can be switched over dfs. ha. fencing. It's OK to set the methods parameter to shell mode, but the sshence cannot be switched. The reason can be seen at the beginning In contrast, the shell mode takes a long time to switch between active and standby mode, and needs to wait a little longer (within about a minute of the test), while the sshference mode will quickly make up after the active node goes down However, the shell mode can tolerate the exceptions of the host node

Keywords: Java Linux Hadoop hdfs

Added by lou28 on Wed, 19 Jan 2022 17:21:01 +0200

Programming VIP