Construction of High Availability HA

Configure High Availability

1. Install zookeeper

2. Edit zoo_cfg in the conf folder under the installation zookeeper directory

If not, copy zoo_ (add to the ip address of the three machines, create a directory, create myid under the directory, and then add 1,2,3 under each myid, corresponding to the first machine and the second or third machine, respectively)

dataDir=/home/hadoop/apps/zkdata

server.1=192.168.80.10:2888:3888
server.2=192.168.80.11:2888:3888
server.3=192.168.80.12:2888:3888

3. Open the service on each machine

zkServer.sh start

When opening fails, look at etc/hosts to comment on the first line, largely because it or the version does not match.

4. View status

zkServer.sh status

5 Configure all files

vi core-site.xml
<configuration>
	<!-- Appoint hdfs Of nameservice by ns1 -->
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://qf/</value>
	</property>
	<!-- Appoint hadoop Temporary catalogue -->
        <property>
        	<name>hadoop.tmp.dir</name>
        	<value>/home/hadoop/hdpdata</value>
	</property>
					
<!-- Appoint zookeeper address -->
	<property>
		<name>ha.zookeeper.quorum</name>
		<value>mini01:2181,mini02:2181,mini03:2181</value>
	</property>
</configuration>
vi hdfs-site.xml
<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>

  <property>
    <name>dfs.block.size</name>
    <value>134217728</value>
  </property>

  <property>
    <name>dfs.namenode.name.dir</name>
    <value>/home/hadoop/name1</value>
  </property>

  <property>
    <name>dfs.datanode.data.dir</name>
    <value>/home/hadoop/apps/hadoop/data</value>
  </property>



  <!--Appoint hdfs Virtual service name-->
  <property>
    <name>dfs.nameservices</name>
    <value>qf</value>
  </property>

  <!--Appoint hdfs Under the name of virtual service namenode-->
  <property>
    <name>dfs.ha.namenodes.qf</name>
    <value>nn1,nn2</value>
  </property>  
  <!--Appoint namenode Internal communication address-->
  <property>
    <name>dfs.namenode.rpc-address.qf.nn1</name>
    <value>mini01:9000</value>
  </property>
  <property>
    <name>dfs.namenode.rpc-address.qf.nn2</name>
    <value>mini02:9000</value>
  </property>

  <!--Appoint namenode Of web ui Mailing address-->  
  <property>
    <name>dfs.namenode.http-address.qf.nn1</name>
    <value>mini01:50070</value>
  </property>
  <property>
    <name>dfs.namenode.http-address.qf.nn2</name>
    <value>mini02:50070</value>
  </property>

  <!--Appoint journalnode Data Sharing Directory-->
  <property>
    <name>dfs.namenode.shared.edits.dir</name>
    <value>qjournal://mini01:8485;mini02:8485;mini03:8485/qf</value>
  </property>

   <!--Deposit journalnode Local storage directory-->
   <property>
    <name>dfs.journalnode.edits.dir</name>
    <value>/home/hadoop/apps/hadoop/journaldata</value>
  </property>
  <!--open namenode Failure Automatic Switching-->
   <property>
    <name>dfs.ha.automatic-failover.enabled</name>
    <value>true</value>
  </property>

   <!--Appoint namanode Classes that fail to switch automatically-->
   <property>
     <name>dfs.client.failover.proxy.provider.qf</name>
     <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
   </property>
   <!--When specifying a fissure, kill one of the phases in some way.-->
  <!--Prevent multiple namenode,with active(Cerebral fissure)-->
   <property>
    <name>dfs.ha.fencing.methods</name>
    <value>sshfence</value>
  </property>

  <property>
    <name>dfs.ha.fencing.ssh.private-key-files</name>
    <value>/home/hadoop/.ssh/id_rsa</value>
  </property>  
   <property>
    <name>dfs.ha.fencing.ssh.connect-timeout</name>
    <value>30000</value>
  </property>



</configuration>

Building yarn on the basis of high availability HA of hdfs

Configure the running environment
vi mapred-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_67
vi yarn-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_67

##### Configuration Profile
vi mapred-site.xml

<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
</configuration>
vi yarn-site.xml
<configuration>
<!-- open RM High Availability -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!-- Appoint RM Of cluster id -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yrc</value>
</property>
<!-- Appoint RM Name -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!-- Individually designated RM Address -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>mini01</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>mini02</value>
</property>
<!-- Appoint zk Cluster Address -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>mini01:2181,mini02:2181,mini03:2181</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Distribution:

scp command

scp -r hadoop-2.7.7 root@hadoop02:$PWD

scp -r hadoop-2.7.7 root@hadoop03:$PWD

Secret-free:

Secret-free settings between machines

Be confidential between two resourecmanager s

start-up

Start zk
zkServer.sh start

jps view
2674 Jps
2647 QuorumPeerMain

Start the journalnode service
hadoop-daemon.sh start journalnode

jps view
2739 JournalNode
2788 Jps
2647 QuorumPeerMain

Select one of the two namenode s to format
hdfs namenode -format

Then start the namenode, any one
hadoop-daemonsh start namenode

Pull metadata on another namenode

hdfs namenode -bootstrapStandby

Format zkfc (at namenode node)
HDFS zkfc-formatZK, two nodes

Open Services

start-all.sh

Note: Make sure that zookeeper service and journalnode are started each time the service is restarted and all services are restarted.

There are also namenode and zookeeper formatting should not be frequent, try not to initialize after a perfect success.

This question:

Each time the namenode can only start one, and hadoop02 will also hang up after the hadoop01 hangs down. Finally, it is found that the myid under the configuration path of zookeeper is also deleted when the storage path of the namenode and datanode is deleted, which causes the subsequent cluster start failure, and the process of zookeeper can only be forced to kill later. Finally, restart zookeeper and format it to restart the cluster. Note: Be sure to start in sequence.

Keywords: Hadoop Zookeeper xml ssh

Added by EviL_CodE on Thu, 29 Aug 2019 15:35:47 +0300