Configure High Availability
1. Install zookeeper
2. Edit zoo_cfg in the conf folder under the installation zookeeper directory
If not, copy zoo_ (add to the ip address of the three machines, create a directory, create myid under the directory, and then add 1,2,3 under each myid, corresponding to the first machine and the second or third machine, respectively)
dataDir=/home/hadoop/apps/zkdata
server.1=192.168.80.10:2888:3888
server.2=192.168.80.11:2888:3888
server.3=192.168.80.12:2888:3888
3. Open the service on each machine
zkServer.sh start
When opening fails, look at etc/hosts to comment on the first line, largely because it or the version does not match.
4. View status
zkServer.sh status
5 Configure all files
vi core-site.xml
<configuration> <!-- Appoint hdfs Of nameservice by ns1 --> <property> <name>fs.defaultFS</name> <value>hdfs://qf/</value> </property> <!-- Appoint hadoop Temporary catalogue --> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/hdpdata</value> </property> <!-- Appoint zookeeper address --> <property> <name>ha.zookeeper.quorum</name> <value>mini01:2181,mini02:2181,mini03:2181</value> </property> </configuration>
vi hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.block.size</name> <value>134217728</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/home/hadoop/name1</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/hadoop/apps/hadoop/data</value> </property> <!--Appoint hdfs Virtual service name--> <property> <name>dfs.nameservices</name> <value>qf</value> </property> <!--Appoint hdfs Under the name of virtual service namenode--> <property> <name>dfs.ha.namenodes.qf</name> <value>nn1,nn2</value> </property> <!--Appoint namenode Internal communication address--> <property> <name>dfs.namenode.rpc-address.qf.nn1</name> <value>mini01:9000</value> </property> <property> <name>dfs.namenode.rpc-address.qf.nn2</name> <value>mini02:9000</value> </property> <!--Appoint namenode Of web ui Mailing address--> <property> <name>dfs.namenode.http-address.qf.nn1</name> <value>mini01:50070</value> </property> <property> <name>dfs.namenode.http-address.qf.nn2</name> <value>mini02:50070</value> </property> <!--Appoint journalnode Data Sharing Directory--> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://mini01:8485;mini02:8485;mini03:8485/qf</value> </property> <!--Deposit journalnode Local storage directory--> <property> <name>dfs.journalnode.edits.dir</name> <value>/home/hadoop/apps/hadoop/journaldata</value> </property> <!--open namenode Failure Automatic Switching--> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <!--Appoint namanode Classes that fail to switch automatically--> <property> <name>dfs.client.failover.proxy.provider.qf</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <!--When specifying a fissure, kill one of the phases in some way.--> <!--Prevent multiple namenode,with active(Cerebral fissure)--> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/hadoop/.ssh/id_rsa</value> </property> <property> <name>dfs.ha.fencing.ssh.connect-timeout</name> <value>30000</value> </property> </configuration>
Building yarn on the basis of high availability HA of hdfs
Configure the running environment
vi mapred-env.sh export JAVA_HOME=/usr/java/jdk1.7.0_67 vi yarn-env.sh export JAVA_HOME=/usr/java/jdk1.7.0_67 ##### Configuration Profile
vi mapred-site.xml <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
vi yarn-site.xml
<configuration> <!-- open RM High Availability --> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <!-- Appoint RM Of cluster id --> <property> <name>yarn.resourcemanager.cluster-id</name> <value>yrc</value> </property> <!-- Appoint RM Name --> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <!-- Individually designated RM Address --> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>mini01</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>mini02</value> </property> <!-- Appoint zk Cluster Address --> <property> <name>yarn.resourcemanager.zk-address</name> <value>mini01:2181,mini02:2181,mini03:2181</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
Distribution:
scp command
scp -r hadoop-2.7.7 root@hadoop02:$PWD
scp -r hadoop-2.7.7 root@hadoop03:$PWD
Secret-free:
Secret-free settings between machines
Be confidential between two resourecmanager s
start-up
Start zk
zkServer.sh start
jps view
2674 Jps
2647 QuorumPeerMain
Start the journalnode service
hadoop-daemon.sh start journalnode
jps view
2739 JournalNode
2788 Jps
2647 QuorumPeerMain
Select one of the two namenode s to format
hdfs namenode -format
Then start the namenode, any one
hadoop-daemonsh start namenode
Pull metadata on another namenode
hdfs namenode -bootstrapStandby
Format zkfc (at namenode node)
HDFS zkfc-formatZK, two nodes
Open Services
start-all.sh
Note: Make sure that zookeeper service and journalnode are started each time the service is restarted and all services are restarted.
There are also namenode and zookeeper formatting should not be frequent, try not to initialize after a perfect success.
This question:
Each time the namenode can only start one, and hadoop02 will also hang up after the hadoop01 hangs down. Finally, it is found that the myid under the configuration path of zookeeper is also deleted when the storage path of the namenode and datanode is deleted, which causes the subsequent cluster start failure, and the process of zookeeper can only be forced to kill later. Finally, restart zookeeper and format it to restart the cluster. Note: Be sure to start in sequence.