HDFS - operation and maintenance

1, Add node

Operating system configuration: ① host name, network, firewall, ssh configuration ssh-keygen -t rsa At the same time, the auth*-keys file of ssh of any node in the cluster can be distributed to the latest node
Add the domain name mapping of this node in the / etc/hosts file of all nodes
Copy the configuration file of namenode to this node Package jdk through tar cf and transfer it with hadoop directory through scp scp the environment variable file, such as the path of JAVA_HOME and HADOOP_HOME
Modify the slave files of all master nodes to add the node. No need to refresh the slaves file
Start the datanode and nodemanager on this node separately hadoop-daemon.sh start datanode yarn-daemon.sh start nodemanager After startup, check the status of the cluster through HDFS dfsadmin - repor yarm rmadmin
Data load balancing (avoid data hot spots caused by data writing to this node) ① Try to perform this operation on the datanode node ② It is better to limit the bandwidth of balance and specify several servers before execution Run start-balancer.sh - threshold 5 (the difference between the utilization of a single node and the average)
Adjust the number of copies of a node (optional) Adjust the value of dfs.replication in the hdfs-site.xml file Scan and confirm whether the file is normal through HDFS fsck / The adjusted number of copies only affects the later added files You can also use HDFS DFS - setrep - w 3 - R / Lin (the number of copies can be greater than the number of nodes, do not operate on the root directory)

2, Node offline

Node hardware failure or task running at this node is difficult

Make sure that no tasks are running before you go offline, otherwise you will go offline very slowly

Modify the hdfs-site.xml configuration of the Master node, and add the dfs.hosts.exclude parameter
Modify the yarn-site.xml configuration of the Master node, and add yarn.resourcemanager.nodes.exclude-path parameter
Modify the mapred-site.xml configuration of the Master node, and add mapreduce.jobtracker.hosts.exclude.filename parameter
Create a new exclude file and add the host name to be deleted (the first three parameters point to this address)
Execute refreshNodes for configuration to take effect hdfs dfsadmin -refreshNodes yarn rmadmin -refreshNodes

3, Federation

Background: a single Namenode is overloaded

distribution Avoid mutual interference

Assign a single Namenode to manage HBase directory
Assign a Namenode to manage the directory of Hive
Other namenodes are allocated and managed by business or department

Configuration steps

core-site.xml

Use the viewfs protocol to configure the NS logical name of the service provided outside the heap of Federation cluster

Modify the core-site.xml configuration

  Add configuration point cmt.xml  
<configuration  xmlns:xi="http://www.w3.org/2001/XInclude">
  <xi:include  href="cmt.xml"/>
  <property>
    <name>fs.defaultFS</name>
    <value>viewfs://nsX</value>
  </property>

  <property>
    <name>hadoop.tmp.dir</name>
    <value>$HADOOP_HOME/tmp</value>
  </property>

  <property>
    <name>dfs.journalnode.edits.dir</name>
    <value>$HADOOP_HOME/journalnode/data</value>
  </property>

  <property>
    <name>ha.zookeeper.quorum</name>
    <value>slave1:2181,slave2:2181,hadoop04:2181</value>
  </property>
</configuration>

cmt.xml Configure the mapping relationship between virtual path and physical path

//Mount the / view? NS1 directory in the cluster to the root directory in namenode1
//That is to say, the operation of files under / view ﹣ NS1 is only related to namenode1
<property>
  <name>fs.viewfs.mounttable.nsX.link./view_ns1</name>
  <value>hdfs://NS1 / < / value > / / according to the plan, the master is namenode1
</property>

<property>
  <name>fs.viewfs.mounttable.nsX.link./view_ns2</name>
  <value>hdfs://Ns2/</value> / / according to the plan, the master is namenode2
</property>

hdfs-site.xml Configure two naming services as well as rpc and http service addresses

<property>
  <name>dfs.nameservices</name>
  <value>ns1,ns2</value>
</property>

## Configure ha of nameservice1
<property>
  <name>dfs.ha.namenodes.ns1</name>
  <value>nn1,nn2</value>
</property>

## Configure ha of nameservice2
<property>
  <name>dfs.ha.namenodes.ns2</name>
  <value>nn3,nn4</value>
</property>

## Configure the RPC communication ports of these four namenode s
<property>
  <name>dfs.namenode.rpc-address.ns1.nn1</name>
  <value>master:9000</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.ns2.nn3</name>
  <value>slave1:9000</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.ns1.nn2</name>
  <value>hadoop04:9000</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.ns2.nn4</name>
  <value>slave2:9000</value>
</property>

## Configure the HTTP communication ports of the four namenode
<property>
  <name>dfs.namenode.http-address.ns1.nn1</name>
  <value>master:50070</value>
</property>
<property>
  <name>dfs.namenode.http-address.ns2.nn3</name>
  <value>slave1:50070</value>
</property>
<property>
  <name>dfs.namenode.http-address.ns1.nn2</name>
  <value>hadoop04:50070</value>
</property>
<property>
  <name>dfs.namenode.http-address.ns2.nn4</name>
  <value>slave2:50070</value>
</property>

## Configure the storage address of edit log of four namenode
<property>
  <name>dfs.namenode.shared.edits.dir</name>
  ## nn1 and nn2 of ns1 use this value
  <value>qjournal://hadoop04:8485;slave1:8485;slave2:8485/ns1</value>
  ## nn3 and nn4 of ns2 use this value
  <value>qjournal://hadoop04:8485;slave1:8485;slave2:8485/ns2</value>
</property>    

## Configuration error recovery
<property>
  <name>dfs.client.failover.proxy.provider.ns1</name>
 <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfigureFailoverProxyProvider</value>
</property>
<property>
  <name>dfs.client.failover.proxy.provider.ns2</name>
 <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfigureFailoverProxyProvider</value>
</property>

## Configure fence of ha
 <property>
  <name>dfs.ha.fencing.methods</name>
  <value>sshfence</value>
</property>
 <property>
  <name>dfs.ha.fencing.ssh.private-key-files</name>
  <value>/home/hadoop/.ssh/id_rsa</value>
</property>
 <property>
  <name>dfs.ha.fencing.ssh.connect-timeout</name>
  <value>30000</value>
</property>
 <property>
  <name>dfs.ha.automatic-failover.enabled</name>
  <value>true</value>
</property>

Create the corresponding physical path
Start service
- Synchronize all configuration files under etc/hadoop to all namenodes
- Each Namenode needs to be formatted separately and the same cluster id needs to be specified when formatting hdfs namenode -format -clusterid hd260
- Format zookeeper Execute this command on all namenode nodes hdfs zkfc -formatZK
- Format the journalnode node hdfs namenode -initializeSharedEdits
- Specify from server Now multiple master servers start namenode separately, and execute instructions on all slave servers hdfs namenode -bootstrapStandby
Visit hdfs

Directly through hfds dfs-ls/
Specify namenode access: HDFS DFS - LS hdfs://slave1:9000/

4, Features

which java/which hadoop view these installation directories
du -ms * du -h0 free
yarn rmadmin -refreshNodes to refresh nodes???
hdfs dfs -ls -R /lin can recursively list all files in this directory
Check the status of cluster nodes through HDFS dfsadmin - repor HDFS fsck / check whether the files are normal
Specify which hdfs services to stop Sequential execution hadoop-daemons.sh --hostnames 'slave1 slave2' stop datanode hadoop-daemons.sh --hostnames 'master hdoop04' stop namenode hadoop-daemons.sh --hostnames 'slave1 slave2 hdoop04' stop journalnode hadoop-daemon.sh stop zkfc

start-up hadoop-daemons.sh --hostnames 'slave1 slave2' start namenode Hadoop daemons.sh start datanode / / by default, all

hadoop-daemons.sh --hostnames 'slave1 slave2 slave*' start zkfc

5, Summary

Record the common operation and maintenance operations of hadoop for the convenience of future maintenance work

Keywords: Big Data Hadoop xml ssh NodeManager

Added by Rayman3.tk on Wed, 06 May 2020 02:22:05 +0300

Programming VIP