Environmental Science:
1-core 4G memory 128G disk, three machines in the same configuration
192.168.242.131 192.168.242.132 192.168.242.133
——Linux Centos7 x64 system platform
——JDK, the running environment of components developed based on Java
——The storage layer of Hadoop and HBase data depends on HDFS
——Zookeeper, monitoring and coordination
Other dependencies:
sudo yum install -y net-tools sudo yum install -y vim sudo yum install -y wget sudo yum install -y lrzsz sudo yum install -y pcre pcre-devel sudo yum install -y zlib zlib-devel sudo yum install -y openssl openssl-devel sudo yum install -y unzip sudo yum install -y libtool sudo yum install -y gcc-c++ sudo yum install -y telnet sudo yum install -y tree sudo yum install -y nano sudo yum install -y psmisc sudo yum install -y rsync sudo yum install -y ntp
The JDK version is directly installed using yum
sudo yum install -y java-1.8.0-openjdk-devel.x86_64
There are no restrictions on the version of Zookeeper, just any. Installation reference:
https://www.cnblogs.com/mindzone/p/15468883.html
There are differences between the deployment of version 3 and version 2 of Hadoop. Here, the cluster deployment of version 2 of Hadoop is written separately
The version of Hbase should match Hadoop, which is a deployment pit
There is no problem with Hbase 1.3.1 matching Hadoop 2.7.2
Hadoop 2.7.2} installation
Unit 1 131 downloads compressed package
wget https://archive.apache.org/dist/hadoop/common/hadoop-2.7.2/hadoop-2.7.2.tar.gz
Extract to the specified directory
mkdir -p /opt/module tar -zxvf hadoop-2.7.2.tar.gz -C /opt/module/
Configure environment variables for Hadoop and JDK
vim /etc/profile
Add variable information at the end (other machines also add it)
# HADOOP_HOME export HADOOP_HOME=/opt/module/hadoop-2.7.2 export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin # JAVA_HOME here to see their own specific jdk version, do not directly cv, use find / -name java to find it export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.322.b06-1.el7_9.x86_64 export PATH=$PATH:$JAVA_HOME/bin
Make variables effective immediately:
source /etc/profile
Then test whether the variable setting is valid
hadoop version
Success will show the information of hadoop
[root@localhost ~]# hadoop version Hadoop 2.7.2 Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r b165c4fe8a74265c792ce23f546c64604acf0e41 Compiled by jenkins on 2016-01-26T00:08Z Compiled with protoc 2.5.0 From source with checksum d0fda26633fa762bff87ec759ebe689c This command was run using /opt/module/hadoop-2.7.2/share/hadoop/common/hadoop-common-2.7.2.jar
Backup profile:
# Profile backup cd /opt/module/hadoop-2.7.2/etc/hadoop/ cp -r core-site.xml core-site.xml.bak cp -r hadoop-env.sh hadoop-env.sh.bak cp -r hdfs-site.xml hdfs-site.xml.bak cp -r mapred-env.sh mapred-env.sh.bak cp -r mapred-site.xml mapred-site.xml.bak cp -r yarn-env.sh yarn-env.sh.bak cp -r yarn-site.xml yarn-site.xml.bak
core-site.xml
Declare the address of the master node
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <!-- appoint HDFS in NameNode Address of --> <property> <name>fs.defaultFS</name> <value>hdfs://192.168.242.131:9000</value> </property> <!-- appoint Hadoop Storage directory of files generated at run time --> <property> <name>hadoop.tmp.dir</name> <value>/opt/module/hadoop-2.7.2/data/tmp</value> </property> </configuration>
hadoop-env.sh
Just declare the JDK location
# The java implementation to use. # export JAVA_HOME=${JAVA_HOME} export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.322.b06-1.el7_9.x86_64
hdfs-site.xml
Define the number of replicas and the address of the secondary node
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>3</value> </property> <!-- appoint Hadoop Secondary name node host configuration --> <property> <name>dfs.namenode.secondary.http-address</name> <value>192.168.242.133:50090</value> </property> </configuration>
mapred-env.sh
Declare JDK path
# export JAVA_HOME=/home/y/libexec/jdk1.6.0/ export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.322.b06-1.el7_9.x86_64
marpred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <!-- appoint MR Run in Yarn upper --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
yarn-env.sh can not be changed. The script directly obtains $JAVA_HOME
yarn-site.xml
Specify the Explorer address
<?xml version="1.0"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <configuration> <!-- Site specific YARN configuration properties --> <!-- Reducer How to get data --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-- appoint YARN of ResourceManager Address of --> <property> <name>yarn.resourcemanager.hostname</name> <value>192.168.242.132</value> </property> </configuration>
Configure cluster node address
vim /opt/module/hadoop-2.7.2/etc/hadoop/slaves
Write down all machine addresses, but don't leave more spaces and blank lines
192.168.242.131 192.168.242.132 192.168.242.133
Then distribute Hadoop to the remaining machines
# xsync script xsync /opt/module/hadoop-2.7.2 # No xsync script, copy with scp scp /opt/module/hadoop-2.7.2 root@192.168.242.132:/opt/module/ scp /opt/module/hadoop-2.7.2 root@192.168.242.133:/opt/module/
HDFS needs to be formatted for the first startup
hdfs namenode -format
If you need to format again, clear the data in the data directory first
rm -rf /opt/module/hadoop-2.7.2/data hdfs namenode -format
Cluster deployment completed!
Hadoop cluster startup:
# Unit 1 starts $HADOOP_HOME/sbin/start-dfs.sh # Unit 2 starts $HADOOP_HOME/sbin/start-yarn.sh
Execution information of unit 1:
[root@192 ~]# $HADOOP_HOME/sbin/start-dfs.sh Starting namenodes on [192.168.242.131] 192.168.242.131: starting namenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-root-namenode-192.168.242.131.out 192.168.242.131: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-root-datanode-192.168.242.131.out 192.168.242.133: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-root-datanode-192.168.242.133.out 192.168.242.132: starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-root-datanode-192.168.242.132.out Starting secondary namenodes [192.168.242.133] 192.168.242.133: starting secondarynamenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-root-secondarynamenode-192.168.242.133.out [root@192 ~]#
Execution information of unit 2:
[root@192 ~]# $HADOOP_HOME/sbin/start-yarn.sh starting yarn daemons starting resourcemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-root-resourcemanager-192.168.242.132.out 192.168.242.133: starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-root-nodemanager-192.168.242.133.out 192.168.242.131: starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-root-nodemanager-192.168.242.131.out 192.168.242.132: starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-root-nodemanager-192.168.242.132.out [root@192 ~]#
Keep Zookeeper running:
[root@192 ~]# zk-cluster status ---------- zookeeper 192.168.242.131 state ------------ /usr/bin/java ZooKeeper JMX enabled by default Using config: /opt/module/apache-zookeeper-3.7.0/bin/../conf/zoo.cfg Client port found: 2181. Client address: localhost. Client SSL: false. Mode: follower ---------- zookeeper 192.168.242.132 state ------------ /usr/bin/java ZooKeeper JMX enabled by default Using config: /opt/module/apache-zookeeper-3.7.0/bin/../conf/zoo.cfg Client port found: 2181. Client address: localhost. Client SSL: false. Mode: leader ---------- zookeeper 192.168.242.133 state ------------ /usr/bin/java ZooKeeper JMX enabled by default Using config: /opt/module/apache-zookeeper-3.7.0/bin/../conf/zoo.cfg Client port found: 2181. Client address: localhost. Client SSL: false. Mode: follower [root@192 ~]#
Hbase cluster installation:
Compressed package downloaded by unit 1:
wget https://dlcdn.apache.org/hbase/stable/hbase-2.4.9-bin.tar.gz
Unpack to the specified directory
tar -zxvf hbase-2.4.9-bin.tar.gz -C /opt/module/
Backup profile
cp -r /opt/module/hbase-2.4.9/conf/hbase-env.sh /opt/module/hbase-2.4.9/conf/hbase-env.sh.bak cp -r /opt/module/hbase-2.4.9/conf/hbase-site.xml /opt/module/hbase-2.4.9/conf/hbase-site.xml.bak cp -r /opt/module/hbase-2.4.9/conf/regionservers /opt/module/hbase-2.4.9/conf/regionservers.bak
hbase-env.sh
Append environment variable
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.322.b06-1.el7_9.x86_64 export HBASE_MANAGES_ZK=false
hbase-site.xml
1. Note that the HDFS ports of rootdir and hadoop are the same. (in this article, use Ctrl + F to find 9000)
2. The datadir of Zookeeper writes out the path set by itself, otherwise the shell of Hbase cannot be found zk during execution
<property> <name>hbase.rootdir</name> <value>hdfs://192.168.242.131:9000/HBase</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <!-- 0.98 New changes after, not in the previous version.port,The default port is 60000 --> <property> <name>hbase.master.port</name> <value>16000</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>192.168.242.131,192.168.242.132,192.168.242.133</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <!-- designated zk catalogue --> <value>/opt/module/apache-zookeeper-3.7.0/zk-data</value> </property>
regionservers
No spaces or blank lines
192.168.242.131 192.168.242.132 192.168.242.133
Soft link Hadoop configuration file
ln -s /opt/module/hadoop-3.3.1/etc/hadoop/core-site.xml /opt/module/hbase-2.4.9/conf/core-site.xml ln -s /opt/module/hadoop-3.3.1/etc/hadoop/hdfs-site.xml /opt/module/hbase-2.4.9/conf/hdfs-site.xml
Hbase arrives here, the installation of unit 1 is completed, and then distributed to the rest of the machines
# xsync script xsync /opt/module/hbase-1.3.1 # No xsync script, copy with scp scp /opt/module/hbase-1.3.1 root@192.168.242.132:/opt/module/ scp /opt/module/hbase-1.3.1 root@192.168.242.133:/opt/module/
Server time synchronization:
This problem will report an error when the Hbase shell checks the status status
Error Description: the primary node is uninitialized, which does not affect the use, but the reason is that the cluster time is inconsistent
reference resources:
https://blog.csdn.net/renzhewudi77/article/details/86301395
The solution is to synchronize at the same time
Take unit 1 as the unified time standard and make the time server
Install ntp service (execute it once whether there is one or not)
sudo yum install -y ntp
First edit the ntp configuration of unit 1
vim /etc/ntp.conf
Main contents:
# Change to your own network segment. For example, my network segment is 242 # (authorization 192168.1.0-192.168.1.255 All machines on the network segment can query and synchronize time from this machine) # restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap by restrict 192.168.242.0 mask 255.255.255.0 nomodify notrap # Note the information below (Cluster in LAN without using other Internet time) server 0.centos.pool.ntp.org iburst server 1.centos.pool.ntp.org iburst server 2.centos.pool.ntp.org iburst server 3.centos.pool.ntp.org iburst For this reason: # server 0.centos.pool.ntp.org iburst # server 1.centos.pool.ntp.org iburst # server 2.centos.pool.ntp.org iburst # server 3.centos.pool.ntp.org iburst # Standby local time provides synchronization (When the node loses its network connection, it can still use the local time as the time server to provide time synchronization for other nodes in the cluster) # Enable writing of statistics records. #statistics clockstats cryptostats loopstats peerstats server 127.127.1.0 fudge 127.127.1.0 stratum 10
System ntp configuration modification
vim /etc/sysconfig/ntpd
Add configuration item:
# add to the content #(synchronize hardware time with system time) SYNC_HWCLOCK=yes
The rest is the running processing of ntp
# View ntp status
service ntpd status
# Service startup
service ntpd start
# Power on self start
chkconfig ntpd on
Other machines only need to set a scheduled task to synchronize with the time server
# Write scheduled tasks (machines other than machine 1 execute, and machine 1 acts as a time server) crontab -e
# Write in the task editing script: # (other machines are configured to synchronize with the time server once every 10 minutes) */10 * * * * /usr/sbin/ntpdate 192.168.242.131
Start and stop Hbase cluster
/opt/module/hbase-2.4.9/bin/start-hbase.sh /opt/module/hbase-2.4.9/bin/stop-hbase.sh
start-up
[root@192 ~]# /opt/module/hbase-1.3.1/bin/start-hbase.sh OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N starting master, logging to /opt/module/hbase-1.3.1/bin/../logs/hbase-root-master-192.168.242.131.out OpenJDK 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N 192.168.242.133: starting regionserver, logging to /opt/module/hbase-1.3.1/bin/../logs/hbase-root-regionserver-192.168.242.133.out 192.168.242.132: starting regionserver, logging to /opt/module/hbase-1.3.1/bin/../logs/hbase-root-regionserver-192.168.242.132.out 192.168.242.131: starting regionserver, logging to /opt/module/hbase-1.3.1/bin/../logs/hbase-root-regionserver-192.168.242.131.out 192.168.242.133: OpenJDK 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 192.168.242.133: OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 192.168.242.133: OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N 192.168.242.132: OpenJDK 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 192.168.242.132: OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 192.168.242.132: OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N 192.168.242.131: OpenJDK 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0 192.168.242.131: OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0 192.168.242.131: OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N [root@192 ~]#
Access the Shell of Hbase and check whether it works normally
# visit /opt/module/hbase-2.4.9/bin/hbase shell
View status
status
Output information:
hbase(main):002:0> status 1 active master, 0 backup masters, 3 servers, 1 dead, 1.0000 average load
Access address view:
http://192.168.242.131:16010