apache hbase 2.x cluster deployment

Introduction to hbase

Hbase is the abbreviation of Hadoop Database. The Hbase project was initiated by Chad Walters and Jim Kellerman of Powerset at the end of 2006 and designed according to the paper "Bigtable: a distributed storage system for structured data" published by Chang of Google. The first version was released in October 2007. In May 2010, Hbase was upgraded from Hadoop subproject to Apache top-level project.

HBase is a distributed, column oriented open source database (in fact, it is column family oriented). HBase is different from the general relational database. It is a database suitable for unstructured data storage, and is based on column rather than row.

HDFS provides reliable underlying data storage services for Hbase, MapReduce provides high-performance computing power for Hbase, and Zookeeper provides stable services and Failover mechanism for Hbase. Therefore, we say that Hbase is a distributed database solution for high-speed storage and reading of massive data through a large number of cheap machines.

hbase cluster installation

In a distributed configuration, the cluster contains multiple nodes, each running one or more HBase daemons. It includes active and standby Master instances, multiple ZooKeeper nodes and multiple RegionServer nodes.

This advanced QuickStart adds two additional nodes to your cluster. The architecture is as follows:

Node NameMasterZooKeeperRegionServer
hadoop01yesyesyes
hadoop02backupyesyes
hadoop03backupyesyes

To configure password free SSH access, the Hmaster needs to be able to other nodes to start the daemon.

This deployment relies on hadoop clusters. Refer to: https://blog.csdn.net/networken/article/details/116407042

Modify hbase configuration file

modify hbase-env.sh , add JAVA_HOME environment variable, and use the built-in zookeeper of hbase

cat >> $HBASE_HOME/conf/hbase-env.sh <<EOF
export JAVA_HOME=/opt/openjdk
export HBASE_MANAGES_ZK=false
EOF

Modify HBase site XML configuration file

cat > $HBASE_HOME/conf/hbase-site.xml<<EOF
<configuration>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://mycluster:8020/hbase</value>
  </property>
  <property>
    <name>hbase.zookeeper.quorum</name>
    <value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
  </property>
  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/opt/hbase/zk</value>
  </property>
</configuration>
EOF

Modify regionservers

cat > $HBASE_HOME/conf/regionservers <<EOF
hadoop01
hadoop02
hadoop03
EOF

modify

cat > $HBASE_HOME/conf/backup-masters <<EOF
hadoop02
hadoop03
EOF

Start the cluster. On node-a, issue start HBase SH command. Your output will be similar to the output below.

$ bin/start-hbase.sh
node-c.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-c.example.com.out
node-a.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-a.example.com.out
node-b.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-b.example.com.out
starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-node-a.example.com.out
node-c.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-c.example.com.out
node-b.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-b.example.com.out
node-b.example.com: starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-nodeb.example.com.out

ZooKeeper starts first, then the master, then the RegionServers, and finally the backup master.

Verify that the process is running. On each node of the cluster, run the jps command and verify that the processes running on each server are correct. If other Java processes are used for other purposes, you may also see other Java processes running on the server.
node-a``jps output

[root@hadoop01 ~]# jps
8371 Jps
2580 ZooKeeperMain
3013 DFSZKFailoverController
2806 NameNode
2262 QuorumPeerMain
2953 DataNode
6539 HRegionServer
3580 NodeManager
3453 ResourceManager
6333 HMaster
2463 JournalNode

node-b``jps output

[root@hadoop02 ~]# jps
2160 QuorumPeerMain
2355 JournalNode
5619 HRegionServer
2676 DFSZKFailoverController
2964 ResourceManager
7124 Jps
2504 NameNode
3049 NodeManager
5771 HMaster
2620 DataNode

node-c``jps output

[root@hadoop03 ~]# jps
2323 JournalNode
5733 HMaster
5574 HRegionServer
2647 DFSZKFailoverController
2472 NameNode
7032 Jps
3036 ResourceManager
2124 QuorumPeerMain
3101 NodeManager
2590 DataNode

zookeeper process name the HQuorumPeer process is a zookeeper instance controlled and started by HBase. If you use zookeeper in this way, it is limited to one instance per cluster node and is only suitable for testing. If zookeeper runs outside HBase, the process is called QuorumPeer. For more information about zookeeper configuration, including using external zookeeper instances with HBase, see the zookeeper section.

Browse to the Web UI.
The HTTP ports used by HBase Web UI are 16010 of Master and 16030 of RegionServer.
If everything is set up correctly, you should be able to use a Web browser to connect to the Master or the UI of the auxiliary Master

http://node-a.example.com:16010/
(http://node-b.example.com:16010/

Test what happens when a node or service disappears.
With the three node cluster you configure, things won't be very elastic. You can still test the behavior of the Master or RegionServer by killing the relevant processes and viewing the logs.

Keywords: Big Data Hadoop

Added by Sgt.Angel on Mon, 24 Jan 2022 04:30:28 +0200