Introduction to hbase
Hbase is the abbreviation of Hadoop Database. The Hbase project was initiated by Chad Walters and Jim Kellerman of Powerset at the end of 2006 and designed according to the paper "Bigtable: a distributed storage system for structured data" published by Chang of Google. The first version was released in October 2007. In May 2010, Hbase was upgraded from Hadoop subproject to Apache top-level project.
HBase is a distributed, column oriented open source database (in fact, it is column family oriented). HBase is different from the general relational database. It is a database suitable for unstructured data storage, and is based on column rather than row.
HDFS provides reliable underlying data storage services for Hbase, MapReduce provides high-performance computing power for Hbase, and Zookeeper provides stable services and Failover mechanism for Hbase. Therefore, we say that Hbase is a distributed database solution for high-speed storage and reading of massive data through a large number of cheap machines.
hbase cluster installation
In a distributed configuration, the cluster contains multiple nodes, each running one or more HBase daemons. It includes active and standby Master instances, multiple ZooKeeper nodes and multiple RegionServer nodes.
This advanced QuickStart adds two additional nodes to your cluster. The architecture is as follows:
Node Name | Master | ZooKeeper | RegionServer |
---|---|---|---|
hadoop01 | yes | yes | yes |
hadoop02 | backup | yes | yes |
hadoop03 | backup | yes | yes |
To configure password free SSH access, the Hmaster needs to be able to other nodes to start the daemon.
This deployment relies on hadoop clusters. Refer to: https://blog.csdn.net/networken/article/details/116407042
Modify hbase configuration file
modify hbase-env.sh , add JAVA_HOME environment variable, and use the built-in zookeeper of hbase
cat >> $HBASE_HOME/conf/hbase-env.sh <<EOF export JAVA_HOME=/opt/openjdk export HBASE_MANAGES_ZK=false EOF
Modify HBase site XML configuration file
cat > $HBASE_HOME/conf/hbase-site.xml<<EOF <configuration> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.rootdir</name> <value>hdfs://mycluster:8020/hbase</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value> </property> <property> <name>hbase.zookeeper.property.dataDir</name> <value>/opt/hbase/zk</value> </property> </configuration> EOF
Modify regionservers
cat > $HBASE_HOME/conf/regionservers <<EOF hadoop01 hadoop02 hadoop03 EOF
modify
cat > $HBASE_HOME/conf/backup-masters <<EOF hadoop02 hadoop03 EOF
Start the cluster. On node-a, issue start HBase SH command. Your output will be similar to the output below.
$ bin/start-hbase.sh node-c.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-c.example.com.out node-a.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-a.example.com.out node-b.example.com: starting zookeeper, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-zookeeper-node-b.example.com.out starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-node-a.example.com.out node-c.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-c.example.com.out node-b.example.com: starting regionserver, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-regionserver-node-b.example.com.out node-b.example.com: starting master, logging to /home/hbuser/hbase-0.98.3-hadoop2/bin/../logs/hbase-hbuser-master-nodeb.example.com.out
ZooKeeper starts first, then the master, then the RegionServers, and finally the backup master.
Verify that the process is running. On each node of the cluster, run the jps command and verify that the processes running on each server are correct. If other Java processes are used for other purposes, you may also see other Java processes running on the server.
node-a``jps output
[root@hadoop01 ~]# jps 8371 Jps 2580 ZooKeeperMain 3013 DFSZKFailoverController 2806 NameNode 2262 QuorumPeerMain 2953 DataNode 6539 HRegionServer 3580 NodeManager 3453 ResourceManager 6333 HMaster 2463 JournalNode
node-b``jps output
[root@hadoop02 ~]# jps 2160 QuorumPeerMain 2355 JournalNode 5619 HRegionServer 2676 DFSZKFailoverController 2964 ResourceManager 7124 Jps 2504 NameNode 3049 NodeManager 5771 HMaster 2620 DataNode
node-c``jps output
[root@hadoop03 ~]# jps 2323 JournalNode 5733 HMaster 5574 HRegionServer 2647 DFSZKFailoverController 2472 NameNode 7032 Jps 3036 ResourceManager 2124 QuorumPeerMain 3101 NodeManager 2590 DataNode
zookeeper process name the HQuorumPeer process is a zookeeper instance controlled and started by HBase. If you use zookeeper in this way, it is limited to one instance per cluster node and is only suitable for testing. If zookeeper runs outside HBase, the process is called QuorumPeer. For more information about zookeeper configuration, including using external zookeeper instances with HBase, see the zookeeper section.
Browse to the Web UI.
The HTTP ports used by HBase Web UI are 16010 of Master and 16030 of RegionServer.
If everything is set up correctly, you should be able to use a Web browser to connect to the Master or the UI of the auxiliary Master
http://node-a.example.com:16010/ (http://node-b.example.com:16010/
Test what happens when a node or service disappears.
With the three node cluster you configure, things won't be very elastic. You can still test the behavior of the Master or RegionServer by killing the relevant processes and viewing the logs.