HBase-Shell-Data Structure-Principle

Chapter 3 HBase Shell operations

3.1 Basic Operations

1. Enter the HBase client command line

[lxl@hadoop102 hbase]$ bin/hbase shell

2. View Help Commands

hbase(main):001:0> help

3. See which tables are in the current database

hbase(main):001:0> list
TABLE                                                                                                                                                                                                            
0 row(s) in 0.2530 seconds

=> []

Operation of Table 3.2

1. Create tables

hbase(main):003:0> create 'student','info'
0 row(s) in 1.3670 seconds

=> Hbase::Table - student

2. Insert data into tables

hbase(main):003:0> put 'student','1001','info:sex','male'
hbase(main):004:0> put 'student','1001','info:age','18'
hbase(main):005:0> put 'student','1002','info:name','Janna'
hbase(main):006:0> put 'student','1002','info:sex','female'
hbase(main):007:0> put 'student','1002','info:age','20'

3. Scanning View Table Data

hbase(main):010:0>  scan 'student'
ROW                                                   COLUMN+CELL                                                                                                                                                
 1001                                                 column=info:name, timestamp=1560293882038, value=xiannv                                                                                                    
 1001                                                 column=info:sex, timestamp=1560293938473, value=female                                                                                                     
 1002                                                 column=info:name, timestamp=1560293958558, value=xiantong                                                                                                  
 1002                                                 column=info:sex, timestamp=1560293983958, value=male

hbase(main):014:0> scan 'student',{STARTROW=>'1001',STOPROW=>'1003'}
ROW                                                   COLUMN+CELL                                                                                                                                                
 1001                                                 column=info:name, timestamp=1560293882038, value=xiannv                                                                                                    
 1001                                                 column=info:sex, timestamp=1560293938473, value=female                                                                                                     
 1002                                                 column=info:name, timestamp=1560293958558, value=xiantong                                                                                                  
 1002                                                 column=info:sex, timestamp=1560293983958, value=male

hbase(main):015:0> scan 'student',{STARTROW => '1001'}
ROW                                                   COLUMN+CELL                                                                                                                                                
 1001                                                 column=info:name, timestamp=1560293882038, value=xiannv                                                                                                    
 1001                                                 column=info:sex, timestamp=1560293938473, value=female                                                                                                     
 1002                                                 column=info:name, timestamp=1560293958558, value=xiantong                                                                                                  
 1002                                                 column=info:sex, timestamp=1560293983958, value=male                                                                                                       
 1003                                                 column=info:sex, timestamp=1560294341536, value=male

4. View table structure

hbase(main):016:0> describe 'student'
Table student is ENABLED                                                                                                                                                                                         
student                                                                                                                                                                                                          
COLUMN FAMILIES DESCRIPTION                                                                                                                                                                                      
{NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCA
CHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}

5. Update data for specified fields

hbase(main):012:0> put 'student','1001','info:name','Nick'
hbase(main):013:0> put 'student','1001','info:age','100'

6. View data for "Specified rows" or "Specified column families: columns"

hbase(main):011:0> get 'student','1001'
COLUMN                                                CELL                                                                                                                                                       
 info:name                                            timestamp=1560293882038, value=xiannv                                                                                                                      
 info:sex                                             timestamp=1560293938473, value=female

hbase(main):012:0> get 'student','1001','info:name'
COLUMN                                                CELL                                                                                                                                                       
 info:name                                            timestamp=1560293882038, value=xiannv

7. Number of rows in statistical tables

hbase(main):021:0> count 'student'

8. Delete data

Delete all data of a rowkey:

hbase(main):016:0> deleteall 'student','1001'

Delete a column of data for a rowkey:

hbase(main):017:0> delete 'student','1002','info:sex'

9. Empty table data

hbase(main):018:0> truncate 'student'

Tip: The order of emptying the table is disable first, then truncate.

10. Delete tables

First, you need to make the table disable:

hbase(main):019:0> disable 'student'

Then drop the table:

hbase(main):020:0> drop 'student'

Tip: If you drop the table directly, you will report an error: ERROR: Table student is enabled. Disable it first.

11. Change Table Information

Store the data in the info column family in three versions:

hbase(main):022:0> alter 'student',{NAME=>'info',VERSIONS=>3}
hbase(main):022:0> get 'student','1001',{COLUMN=>'info:name',VERSIONS=>3}

Operation:

hbase(main):018:0> alter 'student', {NAME=>'info',VERSIONS=>3}
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 1.9710 seconds

hbase(main):019:0> describe 'student'
Table student is ENABLED                                                                                                                                                                                         
student                                                                                                                                                                                                          
COLUMN FAMILIES DESCRIPTION                                                                                                                                                                                      
{NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '3', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCA
CHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                                                                                                                                   
1 row(s) in 0.0150 seconds

hbase(main):020:0> put 'student','1001','info:name','guangshen'
0 row(s) in 0.0080 seconds

hbase(main):021:0> get 'student','1001','info:name'
COLUMN                                                CELL                                                                                                                                                       
 info:name                                            timestamp=1560295375184, value=guangshen                                                                                                                   
1 row(s) in 0.0170 seconds

hbase(main):022:0> get 'student','1001',{COLUMN=>'info:name',VERSIONS=>3}
COLUMN                                                CELL                                                                                                                                                       
 info:name                                            timestamp=1560295375184, value=guangshen                                                                                                                   
 info:name                                            timestamp=1560293882038, value=xiannv                                                                                                                      
1 row(s) in 0.0480 seconds

hbase(main):023:0> put 'student','1001','info:name','chunlei'
0 row(s) in 0.0080 seconds

hbase(main):024:0> get 'student','1001',{COLUMN=>'info:name',VERSIONS=>3}
COLUMN                                                CELL                                                                                                                                                       
 info:name                                            timestamp=1560295467589, value=chunlei                                                                                                                     
 info:name                                            timestamp=1560295375184, value=guangshen                                                                                                                   
 info:name                                            timestamp=1560293882038, value=xiannv                                                                                                                      
1 row(s) in 0.0090 seconds

hbase(main):025:0> put 'student','1001','info:name','wangjun'
0 row(s) in 0.0080 seconds

hbase(main):026:0> get 'student','1001',{COLUMN=>'info:name',VERSIONS=>3}
COLUMN                                                CELL                                                                                                                                                       
 info:name                                            timestamp=1560295506023, value=wangjun                                                                                                                     
 info:name                                            timestamp=1560295467589, value=chunlei                                                                                                                     
 info:name                                            timestamp=1560295375184, value=guangshen                                                                                                                   
1 row(s) in 0.0080 seconds

Chapter 4 HBase Data Structure

4.1 RowKey

Like nosql databases, RowKey is the primary key used to retrieve records. There are only three ways to access rows in HBASE table:

1. Access through a single RowKey

2. Through RowKey's range (regular)

3. Full table scanning

RowKey can be an arbitrary string (the maximum length is 64KB, the actual length is usually 10-100 bytes), within HBASE.

RowKey is saved as an array of bytes. When stored, data is stored in RowKey's byte order. When designing RowKey, you need to fully sort and store this feature by putting together rows that are often read together. (Location dependence)

4.2 Column Family

Column family: Each column in the HBASE table belongs to a column family. Column families are part of the schema of tables (columns are not), and must be defined before using tables. Names are prefixed by clans. For example, courses:history, courses:math belong to courses.

4.3 Cell

Units uniquely determined by {rowkey, column Family:columu, version}. There are no types of data in a cell, all of which are stored in bytecode form.

Keyword: Untyped, bytecode

4.4 Time Stamp

In HBASE, a storage unit determined by rowkey and columns is called cell. Each cell holds multiple versions of the same data. Versions are indexed by a timestamp. The type of timestamp is 64-bit integer. The timestamp can be assigned by HBASE (automatically when data is written), at which point the timestamp is the current system time accurate to milliseconds. Timestamps can also be explicitly assigned by customers. If an application wants to avoid data version conflicts, it must generate its own unique timestamp. In each cell, different versions of data are sorted in reverse chronological order, i.e. the latest data ranks first.

To avoid the burden of management (including storage and indexing) caused by too many versions of data, HBASE provides two ways of data version recovery. One is to save the last n versions of the data, and the other is to save the most recent versions (for example, the last seven days). Users can set it for each column family.

4.5 Namespaces

Namespace structure:

1) Table: Tables, all tables are members of the namespace, that is, tables must belong to a namespace, if not specified, in the default default namespace.

2) RegionServer group: A namespace contains the default RegionServer Group.

3) Permission: Permissions, namespaces allow us to define access control lists (ACL s). For example, create tables, read tables, delete, update and so on.

4) Quota: Quota: Limits that force the number of region s that a namespace can contain.

Chapter 5 Principles of HBase

5.1 Reading process

The process of HBase reading data is shown in Figure 3.

The HBase read data flow shown in Figure 3

1) Client first accesses zookeeper, reads the location of the region from the meta table, and then reads the data in the meta table. The region information of user table is stored in meta.

2) Find the corresponding region information in meta-table according to namespace, table name and rowkey.

3) Find the region server corresponding to this region;

4) Find the corresponding region;

5) Find data from MemStore first, and if not, read it in BlockCache.

6) BlockCache is not yet available, then read it on StoreFile (for efficiency of reading);

7) If the data is read from StoreFile, it is not directly returned to the client, but first written to BlockCache, and then returned to the client.

5.2 Writing process

The Hbase writing process is shown in Figure 2.

Figure 2. HBase Writing Data Flow

1) Client sends write request to Hregion Server;

2) Hregion Server writes data to HLog (write ahead log). For data persistence and recovery;

3) Hregion Server writes data to memory (MemStore);

4) Feedback Client to write successfully.

5.3 Data Flush Process

1) When the MemStore data reaches the threshold (default is 128M, the old version is 64M), the data is brushed to the hard disk, the data in memory is deleted, and the historical data in HLog is deleted.

2) Store data in HDFS;

3) Make marker points in HLog.

5.4 Data merging process

1) When the data block reaches 4 blocks, Hmaster triggers the merge operation, and Region loads the data block locally to merge.

2) When the merged data exceeds 256M, the split Region is allocated to different HregionServer management.

3) When the HregionServer goes down, the hlog on the HregionServer is split up and then assigned to different HregionServer loads to modify. META.

4) Note: HLog will synchronize to HDFS.

Keywords: PHP HBase shell Database Zookeeper

Added by josephferris on Tue, 11 Jun 2019 20:44:07 +0300

Programming VIP