Table design
- Column cluster: 1-2 are recommended. If you can use one, you can't use two
- Version design: if our project does not need to save historical VERSIONS, it is OK to directly configure VERSIONS=1 according to the default configuration. If you need to save historical change information in the project, you can set VERSIONS to > 1. But setting it to greater than 1 means taking up more space
- Data compression: when creating a table, you can specify the data compression method (GZ, SNAPPY, LZO) for the column cluster. GZ mode has the highest compression ratio, about 13% of the space, but its compression and decompression speed is slower
Key actions to avoid hot spots
-
Pre partition
- When creating a table, configure some policies so that a table has multiple regions and is distributed in different hregionservers
- HBase will automatically split. If a region is too large, HBase will automatically split into two, which is to divide horizontally according to rowkey
-
rowkey design
-
Reverse: for example: mobile phone number and time stamp can reverse the mobile phone number
-
Salt: add random numbers in front of rowkey. After adding random numbers, the data will not be queried, because HBase has no secondary index by default
-
Hash: take the hash according to a part of the rowkey, because the hash has the same value every time. Therefore, we can use hash operation to obtain data
-
The core of these strategies is to break up the rowkey s and put them into the cluster nodes. Therefore, the data is no longer stored orderly, which will lead to the decline of the efficiency of scan
-
Pre partition
-
There are two strategies for pre partitioning
- Pre partition by startKey and endKey [10, 40, 50]
- Directly specify the quantity. startKey and endKey are automatically generated by hbase, and the algorithm of key needs to be specified
-
HBase data is stored in HDFS
- /hbase/data / namespace / table / column cluster / StoreFiles
Table building instruction
# 1, Namespace # 1. Create a namespace create_namespace 'MOMO_CHAT' # 2. View namespace list_namespace # 3. Delete the previous namespace drop_namespace 'MOMO_CHAT' # 4. View a specific namespace describe_namespace 'MOMO_CHAT' describe_namespace 'default' # 5. Command momo_ Create a table named MSG under the chat namespace, which contains a column cluster named C1. # Note: for tables with namespaces, use colons to join namespaces and table names together create "MOMO_CHAT:MSG", "C1" # 6. Specify how to modify the column cluster of a table and its compression method alter "MOMO_CHAT:MSG", {NAME => "C1", COMPRESSION => "GZ"} # 7. Delete the previously created table disable "MOMO_CHAT:MSG" drop "MOMO_CHAT:MSG" # 8. You need to specify a pre partition when creating a table create 'MOMO_CHAT:MSG', {NAME => "C1", COMPRESSION => "GZ"}, { NUMREGIONS => 6, SPLITALGO => 'HexStringSplit'}
You can see that there are already six region s.
Randomly generate a message
- Read data from excel file through Excel reader tool class and put it into a Map structure
- key: field name
- value: List, the data List corresponding to the field
- Create getOneMessage method, which is specially used to randomly generate an Msg entity object according to the data read by Excel
- Call ExcelReader Randomcolumn method to randomly obtain the data of a column
- Note: the message uses the current system time. The format of time is year month day hour minute second
public class MoMoMsgGen { public static void main(String[] args) { // Read data from Excel file Map<String, List<String>> resultMap = ExcelReader.readXlsx("D:\\Curriculum development\\51.V8.0_NoSQL_MQ\\2.HBase\\3.code\\momo_chat_app\\data\\Test data set.xlsx", "Momo data"); System.out.println(getOneMessage(resultMap)); } /** * Randomly generate an Msg object based on the data read from the Excel table * @param resultMap Excel Read data (Map structure) * @return An Msg object */ public static Msg getOneMessage(Map<String, List<String>> resultMap) { // one Build Msg entity class object Msg msg = new Msg(); // Set the time of the current system as the time of the message and store it in the form of month, day, hour, minute and second SimpleDateFormat simpleDateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); // get SysTime Date now = new Date(); msg.setMsg_time(simpleDateFormat.format(now)); // two Call randomColumn in ExcelReader to randomly generate the data of a column // Initialize sender_nickyname field, call randomColumn to get Nick randomly_ Name set data msg.setSender_nickyname(ExcelReader.randomColumn(resultMap, "sender_nickyname")); msg.setSender_account(ExcelReader.randomColumn(resultMap, "sender_account")); msg.setSender_sex(ExcelReader.randomColumn(resultMap, "sender_sex")); msg.setSender_ip(ExcelReader.randomColumn(resultMap, "sender_ip")); msg.setSender_os(ExcelReader.randomColumn(resultMap, "sender_os")); msg.setSender_phone_type(ExcelReader.randomColumn(resultMap, "sender_phone_type")); msg.setSender_network(ExcelReader.randomColumn(resultMap, "sender_network")); msg.setSender_gps(ExcelReader.randomColumn(resultMap, "sender_gps")); msg.setReceiver_nickyname(ExcelReader.randomColumn(resultMap, "receiver_nickyname")); msg.setReceiver_ip(ExcelReader.randomColumn(resultMap, "receiver_ip")); msg.setReceiver_account(ExcelReader.randomColumn(resultMap, "receiver_account")); msg.setReceiver_os(ExcelReader.randomColumn(resultMap, "receiver_os")); msg.setReceiver_phone_type(ExcelReader.randomColumn(resultMap, "receiver_phone_type")); msg.setReceiver_network(ExcelReader.randomColumn(resultMap, "receiver_network")); msg.setReceiver_gps(ExcelReader.randomColumn(resultMap, "receiver_gps")); msg.setReceiver_sex(ExcelReader.randomColumn(resultMap, "receiver_sex")); msg.setMsg_type(ExcelReader.randomColumn(resultMap, "msg_type")); msg.setDistance(ExcelReader.randomColumn(resultMap, "distance")); msg.setMessage(ExcelReader.randomColumn(resultMap, "message")); // three Note that the time uses the current time of the system return msg; } }
Generate rowkey
- ROWKEY = MD5Hash_ Sender account_ Recipient account_ Message timestamp
- MD5Hash.getMD5AsHex generates the MD5 value. In order to shorten the rowkey, take the first 8 bits
// Generate rowkey from Msg entity object public static byte[] getRowkey(Msg msg) throws ParseException { // // ROWKEY = MD5Hash_ Sender account_ Recipient account_ Message timestamp // // Use StringBuilder to underline sender account, recipient account and message timestamp () Splice together StringBuilder builder = new StringBuilder(); builder.append(msg.getSender_account()); builder.append("_"); builder.append(msg.getReceiver_account()); builder.append("_"); // Gets the timestamp of the message String msgDateTime = msg.getMsg_time(); SimpleDateFormat simpleDateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); Date msgDate = simpleDateFormat.parse(msgDateTime); long timestamp = msgDate.getTime(); builder.append(timestamp); // Use bytes Tobytes converts the spliced string into a byte [] array // Use md5hash Getmd5ashex generates MD5 value and takes its first 8 bits String md5AsHex = MD5Hash.getMD5AsHex(builder.toString().getBytes()); String md5Hex8bit = md5AsHex.substring(0, 8); // Then, the MD5 value and the previously spliced sender account, recipient account and message timestamp are spliced with underscores and converted into Bytes array String rowkeyString = md5Hex8bit + "_" + builder.toString(); System.out.println(rowkeyString); return Bytes.toBytes(rowkeyString); }
Push randomly generated data into HBase
public static void main(String[] args) throws ParseException, IOException { // Read data from Excel file Map<String, List<String>> resultMap = ExcelReader.readXlsx("D:\\Curriculum development\\51.V8.0_NoSQL_MQ\\2.HBase\\3.code\\momo_chat_app\\data\\Test data set.xlsx", "Momo data"); // Generate data into HBase // 1. Get Hbase connection Configuration config = HBaseConfiguration.create(); Connection connection = ConnectionFactory.createConnection(config); // 2. Obtain HBase table MOMO_CHAT:MSG Table table = connection.getTable(TableName.valueOf("MOMO_CHAT:MSG")); int i = 0; int MAX = 100000; while (i < MAX) { Msg msg = getOneMessage(resultMap); // 3. Initialize the variables (column cluster, column name) required for the operation of Hbase byte[] rowkey = getRowkey(msg); String cf = "C1"; String colMsg_time = "msg_time"; String colSender_nickyname = "sender_nickyname"; String colSender_account = "sender_account"; String colSender_sex = "sender_sex"; String colSender_ip = "sender_ip"; String colSender_os = "sender_os"; String colSender_phone_type = "sender_phone_type"; String colSender_network = "sender_network"; String colSender_gps = "sender_gps"; String colReceiver_nickyname = "receiver_nickyname"; String colReceiver_ip = "receiver_ip"; String colReceiver_account = "receiver_account"; String colReceiver_os = "receiver_os"; String colReceiver_phone_type = "receiver_phone_type"; String colReceiver_network = "receiver_network"; String colReceiver_gps = "receiver_gps"; String colReceiver_sex = "receiver_sex"; String colMsg_type = "msg_type"; String colDistance = "distance"; String colMessage = "message"; // 4. Build put request Put put = new Put(rowkey); // 5. Add all columns of unfamiliar messages one by one put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colMsg_time), Bytes.toBytes(msg.getMsg_time())); put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colSender_nickyname), Bytes.toBytes(msg.getSender_nickyname())); put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colSender_account), Bytes.toBytes(msg.getSender_account())); put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colSender_sex), Bytes.toBytes(msg.getSender_sex())); put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colSender_ip), Bytes.toBytes(msg.getSender_ip())); put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colSender_os), Bytes.toBytes(msg.getSender_os())); put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colSender_phone_type), Bytes.toBytes(msg.getSender_phone_type())); put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colSender_network), Bytes.toBytes(msg.getSender_network())); put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colSender_gps), Bytes.toBytes(msg.getSender_gps())); put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colReceiver_nickyname), Bytes.toBytes(msg.getReceiver_nickyname())); put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colReceiver_ip), Bytes.toBytes(msg.getReceiver_ip())); put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colReceiver_account), Bytes.toBytes(msg.getReceiver_account())); put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colReceiver_os), Bytes.toBytes(msg.getReceiver_os())); put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colReceiver_phone_type), Bytes.toBytes(msg.getReceiver_phone_type())); put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colReceiver_network), Bytes.toBytes(msg.getReceiver_network())); put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colReceiver_gps), Bytes.toBytes(msg.getReceiver_gps())); put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colReceiver_sex), Bytes.toBytes(msg.getReceiver_sex())); put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colMsg_type), Bytes.toBytes(msg.getMsg_type())); put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colDistance), Bytes.toBytes(msg.getDistance())); put.addColumn(Bytes.toBytes(cf), Bytes.toBytes(colMessage), Bytes.toBytes(msg.getMessage())); // 6. Initiate put request table.put(put); // Show progress ++i; System.out.println(i + " / " + MAX); } table.close(); connection.close(); }
The number of data written here is 10w. You can see that the request is evenly distributed in the region.
Implement getMessage data service interface
Implemented using scan + filter
- Building scan objects
- Build 4 filter s (start date query, end date query, sender and recipient)
- Build a list of Msg objects
/** * * @param date Date: September 10, 2020 * @param sender Sender * @param receiver addressee * @return * @throws Exception */ @Override public List<Msg> getMessage(String date, String sender, String receiver) throws Exception { // one Building scan objects Scan scan = new Scan(); // Build two date strings with hours, minutes and seconds String startDateStr = date + " 00:00:00"; String endDateStr = date + " 23:59:59"; // two Build a range for query time, for example: 2020-10-05 00:00:00 – 2020-10-05 23:59:59 // three Build two filters for query date, greater than or equal to and less than or equal to. Use SingleColumnValueFilter to Filter a single column here. SingleColumnValueFilter startDateFilter = new SingleColumnValueFilter(Bytes.toBytes("C1") , Bytes.toBytes("msg_time") , CompareOperator.GREATER_OR_EQUAL , new BinaryComparator(Bytes.toBytes(startDateStr))); SingleColumnValueFilter endDateFilter = new SingleColumnValueFilter(Bytes.toBytes("C1") , Bytes.toBytes("msg_time") , CompareOperator.LESS_OR_EQUAL , new BinaryComparator(Bytes.toBytes(endDateStr))); // four Build sender Filter SingleColumnValueFilter senderFilter = new SingleColumnValueFilter(Bytes.toBytes("C1") , Bytes.toBytes("sender_account") , CompareOperator.EQUAL , new BinaryComparator(Bytes.toBytes(sender))); // five Build recipient Filter SingleColumnValueFilter receiverFilter = new SingleColumnValueFilter(Bytes.toBytes("C1") , Bytes.toBytes("receiver_account") , CompareOperator.EQUAL , new BinaryComparator(Bytes.toBytes(receiver))); // six Use FilterList to combine all filters FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL , startDateFilter , endDateFilter , senderFilter , receiverFilter); // seven Set scan object filter scan.setFilter(filterList); // eight Get the HTable object and call getScanner to execute Table table = connection.getTable(TableName.valueOf("MOMO_CHAT:MSG")); ResultScanner resultScanner = table.getScanner(scan); // nine Get the iterator, iterate each row, and iterate each cell at the same time Iterator<Result> iterator = resultScanner.iterator(); // Create a list to save the queried messages ArrayList<Msg> msgList = new ArrayList<>(); while(iterator.hasNext()) { // The data queried in each row is an Msg object Result result = iterator.next(); Msg msg = new Msg(); // Get rowkey String rowkey = Bytes.toString(result.getRow()); // Cell list List<Cell> cellList = result.listCells(); for (Cell cell : cellList) { // Determine according to the column name of the current cell cell and set the corresponding field String columnName = Bytes.toString(cell.getQualifierArray(), cell.getQualifierOffset(), cell.getQualifierLength()); if(columnName.equals("msg_time")) { msg.setMsg_time(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength())); } if(columnName.equals("sender_nickyname")){ msg.setSender_nickyname(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength())); } if(columnName.equals("sender_account")){ msg.setSender_account(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength())); } if(columnName.equals("sender_sex")){ msg.setSender_sex(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength())); } if(columnName.equals("sender_ip")){ msg.setSender_ip(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength())); } if(columnName.equals("sender_os")){ msg.setSender_os(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength())); } if(columnName.equals("sender_phone_type")){ msg.setSender_phone_type(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength())); } if(columnName.equals("sender_network")){ msg.setSender_network(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength())); } if(columnName.equals("sender_gps")){ msg.setSender_gps(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength())); } if(columnName.equals("receiver_nickyname")){ msg.setReceiver_nickyname(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength())); } if(columnName.equals("receiver_ip")){ msg.setReceiver_ip(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength())); } if(columnName.equals("receiver_account")){ msg.setReceiver_account(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength())); } if(columnName.equals("receiver_os")){ msg.setReceiver_os(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength())); } if(columnName.equals("receiver_phone_type")){ msg.setReceiver_phone_type(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength())); } if(columnName.equals("receiver_network")){ msg.setReceiver_network(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength())); } if(columnName.equals("receiver_gps")){ msg.setReceiver_gps(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength())); } if(columnName.equals("receiver_sex")){ msg.setReceiver_sex(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength())); } if(columnName.equals("msg_type")){ msg.setMsg_type(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength())); } if(columnName.equals("distance")){ msg.setDistance(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength())); } if(columnName.equals("message")){ msg.setMessage(Bytes.toString(cell.getValueArray(), cell.getValueOffset(), cell.getValueLength())); } } msgList.add(msg); } // close resource resultScanner.close(); table.close(); return msgList; }