Question:
Recently, it was found that 1W + region s in HBase clusters are in RIT state, resulting in many HBase clusters being unavailable.
HBase version:
2.0.1
Problem location:
1. At first, I thought it was just a simple timeout, so I manually modified the meta table status through the script (ing - > closed), then scrolled to restart hbase regionserver and master services, and finally batch assign ed. It was found that the RIT situation was not solved. (without hbck tool, I can only do it manually)
2. The manual assign 'region' prompt timed out. The procedure of a large number of assign operations is in awaiting state. Manual forced stop procedure is also invalid (abort_procedure). Therefore, it feels that things are not so simple.
3. Find the process number of the procedure on the procedure page and search in the master log to find out why the procedure operation is stuck. Finally, it was found that the description file of this region could not be found. When you look at HDFS, the entire table directory is gone, so you can't go online to the region (only the HDFS table directory is deleted when the business deletes the table, resulting in!!!).
Solution:
When the problem is found, the solution is very simple: just delete all the metadata information recorded in hbase:meta. However, in practice, it is difficult to find out: there are too many tables to delete in batch (in the later statistics, there are nearly 2000 tables, 1W + region, and the table names of these tables are prefix + timestamp, so the specific table names cannot be provided).
Based on the above problems, you can only write a script to get all the table names. The specific script is as follows:
1,Get all RIT Table row key echo "scan 'hbase:meta',{COLUMNS=>'info:state'}" | hbase shell -n | grep -E 'ING|OFFLINE|CLOSED|OPEN' > meta.txt 2,adopt meta.txt Get namespace:Table name cat meta.txt | awk -F "," '{print $1}' > table_name_and_namespace.txt 3,Get table name through script sh get_table_name.sh > meta_table_name_tmp.txt -------------------------------------------------- #!/bin/bash #The tables under the wz namespace are mainly processed here (the customer tables are all under the wz space) cat table_name.txt | while read line do #echo $line res1=`echo $line | awk -F ":" '{print $1}'` res2=`echo $line | awk -F ":" '{print $2}'` if [ "$res1" == "wz" ];then echo $res2 fi done -------------------------------------------------- 4,Table name de duplication uniq meta_table_name_tmp.txt > meta_table_name.txt 5,obtain hdfs All table names under the specified path(/apps/hbase/data/data/wz Medium wz Use the actual command space), where 7 and 8 change according to the actual situation,/opt/ceshi/hdfs_table_name.txt The permission is 777; hdfs dfs -ls /apps/hbase/data/data/wz | awk '{print $8}' | awk -F "/" '{print $7}' > /opt/ceshi/hdfs_table_name.txt 6,Compare the two sides to find the deleted table name sh diff.sh -------------------------------------------------- #!/bin/bash cat meta_table_name.txt | while read line do flag=`grep -c $line hdfs_table_name.txt` if [ $flag -eq 0 ];then echo $line fi done -------------------------------------------------- 7,Randomly select several tables to verify whether the results are correct and whether they are true hdfs End deleted
After knowing the table name, you can get all the row keys to be deleted through the scan metadata table, and then delete the dirty data through the deleteall command.
1,Get row key echo " scan 'hbase:meta'" | hbase shell -n | grep -E 'table:state|info:state' | grep -E ' table_name1,|table_name2,|...|..| table_nameN,' > zangshuju.txt 2,Delete dirty data for line in `cat zangshuju.txt |awk '{print $1}'`;do echo " deleteall 'hbase:meta',\"$line\" ";done | hbase shell -n
Finally, stop the two hbase master services and start them again. You can see that RIT has completely disappeared.
Summary:
Obviously, the reason for this RIT problem is that the use method is not standardized. Later, it is necessary to change the table deletion logic to disable+drop (recommended). Or when deleting HDFS table data, the metadata is also deleted.