HBCK2 is apache hbase Cluster repair tool.
Comparison between HBCK2 and hbck1
HBCK2 is the successor of HBCK, hbase-1 X (a.k.a hbck1). HBCK2 was used instead of hbck1 for hbase-2 X cluster. Hbck1 should not be targeted at hbase-2 X installation and operation. It can cause harm. Although hbck1 is still bound to hbase-2 In X - to minimize accidents - but it has been deprecated and will be in hbase-3 Deleted from X. Its write function (- fix) has been removed. It can report hbase-2 The status of the X cluster, but its evaluation will be inaccurate because it does not understand hbase-2 The internal working principle of X.
HBCK2 doesn't work like hbck1, even if the command names are similar in both versions. See the next section for the differences between these tools.
Basic concepts
HBCK2 executes a separate task each time it runs. It is not a tool that can analyze all about running clusters and then fix "all problems" found, such as the recommendations used by hbck1.
HBCK2 is used for repair. For the list of inconsistencies or blockages in the running cluster, you can go elsewhere to view the log and UI of the running cluster Master. Once a problem is found, you can use the HBCK2 tool to ask the Master to repair or skip the bad state. Another important difference between HBCK2 and hbck1 is that the Master is required to repair, rather than trying to repair locally in the context of the repair tool. For more information on how this interactive repair process works and how HBCK2 works, see the following sections.
Source download and compilation
Enter the download page
https://hbase.apache.org/downloads.html
Download HBase Operator Tools
The compilation command is as follows
mvn clean install -DskipTests
After compilation, you can see the corresponding jar file under hbase-hbck2/target.
Run HBCK2 tool
HBCK2 jar does not contain dependencies; It is not a fat jar. Dependencies must be provided. Building and adjusting the target hbase version in the top-level POM to match your deployment will achieve the smoothest operation when running against your deployment (see parent pom.xml hbase operator tools to set hbase.version).
The interesting thing about the runtime interaction between HBCK2 and the running cluster is that when HBCK2 is deployed ahead of your hbase, your hbase does not support all API s in the current HBCK2. If HBCK2 does not require server-side support, it should fail gracefully. If you encounter this situation, use the old version of HBCK2 or upgrade your cluster if you can.
The easiest way to "provide" HBCK2 with its dependencies is through $HBase_ The home / bin/hbase script starts HBCK2. The bin/hbase script itself refers to HBCK -- an HBCK option is listed in the help output. By default, running bin/hbase hbck will run the built-in hbck1 tool. To run HBCK2, you need to use the - j option to point to the built HBCK2 jar, as follows:
$ ${HBASE_HOME}/bin/hbase --config /etc/hbase-conf hbck -j ~/hbase-operator-tools/hbase-hbck2/target/hbase-hbck2-xxx.jar
Where mentioned above, / etc / HBase conf is the location of the deployed configuration. HBCK2 jar is located at ~ / HBase operator tools / HBase hbck2 / target / HBase hbck2 XXX jar. The above command without passing options or parameters will output HBCK2 help:
usage: HBCK2 [OPTIONS] COMMAND <ARGS> Options: -d,--debug run with debug output -h,--help output this help message -p,--hbase.zookeeper.property.clientPort <arg> port of hbase ensemble -q,--hbase.zookeeper.quorum <arg> hbase ensemble -s,--skip skip hbase version check (PleaseHoldException) -v,--version this hbck2 version -z,--zookeeper.znode.parent <arg> parent znode of hbase ensemble Command: addFsRegionsMissingInMeta <NAMESPACE|NAMESPACE:TABLENAME>... Options: -d,--force_disable aborts fix for table if disable fails. To be used when regions missing from hbase:meta but directories are present still in HDFS. Can happen if user has run _hbck1_ 'OfflineMetaRepair' against an hbase-2.x cluster. Needs hbase:meta to be online. For each table name passed as parameter, performs diff between regions available in hbase:meta and region dirs on HDFS. Then for dirs with no hbase:meta matches, it reads the 'regioninfo' metadata file and re-creates given region in hbase:meta. Regions are re-created in 'CLOSED' state in the hbase:meta table, but not in the Masters' cache, and they are not assigned either. To get these regions online, run the HBCK2 'assigns'command printed when this command-run completes. NOTE: If using hbase releases older than 2.3.0, a rolling restart of HMasters is needed prior to executing the set of 'assigns' output. An example adding missing regions for tables 'tbl_1' in the default namespace, 'tbl_2' in namespace 'n1' and for all tables from namespace 'n2': $ HBCK2 addFsRegionsMissingInMeta default:tbl_1 n1:tbl_2 n2 Returns HBCK2 an 'assigns' command with all re-inserted regions. SEE ALSO: reportMissingRegionsInMeta SEE ALSO: fixMeta assigns [OPTIONS] <ENCODED_REGIONNAME/INPUTFILES_FOR_REGIONNAMES>... Options: -o,--override override ownership by another procedure -i,--inputFiles take one or more encoded region names A 'raw' assign that can be used even during Master initialization (if the -skip flag is specified). Skirts Coprocessors. Pass one or more encoded region names. 1588230740 is the hard-coded name for the hbase:meta region and de00010733901a05f5a2a3a382e27dd4 is an example of what a user-space encoded region name looks like. For example: $ HBCK2 assigns 1588230740 de00010733901a05f5a2a3a382e27dd4 Returns the pid(s) of the created AssignProcedure(s) or -1 if none. If -i or --inputFiles is specified, pass one or more input file names. Each file contains encoded region names, one per line. For example: $ HBCK2 assigns -i fileName1 fileName2 bypass [OPTIONS] <PID>... Options: -o,--override override if procedure is running/stuck -r,--recursive bypass parent and its children. SLOW! EXPENSIVE! -w,--lockWait milliseconds to wait before giving up; default=1 Pass one (or more) procedure 'pid's to skip to procedure finish. Parent of bypassed procedure will also be skipped to the finish. Entities will be left in an inconsistent state and will require manual fixup. May need Master restart to clear locks still held. Bypass fails if procedure has children. Add 'recursive' if all you have is a parent pid to finish parent and children. This is SLOW, and dangerous so use selectively. Does not always work. extraRegionsInMeta <NAMESPACE|NAMESPACE:TABLENAME>... Options: -f, --fix fix meta by removing all extra regions found. Reports regions present on hbase:meta, but with no related directories on the file system. Needs hbase:meta to be online. For each table name passed as parameter, performs diff between regions available in hbase:meta and region dirs on the given file system. Extra regions would get deleted from Meta if passed the --fix option. NOTE: Before deciding on use the "--fix" option, it's worth check if reported extra regions are overlapping with existing valid regions. If so, then "extraRegionsInMeta --fix" is indeed the optimal solution. Otherwise, "assigns" command is the simpler solution, as it recreates regions dirs in the filesystem, if not existing. An example triggering extra regions report for tables 'table_1' and 'table_2', under default namespace: $ HBCK2 extraRegionsInMeta default:table_1 default:table_2 An example triggering extra regions report for table 'table_1' under default namespace, and for all tables from namespace 'ns1': $ HBCK2 extraRegionsInMeta default:table_1 ns1 Returns list of extra regions for each table passed as parameter, or for each table on namespaces specified as parameter. filesystem [OPTIONS] [<TABLENAME>...] Options: -f, --fix sideline corrupt hfiles, bad links, and references. Report on corrupt hfiles, references, broken links, and integrity. Pass '--fix' to sideline corrupt files and links. '--fix' does NOT fix integrity issues; i.e. 'holes' or 'orphan' regions. Pass one or more tablenames to narrow checkup. Default checks all tables and restores 'hbase.version' if missing. Interacts with the filesystem only! Modified regions need to be reopened to pick-up changes. fixMeta Do a server-side fix of bad or inconsistent state in hbase:meta. Available in hbase 2.2.1/2.1.6 or newer versions. Master UI has matching, new 'HBCK Report' tab that dumps reports generated by most recent run of _catalogjanitor_ and a new 'HBCK Chore'. It is critical that hbase:meta first be made healthy before making any other repairs. Fixes 'holes', 'overlaps', etc., creating (empty) region directories in HDFS to match regions added to hbase:meta. Command is NOT the same as the old _hbck1_ command named similarily. Works against the reports generated by the last catalog_janitor and hbck chore runs. If nothing to fix, run is a noop. Otherwise, if 'HBCK Report' UI reports problems, a run of fixMeta will clear up hbase:meta issues. See 'HBase HBCK' UI for how to generate new report. SEE ALSO: reportMissingRegionsInMeta generateMissingTableDescriptorFile <TABLENAME> Trying to fix an orphan table by generating a missing table descriptor file. This command will have no effect if the table folder is missing or if the .tableinfo is present (we don't override existing table descriptors). This command will first check it the TableDescriptor is cached in HBase Master in which case it will recover the .tableinfo accordingly. If TableDescriptor is not cached in master then it will create a default .tableinfo file with the following items: - the table name - the column family list determined based on the file system - the default properties for both TableDescriptor and ColumnFamilyDescriptors If the .tableinfo file was generated using default parameters then make sure you check the table / column family properties later (and change them if needed). This method does not change anything in HBase, only writes the new .tableinfo file to the file system. Orphan tables can cause e.g. ServerCrashProcedures to stuck, you might need to fix these still after you generated the missing table info files. replication [OPTIONS] [<TABLENAME>...] Options: -f, --fix fix any replication issues found. Looks for undeleted replication queues and deletes them if passed the '--fix' option. Pass a table name to check for replication barrier and purge if '--fix'. reportMissingRegionsInMeta <NAMESPACE|NAMESPACE:TABLENAME>... To be used when regions missing from hbase:meta but directories are present still in HDFS. Can happen if user has run _hbck1_ 'OfflineMetaRepair' against an hbase-2.x cluster. This is a CHECK only method, designed for reporting purposes and doesn't perform any fixes, providing a view of which regions (if any) would get re-added to hbase:meta, grouped by respective table/namespace. To effectively re-add regions in meta, run addFsRegionsMissingInMeta. This command needs hbase:meta to be online. For each namespace/table passed as parameter, it performs a diff between regions available in hbase:meta against existing regions dirs on HDFS. Region dirs with no matches are printed grouped under its related table name. Tables with no missing regions will show a 'no missing regions' message. If no namespace or table is specified, it will verify all existing regions. It accepts a combination of multiple namespace and tables. Table names should include the namespace portion, even for tables in the default namespace, otherwise it will assume as a namespace value. An example triggering missing regions report for tables 'table_1' and 'table_2', under default namespace: $ HBCK2 reportMissingRegionsInMeta default:table_1 default:table_2 An example triggering missing regions report for table 'table_1' under default namespace, and for all tables from namespace 'ns1': $ HBCK2 reportMissingRegionsInMeta default:table_1 ns1 Returns list of missing regions for each table passed as parameter, or for each table on namespaces specified as parameter. setRegionState <ENCODED_REGIONNAME> <STATE> Possible region states: OFFLINE, OPENING, OPEN, CLOSING, CLOSED, SPLITTING, SPLIT, FAILED_OPEN, FAILED_CLOSE, MERGING, MERGED, SPLITTING_NEW, MERGING_NEW, ABNORMALLY_CLOSED WARNING: This is a very risky option intended for use as last resort. Example scenarios include unassigns/assigns that can't move forward because region is in an inconsistent state in 'hbase:meta'. For example, the 'unassigns' command can only proceed if passed a region in one of the following states: SPLITTING|SPLIT|MERGING|OPEN|CLOSING Before manually setting a region state with this command, please certify that this region is not being handled by a running procedure, such as 'assign' or 'split'. You can get a view of running procedures in the hbase shell using the 'list_procedures' command. An example setting region 'de00010733901a05f5a2a3a382e27dd4' to CLOSING: $ HBCK2 setRegionState de00010733901a05f5a2a3a382e27dd4 CLOSING Returns "0" if region state changed and "1" otherwise. setTableState <TABLENAME> <STATE> Possible table states: ENABLED, DISABLED, DISABLING, ENABLING To read current table state, in the hbase shell run: hbase> get 'hbase:meta', '<TABLENAME>', 'table:state' A value of \x08\x00 == ENABLED, \x08\x01 == DISABLED, etc. Can also run a 'describe "<TABLENAME>"' at the shell prompt. An example making table name 'user' ENABLED: $ HBCK2 setTableState users ENABLED Returns whatever the previous table state was. scheduleRecoveries <SERVERNAME>... Schedule ServerCrashProcedure(SCP) for list of RegionServers. Format server name as '<HOSTNAME>,<PORT>,<STARTCODE>' (See HBase UI/logs). Example using RegionServer 'a.example.org,29100,1540348649479': $ HBCK2 scheduleRecoveries a.example.org,29100,1540348649479 Returns the pid(s) of the created ServerCrashProcedure(s) or -1 if no procedure created (see master logs for why not). Command support added in hbase versions 2.0.3, 2.1.2, 2.2.0 or newer. unassigns <ENCODED_REGIONNAME>... Options: -o,--override override ownership by another procedure A 'raw' unassign that can be used even during Master initialization (if the -skip flag is specified). Skirts Coprocessors. Pass one or more encoded region names. 1588230740 is the hard-coded name for the hbase:meta region and de00010733901a05f5a2a3a382e27dd4 is an example of what a userspace encoded region name looks like. For example: $ HBCK2 unassign 1588230740 de00010733901a05f5a2a3a382e27dd4 Returns the pid(s) of the created UnassignProcedure(s) or -1 if none. SEE ALSO, org.apache.hbase.hbck1.OfflineMetaRepair, the offline hbase:meta tool. See the HBCK2 README for how to use.
Note that when you pass the HBCK parameter to bin/hbase, it will use the default client to access the target hbase cluster by default. This is sufficient for most HBCK2 uses. If you encounter the following exceptions:
bin/hbase --config hbase-conf hbck 2019-08-30 05:04:54,467 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2799) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2810) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:100) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2849) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2831) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:389) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356) at org.apache.hadoop.hbase.util.CommonFSUtils.getRootDir(CommonFSUtils.java:361) at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3605)
... this is because the HDFS jar is not on CLASSPATH. By default, HDFS jars are not bundled on CLASSPATH when hbck is run through bin/hbase. Define hadoop in environment_ Home so that bin/hbase can find your local hadoop installation, and then it will load its HDFS jar.
Introduction to HBCK2
HBCK2 is currently a simple tool that does only one thing at a time.
In hbase-2 In X, the Master is the final arbiter of all States, so the general principle of most HBCK2 commands is that it requires the Master to make all repairs. This means that you must start the Master before you can run the HBCK2 command.
HBCK2 is implemented by using HbckService hosted on the Master. The service publishes some methods for use by HBCK2 tools. Therefore, for the HBCK2 command that depends on the HbckService interface of the Master, the first thing HBCK2 does is poke the cluster to ensure that the service is available. If the remote server does not publish the service or the HbckService lacks the requested method, this will fail. In the latter case, if you can, update your cluster for more repair tools.
Look for problems
Although hbck1 performs analysis to report your cluster GOOD or BAD, HBCK2 is not so self righteous. In hbase-2 X, the operator determines what needs to be repaired, and then uses tools including HBCK2 to repair. The operator may have to run several rounds of HBCK2 back and forth, and then check the cluster status.
To solve cluster problems, use the following utilities and methods.
Diagnostic tools
Master Logs
The Master runs all assignments, server crash handling, cluster start and stop, etc. In hbase-2 In X, everything done by the Master is transformed into a program running on the state machine engine. For details on how this new infrastructure works, see process framework and assignment manager. Each process has a unique process id, its pid, which is listed in each log record. After the pid, you can track the life cycle of the process in the main log as the transition from the beginning of the process to the completion of each stage of the process. Some programs generate subroutines, wait for their subroutines, and then complete them by themselves. Each subroutine records its pid and its ppid; The pid of its parent program.
Generally speaking, there is no problem in all operations, but if some unforeseen circumstances occur, the distribution framework may be damaged and operator intervention is required. We will discuss some such scenarios below, but they can be shown in the main log that a process in which an area is STUCK or a conversion entity (area or table) may be blocked because another process holds an exclusive lock and does not let go
The stick procedure is as follows:
2018-09-12 15:29:06,558 WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: STUCK Region-In-Transition rit=OPENING, location=va1001.example.org,22101,1536173230599, table=IntegrationTestBigLinkedList_20180626110336, region=dbdb56242f17610c46ea044f7a42895b
Master UI: /master-status#tables
The middle part of the Master UI home page in this section displays a list of tables, including the columns of whether the table is ENABLED, ENABLING, DISABLING or DISABLED and other attributes. Columns with area counts for various transition states are also listed: on, off, and so on. Reading this table will help you determine if the areas of this table are properly configured. For example, if a table is ENABLED and there is no area in the OPEN state, and the master log keeps silent on any ongoing allocation, then there is a problem.
Master UI: 'Procedures & Locks'
This page lists all the processes and locks in progress and the current Master Procedure WAL set (named pv2- 0000000000000000 ####. Log your hbase installation under the masterprocwalls directory) on the Master UI home page under the procedures & locks menu item in the page title. At startup, on a large cluster, this page is full of lists of processes and locks when intense allocation is in progress. The number of MasterProcWAL will also expand. If there is a stuck lock or process after the cluster stabilizes, or the count of WAL does not decrease but only increases, operator intervention is required to remove the blocking.
The list of locks and procedures can also be obtained through hbase shell:
$ echo "list_locks"| hbase shell &> /tmp/locks.txt $ echo "list_procedures"| hbase shell &> /tmp/procedures.txt
Master UI: The 'HBCK Report'
In hbase 2.3.0/2.1.6/2.2.1 / HBCK In the JSP version, an HBCK report page is added to the master, which displays the output of two checks run by the master at regular intervals; One is output by the CatalogJanitor runtime. If there are overlaps or vulnerabilities in hbase:meta, half of the CatalogJanitor page will list what it finds (otherwise it is quiet). Add another background "miscellaneous" process to compare hbase:meta and file system content; If abnormal, it will be recorded in its HBCK report section.
For information on how to force inspectors to run, see the "HBCK report" page itself.
The HBase Canary Tool
The Canary tool is useful for verifying allocation status. It can be table focused or run for the entire cluster.
For example, to check the cluster allocation:
$ hbase canary -f false -t 6000000 &>/tmp/canary.log
-f false tells Canary to continue the failed region extraction, while - t 6000000 tells Canary to run for up to two hours. When you are finished, view / TMP / canary log. Look at the row of ERROR to find the problematic area assignment.
You can probe like Canary in hbase shell. For example, given that the starting row d1dd0c of a Region belongs to the table testtable, please do the following:
hbase> scan 'testtable', {STARTROW => 'd1dddd0c', LIMIT => 10}
For an overview of resolving zone names to their components, see the RegionInfo API.
Other tools
To calculate the list of unopened regions on the ENABLED or ENABLING table, read the hbase:meta table info:state column. For example, to find the table integration testbiglinked list_ For the status of all areas in 20180626064758, please do the following:
$ echo " scan 'hbase:meta', {ROWPREFIXFILTER => 'IntegrationTestBigLinkedList_20180626064758,', COLUMN => 'info:state'}"| hbase shell > /tmp/t.txt
... then execute grep in the OPENING or CLOSING area.
To move the OPENING problem to OPEN to make it consistent with the ENABLED state of the table, queue the new allocation process using the assign command in the hbase shell (view the main log to view the allocation run). If you want to assign multiple areas, use the HBCK2 tool. It allows batch allocation.
Problem repair
general principles
When fixing, make sure hbase:meta is consistent before fixing any other problem types, such as file system deviations. Deviations or allocation problems in the file system should be solved after hbase:meta sorting. If there is a problem with hbase:meta, the Master cannot place it correctly when using orphaned file system data or allocating regions.
Other general principles to keep in mind include that if the area is in the CLOSING state (or conversely, if it is in the OPENING state, it is not allocated), the area cannot be allocated without first switching through CLOSED: the area must always be moved from CLOSED to OPENING, then to OPEN, and then to CLOSING, CLOSED.
When repairing, repair one table at a time.
In addition, if the table is DISABLED, you cannot assign a region. In the Master log, you will see that the Master will report that the allocation has been skipped because the table is DISABLED. You may want to assign an area because it is currently in the OPENING state and you want it to be in the CLOSED state, so it is consistent with the DISABLED state of the table. In this case, you may have to temporarily set the table status to ENABLED so that you can make the assignment, and then set it back again after canceling the assignment. HBCK2 has a function that allows you to perform this operation. See HBCK2 using output.
Here's what hbase-2 has run so far X experience mixed notes and prescriptions. The underlying problem causing the state described below has been fixed in a later version of HBase, so please upgrade as much as possible to avoid the described situation
Assign / unassign
Usually, during allocation, the Master will continue until success. Assign an exclusive lock to the area. This prevents concurrent allocation or deallocation runs. The allocation of the lock area will wait until the lock is released. For a list of currently outstanding locks, see the [procedures & locks] section above.
The main start cannot be carried out and remains in mode until the area is online
This shouldn't happen. If so, it looks like this:
2018-10-01 22:07:42,792 WARN org.apache.hadoop.hbase.master.HMaster: hbase:meta,,1.1588230740 is NOT online; state={1588230740 state=CLOSING, ts=1538456302300, server=ve1017.example.org,22101,1538449648131}; ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern until region onlined.
The Master cannot continue to start because there is no process to allocate hbase:meta (or hbase:namespace). To inject one, use the HBCK2 tool:
HBASE_CLASSPATH_PREFIX=./hbase-hbck2-1.0.0-SNAPSHOT.jar hbase org.apache.hbase.HBCK2 assigns -skip 1588230740
... where 1588230740 is the code name of hbase:meta region. Pass the "- skip" option to stop HBCK2 from checking the version of the remote host. If the remote host is not started, the version check will prompt "the host is initializing the response" or "please keep exception" and abandon the allocation attempt- The 'skip' command will start in the version check and will complete the scheduled allocation.
The same can happen to hbase:namespace system tables. Find the encoding region name of the hbase:namespace region and perform an operation similar to what we did for hbase:meta. In the latter case, the Master will actually print a useful message, as shown below:
2019-07-09 22:08:38,966 WARN [master/localhost:16000:becomeActiveMaster] master.HMaster: hbase:namespace,,1562733904278.9559cf72b8e81e1291c626a8e781a6ae. is NOT online; state={9559cf72b8e81e1291c626a8e781a6ae state=CLOSED, ts=1562735318897, server=null}; ServerCrashProcedures=true. Master startup cannot progress, in holding-pattern until region onlined.
To schedule the allocation for the hbase:namespace table recorded in the above log line, you can:
$ ${HBASE_HOME}/bin/hbase --config /etc/hbase-conf hbck -j ./hbase-hbck2-1.0.0-SNAPSHOT.jar hbase -skip assigns 9559cf72b8e81e1291c626a8e781a6ae
... pass the encoded name of the namespace area (the encoded name will be different for each deployment).
Missing region in hbase: meta region / table recovery / reconstruction
There are some unusual cases where table regions are deleted from the hbase:meta table. Some classifications of such cases show that these are caused by the operator. The user has run the outdated hbck1 OfflineMetaRepair tool for the HBCK2 cluster. OfflineMetaRepair is a well-known tool for repairing HBase 1 Problems related to hbase:meta table on version X. The original version is not compatible with HBase 2 X or higher version, after some adjustments, to the extreme, can now run through HBCK2.
In most cases, areas in hbase:meta will eventually be lost randomly, but hbase may still work. In this case, you can use the addFsRegionsMissingInMeta command in HBCK2 to solve the problem online. Compared with the full hbase:meta reconstruction described later, this command is less destructive to hbase and can even be used to recover namespace table regions.
Extra regions in hbase: meta region / table recovery / reconstruction
In some cases, the table area has been deleted by the file system, but there are still relevant entries on the hbase:meta table. This may be due to splitting problems, manual operation errors (such as manually deleting / moving area directories), or even meta information data loss such as HBASE-21843.
Such problems can be solved through the online Master, using the extraRegionsInMeta – fix command in HBCK2. Compared with the full hbase:meta reconstruction described later, this command is less destructive to hbase. This is also useful when this happens on versions that do not support the fixMeta hbck2 option (any versions before "2.0.6", "2.1.6", "2.2.1", "2.3.0", "3.0.0").
Online hbase:meta reconstruction method
If hbase:meta damage is not too serious, hbase can still bring it online. Even if the namespace area is in the missing area, hbase:meta can still be scanned during initialization. At this time, the Master will wait for the namespace to be allocated. To verify this, execute the hbase:meta scan command, as shown below. If it does not time out or displays any errors, hbase:meta is online:
echo "scan 'hbase:meta', {COLUMN=>'info:regioninfo'}" | hbase shell
If there are no errors shown above, you can use HBCK2 addFsRegionsMissingInMeta. It reads the zone metadata information available on the FS zone directory to recreate the zone in hbase:meta. Since it can run with hbase partially running, it attempts to disable the online table affected by the reported problem and read the area to hbase:meta. It can check specific tables / namespaces, or all tables from all namespaces. The following example shows adding missing regions for table 'tbl_1' in the default namespace, 'tbl_2' in namespace 'n1', and all tables in namespace 'n2':
$ HBCK2 addFsRegionsMissingInMeta default:tbl_1 n1:tbl_2 n2
Since it runs independently of the Master, once it is successfully completed, additional steps are required to actually allocate the newly added areas. These are listed below:
- addFsRegionsMissingInMeta outputs an allocation command, which contains all the newly added regions. This command needs to be executed later, so it is convenient to copy and save.
- For HBase versions before 2.3.0, restart all running HBase masters after addFsRegionsMissingInMeta completes successfully and saves the output.
- Once the Master restarts and hbase:meta is online (check whether the Web UI is accessible), run the assign command from the addFsRegionsMissingInMeta output saved by the instruction in #1.
Note: if the namespace area is in the missing area, you need to add the – skip flag at the beginning of the returned assignments command.
If the cluster suffers catastrophic loss of hbase:meta table, the following methods can be used for rough reconstruction. In summary, we stop clustering; Run the HBCK2 OfflineMetaRepair tool, which reads the directory and metadata put into the file system and tries its best to rebuild the feasible hbase:meta table; Restart your cluster; Inject allocation to bring the system namespace table online; Finally, reallocate the user space tables you want to enable (the rebuilt hbase:meta creates a table where all tables are offline and no areas are allocated).
Detailed reconstruction method
Stop the cluster.
Run the rebuild hbase:meta command from HBCK2. This will remove the original hbase:meta and place a newly reconstructed one. The following is an example of how to run the tool. It adds the - details flag, so the tool dumps information about the areas it finds in hdfs:
$ HBASE_CLASSPATH_PREFIX=~/checkouts/hbase-operator-tools/hbase-hbck2/target/hbase-hbck2-1.0.0-SNAPSHOT.jar ./bin/hbase org.apache.hbase.hbck1.OfflineMetaRepair -details
Start the cluster. It won't appear completely. It will get stuck because the namespace table is not online and there is no procedure assigned to this unexpected event in the procedure store. hbase master log will show this status. This is an example of what it will record:
2019-07-10 18:30:51,090 WARN [master/localhost:16000:becomeActiveMaster] master.HMaster: hbase:namespace,,1562808216225.725a0fe6c2c869d3d0a9ed82bfa80fa3. is NOT online; state={725a0fe6c2c869d3d0a9ed82bfa80fa3 state=CLOSED, ts=1562808619952, server=null}; ServerCrashProcedures=false. Master startup cannot progress, in holding-pattern until region onlined.
To assign a namespace table region, you cannot use a shell. If you use a shell, it will fail and return PleaseHoldException because the master has not yet started (it is waiting for the namespace table to go online before it declares itself "started"). You must use the HBCK2 assignment command. To assign, you will need a namespace encoded name. It appears in the log referenced above: that is 725a0fe6c2c869d3d0a9ed82bfa80fa3, in this case. You must also "skip" the major version check through the - skip command (without it, your HBCK2 call will also raise the above PleaseHoldException because the major version has not been started). This is an example of adding a namespace table assignment:
$ HBASE_CLASSPATH_PREFIX=~/checkouts/hbase-operator-tools/hbase-hbck2/target/hbase-hbck2-1.0.0-SNAPSHOT.jar ./bin/hbase org.apache.hbase.HBCK2 -skip assigns 725a0fe6c2c869d3d0a9ed82bfa80fa3
If the call returns "connection rejected", is the Master up? If the Master fails to initialize, it will shut down after a period of time. Simply restart the cluster / Master server and rerun the above allocation command.
When the assignment runs successfully, you will see it emit the following. "48" at the end is the pid of the allocation process plan. If the returned pid is "- 1", the main startup has not been fully... Retry. Or, the encoded region name is incorrect. see.
$ HBASE_CLASSPATH_PREFIX=~/checkouts/hbase-operator-tools/hbase-hbck2/target/hbase-hbck2-1.0.0-SNAPSHOT.jar ./bin/hbase org.apache.hbase.HBCK2 -skip assigns 725a0fe6c2c869d3d0a9ed82bfa80fa3 18:40:43.817 [main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18:40:44.315 [main] INFO org.apache.hbase.HBCK2 - hbck support check skipped [48]
Check the master log. The master should be started. You will see that pid=48 completed successfully. Look for such a line to verify that the master is started successfully:
master.HMaster: Master has completed initialization 132.515sec
It may take some time to appear.
The reconstruction of hbase:meta adds the user table in DISABLED state and the area in CLOSED mode. Re enable tables through the shell to bring all table regions back online. Make or view enable one at a time_ The all ". *" command enables all tables at once.
Rebuilding metadata may lose edits and may require subsequent repair and cleanup using the tools outlined above in this readme.
Delete references, missing HBase Version file and corrupt hfile
HBCK2 can check for pending references and corrupted hfile s. You can ask it to exclude error files that may need to overcome the hump of area offline or read failure. See file system commands in the HBCK2 list. Pass one or more table names (or none to check all tables). It will report bad files. Fix with the – fix option.
program reset
In extreme cases, as the last resource, the Master state can be erased if the Master is upset, and all repair attempts only have unfinished locks or processes, and / or the MasterProcWAL set grows indefinitely. Just move the / hbase / masterprocwalls / directory under hbase installation to one side, and then restart the Master process. It will return in blank form.
If all regions are happily assigned or offline during erasure, the Master should pick up and continue when the Master restarts as if nothing had happened. However, if there were regions in transition at that time, operators would have to intervene to bring unfinished allocation / cancellation to their destination. Read the hbase:meta info:state column described above to see what needs to be allocated / deallocated. After deleting all the history of MasterProcWAL, no entity should be locked, so you are free to batch allocate / unassign.
Use isolated data
For information on how to fix orphaned areas of the "HBCK Chore" report, see the advanced section of the full batch loading tool in the reference guide "using" spurious data.
This article is an original article from big data to artificial intelligence blogger "xiaozhch5", which follows the CC 4.0 BY-SA copyright agreement. Please attach the original source link and this statement for reprint.
Original link: https://lrting.top/backend/3566/