Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 56 additions & 2 deletions hbase-hbck2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -375,9 +375,60 @@ To schedule an assign for the hbase:namespace table noted in the above log line,
```HBASE_CLASSPATH_PREFIX=./hbase-hbck2-1.0.0-SNAPSHOT.jar hbase org.apache.hbase.HBCK2 -skip assigns 9559cf72b8e81e1291c626a8e781a6ae```
... passing the encoded name for the namespace region (the encoded name will differ per deploy).

### hbase:meta region/table restore/rebuild
### Missing Regions in META - hbase:meta region/table restore/rebuild

Should a cluster suffer a catastrophic loss of the `hbase:meta` region, a rough rebuild is possible following the below recipe. In outline: stop the cluster; run the _OfflineMetaRepair_ tool which reads directories and metadata dropped into the filesystem making a best effort at reconstructing a viable _hbase:meta_ table; restart your cluster; inject an assign to bring the system namespace table online; and then finally, re-assign userspace tables you'd like enabled (the rebuilt _hbase:meta_ creates a table with all tables offline and no regions assigned).
There's been some extra-ordinary cases where table regions are removed from META table.
Comment thread
busbey marked this conversation as resolved.
Some triage on such cases revealed those were operator-induced, after execution
attempts of the obsolete *hbck1* _OfflineMetaRepair_ tool. _OfflineMetaRepair_ is a well known tool
for fixing META table related issues on HBase 1.x versions. The original version is not compatible
with HBase 2.x or higher versions, and it has undergone some adjustments to be now run within hbck2.

In most of these cases, regions may be missing in meta at random, but hbase may still be
operational. In such situations, problem can be addressed with master online, using _addFsRegionsMissingInMeta_ command.
This command is less disruptive to hbase than the full meta rebuild covered later, and it can be used even for
recovering _namespace_ table region.

#### Online meta rebuild recipe

If meta corruption is not too critical, hbase would still be able to bring it online. Even if namespace region
is among the missing ones in meta, it will still be possible to scan meta in the initialization period,
where master will be waiting for namespace to be assigned. To verify on this, a meta scan command can be executed
as below. If it does not time out or show any errors, _meta_ is online:

```
echo "scan 'hbase:meta', {COLUMN=>'info:regioninfo'}" | hbase shell
```

HBCK2 _addFsRegionsMissingInMeta_ can be used if the above does not show any errors. It reads region
metadata info available on the FS region dirs, in order to re-create regions in META. Since it can
run with hbase partially operational, it attempts to disable online tables that are affected by the
reported problem and is gonna have regions re-added to _meta_.
It can check for specific tables/namespaces, or all tables
from all namespaces. An example adding missing regions for tables 'tbl_1' on default namespace,
'tbl_2' on namespace 'n1' and for all tables from namespace 'n2':

```
$ HBCK2 addFsRegionsMissingInMeta default:tbl_1 n1:tbl_2 n2
```

As it operates independently from Master, once it finishes successfully, additional steps are
required to actually have the re-added regions assigned. These are listed below:

1. _addFsRegionsMissingInMeta_ outputs an _assigns_ command with all regions that got re-added. This
command needs to be executed later, so copy and save it for convenience.

2. For HBase versions prior to 2.3.0, after _addFsRegionsMissingInMeta_ finished successfully and output has been saved,
restart all running HBase Masters.

3. Once Master's are restarted and META is already online (check if Web UI is accessible), run
_assigns_ command from _addFsRegionsMissingInMeta_ output saved per instructions from #1.

NOTE: If _namespace_ region is among the missing ones, you will need to add _--skip_ flag at the
beginning of _assigns_ command returned.

Comment thread
wchevreuil marked this conversation as resolved.

Should a cluster suffer a catastrophic loss of the `hbase:meta` region, a rough rebuild is possible following the below recipe.
In outline: stop the cluster; run the _OfflineMetaRepair_ tool which reads directories and metadata dropped into the filesystem making a best effort at reconstructing a viable _hbase:meta_ table; restart your cluster; inject an assign to bring the system namespace table online; and then finally, re-assign userspace tables you'd like enabled (the rebuilt _hbase:meta_ creates a table with all tables offline and no regions assigned).

#### Detailed rebuild recipe
Stop the cluster.
Expand Down Expand Up @@ -410,3 +461,6 @@ The rebuild meta will likely be missing edits and may need subsequent repair and
### Dropped reference files, missing hbase.version file, and corrupted hfiles

HBCK2 can check for hanging references and corrupt hfiles. You can ask it to sideline bad files which may be needed to get over humps where regions won't online or reads are failing. See the _filesystem_ command in the HBCK2 listing. Pass one or more tablename (or 'none' to check all tables). It will report bad files. Pass the _--fix_ option to effect repairs.



6 changes: 6 additions & 0 deletions hbase-hbck2/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,12 @@
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
</dependency>
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-core</artifactId>
<version>2.1.0</version>
<scope>test</scope>
</dependency>
</dependencies>

<profiles>
Expand Down
129 changes: 129 additions & 0 deletions hbase-hbck2/src/main/java/org/apache/hbase/FsRegionsMetaRecoverer.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hbase;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HConstants;
import org.apache.hadoop.hbase.MetaTableAccessor;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.RegionInfo;
import org.apache.hadoop.hbase.regionserver.HRegionFileSystem;
import org.apache.hadoop.hbase.util.CommonFSUtils;
import org.apache.hadoop.hbase.util.FSUtils;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;

import java.io.Closeable;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Optional;
import java.util.stream.Collectors;

/**
* This class implements the inner works required for check and recover regions that wrongly
* went missing in META.
* Normally HBCK2 fix options rely on Master self-contained information to recover/fix
* inconsistencies, but this an exception case where META table is in a broken state.
* So, it assumes HDFS state as the source of truth, in other words, methods provided here consider
* meta information found on HDFS region dirs as the valid ones.
*/
public class FsRegionsMetaRecoverer implements Closeable {
private static final Logger LOG = LogManager.getLogger(FsRegionsMetaRecoverer.class);
private final FileSystem fs;
private final Connection conn;
private final Configuration config;

public FsRegionsMetaRecoverer(Configuration configuration) throws IOException {
this.config = configuration;
this.fs = CommonFSUtils.getRootDirFileSystem(configuration);
this.conn = ConnectionFactory.createConnection(configuration);
}

/*Initially defined for test only purposes */
FsRegionsMetaRecoverer(Configuration configuration, Connection connection, FileSystem fileSystem){
this.config = configuration;
this.conn = connection;
this.fs = fileSystem;
}

private List<Path> getTableRegionsDirs(String table) throws Exception {
String hbaseRoot = this.config.get(HConstants.HBASE_DIR);
Path tableDir = FSUtils.getTableDir(new Path(hbaseRoot), TableName.valueOf(table));
return FSUtils.getRegionDirs(fs, tableDir);
}

public Map<TableName,List<Path>> reportTablesMissingRegions(final List<String> namespacesOrTables)
throws IOException {
final Map<TableName,List<Path>> result = new HashMap<>();
List<TableName> tableNames = MetaTableAccessor.getTableStates(this.conn).keySet().stream()
.filter(tableName -> {
if(namespacesOrTables==null || namespacesOrTables.isEmpty()){
return true;
} else {
Optional<String> findings = namespacesOrTables.stream().filter(
name -> (name.indexOf(":") > 0) ?
tableName.equals(TableName.valueOf(name)) :
tableName.getNamespaceAsString().equals(name)).findFirst();
return findings.isPresent();
}
}).collect(Collectors.toList());
tableNames.stream().forEach(tableName -> {
try {
result.put(tableName,
findMissingRegionsInMETA(tableName.getNameWithNamespaceInclAsString()));
} catch (Exception e) {
LOG.warn(e);
}
});
return result;
}

List<Path> findMissingRegionsInMETA(String table) throws Exception {
final List<Path> missingRegions = new ArrayList<>();
final List<Path> regionsDirs = getTableRegionsDirs(table);
TableName tableName = TableName.valueOf(table);
List<RegionInfo> regionInfos = MetaTableAccessor.
getTableRegions(this.conn, tableName, false);
HashSet<String> regionsInMeta = regionInfos.stream().map(info ->
info.getEncodedName()).collect(Collectors.toCollection(HashSet::new));
for(final Path regionDir : regionsDirs){
if (!regionsInMeta.contains(regionDir.getName())) {
LOG.debug(regionDir + "is not in META.");
missingRegions.add(regionDir);
}
}
return missingRegions;
}

public void putRegionInfoFromHdfsInMeta(Path region) throws IOException {
RegionInfo info = HRegionFileSystem.loadRegionInfoFileContent(fs, region);
MetaTableAccessor.addRegionToMeta(conn, info);
}

@Override public void close() throws IOException {
this.conn.close();
}
}
Loading