Class HFileCorruptionChecker
java.lang.Object
org.apache.hadoop.hbase.util.hbck.HFileCorruptionChecker
This class marches through all of the region's hfiles and verifies that they are all valid files.
One just needs to instantiate the class, use checkTables(List<Path>) and then retrieve the
corrupted hfiles (and quarantined files if in quarantining mode) The implementation currently
parallelizes at the regionDir level.
-
Nested Class Summary
Modifier and TypeClassDescriptionprivate class
An individual work item for parallelized mob dir processing.private class
An individual work item for parallelized regiondir processing. -
Field Summary
Modifier and TypeFieldDescription(package private) final CacheConfig
(package private) final org.apache.hadoop.conf.Configuration
(package private) final Set<org.apache.hadoop.fs.Path>
(package private) final Set<org.apache.hadoop.fs.Path>
(package private) final ExecutorService
(package private) final Set<org.apache.hadoop.fs.Path>
(package private) final Set<org.apache.hadoop.fs.Path>
(package private) final org.apache.hadoop.fs.FileSystem
(package private) final AtomicInteger
(package private) final boolean
private static final org.slf4j.Logger
(package private) final Set<org.apache.hadoop.fs.Path>
(package private) final Set<org.apache.hadoop.fs.Path>
(package private) final AtomicInteger
(package private) final Set<org.apache.hadoop.fs.Path>
(package private) final Set<org.apache.hadoop.fs.Path>
-
Constructor Summary
ConstructorDescriptionHFileCorruptionChecker
(org.apache.hadoop.conf.Configuration conf, ExecutorService executor, boolean quarantine) -
Method Summary
Modifier and TypeMethodDescriptionprotected void
checkColFamDir
(org.apache.hadoop.fs.Path cfDir) Check all files in a column family dir.protected void
checkHFile
(org.apache.hadoop.fs.Path p) Checks a path to see if it is a valid hfile.protected void
checkMobColFamDir
(org.apache.hadoop.fs.Path cfDir) Check all files in a mob column family dir.protected void
checkMobFile
(org.apache.hadoop.fs.Path p) Checks a path to see if it is a valid mob file.private void
checkMobRegionDir
(org.apache.hadoop.fs.Path regionDir) Checks all the mob files of a table.protected void
checkRegionDir
(org.apache.hadoop.fs.Path regionDir) Check all column families in a region dir.(package private) void
checkTableDir
(org.apache.hadoop.fs.Path tableDir) Check all the regiondirs in the specified tableDir path to a tablevoid
checkTables
(Collection<org.apache.hadoop.fs.Path> tables) Check the specified table dirs for bad hfiles.createMobRegionDirChecker
(org.apache.hadoop.fs.Path tableDir) Creates an instance of MobRegionDirChecker.(package private) org.apache.hadoop.fs.Path
createQuarantinePath
(org.apache.hadoop.fs.Path hFile) Given a path, generates a new path to where we move a corrupted hfile (bad trailer, no trailer).Collection<org.apache.hadoop.fs.Path>
Returns the set of corrupted file paths after checkTables is called.Collection<org.apache.hadoop.fs.Path>
Returns the set of corrupted mob file paths after checkTables is called.Collection<org.apache.hadoop.fs.Path>
Returns the set of check failure mob file paths after checkTables is called.Collection<org.apache.hadoop.fs.Path>
Returns the set of check failure file paths after checkTables is called.int
Returns number of hfiles checked in the last HfileCorruptionChecker runCollection<org.apache.hadoop.fs.Path>
Collection<org.apache.hadoop.fs.Path>
int
Returns number of mob files checked in the last HfileCorruptionChecker runCollection<org.apache.hadoop.fs.Path>
Returns the set of successfully quarantined paths after checkTables is called.Collection<org.apache.hadoop.fs.Path>
Returns the set of successfully quarantined paths after checkTables is called.void
report
(HbckErrorReporter out) Print a human readable summary of hfile quarantining operations.
-
Field Details
-
LOG
-
conf
-
fs
-
cacheConf
-
executor
-
corrupted
-
failures
-
quarantined
-
missing
-
corruptedMobFiles
-
failureMobFiles
-
missedMobFiles
-
quarantinedMobFiles
-
inQuarantineMode
-
hfilesChecked
-
mobFilesChecked
-
-
Constructor Details
-
HFileCorruptionChecker
public HFileCorruptionChecker(org.apache.hadoop.conf.Configuration conf, ExecutorService executor, boolean quarantine) throws IOException - Throws:
IOException
-
-
Method Details
-
checkHFile
Checks a path to see if it is a valid hfile. full Path to an HFile This is a connectivity related exception- Throws:
IOException
-
createQuarantinePath
Given a path, generates a new path to where we move a corrupted hfile (bad trailer, no trailer). Path to a corrupt hfile (assumes that it is HBASE_DIR/ table /region/cf/file)- Returns:
- path to where corrupted files are stored. This should be HBASE_DIR/.corrupt/table/region/cf/file.
- Throws:
IOException
-
checkColFamDir
Check all files in a column family dir. column family directory- Throws:
IOException
-
checkMobColFamDir
Check all files in a mob column family dir. mob column family directory- Throws:
IOException
-
checkMobFile
Checks a path to see if it is a valid mob file. full Path to a mob file. This is a connectivity related exception- Throws:
IOException
-
checkMobRegionDir
Checks all the mob files of a table.- Parameters:
regionDir
- The mob region directory- Throws:
IOException
-
checkRegionDir
Check all column families in a region dir. region directory- Throws:
IOException
-
checkTableDir
Check all the regiondirs in the specified tableDir path to a table- Throws:
IOException
-
createMobRegionDirChecker
private HFileCorruptionChecker.MobRegionDirChecker createMobRegionDirChecker(org.apache.hadoop.fs.Path tableDir) Creates an instance of MobRegionDirChecker.- Parameters:
tableDir
- The current table directory.- Returns:
- An instance of MobRegionDirChecker.
-
checkTables
Check the specified table dirs for bad hfiles.- Throws:
IOException
-
getFailures
Returns the set of check failure file paths after checkTables is called. -
getCorrupted
Returns the set of corrupted file paths after checkTables is called. -
getHFilesChecked
Returns number of hfiles checked in the last HfileCorruptionChecker run -
getQuarantined
Returns the set of successfully quarantined paths after checkTables is called. -
getMissing
- Returns:
- the set of paths that were missing. Likely due to deletion/moves from compaction or flushes.
-
getFailureMobFiles
Returns the set of check failure mob file paths after checkTables is called. -
getCorruptedMobFiles
Returns the set of corrupted mob file paths after checkTables is called. -
getMobFilesChecked
Returns number of mob files checked in the last HfileCorruptionChecker run -
getQuarantinedMobFiles
Returns the set of successfully quarantined paths after checkTables is called. -
getMissedMobFiles
- Returns:
- the set of paths that were missing. Likely due to table deletion or deletion/moves from compaction.
-
report
Print a human readable summary of hfile quarantining operations.
-