Class MobRefReporter
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.hadoop.hbase.mob.mapreduce.MobRefReporter
- All Implemented Interfaces:
org.apache.hadoop.conf.Configurable
,org.apache.hadoop.util.Tool
@Private
public class MobRefReporter
extends org.apache.hadoop.conf.Configured
implements org.apache.hadoop.util.Tool
Scans a given table + CF for all mob reference cells to get the list of backing mob files. For
each referenced file we attempt to verify that said file is on the FileSystem in a place that the
MOB system will look when attempting to resolve the actual value.
The job includes counters that can help provide a rough sketch of the mob data.
Map-Reduce Framework Map input records=10000 ... Reduce output records=99 ... CELLS PER ROW Number of rows with 1s of cells per row=10000 MOB NUM_CELLS=52364 PROBLEM Affected rows=338 Problem MOB files=2 ROWS WITH PROBLEMS PER FILE Number of HFiles with 100s of affected rows=2 SIZES OF CELLS Number of cells with size in the 10,000s of bytes=627 Number of cells with size in the 100,000s of bytes=51392 Number of cells with size in the 1,000,000s of bytes=345 SIZES OF ROWS Number of rows with total size in the 100,000s of bytes=6838 Number of rows with total size in the 1,000,000s of bytes=3162
- Map-Reduce Framework:Map input records - the number of rows with mob references
- Map-Reduce Framework:Reduce output records - the number of unique hfiles referenced
- MOB:NUM_CELLS - the total number of mob reference cells
- PROBLEM:Affected rows - the number of rows that reference hfiles with an issue
- PROBLEM:Problem MOB files - the number of unique hfiles that have an issue
- CELLS PER ROW: - this counter group gives a histogram of the order of magnitude of the number of cells in a given row by grouping by the number of digits used in each count. This allows us to see more about the distribution of cells than what we can determine with just the cell count and the row count. In this particular example we can see that all of our rows have somewhere between 1 - 9 cells.
- ROWS WITH PROBLEMS PER FILE: - this counter group gives a histogram of the order of magnitude of the number of rows in each of the hfiles with a problem. e.g. in the example there are 2 hfiles and they each have the same order of magnitude number of rows, specifically between 100 and 999.
- SIZES OF CELLS: - this counter group gives a histogram of the order of magnitude of the size of mob values according to our reference cells. e.g. in the example above we have cell sizes that are all between 10,000 bytes and 9,999,999 bytes. From this histogram we can also see that _most_ cells are 100,000 - 999,000 bytes and the smaller and bigger ones are outliers making up less than 2% of mob cells.
- SIZES OF ROWS: - this counter group gives a histogram of the order of magnitude of the size of mob values across each row according to our reference cells. In the example above we have rows that are are between 100,000 bytes and 9,999,999 bytes. We can also see that about 2/3rd of our rows are 100,000 - 999,999 bytes.
RESULT OF LOOKUP FILE REF comma seperated, base64 encoded rows when there's a probleme.g.
MOB DIR 09c576e28a65ed2ead0004d192ffaa382019110184b30a1c7e034573bf8580aef8393402 MISSING FILE 28e252d7f013973174750d483d358fa020191101f73536e7133f4cd3ab1065edf588d509 MmJiMjMyYzBiMTNjNzc0OTY1ZWY4NTU4ZjBmYmQ2MTUtNTIz,MmEzOGE0YTkzMTZjNDllNWE4MzM1MTdjNDVkMzEwNzAtODg=Possible results are listed; the first three indicate things are working properly.
- MOB DIR - the reference is in the normal MOB area for the given table and CF
- HLINK TO ARCHIVE FOR SAME TABLE - the reference is present in the archive area for this table and CF
- HLINK TO ARCHIVE FOR OTHER TABLE - the reference is present in a different table and CF, either in the MOB or archive areas (e.g. from a snapshot restore or clone)
- ARCHIVE WITH HLINK BUT NOT FROM OUR TABLE - the reference is currently present in the archive area for this table and CF, but it is kept there because a _different_ table has a reference to it (e.g. from a snapshot clone). If these other tables are removed then the file will likely be deleted unless there is a snapshot also referencing it.
- ARCHIVE BUT NO HLINKS - the reference is currently present in the archive for this table and CF, but there are no references present to prevent its removal. Unless it is newer than the general TTL (default 5 minutes) or referenced in a snapshot it will be subject to cleaning.
- ARCHIVE BUT FAILURE WHILE CHECKING HLINKS - Check the job logs to see why things failed while looking for why this file is being kept around.
- MISSING FILE - We couldn't find the reference on the FileSystem. Either there is dataloss due to a bug in the MOB storage system or the MOB storage is damaged but in an edge case that allows it to work for now. You can verify which by doing a raw reference scan to get the referenced hfile and check the underlying filesystem. See the ref guide section on mob for details.
- HLINK BUT POINT TO MISSING FILE - There is a pointer in our mob area for this table and CF to a file elsewhere on the FileSystem, however the file it points to no longer exists.
- MISSING FILE BUT FAILURE WHILE CHECKING HLINKS - We could not find the referenced file, however you should check the job logs to see why we couldn't check to see if there is a pointer to the referenced file in our archive or another table's archive or mob area.
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
static class
-
Field Summary
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescription(package private) static String
log10GroupedString
(long number) Returns the string representation of the given number after grouping it into log10 buckets.static void
private void
int
Main method for the tool.Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
-
Field Details
-
LOG
-
NAME
- See Also:
-
REPORT_JOB_ID
- See Also:
-
REPORT_START_DATETIME
- See Also:
-
-
Constructor Details
-
MobRefReporter
public MobRefReporter()
-
-
Method Details
-
log10GroupedString
Returns the string representation of the given number after grouping it into log10 buckets. e.g. 0-9 -> 1, 10-99 -> 10, ..., 100,000-999,999 -> 100,000, etc. -
run
Main method for the tool.- Specified by:
run
in interfaceorg.apache.hadoop.util.Tool
- Returns:
- 0 if success, 1 for bad args. 2 if job aborted with an exception, 3 if mr job was unsuccessful
- Throws:
IOException
InterruptedException
-
main
- Throws:
Exception
-
printUsage
-