@InterfaceAudience.Private public class TableSnapshotInputFormatImpl extends Object
Modifier and Type | Class and Description |
---|---|
static class |
TableSnapshotInputFormatImpl.InputSplit
Implementation class for InputSplit logic common between mapred and mapreduce.
|
static class |
TableSnapshotInputFormatImpl.RecordReader
Implementation class for RecordReader logic common between mapred and mapreduce.
|
Modifier and Type | Field and Description |
---|---|
private static float |
DEFAULT_LOCALITY_CUTOFF_MULTIPLIER |
private static String |
LOCALITY_CUTOFF_MULTIPLIER
|
static org.slf4j.Logger |
LOG |
static String |
NUM_SPLITS_PER_REGION
For MapReduce jobs running multiple mappers per region, determines number of splits to generate
per region.
|
protected static String |
RESTORE_DIR_KEY |
static String |
SNAPSHOT_INPUTFORMAT_LOCALITY_BY_REGION_LOCATION
Whether to calculate the Snapshot region location by region location from meta.
|
static boolean |
SNAPSHOT_INPUTFORMAT_LOCALITY_BY_REGION_LOCATION_DEFAULT |
static boolean |
SNAPSHOT_INPUTFORMAT_LOCALITY_ENABLED_DEFAULT |
static String |
SNAPSHOT_INPUTFORMAT_LOCALITY_ENABLED_KEY
Whether to calculate the block location for splits.
|
static String |
SNAPSHOT_INPUTFORMAT_ROW_LIMIT_PER_INPUTSPLIT
In some scenario, scan limited rows on each InputSplit for sampling data extraction
|
static String |
SNAPSHOT_INPUTFORMAT_SCAN_METRICS_ENABLED
Whether to enable scan metrics on Scan, default to true
|
static boolean |
SNAPSHOT_INPUTFORMAT_SCAN_METRICS_ENABLED_DEFAULT |
static String |
SNAPSHOT_INPUTFORMAT_SCANNER_READTYPE
The
Scan.ReadType which should be set on the Scan to read the HBase Snapshot,
default STREAM. |
static Scan.ReadType |
SNAPSHOT_INPUTFORMAT_SCANNER_READTYPE_DEFAULT |
private static String |
SNAPSHOT_NAME_KEY |
static String |
SPLIT_ALGO
For MapReduce jobs running multiple mappers per region, determines what split algorithm we
should be using to find split points for scanners.
|
Constructor and Description |
---|
TableSnapshotInputFormatImpl() |
Modifier and Type | Method and Description |
---|---|
private static List<String> |
calculateLocationsForInputSplit(org.apache.hadoop.conf.Configuration conf,
TableDescriptor htd,
HRegionInfo hri,
org.apache.hadoop.fs.Path tableDir)
Compute block locations for snapshot files (which will get the locations for referred hfiles)
only when localityEnabled is true.
|
static void |
cleanRestoreDir(org.apache.hadoop.mapreduce.Job job,
String snapshotName)
clean restore directory after snapshot scan job
|
static Scan |
extractScanFromConf(org.apache.hadoop.conf.Configuration conf) |
static List<String> |
getBestLocations(org.apache.hadoop.conf.Configuration conf,
HDFSBlocksDistribution blockDistribution) |
private static List<String> |
getBestLocations(org.apache.hadoop.conf.Configuration conf,
HDFSBlocksDistribution blockDistribution,
int numTopsAtMost)
This computes the locations to be passed from the InputSplit.
|
static List<HRegionInfo> |
getRegionInfosFromManifest(SnapshotManifest manifest) |
static SnapshotManifest |
getSnapshotManifest(org.apache.hadoop.conf.Configuration conf,
String snapshotName,
org.apache.hadoop.fs.Path rootDir,
org.apache.hadoop.fs.FileSystem fs) |
private static String |
getSnapshotName(org.apache.hadoop.conf.Configuration conf) |
static RegionSplitter.SplitAlgorithm |
getSplitAlgo(org.apache.hadoop.conf.Configuration conf) |
static List<TableSnapshotInputFormatImpl.InputSplit> |
getSplits(org.apache.hadoop.conf.Configuration conf) |
static List<TableSnapshotInputFormatImpl.InputSplit> |
getSplits(Scan scan,
SnapshotManifest manifest,
List<HRegionInfo> regionManifests,
org.apache.hadoop.fs.Path restoreDir,
org.apache.hadoop.conf.Configuration conf) |
static List<TableSnapshotInputFormatImpl.InputSplit> |
getSplits(Scan scan,
SnapshotManifest manifest,
List<HRegionInfo> regionManifests,
org.apache.hadoop.fs.Path restoreDir,
org.apache.hadoop.conf.Configuration conf,
RegionSplitter.SplitAlgorithm sa,
int numSplits) |
static void |
setInput(org.apache.hadoop.conf.Configuration conf,
String snapshotName,
org.apache.hadoop.fs.Path restoreDir)
Configures the job to use TableSnapshotInputFormat to read from a snapshot.
|
static void |
setInput(org.apache.hadoop.conf.Configuration conf,
String snapshotName,
org.apache.hadoop.fs.Path restoreDir,
RegionSplitter.SplitAlgorithm splitAlgo,
int numSplitsPerRegion)
Configures the job to use TableSnapshotInputFormat to read from a snapshot.
|
public static final org.slf4j.Logger LOG
private static final String SNAPSHOT_NAME_KEY
protected static final String RESTORE_DIR_KEY
private static final String LOCALITY_CUTOFF_MULTIPLIER
private static final float DEFAULT_LOCALITY_CUTOFF_MULTIPLIER
public static final String SPLIT_ALGO
public static final String NUM_SPLITS_PER_REGION
public static final String SNAPSHOT_INPUTFORMAT_LOCALITY_ENABLED_KEY
public static final boolean SNAPSHOT_INPUTFORMAT_LOCALITY_ENABLED_DEFAULT
public static final String SNAPSHOT_INPUTFORMAT_LOCALITY_BY_REGION_LOCATION
public static final boolean SNAPSHOT_INPUTFORMAT_LOCALITY_BY_REGION_LOCATION_DEFAULT
public static final String SNAPSHOT_INPUTFORMAT_ROW_LIMIT_PER_INPUTSPLIT
public static final String SNAPSHOT_INPUTFORMAT_SCAN_METRICS_ENABLED
public static final boolean SNAPSHOT_INPUTFORMAT_SCAN_METRICS_ENABLED_DEFAULT
public static final String SNAPSHOT_INPUTFORMAT_SCANNER_READTYPE
Scan.ReadType
which should be set on the Scan
to read the HBase Snapshot,
default STREAM.public static final Scan.ReadType SNAPSHOT_INPUTFORMAT_SCANNER_READTYPE_DEFAULT
public TableSnapshotInputFormatImpl()
public static List<TableSnapshotInputFormatImpl.InputSplit> getSplits(org.apache.hadoop.conf.Configuration conf) throws IOException
IOException
public static RegionSplitter.SplitAlgorithm getSplitAlgo(org.apache.hadoop.conf.Configuration conf) throws IOException
IOException
public static List<HRegionInfo> getRegionInfosFromManifest(SnapshotManifest manifest)
public static SnapshotManifest getSnapshotManifest(org.apache.hadoop.conf.Configuration conf, String snapshotName, org.apache.hadoop.fs.Path rootDir, org.apache.hadoop.fs.FileSystem fs) throws IOException
IOException
public static Scan extractScanFromConf(org.apache.hadoop.conf.Configuration conf) throws IOException
IOException
public static List<TableSnapshotInputFormatImpl.InputSplit> getSplits(Scan scan, SnapshotManifest manifest, List<HRegionInfo> regionManifests, org.apache.hadoop.fs.Path restoreDir, org.apache.hadoop.conf.Configuration conf) throws IOException
IOException
public static List<TableSnapshotInputFormatImpl.InputSplit> getSplits(Scan scan, SnapshotManifest manifest, List<HRegionInfo> regionManifests, org.apache.hadoop.fs.Path restoreDir, org.apache.hadoop.conf.Configuration conf, RegionSplitter.SplitAlgorithm sa, int numSplits) throws IOException
IOException
private static List<String> calculateLocationsForInputSplit(org.apache.hadoop.conf.Configuration conf, TableDescriptor htd, HRegionInfo hri, org.apache.hadoop.fs.Path tableDir) throws IOException
IOException
private static List<String> getBestLocations(org.apache.hadoop.conf.Configuration conf, HDFSBlocksDistribution blockDistribution, int numTopsAtMost)
public static List<String> getBestLocations(org.apache.hadoop.conf.Configuration conf, HDFSBlocksDistribution blockDistribution)
private static String getSnapshotName(org.apache.hadoop.conf.Configuration conf)
public static void setInput(org.apache.hadoop.conf.Configuration conf, String snapshotName, org.apache.hadoop.fs.Path restoreDir) throws IOException
conf
- the job to configurationsnapshotName
- the name of the snapshot to read fromrestoreDir
- a temporary directory to restore the snapshot into. Current user should
have write permissions to this directory, and this should not be a
subdirectory of rootdir. After the job is finished, restoreDir can be
deleted.IOException
- if an error occurspublic static void setInput(org.apache.hadoop.conf.Configuration conf, String snapshotName, org.apache.hadoop.fs.Path restoreDir, RegionSplitter.SplitAlgorithm splitAlgo, int numSplitsPerRegion) throws IOException
conf
- the job to configuresnapshotName
- the name of the snapshot to read fromrestoreDir
- a temporary directory to restore the snapshot into. Current user
should have write permissions to this directory, and this should not
be a subdirectory of rootdir. After the job is finished, restoreDir
can be deleted.numSplitsPerRegion
- how many input splits to generate per one regionsplitAlgo
- SplitAlgorithm to be used when generating InputSplitsIOException
- if an error occurspublic static void cleanRestoreDir(org.apache.hadoop.mapreduce.Job job, String snapshotName) throws IOException
job
- the snapshot scan jobsnapshotName
- the name of the snapshot to read fromIOException
- if an error occursCopyright © 2007–2020 The Apache Software Foundation. All rights reserved.