Class MultiTableSnapshotInputFormat
java.lang.Object
org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>
org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat
org.apache.hadoop.hbase.mapreduce.MultiTableSnapshotInputFormat
MultiTableSnapshotInputFormat generalizes
TableSnapshotInputFormat
allowing a MapReduce
job to run over one or more table snapshots, with one or more scans configured for each.
Internally, the input format delegates to TableSnapshotInputFormat
and thus has the same
performance advantages; see TableSnapshotInputFormat
for more details. Usage is similar
to TableSnapshotInputFormat, with the following exception: initMultiTableSnapshotMapperJob takes
in a map from snapshot name to a collection of scans. For each snapshot in the map, each
corresponding scan will be applied; the overall dataset for the job is defined by the
concatenation of the regions and tables included in each snapshot/scan pair.
(Map, Class, Class, Class, org.apache.hadoop.mapreduce.Job, boolean, Path)
can be used to configure the job.
Job job = new Job(conf);
Map<String, Collection<Scan>> snapshotScans = ImmutableMap.of(
"snapshot1", ImmutableList.of(new Scan(Bytes.toBytes("a"), Bytes.toBytes("b"))),
"snapshot2", ImmutableList.of(new Scan(Bytes.toBytes("1"), Bytes.toBytes("2")))
);
Path restoreDir = new Path("/tmp/snapshot_restore_dir")
TableMapReduceUtil.initTableSnapshotMapperJob(
snapshotScans, MyTableMapper.class, MyMapKeyOutput.class,
MyMapOutputValueWritable.class, job, true, restoreDir);
Internally, this input format restores each snapshot into a subdirectory of the given tmp
directory. Input splits and record readers are created as described in
TableSnapshotInputFormat
(one per region). See
TableSnapshotInputFormat
for more notes on permissioning; the same caveats apply here.- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat
TableSnapshotInputFormat.TableSnapshotRegionRecordReader, TableSnapshotInputFormat.TableSnapshotRegionSplit
-
Field Summary
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionList<org.apache.hadoop.mapreduce.InputSplit>
getSplits
(org.apache.hadoop.mapreduce.JobContext jobContext) static void
setInput
(org.apache.hadoop.conf.Configuration configuration, Map<String, Collection<Scan>> snapshotScans, org.apache.hadoop.fs.Path tmpRestoreDir) Methods inherited from class org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat
cleanRestoreDir, createRecordReader, setInput, setInput
-
Field Details
-
delegate
-
-
Constructor Details
-
MultiTableSnapshotInputFormat
public MultiTableSnapshotInputFormat()
-
-
Method Details
-
getSplits
public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext jobContext) throws IOException, InterruptedException - Overrides:
getSplits
in classTableSnapshotInputFormat
- Throws:
IOException
InterruptedException
-
setInput
public static void setInput(org.apache.hadoop.conf.Configuration configuration, Map<String, Collection<Scan>> snapshotScans, org.apache.hadoop.fs.Path tmpRestoreDir) throws IOException- Throws:
IOException
-