Package org.apache.hadoop.hbase.mapred
Class MultiTableSnapshotInputFormat
java.lang.Object
org.apache.hadoop.hbase.mapred.TableSnapshotInputFormat
org.apache.hadoop.hbase.mapred.MultiTableSnapshotInputFormat
- All Implemented Interfaces:
org.apache.hadoop.mapred.InputFormat<ImmutableBytesWritable,
Result>
@Public
public class MultiTableSnapshotInputFormat
extends TableSnapshotInputFormat
implements org.apache.hadoop.mapred.InputFormat<ImmutableBytesWritable,Result>
MultiTableSnapshotInputFormat generalizes
TableSnapshotInputFormat
allowing a MapReduce job to run
over one or more table snapshots, with one or more scans configured for each. Internally, the
input format delegates to TableSnapshotInputFormat
and
thus has the same performance advantages; see
TableSnapshotInputFormat
for more details. Usage is
similar to TableSnapshotInputFormat, with the following exception:
initMultiTableSnapshotMapperJob takes in a map from snapshot name to a collection of scans. For
each snapshot in the map, each corresponding scan will be applied; the overall dataset for the
job is defined by the concatenation of the regions and tables included in each snapshot/scan
pair.
TableMapReduceUtil.initMultiTableSnapshotMapperJob(Map, Class, Class, Class, JobConf, boolean, Path)
can be used to configure the job.
Job job = new Job(conf);
Map<String, Collection<Scan>> snapshotScans = ImmutableMap.of(
"snapshot1", ImmutableList.of(new Scan(Bytes.toBytes("a"), Bytes.toBytes("b"))),
"snapshot2", ImmutableList.of(new Scan(Bytes.toBytes("1"), Bytes.toBytes("2")))
);
Path restoreDir = new Path("/tmp/snapshot_restore_dir")
TableMapReduceUtil.initTableSnapshotMapperJob(
snapshotScans, MyTableMapper.class, MyMapKeyOutput.class,
MyMapOutputValueWritable.class, job, true, restoreDir);
Internally, this input format restores each snapshot into a subdirectory of the given tmp
directory. Input splits and record readers are created as described in
TableSnapshotInputFormat
(one per region). See
TableSnapshotInputFormat
for more notes on
permissioning; the same caveats apply here.- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.hadoop.hbase.mapred.TableSnapshotInputFormat
TableSnapshotInputFormat.TableSnapshotRecordReader, TableSnapshotInputFormat.TableSnapshotRegionSplit
-
Field Summary
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionorg.apache.hadoop.mapred.RecordReader<ImmutableBytesWritable,
Result> getRecordReader
(org.apache.hadoop.mapred.InputSplit split, org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.mapred.Reporter reporter) org.apache.hadoop.mapred.InputSplit[]
getSplits
(org.apache.hadoop.mapred.JobConf job, int numSplits) static void
setInput
(org.apache.hadoop.conf.Configuration conf, Map<String, Collection<Scan>> snapshotScans, org.apache.hadoop.fs.Path restoreDir) Methods inherited from class org.apache.hadoop.hbase.mapred.TableSnapshotInputFormat
setInput, setInput
-
Field Details
-
delegate
-
-
Constructor Details
-
MultiTableSnapshotInputFormat
public MultiTableSnapshotInputFormat()
-
-
Method Details
-
getSplits
public org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf job, int numSplits) throws IOException - Specified by:
getSplits
in interfaceorg.apache.hadoop.mapred.InputFormat<ImmutableBytesWritable,
Result> - Overrides:
getSplits
in classTableSnapshotInputFormat
- Throws:
IOException
-
getRecordReader
public org.apache.hadoop.mapred.RecordReader<ImmutableBytesWritable,Result> getRecordReader(org.apache.hadoop.mapred.InputSplit split, org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.mapred.Reporter reporter) throws IOException - Specified by:
getRecordReader
in interfaceorg.apache.hadoop.mapred.InputFormat<ImmutableBytesWritable,
Result> - Overrides:
getRecordReader
in classTableSnapshotInputFormat
- Throws:
IOException
-
setInput
public static void setInput(org.apache.hadoop.conf.Configuration conf, Map<String, Collection<Scan>> snapshotScans, org.apache.hadoop.fs.Path restoreDir) throws IOException- Throws:
IOException
-