org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>

org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat

org.apache.hadoop.hbase.mapreduce.MultiTableSnapshotInputFormat

@Public public class MultiTableSnapshotInputFormat extends TableSnapshotInputFormat

MultiTableSnapshotInputFormat generalizes TableSnapshotInputFormat allowing a MapReduce job to run over one or more table snapshots, with one or more scans configured for each. Internally, the input format delegates to TableSnapshotInputFormat and thus has the same performance advantages; see TableSnapshotInputFormat for more details. Usage is similar to TableSnapshotInputFormat, with the following exception: initMultiTableSnapshotMapperJob takes in a map from snapshot name to a collection of scans. For each snapshot in the map, each corresponding scan will be applied; the overall dataset for the job is defined by the concatenation of the regions and tables included in each snapshot/scan pair. (Map, Class, Class, Class, org.apache.hadoop.mapreduce.Job, boolean, Path) can be used to configure the job.

 
 Job job = new Job(conf);
 Map<String, Collection<Scan>> snapshotScans = ImmutableMap.of(
    "snapshot1", ImmutableList.of(new Scan(Bytes.toBytes("a"), Bytes.toBytes("b"))),
    "snapshot2", ImmutableList.of(new Scan(Bytes.toBytes("1"), Bytes.toBytes("2")))
 );
 Path restoreDir = new Path("/tmp/snapshot_restore_dir")
 TableMapReduceUtil.initTableSnapshotMapperJob(
     snapshotScans, MyTableMapper.class, MyMapKeyOutput.class,
      MyMapOutputValueWritable.class, job, true, restoreDir);

Internally, this input format restores each snapshot into a subdirectory of the given tmp directory. Input splits and record readers are created as described in TableSnapshotInputFormat (one per region). See TableSnapshotInputFormat for more notes on permissioning; the same caveats apply here.

See Also:

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat
TableSnapshotInputFormat.TableSnapshotRegionRecordReader, TableSnapshotInputFormat.TableSnapshotRegionSplit
Field Summary

Fields

Modifier and Type

Field

Description

private final MultiTableSnapshotInputFormatImpl

delegate
Constructor Summary

Constructors

Constructor

Description

MultiTableSnapshotInputFormat()
Method Summary

Modifier and Type

Method

Description

List<org.apache.hadoop.mapreduce.InputSplit>

getSplits(org.apache.hadoop.mapreduce.JobContext jobContext)

static void

setInput(org.apache.hadoop.conf.Configuration configuration, Map<String,Collection<Scan>> snapshotScans, org.apache.hadoop.fs.Path tmpRestoreDir)

Methods inherited from class org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat
cleanRestoreDir, createRecordReader, setInput, setInput

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- delegate
  
  private final MultiTableSnapshotInputFormatImpl delegate
Constructor Details
- MultiTableSnapshotInputFormat
  
  public MultiTableSnapshotInputFormat()
Method Details
- getSplits
  
  public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext jobContext) throws IOException, InterruptedException
  
  Overrides:
  
  getSplits in class TableSnapshotInputFormat
  
  Throws:
  
  IOException
  
  InterruptedException
- setInput
  
  public static void setInput(org.apache.hadoop.conf.Configuration configuration, Map<String,Collection<Scan>> snapshotScans, org.apache.hadoop.fs.Path tmpRestoreDir) throws IOException
  
  Throws:
  
  IOException

Class MultiTableSnapshotInputFormat

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat

Field Summary

Constructor Summary

Method Summary

Methods inherited from class org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat

Methods inherited from class java.lang.Object

Field Details

delegate

Constructor Details

MultiTableSnapshotInputFormat

Method Details

getSplits

setInput