java.lang.Object

org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>

org.apache.hadoop.hbase.mapreduce.MultiTableInputFormatBase

Direct Known Subclasses:: MultiTableInputFormat

@Public public abstract class MultiTableInputFormatBase extends org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>

A base for MultiTableInputFormats. Receives a list of Scan instances that define the input tables and filters etc. Subclasses may use other TableRecordReader implementations.

Field Summary

Fields

Modifier and Type

Field

Description

private static final org.slf4j.Logger

LOG

private List<Scan>

scans

Holds the set of scans used to define the input.

private TableRecordReader

tableRecordReader

The reader scanning the table, can be a custom one.
Constructor Summary

Constructors

Constructor

Description

MultiTableInputFormatBase()
Method Summary

Modifier and Type

Method

Description

org.apache.hadoop.mapreduce.RecordReader<ImmutableBytesWritable,Result>

createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context)

Builds a TableRecordReader.

protected List<Scan>

getScans()

Allows subclasses to get the list of Scan objects.

List<org.apache.hadoop.mapreduce.InputSplit>

getSplits(org.apache.hadoop.mapreduce.JobContext context)

Calculates the splits that will serve as input for the map tasks.

protected boolean

includeRegionInSplit(byte[] startKey, byte[] endKey)

Test if the given region is to be included in the InputSplit while splitting the regions of a table.

protected void

setScans(List<Scan> scans)

Allows subclasses to set the list of Scan objects.

protected void

setTableRecordReader(TableRecordReader tableRecordReader)

Allows subclasses to set the TableRecordReader.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- LOG
  
  private static final org.slf4j.Logger LOG
- scans
  
  private List<Scan> scans
  
  Holds the set of scans used to define the input.
- tableRecordReader
  
  private TableRecordReader tableRecordReader
  
  The reader scanning the table, can be a custom one.
Constructor Details
- MultiTableInputFormatBase
  
  public MultiTableInputFormatBase()
Method Details
- createRecordReader
  
  public org.apache.hadoop.mapreduce.RecordReader<ImmutableBytesWritable,Result> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context) throws IOException, InterruptedException
  
  Builds a TableRecordReader. If no TableRecordReader was provided, uses the default.
  Specified by:
  
  createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>
  
  Parameters:
  
  split - The split to work with.
  
  context - The current context.
  
  Returns:
  
  The newly created record reader.
  
  Throws:
  
  IOException - When creating the reader fails.
  
  InterruptedException - when record reader initialization fails
  
  See Also:
  
  InputFormat.createRecordReader(InputSplit, TaskAttemptContext)
- getSplits
  
  public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context) throws IOException
  
  Calculates the splits that will serve as input for the map tasks. The number of splits matches the number of regions in a table.
  Specified by:
  
  getSplits in class org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>
  
  Parameters:
  
  context - The current job context.
  
  Returns:
  
  The list of input splits.
  
  Throws:
  
  IOException - When creating the list of splits fails.
  
  See Also:
  
  InputFormat.getSplits(org.apache.hadoop.mapreduce.JobContext)
- includeRegionInSplit
  
  protected boolean includeRegionInSplit(byte[] startKey, byte[] endKey)
  
  Test if the given region is to be included in the InputSplit while splitting the regions of a table.
  This optimization is effective when there is a specific reasoning to exclude an entire region from the M-R job, (and hence, not contributing to the InputSplit), given the start and end keys of the same.
  Useful when we need to remember the last-processed top record and revisit the [last, current) interval for M-R processing, continuously. In addition to reducing InputSplits, reduces the load on the region server as well, due to the ordering of the keys.
  
  Note: It is possible that endKey.length() == 0 , for the last (recent) region.
  Override this method, if you want to bulk exclude regions altogether from M-R. By default, no region is excluded( i.e. all regions are included).
  
  Parameters:
  
  startKey - Start key of the region
  
  endKey - End key of the region
  
  Returns:
  
  true, if this region needs to be included as part of the input (default).
- getScans
  
  protected List<Scan> getScans()
  
  Allows subclasses to get the list of Scan objects.
- setScans
  
  protected void setScans(List<Scan> scans)
  
  Allows subclasses to set the list of Scan objects.
  
  Parameters:
  
  scans - The list of Scan used to define the input
- setTableRecordReader
  
  protected void setTableRecordReader(TableRecordReader tableRecordReader)
  
  Allows subclasses to set the TableRecordReader.
  
  Parameters:
  
  tableRecordReader - A different TableRecordReader implementation.

Class MultiTableInputFormatBase

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

LOG

scans

tableRecordReader

Constructor Details

MultiTableInputFormatBase

Method Details

createRecordReader

getSplits

includeRegionInSplit

getScans

setScans

setTableRecordReader