Class MultiTableInputFormatBase
java.lang.Object
org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>
 
org.apache.hadoop.hbase.mapreduce.MultiTableInputFormatBase
- Direct Known Subclasses:
- MultiTableInputFormat
@Public
public abstract class MultiTableInputFormatBase
extends org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result> 
A base for 
MultiTableInputFormats. Receives a list of Scan instances that define
 the input tables and filters etc. Subclasses may use other TableRecordReader implementations.- 
Field SummaryFieldsModifier and TypeFieldDescriptionprivate static final org.slf4j.LoggerHolds the set of scans used to define the input.private TableRecordReaderThe reader scanning the table, can be a custom one.
- 
Constructor SummaryConstructors
- 
Method SummaryModifier and TypeMethodDescriptionorg.apache.hadoop.mapreduce.RecordReader<ImmutableBytesWritable,Result> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context) Builds a TableRecordReader.getScans()Allows subclasses to get the list ofScanobjects.List<org.apache.hadoop.mapreduce.InputSplit>getSplits(org.apache.hadoop.mapreduce.JobContext context) Calculates the splits that will serve as input for the map tasks.protected booleanincludeRegionInSplit(byte[] startKey, byte[] endKey) Test if the given region is to be included in the InputSplit while splitting the regions of a table.protected voidAllows subclasses to set the list ofScanobjects.protected voidsetTableRecordReader(TableRecordReader tableRecordReader) Allows subclasses to set theTableRecordReader.
- 
Field Details- 
LOG
- 
scansHolds the set of scans used to define the input.
- 
tableRecordReaderThe reader scanning the table, can be a custom one.
 
- 
- 
Constructor Details- 
MultiTableInputFormatBasepublic MultiTableInputFormatBase()
 
- 
- 
Method Details- 
createRecordReaderpublic org.apache.hadoop.mapreduce.RecordReader<ImmutableBytesWritable,Result> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context) throws IOException, InterruptedException Builds a TableRecordReader. If no TableRecordReader was provided, uses the default.- Specified by:
- createRecordReaderin class- org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,- Result> 
- Parameters:
- split- The split to work with.
- context- The current context.
- Returns:
- The newly created record reader.
- Throws:
- IOException- When creating the reader fails.
- InterruptedException- when record reader initialization fails
- See Also:
- 
- InputFormat.createRecordReader(InputSplit, TaskAttemptContext)
 
 
- 
getSplitspublic List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context) throws IOException Calculates the splits that will serve as input for the map tasks. The number of splits matches the number of regions in a table.- Specified by:
- getSplitsin class- org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,- Result> 
- Parameters:
- context- The current job context.
- Returns:
- The list of input splits.
- Throws:
- IOException- When creating the list of splits fails.
- See Also:
- 
- InputFormat.getSplits(org.apache.hadoop.mapreduce.JobContext)
 
 
- 
includeRegionInSplitTest if the given region is to be included in the InputSplit while splitting the regions of a table.This optimization is effective when there is a specific reasoning to exclude an entire region from the M-R job, (and hence, not contributing to the InputSplit), given the start and end keys of the same. 
 Useful when we need to remember the last-processed top record and revisit the [last, current) interval for M-R processing, continuously. In addition to reducing InputSplits, reduces the load on the region server as well, due to the ordering of the keys.
 
 Note: It is possible thatendKey.length() == 0, for the last (recent) region.
 Override this method, if you want to bulk exclude regions altogether from M-R. By default, no region is excluded( i.e. all regions are included).- Parameters:
- startKey- Start key of the region
- endKey- End key of the region
- Returns:
- true, if this region needs to be included as part of the input (default).
 
- 
getScansAllows subclasses to get the list ofScanobjects.
- 
setScansAllows subclasses to set the list ofScanobjects.- Parameters:
- scans- The list of- Scanused to define the input
 
- 
setTableRecordReaderAllows subclasses to set theTableRecordReader.- Parameters:
- tableRecordReader- A different- TableRecordReaderimplementation.
 
 
-