Class MultiTableInputFormatBase
java.lang.Object
org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>
org.apache.hadoop.hbase.mapreduce.MultiTableInputFormatBase
- Direct Known Subclasses:
MultiTableInputFormat
@Public
public abstract class MultiTableInputFormatBase
extends org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>
A base for
MultiTableInputFormat
s. Receives a list of Scan
instances that define
the input tables and filters etc. Subclasses may use other TableRecordReader implementations.-
Field Summary
Modifier and TypeFieldDescriptionprivate static final org.slf4j.Logger
Holds the set of scans used to define the input.private TableRecordReader
The reader scanning the table, can be a custom one. -
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionorg.apache.hadoop.mapreduce.RecordReader<ImmutableBytesWritable,
Result> createRecordReader
(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context) Builds a TableRecordReader.getScans()
Allows subclasses to get the list ofScan
objects.List<org.apache.hadoop.mapreduce.InputSplit>
getSplits
(org.apache.hadoop.mapreduce.JobContext context) Calculates the splits that will serve as input for the map tasks.protected boolean
includeRegionInSplit
(byte[] startKey, byte[] endKey) Test if the given region is to be included in the InputSplit while splitting the regions of a table.protected void
Allows subclasses to set the list ofScan
objects.protected void
setTableRecordReader
(TableRecordReader tableRecordReader) Allows subclasses to set theTableRecordReader
.
-
Field Details
-
LOG
-
scans
Holds the set of scans used to define the input. -
tableRecordReader
The reader scanning the table, can be a custom one.
-
-
Constructor Details
-
MultiTableInputFormatBase
public MultiTableInputFormatBase()
-
-
Method Details
-
createRecordReader
public org.apache.hadoop.mapreduce.RecordReader<ImmutableBytesWritable,Result> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context) throws IOException, InterruptedException Builds a TableRecordReader. If no TableRecordReader was provided, uses the default.- Specified by:
createRecordReader
in classorg.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,
Result> - Parameters:
split
- The split to work with.context
- The current context.- Returns:
- The newly created record reader.
- Throws:
IOException
- When creating the reader fails.InterruptedException
- when record reader initialization fails- See Also:
-
InputFormat.createRecordReader(InputSplit, TaskAttemptContext)
-
getSplits
public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context) throws IOException Calculates the splits that will serve as input for the map tasks. The number of splits matches the number of regions in a table.- Specified by:
getSplits
in classorg.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,
Result> - Parameters:
context
- The current job context.- Returns:
- The list of input splits.
- Throws:
IOException
- When creating the list of splits fails.- See Also:
-
InputFormat.getSplits(org.apache.hadoop.mapreduce.JobContext)
-
includeRegionInSplit
Test if the given region is to be included in the InputSplit while splitting the regions of a table.This optimization is effective when there is a specific reasoning to exclude an entire region from the M-R job, (and hence, not contributing to the InputSplit), given the start and end keys of the same.
Useful when we need to remember the last-processed top record and revisit the [last, current) interval for M-R processing, continuously. In addition to reducing InputSplits, reduces the load on the region server as well, due to the ordering of the keys.
Note: It is possible thatendKey.length() == 0
, for the last (recent) region.
Override this method, if you want to bulk exclude regions altogether from M-R. By default, no region is excluded( i.e. all regions are included).- Parameters:
startKey
- Start key of the regionendKey
- End key of the region- Returns:
- true, if this region needs to be included as part of the input (default).
-
getScans
Allows subclasses to get the list ofScan
objects. -
setScans
Allows subclasses to set the list ofScan
objects.- Parameters:
scans
- The list ofScan
used to define the input
-
setTableRecordReader
Allows subclasses to set theTableRecordReader
.- Parameters:
tableRecordReader
- A differentTableRecordReader
implementation.
-