@InterfaceAudience.Public public abstract class MultiTableInputFormatBase extends org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>
MultiTableInputFormat
s. Receives a list of
Scan
instances that define the input tables and
filters etc. Subclasses may use other TableRecordReader implementations.Modifier and Type | Field and Description |
---|---|
private static org.slf4j.Logger |
LOG |
private List<Scan> |
scans
Holds the set of scans used to define the input.
|
private TableRecordReader |
tableRecordReader
The reader scanning the table, can be a custom one.
|
Constructor and Description |
---|
MultiTableInputFormatBase() |
Modifier and Type | Method and Description |
---|---|
org.apache.hadoop.mapreduce.RecordReader<ImmutableBytesWritable,Result> |
createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
org.apache.hadoop.mapreduce.TaskAttemptContext context)
Builds a TableRecordReader.
|
protected List<Scan> |
getScans()
Allows subclasses to get the list of
Scan objects. |
List<org.apache.hadoop.mapreduce.InputSplit> |
getSplits(org.apache.hadoop.mapreduce.JobContext context)
Calculates the splits that will serve as input for the map tasks.
|
protected boolean |
includeRegionInSplit(byte[] startKey,
byte[] endKey)
Test if the given region is to be included in the InputSplit while
splitting the regions of a table.
|
protected void |
setScans(List<Scan> scans)
Allows subclasses to set the list of
Scan objects. |
protected void |
setTableRecordReader(TableRecordReader tableRecordReader)
Allows subclasses to set the
TableRecordReader . |
private static final org.slf4j.Logger LOG
private TableRecordReader tableRecordReader
public MultiTableInputFormatBase()
public org.apache.hadoop.mapreduce.RecordReader<ImmutableBytesWritable,Result> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context) throws IOException, InterruptedException
createRecordReader
in class org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>
split
- The split to work with.context
- The current context.IOException
- When creating the reader fails.InterruptedException
- when record reader initialization failsInputFormat.createRecordReader(
org.apache.hadoop.mapreduce.InputSplit,
org.apache.hadoop.mapreduce.TaskAttemptContext)
public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context) throws IOException
getSplits
in class org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>
context
- The current job context.IOException
- When creating the list of splits fails.InputFormat.getSplits(org.apache.hadoop.mapreduce.JobContext)
protected boolean includeRegionInSplit(byte[] startKey, byte[] endKey)
This optimization is effective when there is a specific reasoning to
exclude an entire region from the M-R job, (and hence, not contributing to
the InputSplit), given the start and end keys of the same.
Useful when we need to remember the last-processed top record and revisit
the [last, current) interval for M-R processing, continuously. In addition
to reducing InputSplits, reduces the load on the region server as well, due
to the ordering of the keys.
Note: It is possible that endKey.length() == 0
, for the last
(recent) region.
Override this method, if you want to bulk exclude regions altogether from
M-R. By default, no region is excluded( i.e. all regions are included).
startKey
- Start key of the regionendKey
- End key of the regionprotected void setScans(List<Scan> scans)
Scan
objects.scans
- The list of Scan
used to define the inputprotected void setTableRecordReader(TableRecordReader tableRecordReader)
TableRecordReader
.tableRecordReader
- A different TableRecordReader
implementation.Copyright © 2007–2020 The Apache Software Foundation. All rights reserved.