@InterfaceAudience.Public public class TableInputFormat extends TableInputFormatBase implements org.apache.hadoop.conf.Configurable
Modifier and Type | Field and Description |
---|---|
private org.apache.hadoop.conf.Configuration |
conf
The configuration.
|
static String |
INPUT_TABLE
Job parameter that specifies the input table.
|
private static org.slf4j.Logger |
LOG |
static String |
SCAN
Base-64 encoded scanner.
|
static String |
SCAN_BATCHSIZE
Set the maximum number of values to return for each call to next().
|
static String |
SCAN_CACHEBLOCKS
Set to false to disable server-side caching of blocks for this scan.
|
static String |
SCAN_CACHEDROWS
The number of rows for caching that will be passed to scanners.
|
static String |
SCAN_COLUMN_FAMILY
Column Family to Scan
|
static String |
SCAN_COLUMNS
Space delimited list of columns and column families to scan.
|
static String |
SCAN_MAXVERSIONS
The maximum number of version to return.
|
static String |
SCAN_ROW_START
Scan start row
|
static String |
SCAN_ROW_STOP
Scan stop row
|
static String |
SCAN_TIMERANGE_END
The ending timestamp used to filter columns with a specific range of versions.
|
static String |
SCAN_TIMERANGE_START
The starting timestamp used to filter columns with a specific range of versions.
|
static String |
SCAN_TIMESTAMP
The timestamp used to filter columns with a specific timestamp.
|
static String |
SHUFFLE_MAPS
Specify if we have to shuffle the map tasks.
|
private static String |
SPLIT_TABLE
If specified, use start keys of this table to split.
|
MAPREDUCE_INPUT_AUTOBALANCE, MAX_AVERAGE_REGION_SIZE, NUM_MAPPERS_PER_REGION
Constructor and Description |
---|
TableInputFormat() |
Modifier and Type | Method and Description |
---|---|
private static void |
addColumn(Scan scan,
byte[] familyAndQualifier)
Parses a combined family and qualifier and adds either both or just the
family in case there is no qualifier.
|
static void |
addColumns(Scan scan,
byte[][] columns)
Adds an array of columns specified using old format, family:qualifier.
|
private static void |
addColumns(Scan scan,
String columns)
Convenience method to parse a string representation of an array of column specifiers.
|
static void |
configureSplitTable(org.apache.hadoop.mapreduce.Job job,
TableName tableName)
Sets split table in map-reduce job.
|
static Scan |
createScanFromConfiguration(org.apache.hadoop.conf.Configuration conf)
Sets up a
Scan instance, applying settings from the configuration property
constants defined in TableInputFormat . |
org.apache.hadoop.conf.Configuration |
getConf()
Returns the current configuration.
|
List<org.apache.hadoop.mapreduce.InputSplit> |
getSplits(org.apache.hadoop.mapreduce.JobContext context)
Calculates the splits that will serve as input for the map tasks.
|
protected Pair<byte[][],byte[][]> |
getStartEndKeys() |
protected void |
initialize(org.apache.hadoop.mapreduce.JobContext context)
Handle subclass specific set up.
|
void |
setConf(org.apache.hadoop.conf.Configuration configuration)
Sets the configuration.
|
calculateAutoBalancedSplits, closeTable, createNInputSplitsUniform, createRecordReader, createRegionSizeCalculator, getAdmin, getRegionLocator, getScan, getTable, includeRegionInSplit, initializeTable, reverseDNS, setScan, setTableRecordReader
private static final org.slf4j.Logger LOG
public static final String INPUT_TABLE
private static final String SPLIT_TABLE
public static final String SCAN
TableMapReduceUtil.convertScanToString(Scan)
for more details.public static final String SCAN_ROW_START
public static final String SCAN_ROW_STOP
public static final String SCAN_COLUMN_FAMILY
public static final String SCAN_COLUMNS
public static final String SCAN_TIMESTAMP
public static final String SCAN_TIMERANGE_START
public static final String SCAN_TIMERANGE_END
public static final String SCAN_MAXVERSIONS
public static final String SCAN_CACHEBLOCKS
public static final String SCAN_CACHEDROWS
public static final String SCAN_BATCHSIZE
public static final String SHUFFLE_MAPS
private org.apache.hadoop.conf.Configuration conf
public TableInputFormat()
public org.apache.hadoop.conf.Configuration getConf()
getConf
in interface org.apache.hadoop.conf.Configurable
Configurable.getConf()
public void setConf(org.apache.hadoop.conf.Configuration configuration)
setConf
in interface org.apache.hadoop.conf.Configurable
configuration
- The configuration to set.Configurable.setConf(
org.apache.hadoop.conf.Configuration)
public static Scan createScanFromConfiguration(org.apache.hadoop.conf.Configuration conf) throws IOException
Scan
instance, applying settings from the configuration property
constants defined in TableInputFormat
. This allows specifying things such as:
IOException
protected void initialize(org.apache.hadoop.mapreduce.JobContext context) throws IOException
TableInputFormatBase
TableInputFormatBase.createRecordReader(InputSplit, TaskAttemptContext)
and TableInputFormatBase.getSplits(JobContext)
,
will call TableInputFormatBase.initialize(JobContext)
as a convenient centralized location to handle
retrieving the necessary configuration information and calling
TableInputFormatBase.initializeTable(Connection, TableName)
.
Subclasses should implement their initialize call such that it is safe to call multiple times.
The current TableInputFormatBase implementation relies on a non-null table reference to decide
if an initialize call is needed, but this behavior may change in the future. In particular,
it is critical that initializeTable not be called multiple times since this will leak
Connection instances.initialize
in class TableInputFormatBase
IOException
private static void addColumn(Scan scan, byte[] familyAndQualifier)
scan
- The Scan to update.familyAndQualifier
- family and qualifierIllegalArgumentException
- When familyAndQualifier is invalid.public static void addColumns(Scan scan, byte[][] columns)
Overrides previous calls to Scan.addColumn(byte[], byte[])
for any families in the
input.
scan
- The Scan to update.columns
- array of columns, formatted as family:qualifier
Scan.addColumn(byte[], byte[])
public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context) throws IOException
getSplits
in class TableInputFormatBase
context
- The current job context.IOException
- When creating the list of splits fails.InputFormat.getSplits(
org.apache.hadoop.mapreduce.JobContext)
private static void addColumns(Scan scan, String columns)
scan
- The Scan to update.columns
- The columns to parse.protected Pair<byte[][],byte[][]> getStartEndKeys() throws IOException
getStartEndKeys
in class TableInputFormatBase
IOException
public static void configureSplitTable(org.apache.hadoop.mapreduce.Job job, TableName tableName)
Copyright © 2007–2020 The Apache Software Foundation. All rights reserved.