Class TableInputFormat
java.lang.Object
org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>
org.apache.hadoop.hbase.mapreduce.TableInputFormatBase
org.apache.hadoop.hbase.mapreduce.TableInputFormat
- All Implemented Interfaces:
org.apache.hadoop.conf.Configurable
- Direct Known Subclasses:
RoundRobinTableInputFormat
@Public
public class TableInputFormat
extends TableInputFormatBase
implements org.apache.hadoop.conf.Configurable
Convert HBase tabular data into a format that is consumable by Map/Reduce.
-
Field Summary
Modifier and TypeFieldDescriptionprivate org.apache.hadoop.conf.Configuration
The configuration.static final String
Job parameter that specifies the input table.private static final org.slf4j.Logger
static final String
Base-64 encoded scanner.static final String
Set the maximum number of values to return for each call to next().static final String
Set to false to disable server-side caching of blocks for this scan.static final String
The number of rows for caching that will be passed to scanners.static final String
Column Family to Scanstatic final String
Space delimited list of columns and column families to scan.static final String
The maximum number of version to return.static final String
Scan start rowstatic final String
Scan stop rowstatic final String
The ending timestamp used to filter columns with a specific range of versions.static final String
The starting timestamp used to filter columns with a specific range of versions.static final String
The timestamp used to filter columns with a specific timestamp.static final String
Specify if we have to shuffle the map tasks.private static final String
If specified, use start keys of this table to split.Fields inherited from class org.apache.hadoop.hbase.mapreduce.TableInputFormatBase
MAPREDUCE_INPUT_AUTOBALANCE, MAX_AVERAGE_REGION_SIZE, NUM_MAPPERS_PER_REGION
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionprivate static void
Parses a combined family and qualifier and adds either both or just the family in case there is no qualifier.static void
addColumns
(Scan scan, byte[][] columns) Adds an array of columns specified using old format, family:qualifier.private static void
addColumns
(Scan scan, String columns) Convenience method to parse a string representation of an array of column specifiers.static void
configureSplitTable
(org.apache.hadoop.mapreduce.Job job, TableName tableName) Sets split table in map-reduce job.static Scan
createScanFromConfiguration
(org.apache.hadoop.conf.Configuration conf) Sets up aScan
instance, applying settings from the configuration property constants defined inTableInputFormat
.org.apache.hadoop.conf.Configuration
getConf()
Returns the current configuration.List<org.apache.hadoop.mapreduce.InputSplit>
getSplits
(org.apache.hadoop.mapreduce.JobContext context) Calculates the splits that will serve as input for the map tasks.protected Pair<byte[][],
byte[][]> protected void
initialize
(org.apache.hadoop.mapreduce.JobContext context) Handle subclass specific set up.void
setConf
(org.apache.hadoop.conf.Configuration configuration) Sets the configuration.Methods inherited from class org.apache.hadoop.hbase.mapreduce.TableInputFormatBase
calculateAutoBalancedSplits, closeTable, createNInputSplitsUniform, createRecordReader, createRegionSizeCalculator, getAdmin, getRegionLocator, getScan, getTable, includeRegionInSplit, initializeTable, reverseDNS, setScan, setTableRecordReader
-
Field Details
-
LOG
-
INPUT_TABLE
Job parameter that specifies the input table.- See Also:
-
SPLIT_TABLE
If specified, use start keys of this table to split. This is useful when you are preparing data for bulkload.- See Also:
-
SCAN
Base-64 encoded scanner. All other SCAN_ confs are ignored if this is specified. SeeTableMapReduceUtil.convertScanToString(Scan)
for more details.- See Also:
-
SCAN_ROW_START
Scan start row- See Also:
-
SCAN_ROW_STOP
Scan stop row- See Also:
-
SCAN_COLUMN_FAMILY
Column Family to Scan- See Also:
-
SCAN_COLUMNS
Space delimited list of columns and column families to scan.- See Also:
-
SCAN_TIMESTAMP
The timestamp used to filter columns with a specific timestamp.- See Also:
-
SCAN_TIMERANGE_START
The starting timestamp used to filter columns with a specific range of versions.- See Also:
-
SCAN_TIMERANGE_END
The ending timestamp used to filter columns with a specific range of versions.- See Also:
-
SCAN_MAXVERSIONS
The maximum number of version to return.- See Also:
-
SCAN_CACHEBLOCKS
Set to false to disable server-side caching of blocks for this scan.- See Also:
-
SCAN_CACHEDROWS
The number of rows for caching that will be passed to scanners.- See Also:
-
SCAN_BATCHSIZE
Set the maximum number of values to return for each call to next().- See Also:
-
SHUFFLE_MAPS
Specify if we have to shuffle the map tasks.- See Also:
-
conf
The configuration.
-
-
Constructor Details
-
TableInputFormat
public TableInputFormat()
-
-
Method Details
-
getConf
Returns the current configuration.- Specified by:
getConf
in interfaceorg.apache.hadoop.conf.Configurable
- Returns:
- The current configuration.
- See Also:
-
Configurable.getConf()
-
setConf
Sets the configuration. This is used to set the details for the table to be scanned.- Specified by:
setConf
in interfaceorg.apache.hadoop.conf.Configurable
- Parameters:
configuration
- The configuration to set.- See Also:
-
Configurable.setConf(org.apache.hadoop.conf.Configuration)
-
createScanFromConfiguration
public static Scan createScanFromConfiguration(org.apache.hadoop.conf.Configuration conf) throws IOException Sets up aScan
instance, applying settings from the configuration property constants defined inTableInputFormat
. This allows specifying things such as:- start and stop rows
- column qualifiers or families
- timestamps or timerange
- scanner caching and batch size
- Throws:
IOException
-
initialize
Description copied from class:TableInputFormatBase
Handle subclass specific set up. Each of the entry points used by the MapReduce framework,TableInputFormatBase.createRecordReader(InputSplit, TaskAttemptContext)
andTableInputFormatBase.getSplits(JobContext)
, will callTableInputFormatBase.initialize(JobContext)
as a convenient centralized location to handle retrieving the necessary configuration information and callingTableInputFormatBase.initializeTable(Connection, TableName)
. Subclasses should implement their initialize call such that it is safe to call multiple times. The current TableInputFormatBase implementation relies on a non-null table reference to decide if an initialize call is needed, but this behavior may change in the future. In particular, it is critical that initializeTable not be called multiple times since this will leak Connection instances.- Overrides:
initialize
in classTableInputFormatBase
- Throws:
IOException
-
addColumn
Parses a combined family and qualifier and adds either both or just the family in case there is no qualifier. This assumes the older colon divided notation, e.g. "family:qualifier".- Parameters:
scan
- The Scan to update.familyAndQualifier
- family and qualifier- Throws:
IllegalArgumentException
- When familyAndQualifier is invalid.
-
addColumns
Adds an array of columns specified using old format, family:qualifier.Overrides previous calls to
Scan.addColumn(byte[], byte[])
for any families in the input.- Parameters:
scan
- The Scan to update.columns
- array of columns, formatted asfamily:qualifier
- See Also:
-
getSplits
public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context) throws IOException Calculates the splits that will serve as input for the map tasks. The number of splits matches the number of regions in a table. Splits are shuffled if required.- Overrides:
getSplits
in classTableInputFormatBase
- Parameters:
context
- The current job context.- Returns:
- The list of input splits.
- Throws:
IOException
- When creating the list of splits fails.- See Also:
-
InputFormat.getSplits(org.apache.hadoop.mapreduce.JobContext)
-
addColumns
Convenience method to parse a string representation of an array of column specifiers.- Parameters:
scan
- The Scan to update.columns
- The columns to parse.
-
getStartEndKeys
- Overrides:
getStartEndKeys
in classTableInputFormatBase
- Throws:
IOException
-
configureSplitTable
Sets split table in map-reduce job.
-