TableInputFormat (Apache HBase 2.0.6 API)

java.lang.Object
- org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>
- - org.apache.hadoop.hbase.mapreduce.TableInputFormatBase
  - - org.apache.hadoop.hbase.mapreduce.TableInputFormat

All Implemented Interfaces:

org.apache.hadoop.conf.Configurable
```
@InterfaceAudience.Public
public class TableInputFormat
extends TableInputFormatBase
implements org.apache.hadoop.conf.Configurable
```
Convert HBase tabular data into a format that is consumable by Map/Reduce.

Field Summary

Fields
Modifier and Type	Field and Description
`private org.apache.hadoop.conf.Configuration`	`conf` The configuration.
`static String`	`INPUT_TABLE` Job parameter that specifies the input table.
`private static org.slf4j.Logger`	`LOG`
`static String`	`SCAN` Base-64 encoded scanner.
`static String`	`SCAN_BATCHSIZE` Set the maximum number of values to return for each call to next().
`static String`	`SCAN_CACHEBLOCKS` Set to false to disable server-side caching of blocks for this scan.
`static String`	`SCAN_CACHEDROWS` The number of rows for caching that will be passed to scanners.
`static String`	`SCAN_COLUMN_FAMILY` Column Family to Scan
`static String`	`SCAN_COLUMNS` Space delimited list of columns and column families to scan.
`static String`	`SCAN_MAXVERSIONS` The maximum number of version to return.
`static String`	`SCAN_ROW_START` Scan start row
`static String`	`SCAN_ROW_STOP` Scan stop row
`static String`	`SCAN_TIMERANGE_END` The ending timestamp used to filter columns with a specific range of versions.
`static String`	`SCAN_TIMERANGE_START` The starting timestamp used to filter columns with a specific range of versions.
`static String`	`SCAN_TIMESTAMP` The timestamp used to filter columns with a specific timestamp.
`static String`	`SHUFFLE_MAPS` Specify if we have to shuffle the map tasks.
`private static String`	`SPLIT_TABLE` If specified, use start keys of this table to split.

Fields inherited from class org.apache.hadoop.hbase.mapreduce.TableInputFormatBase
MAPREDUCE_INPUT_AUTOBALANCE, MAX_AVERAGE_REGION_SIZE, NUM_MAPPERS_PER_REGION

Constructor Summary

Constructors
Constructor and Description

TableInputFormat()

Constructors
Constructor and Description
`TableInputFormat()`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`private static void`	`addColumn(Scan scan, byte[] familyAndQualifier)` Parses a combined family and qualifier and adds either both or just the family in case there is no qualifier.
`static void`	`addColumns(Scan scan, byte[][] columns)` Adds an array of columns specified using old format, family:qualifier.
`private static void`	`addColumns(Scan scan, String columns)` Convenience method to parse a string representation of an array of column specifiers.
`static void`	`configureSplitTable(org.apache.hadoop.mapreduce.Job job, TableName tableName)` Sets split table in map-reduce job.
`static Scan`	`createScanFromConfiguration(org.apache.hadoop.conf.Configuration conf)` Sets up a `Scan` instance, applying settings from the configuration property constants defined in `TableInputFormat`.
`org.apache.hadoop.conf.Configuration`	`getConf()` Returns the current configuration.
`List<org.apache.hadoop.mapreduce.InputSplit>`	`getSplits(org.apache.hadoop.mapreduce.JobContext context)` Calculates the splits that will serve as input for the map tasks.
`protected Pair<byte[][],byte[][]>`	`getStartEndKeys()`
`protected void`	`initialize(org.apache.hadoop.mapreduce.JobContext context)` Handle subclass specific set up.
`void`	`setConf(org.apache.hadoop.conf.Configuration configuration)` Sets the configuration.

Methods inherited from class org.apache.hadoop.hbase.mapreduce.TableInputFormatBase
calculateAutoBalancedSplits, closeTable, createNInputSplitsUniform, createRecordReader, getAdmin, getRegionLocator, getScan, getTable, includeRegionInSplit, initializeTable, reverseDNS, setScan, setTableRecordReader

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - LOG
```
private static final org.slf4j.Logger LOG
```
  - INPUT_TABLE
```
public static final String INPUT_TABLE
```
    Job parameter that specifies the input table.
    
    See Also:
    
    Constant Field Values
  - SPLIT_TABLE
```
private static final String SPLIT_TABLE
```
    If specified, use start keys of this table to split. This is useful when you are preparing data for bulkload.
    
    See Also:
    
    Constant Field Values
  - SCAN
```
public static final String SCAN
```
    Base-64 encoded scanner. All other SCAN_ confs are ignored if this is specified. See TableMapReduceUtil.convertScanToString(Scan) for more details.
    
    See Also:
    
    Constant Field Values
  - SCAN_ROW_START
```
public static final String SCAN_ROW_START
```
    Scan start row
    
    See Also:
    
    Constant Field Values
  - SCAN_ROW_STOP
```
public static final String SCAN_ROW_STOP
```
    Scan stop row
    
    See Also:
    
    Constant Field Values
  - SCAN_COLUMN_FAMILY
```
public static final String SCAN_COLUMN_FAMILY
```
    Column Family to Scan
    
    See Also:
    
    Constant Field Values
  - SCAN_COLUMNS
```
public static final String SCAN_COLUMNS
```
    Space delimited list of columns and column families to scan.
    
    See Also:
    
    Constant Field Values
  - SCAN_TIMESTAMP
```
public static final String SCAN_TIMESTAMP
```
    The timestamp used to filter columns with a specific timestamp.
    
    See Also:
    
    Constant Field Values
  - SCAN_TIMERANGE_START
```
public static final String SCAN_TIMERANGE_START
```
    The starting timestamp used to filter columns with a specific range of versions.
    
    See Also:
    
    Constant Field Values
  - SCAN_TIMERANGE_END
```
public static final String SCAN_TIMERANGE_END
```
    The ending timestamp used to filter columns with a specific range of versions.
    
    See Also:
    
    Constant Field Values
  - SCAN_MAXVERSIONS
```
public static final String SCAN_MAXVERSIONS
```
    The maximum number of version to return.
    
    See Also:
    
    Constant Field Values
  - SCAN_CACHEBLOCKS
```
public static final String SCAN_CACHEBLOCKS
```
    Set to false to disable server-side caching of blocks for this scan.
    
    See Also:
    
    Constant Field Values
  - SCAN_CACHEDROWS
```
public static final String SCAN_CACHEDROWS
```
    The number of rows for caching that will be passed to scanners.
    
    See Also:
    
    Constant Field Values
  - SCAN_BATCHSIZE
```
public static final String SCAN_BATCHSIZE
```
    Set the maximum number of values to return for each call to next().
    
    See Also:
    
    Constant Field Values
  - SHUFFLE_MAPS
```
public static final String SHUFFLE_MAPS
```
    Specify if we have to shuffle the map tasks.
    
    See Also:
    
    Constant Field Values
  - conf
```
private org.apache.hadoop.conf.Configuration conf
```
    The configuration.
- Constructor Detail
  - TableInputFormat
```
public TableInputFormat()
```
- Method Detail
  - getConf
```
public org.apache.hadoop.conf.Configuration getConf()
```
    Returns the current configuration.
    
    Specified by:
    
    getConf in interface org.apache.hadoop.conf.Configurable
    
    Returns:
    
    The current configuration.
    
    See Also:
    
    Configurable.getConf()
  - setConf
```
public void setConf(org.apache.hadoop.conf.Configuration configuration)
```
    Sets the configuration. This is used to set the details for the table to be scanned.
    
    Specified by:
    
    setConf in interface org.apache.hadoop.conf.Configurable
    
    Parameters:
    
    configuration - The configuration to set.
    
    See Also:
    
    Configurable.setConf( org.apache.hadoop.conf.Configuration)
  - createScanFromConfiguration
```
public static Scan createScanFromConfiguration(org.apache.hadoop.conf.Configuration conf)
                                        throws IOException
```
    Sets up a Scan instance, applying settings from the configuration property constants defined in TableInputFormat. This allows specifying things such as:
    - start and stop rows
    - column qualifiers or families
    - timestamps or timerange
    - scanner caching and batch size
    Throws:
    
    IOException
  - initialize
```
protected void initialize(org.apache.hadoop.mapreduce.JobContext context)
                   throws IOException
```
    Description copied from class: TableInputFormatBase
    
    Handle subclass specific set up. Each of the entry points used by the MapReduce framework, TableInputFormatBase.createRecordReader(InputSplit, TaskAttemptContext) and TableInputFormatBase.getSplits(JobContext), will call TableInputFormatBase.initialize(JobContext) as a convenient centralized location to handle retrieving the necessary configuration information and calling TableInputFormatBase.initializeTable(Connection, TableName). Subclasses should implement their initialize call such that it is safe to call multiple times. The current TableInputFormatBase implementation relies on a non-null table reference to decide if an initialize call is needed, but this behavior may change in the future. In particular, it is critical that initializeTable not be called multiple times since this will leak Connection instances.
    
    Overrides:
    
    initialize in class TableInputFormatBase
    
    Throws:
    
    IOException
  - addColumn
```
private static void addColumn(Scan scan,
                              byte[] familyAndQualifier)
```
    Parses a combined family and qualifier and adds either both or just the family in case there is no qualifier. This assumes the older colon divided notation, e.g. "family:qualifier".
    
    Parameters:
    
    scan - The Scan to update.
    
    familyAndQualifier - family and qualifier
    
    Throws:
    
    IllegalArgumentException - When familyAndQualifier is invalid.
  - addColumns
```
public static void addColumns(Scan scan,
                              byte[][] columns)
```
    Adds an array of columns specified using old format, family:qualifier.
    Overrides previous calls to Scan.addColumn(byte[], byte[])for any families in the input.
    
    Parameters:
    
    scan - The Scan to update.
    
    columns - array of columns, formatted as family:qualifier
    
    See Also:
    
    Scan.addColumn(byte[], byte[])
  - getSplits
```
public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context)
                                                       throws IOException
```
    Calculates the splits that will serve as input for the map tasks. The number of splits matches the number of regions in a table. Splits are shuffled if required.
    
    Overrides:
    
    getSplits in class TableInputFormatBase
    
    Parameters:
    
    context - The current job context.
    
    Returns:
    
    The list of input splits.
    
    Throws:
    
    IOException - When creating the list of splits fails.
    
    See Also:
    
    InputFormat.getSplits( org.apache.hadoop.mapreduce.JobContext)
  - addColumns
```
private static void addColumns(Scan scan,
                               String columns)
```
    Convenience method to parse a string representation of an array of column specifiers.
    
    Parameters:
    
    scan - The Scan to update.
    
    columns - The columns to parse.
  - getStartEndKeys
```
protected Pair<byte[][],byte[][]> getStartEndKeys()
                                           throws IOException
```
    Overrides:
    
    getStartEndKeys in class TableInputFormatBase
    
    Throws:
    
    IOException
  - configureSplitTable
```
public static void configureSplitTable(org.apache.hadoop.mapreduce.Job job,
                                       TableName tableName)
```
    Sets split table in map-reduce job.

Class TableInputFormat

Field Summary

Fields inherited from class org.apache.hadoop.hbase.mapreduce.TableInputFormatBase

Constructor Summary

Method Summary

Methods inherited from class org.apache.hadoop.hbase.mapreduce.TableInputFormatBase

Methods inherited from class java.lang.Object

Field Detail

LOG

INPUT_TABLE

SPLIT_TABLE

SCAN

SCAN_ROW_START

SCAN_ROW_STOP

SCAN_COLUMN_FAMILY

SCAN_COLUMNS

SCAN_TIMESTAMP

SCAN_TIMERANGE_START

SCAN_TIMERANGE_END

SCAN_MAXVERSIONS

SCAN_CACHEBLOCKS

SCAN_CACHEDROWS

SCAN_BATCHSIZE

SHUFFLE_MAPS

conf

Constructor Detail

TableInputFormat

Method Detail

getConf

setConf

createScanFromConfiguration

initialize

addColumn

addColumns

getSplits

addColumns

getStartEndKeys

configureSplitTable