org.apache.hadoop.hbase.mapred.TableInputFormatBase

All Implemented Interfaces:: org.apache.hadoop.mapred.InputFormat<ImmutableBytesWritable,Result>

Direct Known Subclasses:: TableInputFormat

@Public public abstract class TableInputFormatBase extends Object implements org.apache.hadoop.mapred.InputFormat<ImmutableBytesWritable,Result>

A Base for TableInputFormats. Receives a Table, a byte[] of input columns and optionally a Filter. Subclasses may use other TableRecordReader implementations.

Subclasses MUST ensure initializeTable(Connection, TableName) is called for an instance to function properly. Each of the entry points to this class used by the MapReduce framework, getRecordReader(InputSplit, JobConf, Reporter) and getSplits(JobConf, int), will call initialize(JobConf) as a convenient centralized location to handle retrieving the necessary configuration information. If your subclass overrides either of these methods, either call the parent version or call initialize yourself.

An example of a subclass:

   class ExampleTIF extends TableInputFormatBase {

     @Override
     protected void initialize(JobConf context) throws IOException {
       // We are responsible for the lifecycle of this connection until we hand it over in
       // initializeTable.
       Connection connection =
          ConnectionFactory.createConnection(HBaseConfiguration.create(job));
       TableName tableName = TableName.valueOf("exampleTable");
       // mandatory. once passed here, TableInputFormatBase will handle closing the connection.
       initializeTable(connection, tableName);
       byte[][] inputColumns = new byte [][] { Bytes.toBytes("columnA"),
         Bytes.toBytes("columnB") };
       // mandatory
       setInputColumns(inputColumns);
       // optional, by default we'll get everything for the given columns.
       Filter exampleFilter = new RowFilter(CompareOp.EQUAL, new RegexStringComparator("aa.*"));
       setRowFilter(exampleFilter);
     }
   }

Field Summary

Fields

Modifier and Type

Field

Description

private Connection

connection

private static final String

INITIALIZATION_ERROR

private byte[][]

inputColumns

private static final org.slf4j.Logger

LOG

private static final String

NOT_INITIALIZED

private RegionLocator

regionLocator

private Filter

rowFilter

private Table

table

private TableRecordReader

tableRecordReader
Constructor Summary

Constructors

Constructor

Description

TableInputFormatBase()
Method Summary

Modifier and Type

Method

Description

private void

close(Closeable... closables)

protected void

closeTable()

Close the Table and related objects that were initialized via initializeTable(Connection, TableName).

org.apache.hadoop.mapred.RecordReader<ImmutableBytesWritable,Result>

getRecordReader(org.apache.hadoop.mapred.InputSplit split, org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.mapred.Reporter reporter)

Builds a TableRecordReader.

org.apache.hadoop.mapred.InputSplit[]

getSplits(org.apache.hadoop.mapred.JobConf job, int numSplits)

Calculates the splits that will serve as input for the map tasks.

protected Table

getTable()

Allows subclasses to get the Table.

protected void

initialize(org.apache.hadoop.mapred.JobConf job)

Handle subclass specific set up.

protected void

initializeTable(Connection connection, TableName tableName)

Allows subclasses to initialize the table information.

protected void

setInputColumns(byte[][] inputColumns)

protected void

setRowFilter(Filter rowFilter)

Allows subclasses to set the Filter to be used.

protected void

setTableRecordReader(TableRecordReader tableRecordReader)

Allows subclasses to set the TableRecordReader.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- LOG
  
  private static final org.slf4j.Logger LOG
- inputColumns
  
  private byte[][] inputColumns
- table
  
  private Table table
- regionLocator
  
  private RegionLocator regionLocator
- connection
  
  private Connection connection
- tableRecordReader
  
  private TableRecordReader tableRecordReader
- rowFilter
  
  private Filter rowFilter
- NOT_INITIALIZED
  
  private static final String NOT_INITIALIZED
  See Also:
  
  Constant Field Values
- INITIALIZATION_ERROR
  
  private static final String INITIALIZATION_ERROR
  See Also:
  
  Constant Field Values
Constructor Details
- TableInputFormatBase
  
  public TableInputFormatBase()
Method Details
- getRecordReader
  
  public org.apache.hadoop.mapred.RecordReader<ImmutableBytesWritable,Result> getRecordReader(org.apache.hadoop.mapred.InputSplit split, org.apache.hadoop.mapred.JobConf job, org.apache.hadoop.mapred.Reporter reporter) throws IOException
  
  Builds a TableRecordReader. If no TableRecordReader was provided, uses the default.
  Specified by:
  
  getRecordReader in interface org.apache.hadoop.mapred.InputFormat<ImmutableBytesWritable,Result>
  
  Throws:
  
  IOException
  
  See Also:
  
  InputFormat.getRecordReader(InputSplit, JobConf, Reporter)
- getSplits
  
  public org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf job, int numSplits) throws IOException
  
  Calculates the splits that will serve as input for the map tasks.
  Splits are created in number equal to the smallest between numSplits and the number of HRegions in the table. If the number of splits is smaller than the number of HRegions then splits are spanned across multiple HRegions and are grouped the most evenly possible. In the case splits are uneven the bigger splits are placed first in the InputSplit array.
  Specified by:
  
  getSplits in interface org.apache.hadoop.mapred.InputFormat<ImmutableBytesWritable,Result>
  
  Parameters:
  
  job - the map task JobConf
  
  numSplits - a hint to calculate the number of splits (mapred.map.tasks).
  
  Returns:
  
  the input splits
  
  Throws:
  
  IOException
  
  See Also:
  
  InputFormat.getSplits(org.apache.hadoop.mapred.JobConf, int)
- initializeTable
  
  protected void initializeTable(Connection connection, TableName tableName) throws IOException
  
  Allows subclasses to initialize the table information.
  
  Parameters:
  
  connection - The Connection to the HBase cluster. MUST be unmanaged. We will close.
  
  tableName - The TableName of the table to process.
  
  Throws:
  
  IOException
- setInputColumns
  
  protected void setInputColumns(byte[][] inputColumns)
  
  Parameters:
  
  inputColumns - to be passed in Result to the map task.
- getTable
  
  protected Table getTable()
  
  Allows subclasses to get the Table.
- setTableRecordReader
  
  protected void setTableRecordReader(TableRecordReader tableRecordReader)
  
  Allows subclasses to set the TableRecordReader. to provide other TableRecordReader implementations.
- setRowFilter
  
  protected void setRowFilter(Filter rowFilter)
  
  Allows subclasses to set the Filter to be used.
- initialize
  
  protected void initialize(org.apache.hadoop.mapred.JobConf job) throws IOException
  
  Handle subclass specific set up. Each of the entry points used by the MapReduce framework, getRecordReader(InputSplit, JobConf, Reporter) and getSplits(JobConf, int), will call initialize(JobConf) as a convenient centralized location to handle retrieving the necessary configuration information and calling initializeTable(Connection, TableName).
  Subclasses should implement their initialize call such that it is safe to call multiple times. The current TableInputFormatBase implementation relies on a non-null table reference to decide if an initialize call is needed, but this behavior may change in the future. In particular, it is critical that initializeTable not be called multiple times since this will leak Connection instances.
  
  Throws:
  
  IOException
- closeTable
  
  protected void closeTable() throws IOException
  
  Close the Table and related objects that were initialized via initializeTable(Connection, TableName).
  
  Throws:
  
  IOException
- close
  
  private void close(Closeable... closables) throws IOException
  
  Throws:
  
  IOException

Class TableInputFormatBase

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

LOG

inputColumns

table

regionLocator

connection

tableRecordReader

rowFilter

NOT_INITIALIZED

INITIALIZATION_ERROR

Constructor Details

TableInputFormatBase

Method Details

getRecordReader

getSplits

initializeTable

setInputColumns

getTable

setTableRecordReader

setRowFilter

initialize

closeTable

close