TableMapReduceUtil (Apache HBase 1.4.11 API)

java.lang.Object
- org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil

@InterfaceAudience.Public
@InterfaceStability.Stable
public class TableMapReduceUtil
extends Object

Utility for TableMapper and TableReducer

Constructor Summary

Constructors
Constructor and Description

TableMapReduceUtil()

Constructors
Constructor and Description
`TableMapReduceUtil()`

Method Summary

Methods
Modifier and Type	Method and Description
`static void`	`addDependencyJars(org.apache.hadoop.conf.Configuration conf, Class<?>... classes)` Deprecated. rely on `addDependencyJars(Job)` instead.
`static void`	`addDependencyJars(org.apache.hadoop.mapreduce.Job job)` Add the HBase dependency jars as well as jars for any of the configured job classes to the job configuration, so that JobClient will ship them to the cluster and add them to the DistributedCache.
`static void`	`addDependencyJarsForClasses(org.apache.hadoop.conf.Configuration conf, Class<?>... classes)` Add the jars containing the given classes to the job's configuration such that JobClient will ship them to the cluster and add them to the DistributedCache.
`static void`	`addHBaseDependencyJars(org.apache.hadoop.conf.Configuration conf)` Add HBase and its dependencies (only) to the job configuration.
`static String`	`buildDependencyClasspath(org.apache.hadoop.conf.Configuration conf)` Returns a classpath string built from the content of the "tmpjars" value in `conf`.
`static String`	`convertScanToString(Scan scan)` Writes the given scan into a Base64 encoded string.
`static Scan`	`convertStringToScan(String base64)` Converts the given Base64 string back into a Scan instance.
`static void`	`initCredentials(org.apache.hadoop.mapreduce.Job job)`
`static void`	`initCredentialsForCluster(org.apache.hadoop.mapreduce.Job job, org.apache.hadoop.conf.Configuration conf)` Obtain an authentication token, for the specified cluster, on behalf of the current user and add it to the credentials for the given map reduce job.
`static void`	`initCredentialsForCluster(org.apache.hadoop.mapreduce.Job job, String quorumAddress)` Deprecated. Since 1.2.0, use `initCredentialsForCluster(Job, Configuration)` instead.
`static void`	`initMultiTableSnapshotMapperJob(Map<String,Collection<Scan>> snapshotScans, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, org.apache.hadoop.fs.Path tmpRestoreDir)` Sets up the job for reading from one or more table snapshots, with one or more scans per snapshot.
`static void`	`initTableMapperJob(byte[] table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job)` Use this before submitting a TableMap job.
`static void`	`initTableMapperJob(byte[] table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars)` Use this before submitting a TableMap job.
`static void`	`initTableMapperJob(byte[] table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, Class<? extends org.apache.hadoop.mapreduce.InputFormat> inputFormatClass)` Use this before submitting a TableMap job.
`static void`	`initTableMapperJob(List<Scan> scans, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job)` Use this before submitting a Multi TableMap job.
`static void`	`initTableMapperJob(List<Scan> scans, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars)` Use this before submitting a Multi TableMap job.
`static void`	`initTableMapperJob(List<Scan> scans, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, boolean initCredentials)` Use this before submitting a Multi TableMap job.
`static void`	`initTableMapperJob(String table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job)` Use this before submitting a TableMap job.
`static void`	`initTableMapperJob(String table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars)` Use this before submitting a TableMap job.
`static void`	`initTableMapperJob(String table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, boolean initCredentials, Class<? extends org.apache.hadoop.mapreduce.InputFormat> inputFormatClass)` Use this before submitting a TableMap job.
`static void`	`initTableMapperJob(String table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, Class<? extends org.apache.hadoop.mapreduce.InputFormat> inputFormatClass)` Use this before submitting a TableMap job.
`static void`	`initTableMapperJob(TableName table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job)` Use this before submitting a TableMap job.
`static void`	`initTableReducerJob(String table, Class<? extends TableReducer> reducer, org.apache.hadoop.mapreduce.Job job)` Use this before submitting a TableReduce job.
`static void`	`initTableReducerJob(String table, Class<? extends TableReducer> reducer, org.apache.hadoop.mapreduce.Job job, Class partitioner)` Use this before submitting a TableReduce job.
`static void`	`initTableReducerJob(String table, Class<? extends TableReducer> reducer, org.apache.hadoop.mapreduce.Job job, Class partitioner, String quorumAddress, String serverClass, String serverImpl)` Use this before submitting a TableReduce job.
`static void`	`initTableReducerJob(String table, Class<? extends TableReducer> reducer, org.apache.hadoop.mapreduce.Job job, Class partitioner, String quorumAddress, String serverClass, String serverImpl, boolean addDependencyJars)` Use this before submitting a TableReduce job.
`static void`	`initTableSnapshotMapperJob(String snapshotName, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, org.apache.hadoop.fs.Path tmpRestoreDir)` Sets up the job for reading from a table snapshot.
`static void`	`initTableSnapshotMapperJob(String snapshotName, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, org.apache.hadoop.fs.Path tmpRestoreDir, RegionSplitter.SplitAlgorithm splitAlgo, int numSplitsPerRegion)` Sets up the job for reading from a table snapshot.
`static void`	`limitNumReduceTasks(String table, org.apache.hadoop.mapreduce.Job job)` Ensures that the given number of reduce tasks for the given job configuration does not exceed the number of regions for the given table.
`static void`	`resetCacheConfig(org.apache.hadoop.conf.Configuration conf)` Enable a basic on-heap cache for these jobs.
`static void`	`setNumReduceTasks(String table, org.apache.hadoop.mapreduce.Job job)` Sets the number of reduce tasks for the given job configuration to the number of regions the given table has.
`static void`	`setScannerCaching(org.apache.hadoop.mapreduce.Job job, int batchSize)` Sets the number of rows to return and cache with each scanner iteration.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - TableMapReduceUtil
```
public TableMapReduceUtil()
```
- Method Detail
  - initTableMapperJob
```
public static void initTableMapperJob(String table,
                      Scan scan,
                      Class<? extends TableMapper> mapper,
                      Class<?> outputKeyClass,
                      Class<?> outputValueClass,
                      org.apache.hadoop.mapreduce.Job job)
                               throws IOException
```
    Use this before submitting a TableMap job. It will appropriately set up the job.
    
    Parameters:
    table - The table name to read from.
    scan - The scan instance with the columns, time range etc.
    mapper - The mapper class to use.
    outputKeyClass - The class of the output key.
    outputValueClass - The class of the output value.
    job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
    
    Throws:
    
    IOException - When setting up the details fails.
  - initTableMapperJob
```
public static void initTableMapperJob(TableName table,
                      Scan scan,
                      Class<? extends TableMapper> mapper,
                      Class<?> outputKeyClass,
                      Class<?> outputValueClass,
                      org.apache.hadoop.mapreduce.Job job)
                               throws IOException
```
    Use this before submitting a TableMap job. It will appropriately set up the job.
    
    Parameters:
    table - The table name to read from.
    scan - The scan instance with the columns, time range etc.
    mapper - The mapper class to use.
    outputKeyClass - The class of the output key.
    outputValueClass - The class of the output value.
    job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
    
    Throws:
    
    IOException - When setting up the details fails.
  - initTableMapperJob
```
public static void initTableMapperJob(byte[] table,
                      Scan scan,
                      Class<? extends TableMapper> mapper,
                      Class<?> outputKeyClass,
                      Class<?> outputValueClass,
                      org.apache.hadoop.mapreduce.Job job)
                               throws IOException
```
    Use this before submitting a TableMap job. It will appropriately set up the job.
    
    Parameters:
    table - Binary representation of the table name to read from.
    scan - The scan instance with the columns, time range etc.
    mapper - The mapper class to use.
    outputKeyClass - The class of the output key.
    outputValueClass - The class of the output value.
    job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
    
    Throws:
    
    IOException - When setting up the details fails.
  - initTableMapperJob
```
public static void initTableMapperJob(String table,
                      Scan scan,
                      Class<? extends TableMapper> mapper,
                      Class<?> outputKeyClass,
                      Class<?> outputValueClass,
                      org.apache.hadoop.mapreduce.Job job,
                      boolean addDependencyJars,
                      Class<? extends org.apache.hadoop.mapreduce.InputFormat> inputFormatClass)
                               throws IOException
```
    Use this before submitting a TableMap job. It will appropriately set up the job.
    
    Parameters:
    table - The table name to read from.
    scan - The scan instance with the columns, time range etc.
    mapper - The mapper class to use.
    outputKeyClass - The class of the output key.
    outputValueClass - The class of the output value.
    job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
    addDependencyJars - upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).
    
    Throws:
    
    IOException - When setting up the details fails.
  - initTableMapperJob
```
public static void initTableMapperJob(String table,
                      Scan scan,
                      Class<? extends TableMapper> mapper,
                      Class<?> outputKeyClass,
                      Class<?> outputValueClass,
                      org.apache.hadoop.mapreduce.Job job,
                      boolean addDependencyJars,
                      boolean initCredentials,
                      Class<? extends org.apache.hadoop.mapreduce.InputFormat> inputFormatClass)
                               throws IOException
```
    Use this before submitting a TableMap job. It will appropriately set up the job.
    
    Parameters:
    table - The table name to read from.
    scan - The scan instance with the columns, time range etc.
    mapper - The mapper class to use.
    outputKeyClass - The class of the output key.
    outputValueClass - The class of the output value.
    job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
    addDependencyJars - upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).
    initCredentials - whether to initialize hbase auth credentials for the job
    inputFormatClass - the input format
    
    Throws:
    
    IOException - When setting up the details fails.
  - initTableMapperJob
```
public static void initTableMapperJob(byte[] table,
                      Scan scan,
                      Class<? extends TableMapper> mapper,
                      Class<?> outputKeyClass,
                      Class<?> outputValueClass,
                      org.apache.hadoop.mapreduce.Job job,
                      boolean addDependencyJars,
                      Class<? extends org.apache.hadoop.mapreduce.InputFormat> inputFormatClass)
                               throws IOException
```
    Use this before submitting a TableMap job. It will appropriately set up the job.
    
    Parameters:
    table - Binary representation of the table name to read from.
    scan - The scan instance with the columns, time range etc.
    mapper - The mapper class to use.
    outputKeyClass - The class of the output key.
    outputValueClass - The class of the output value.
    job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
    addDependencyJars - upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).
    inputFormatClass - The class of the input format
    
    Throws:
    
    IOException - When setting up the details fails.
  - initTableMapperJob
```
public static void initTableMapperJob(byte[] table,
                      Scan scan,
                      Class<? extends TableMapper> mapper,
                      Class<?> outputKeyClass,
                      Class<?> outputValueClass,
                      org.apache.hadoop.mapreduce.Job job,
                      boolean addDependencyJars)
                               throws IOException
```
    Use this before submitting a TableMap job. It will appropriately set up the job.
    
    Parameters:
    table - Binary representation of the table name to read from.
    scan - The scan instance with the columns, time range etc.
    mapper - The mapper class to use.
    outputKeyClass - The class of the output key.
    outputValueClass - The class of the output value.
    job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
    addDependencyJars - upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).
    
    Throws:
    
    IOException - When setting up the details fails.
  - initTableMapperJob
```
public static void initTableMapperJob(String table,
                      Scan scan,
                      Class<? extends TableMapper> mapper,
                      Class<?> outputKeyClass,
                      Class<?> outputValueClass,
                      org.apache.hadoop.mapreduce.Job job,
                      boolean addDependencyJars)
                               throws IOException
```
    Use this before submitting a TableMap job. It will appropriately set up the job.
    
    Parameters:
    table - The table name to read from.
    scan - The scan instance with the columns, time range etc.
    mapper - The mapper class to use.
    outputKeyClass - The class of the output key.
    outputValueClass - The class of the output value.
    job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
    addDependencyJars - upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).
    
    Throws:
    
    IOException - When setting up the details fails.
  - resetCacheConfig
```
public static void resetCacheConfig(org.apache.hadoop.conf.Configuration conf)
```
    Enable a basic on-heap cache for these jobs. Any BlockCache implementation based on direct memory will likely cause the map tasks to OOM when opening the region. This is done here instead of in TableSnapshotRegionRecordReader in case an advanced user wants to override this behavior in their job.
  - initMultiTableSnapshotMapperJob
```
public static void initMultiTableSnapshotMapperJob(Map<String,Collection<Scan>> snapshotScans,
                                   Class<? extends TableMapper> mapper,
                                   Class<?> outputKeyClass,
                                   Class<?> outputValueClass,
                                   org.apache.hadoop.mapreduce.Job job,
                                   boolean addDependencyJars,
                                   org.apache.hadoop.fs.Path tmpRestoreDir)
                                            throws IOException
```
    Sets up the job for reading from one or more table snapshots, with one or more scans per snapshot. It bypasses hbase servers and read directly from snapshot files.
    
    Parameters:
    snapshotScans - map of snapshot name to scans on that snapshot.
    mapper - The mapper class to use.
    outputKeyClass - The class of the output key.
    outputValueClass - The class of the output value.
    job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
    addDependencyJars - upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).
    
    Throws:
    
    IOException
  - initTableSnapshotMapperJob
```
public static void initTableSnapshotMapperJob(String snapshotName,
                              Scan scan,
                              Class<? extends TableMapper> mapper,
                              Class<?> outputKeyClass,
                              Class<?> outputValueClass,
                              org.apache.hadoop.mapreduce.Job job,
                              boolean addDependencyJars,
                              org.apache.hadoop.fs.Path tmpRestoreDir)
                                       throws IOException
```
    Sets up the job for reading from a table snapshot. It bypasses hbase servers and read directly from snapshot files.
    
    Parameters:
    snapshotName - The name of the snapshot (of a table) to read from.
    scan - The scan instance with the columns, time range etc.
    mapper - The mapper class to use.
    outputKeyClass - The class of the output key.
    outputValueClass - The class of the output value.
    job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
    addDependencyJars - upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).
    tmpRestoreDir - a temporary directory to copy the snapshot files into. Current user should have write permissions to this directory, and this should not be a subdirectory of rootdir. After the job is finished, restore directory can be deleted.
    
    Throws:
    
    IOException - When setting up the details fails.
    See Also:
    TableSnapshotInputFormat
  - initTableSnapshotMapperJob
```
public static void initTableSnapshotMapperJob(String snapshotName,
                              Scan scan,
                              Class<? extends TableMapper> mapper,
                              Class<?> outputKeyClass,
                              Class<?> outputValueClass,
                              org.apache.hadoop.mapreduce.Job job,
                              boolean addDependencyJars,
                              org.apache.hadoop.fs.Path tmpRestoreDir,
                              RegionSplitter.SplitAlgorithm splitAlgo,
                              int numSplitsPerRegion)
                                       throws IOException
```
    Sets up the job for reading from a table snapshot. It bypasses hbase servers and read directly from snapshot files.
    
    Parameters:
    snapshotName - The name of the snapshot (of a table) to read from.
    scan - The scan instance with the columns, time range etc.
    mapper - The mapper class to use.
    outputKeyClass - The class of the output key.
    outputValueClass - The class of the output value.
    job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
    addDependencyJars - upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).
    tmpRestoreDir - a temporary directory to copy the snapshot files into. Current user should have write permissions to this directory, and this should not be a subdirectory of rootdir. After the job is finished, restore directory can be deleted.
    splitAlgo - algorithm to split
    numSplitsPerRegion - how many input splits to generate per one region
    
    Throws:
    
    IOException - When setting up the details fails.
    See Also:
    TableSnapshotInputFormat
  - initTableMapperJob
```
public static void initTableMapperJob(List<Scan> scans,
                      Class<? extends TableMapper> mapper,
                      Class<?> outputKeyClass,
                      Class<?> outputValueClass,
                      org.apache.hadoop.mapreduce.Job job)
                               throws IOException
```
    Use this before submitting a Multi TableMap job. It will appropriately set up the job.
    
    Parameters:
    scans - The list of Scan objects to read from.
    mapper - The mapper class to use.
    outputKeyClass - The class of the output key.
    outputValueClass - The class of the output value.
    job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
    
    Throws:
    
    IOException - When setting up the details fails.
  - initTableMapperJob
```
public static void initTableMapperJob(List<Scan> scans,
                      Class<? extends TableMapper> mapper,
                      Class<?> outputKeyClass,
                      Class<?> outputValueClass,
                      org.apache.hadoop.mapreduce.Job job,
                      boolean addDependencyJars)
                               throws IOException
```
    Use this before submitting a Multi TableMap job. It will appropriately set up the job.
    
    Parameters:
    scans - The list of Scan objects to read from.
    mapper - The mapper class to use.
    outputKeyClass - The class of the output key.
    outputValueClass - The class of the output value.
    job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
    addDependencyJars - upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).
    
    Throws:
    
    IOException - When setting up the details fails.
  - initTableMapperJob
```
public static void initTableMapperJob(List<Scan> scans,
                      Class<? extends TableMapper> mapper,
                      Class<?> outputKeyClass,
                      Class<?> outputValueClass,
                      org.apache.hadoop.mapreduce.Job job,
                      boolean addDependencyJars,
                      boolean initCredentials)
                               throws IOException
```
    Use this before submitting a Multi TableMap job. It will appropriately set up the job.
    
    Parameters:
    scans - The list of Scan objects to read from.
    mapper - The mapper class to use.
    outputKeyClass - The class of the output key.
    outputValueClass - The class of the output value.
    job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
    addDependencyJars - upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).
    initCredentials - whether to initialize hbase auth credentials for the job
    
    Throws:
    
    IOException - When setting up the details fails.
  - initCredentials
```
public static void initCredentials(org.apache.hadoop.mapreduce.Job job)
                            throws IOException
```
    Throws:
    
    IOException
  - initCredentialsForCluster
```
@Deprecated
public static void initCredentialsForCluster(org.apache.hadoop.mapreduce.Job job,
                                        String quorumAddress)
                                      throws IOException
```
    Deprecated. Since 1.2.0, use initCredentialsForCluster(Job, Configuration) instead.
    
    Obtain an authentication token, for the specified cluster, on behalf of the current user and add it to the credentials for the given map reduce job. The quorumAddress is the key to the ZK ensemble, which contains: hbase.zookeeper.quorum, hbase.zookeeper.client.port and zookeeper.znode.parent
    
    Parameters:
    job - The job that requires the permission.
    quorumAddress - string that contains the 3 required configuratins
    
    Throws:
    
    IOException - When the authentication token cannot be obtained.
  - initCredentialsForCluster
```
public static void initCredentialsForCluster(org.apache.hadoop.mapreduce.Job job,
                             org.apache.hadoop.conf.Configuration conf)
                                      throws IOException
```
    Obtain an authentication token, for the specified cluster, on behalf of the current user and add it to the credentials for the given map reduce job.
    
    Parameters:
    job - The job that requires the permission.
    conf - The configuration to use in connecting to the peer cluster
    
    Throws:
    
    IOException - When the authentication token cannot be obtained.
  - convertScanToString
```
public static String convertScanToString(Scan scan)
                                  throws IOException
```
    Writes the given scan into a Base64 encoded string.
    
    Parameters:
    scan - The scan to write out.
    
    Returns:
    The scan saved in a Base64 encoded string.
    
    Throws:
    
    IOException - When writing the scan fails.
  - convertStringToScan
```
public static Scan convertStringToScan(String base64)
                                throws IOException
```
    Converts the given Base64 string back into a Scan instance.
    
    Parameters:
    base64 - The scan details.
    
    Returns:
    The newly created Scan instance.
    
    Throws:
    
    IOException - When reading the scan instance fails.
  - initTableReducerJob
```
public static void initTableReducerJob(String table,
                       Class<? extends TableReducer> reducer,
                       org.apache.hadoop.mapreduce.Job job)
                                throws IOException
```
    Use this before submitting a TableReduce job. It will appropriately set up the JobConf.
    
    Parameters:
    table - The output table.
    reducer - The reducer class to use.
    job - The current job to adjust.
    
    Throws:
    
    IOException - When determining the region count fails.
  - initTableReducerJob
```
public static void initTableReducerJob(String table,
                       Class<? extends TableReducer> reducer,
                       org.apache.hadoop.mapreduce.Job job,
                       Class partitioner)
                                throws IOException
```
    Use this before submitting a TableReduce job. It will appropriately set up the JobConf.
    
    Parameters:
    table - The output table.
    reducer - The reducer class to use.
    job - The current job to adjust.
    partitioner - Partitioner to use. Pass null to use default partitioner.
    
    Throws:
    
    IOException - When determining the region count fails.
  - initTableReducerJob
```
public static void initTableReducerJob(String table,
                       Class<? extends TableReducer> reducer,
                       org.apache.hadoop.mapreduce.Job job,
                       Class partitioner,
                       String quorumAddress,
                       String serverClass,
                       String serverImpl)
                                throws IOException
```
    Use this before submitting a TableReduce job. It will appropriately set up the JobConf.
    
    Parameters:
    table - The output table.
    reducer - The reducer class to use.
    job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
    partitioner - Partitioner to use. Pass null to use default partitioner.
    quorumAddress - Distant cluster to write to; default is null for output to the cluster that is designated in hbase-site.xml. Set this String to the zookeeper ensemble of an alternate remote cluster when you would have the reduce write a cluster that is other than the default; e.g. copying tables between clusters, the source would be designated by hbase-site.xml and this param would have the ensemble address of the remote cluster. The format to pass is particular. Pass <hbase.zookeeper.quorum>:< hbase.zookeeper.client.port>:<zookeeper.znode.parent> such as server,server2,server3:2181:/hbase.
    serverClass - redefined hbase.regionserver.class
    serverImpl - redefined hbase.regionserver.impl
    
    Throws:
    
    IOException - When determining the region count fails.
  - initTableReducerJob
```
public static void initTableReducerJob(String table,
                       Class<? extends TableReducer> reducer,
                       org.apache.hadoop.mapreduce.Job job,
                       Class partitioner,
                       String quorumAddress,
                       String serverClass,
                       String serverImpl,
                       boolean addDependencyJars)
                                throws IOException
```
    Use this before submitting a TableReduce job. It will appropriately set up the JobConf.
    
    Parameters:
    table - The output table.
    reducer - The reducer class to use.
    job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
    partitioner - Partitioner to use. Pass null to use default partitioner.
    quorumAddress - Distant cluster to write to; default is null for output to the cluster that is designated in hbase-site.xml. Set this String to the zookeeper ensemble of an alternate remote cluster when you would have the reduce write a cluster that is other than the default; e.g. copying tables between clusters, the source would be designated by hbase-site.xml and this param would have the ensemble address of the remote cluster. The format to pass is particular. Pass <hbase.zookeeper.quorum>:< hbase.zookeeper.client.port>:<zookeeper.znode.parent> such as server,server2,server3:2181:/hbase.
    serverClass - redefined hbase.regionserver.class
    serverImpl - redefined hbase.regionserver.impl
    addDependencyJars - upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).
    
    Throws:
    
    IOException - When determining the region count fails.
  - limitNumReduceTasks
```
public static void limitNumReduceTasks(String table,
                       org.apache.hadoop.mapreduce.Job job)
                                throws IOException
```
    Ensures that the given number of reduce tasks for the given job configuration does not exceed the number of regions for the given table.
    
    Parameters:
    table - The table to get the region count for.
    job - The current job to adjust.
    
    Throws:
    
    IOException - When retrieving the table details fails.
  - setNumReduceTasks
```
public static void setNumReduceTasks(String table,
                     org.apache.hadoop.mapreduce.Job job)
                              throws IOException
```
    Sets the number of reduce tasks for the given job configuration to the number of regions the given table has.
    
    Parameters:
    table - The table to get the region count for.
    job - The current job to adjust.
    
    Throws:
    
    IOException - When retrieving the table details fails.
  - setScannerCaching
```
public static void setScannerCaching(org.apache.hadoop.mapreduce.Job job,
                     int batchSize)
```
    Sets the number of rows to return and cache with each scanner iteration. Higher caching values will enable faster mapreduce jobs at the expense of requiring more heap to contain the cached rows.
    
    Parameters:
    job - The current job to adjust.
    batchSize - The number of rows to return in batch with each scanner iteration.
  - addHBaseDependencyJars
```
public static void addHBaseDependencyJars(org.apache.hadoop.conf.Configuration conf)
                                   throws IOException
```
    Add HBase and its dependencies (only) to the job configuration.
    This is intended as a low-level API, facilitating code reuse between this class and its mapred counterpart. It also of use to external tools that need to build a MapReduce job that interacts with HBase but want fine-grained control over the jars shipped to the cluster.
    
    Parameters:
    conf - The Configuration object to extend with dependencies.
    
    Throws:
    
    IOException
    See Also:
    TableMapReduceUtil, PIG-3285
  - buildDependencyClasspath
```
public static String buildDependencyClasspath(org.apache.hadoop.conf.Configuration conf)
```
    Returns a classpath string built from the content of the "tmpjars" value in conf. Also exposed to shell scripts via `bin/hbase mapredcp`.
  - addDependencyJars
```
public static void addDependencyJars(org.apache.hadoop.mapreduce.Job job)
                              throws IOException
```
    Add the HBase dependency jars as well as jars for any of the configured job classes to the job configuration, so that JobClient will ship them to the cluster and add them to the DistributedCache.
    
    Throws:
    
    IOException
  - addDependencyJars
```
@Deprecated
public static void addDependencyJars(org.apache.hadoop.conf.Configuration conf,
                                Class<?>... classes)
                              throws IOException
```
    Deprecated. rely on addDependencyJars(Job) instead.
    
    Add the jars containing the given classes to the job's configuration such that JobClient will ship them to the cluster and add them to the DistributedCache.
    
    Throws:
    
    IOException
  - addDependencyJarsForClasses
```
@InterfaceAudience.Private
public static void addDependencyJarsForClasses(org.apache.hadoop.conf.Configuration conf,
                                                         Class<?>... classes)
                                        throws IOException
```
    Add the jars containing the given classes to the job's configuration such that JobClient will ship them to the cluster and add them to the DistributedCache. N.B. that this method at most adds one jar per class given. If there is more than one jar available containing a class with the same name as a given class, we don't define which of those jars might be chosen.
    
    Parameters:
    conf - The Hadoop Configuration to modify
    classes - will add just those dependencies needed to find the given classes
    
    Throws:
    
    IOException - if an underlying library call fails.

Class TableMapReduceUtil

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

TableMapReduceUtil

Method Detail

initTableMapperJob

initTableMapperJob

initTableMapperJob

initTableMapperJob

initTableMapperJob

initTableMapperJob

initTableMapperJob

initTableMapperJob

resetCacheConfig

initMultiTableSnapshotMapperJob

initTableSnapshotMapperJob

initTableSnapshotMapperJob

initTableMapperJob

initTableMapperJob

initTableMapperJob

initCredentials

initCredentialsForCluster

initCredentialsForCluster

convertScanToString

convertStringToScan

initTableReducerJob

initTableReducerJob

initTableReducerJob

initTableReducerJob

limitNumReduceTasks

setNumReduceTasks

setScannerCaching

addHBaseDependencyJars

buildDependencyClasspath

addDependencyJars

addDependencyJars

addDependencyJarsForClasses