java.lang.Object

org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil

@Public public class TableMapReduceUtil extends Object

Utility for TableMapper and TableReducer

Field Summary

Fields

Modifier and Type

Field

Description

private static final org.slf4j.Logger

LOG

static final String

TABLE_INPUT_CLASS_KEY
Constructor Summary

Constructors

Constructor

Description

TableMapReduceUtil()
Method Summary

Modifier and Type

Method

Description

static void

addDependencyJars(org.apache.hadoop.conf.Configuration conf, Class<?>... classes)

Deprecated.
since 1.3.0 and will be removed in 3.0.0.

static void

addDependencyJars(org.apache.hadoop.mapreduce.Job job)

Add the HBase dependency jars as well as jars for any of the configured job classes to the job configuration, so that JobClient will ship them to the cluster and add them to the DistributedCache.

static void

addDependencyJarsForClasses(org.apache.hadoop.conf.Configuration conf, Class<?>... classes)

Add the jars containing the given classes to the job's configuration such that JobClient will ship them to the cluster and add them to the DistributedCache.

static void

addHBaseDependencyJars(org.apache.hadoop.conf.Configuration conf)

Add HBase and its dependencies (only) to the job configuration.

static String

buildDependencyClasspath(org.apache.hadoop.conf.Configuration conf)

Returns a classpath string built from the content of the "tmpjars" value in conf.

static String

convertScanToString(Scan scan)

Writes the given scan into a Base64 encoded string.

static Scan

convertStringToScan(String base64)

Converts the given Base64 string back into a Scan instance.

private static String

findContainingJar(Class<?> my_class, Map<String,String> packagedClasses)

Find a jar that contains a class of the same name, if any.

private static org.apache.hadoop.fs.Path

findOrCreateJar(Class<?> my_class, org.apache.hadoop.fs.FileSystem fs, Map<String,String> packagedClasses)

Finds the Jar for a class or creates it if it doesn't exist.

private static Class<? extends org.apache.hadoop.mapreduce.InputFormat>

getConfiguredInputFormat(org.apache.hadoop.mapreduce.Job job)

private static String

getJar(Class<?> my_class)

Invoke 'getJar' on a custom JarFinder implementation.

private static int

getRegionCount(org.apache.hadoop.conf.Configuration conf, TableName tableName)

static void

initCredentials(org.apache.hadoop.mapreduce.Job job)

static void

initCredentialsForCluster(org.apache.hadoop.mapreduce.Job job, String quorumAddress)

Deprecated.
Since 1.2.0 and will be removed in 3.0.0.

static void

initCredentialsForCluster(org.apache.hadoop.mapreduce.Job job, org.apache.hadoop.conf.Configuration conf)

Obtain an authentication token, for the specified cluster, on behalf of the current user and add it to the credentials for the given map reduce job.

static void

initMultiTableSnapshotMapperJob(Map<String,Collection<Scan>> snapshotScans, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, org.apache.hadoop.fs.Path tmpRestoreDir)

Sets up the job for reading from one or more table snapshots, with one or more scans per snapshot.

static void

initTableMapperJob(byte[] table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job)

Use this before submitting a TableMap job.

static void

initTableMapperJob(byte[] table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars)

Use this before submitting a TableMap job.

static void

initTableMapperJob(byte[] table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, Class<? extends org.apache.hadoop.mapreduce.InputFormat> inputFormatClass)

Use this before submitting a TableMap job.

static void

initTableMapperJob(String table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job)

Use this before submitting a TableMap job.

static void

initTableMapperJob(String table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars)

Use this before submitting a TableMap job.

static void

initTableMapperJob(String table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, boolean initCredentials, Class<? extends org.apache.hadoop.mapreduce.InputFormat> inputFormatClass)

Use this before submitting a TableMap job.

static void

initTableMapperJob(String table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, Class<? extends org.apache.hadoop.mapreduce.InputFormat> inputFormatClass)

Use this before submitting a TableMap job.

static void

initTableMapperJob(List<Scan> scans, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job)

Use this before submitting a Multi TableMap job.

static void

initTableMapperJob(List<Scan> scans, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars)

Use this before submitting a Multi TableMap job.

static void

initTableMapperJob(List<Scan> scans, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, boolean initCredentials)

Use this before submitting a Multi TableMap job.

static void

initTableMapperJob(TableName table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job)

Use this before submitting a TableMap job.

static void

initTableReducerJob(String table, Class<? extends TableReducer> reducer, org.apache.hadoop.mapreduce.Job job)

Use this before submitting a TableReduce job.

static void

initTableReducerJob(String table, Class<? extends TableReducer> reducer, org.apache.hadoop.mapreduce.Job job, Class partitioner)

Use this before submitting a TableReduce job.

static void

initTableReducerJob(String table, Class<? extends TableReducer> reducer, org.apache.hadoop.mapreduce.Job job, Class partitioner, String quorumAddress, String serverClass, String serverImpl)

Use this before submitting a TableReduce job.

static void

initTableReducerJob(String table, Class<? extends TableReducer> reducer, org.apache.hadoop.mapreduce.Job job, Class partitioner, String quorumAddress, String serverClass, String serverImpl, boolean addDependencyJars)

Use this before submitting a TableReduce job.

static void

initTableSnapshotMapperJob(String snapshotName, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, org.apache.hadoop.fs.Path tmpRestoreDir)

Sets up the job for reading from a table snapshot.

static void

initTableSnapshotMapperJob(String snapshotName, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, org.apache.hadoop.fs.Path tmpRestoreDir, RegionSplitter.SplitAlgorithm splitAlgo, int numSplitsPerRegion)

Sets up the job for reading from a table snapshot.

static void

limitNumReduceTasks(String table, org.apache.hadoop.mapreduce.Job job)

Ensures that the given number of reduce tasks for the given job configuration does not exceed the number of regions for the given table.

static void

resetCacheConfig(org.apache.hadoop.conf.Configuration conf)

Enable a basic on-heap cache for these jobs.

static void

setNumReduceTasks(String table, org.apache.hadoop.mapreduce.Job job)

Sets the number of reduce tasks for the given job configuration to the number of regions the given table has.

static void

setScannerCaching(org.apache.hadoop.mapreduce.Job job, int batchSize)

Sets the number of rows to return and cache with each scanner iteration.

private static void

updateMap(String jar, Map<String,String> packagedClasses)

Add entries to packagedClasses corresponding to class files contained in jar.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- LOG
  
  private static final org.slf4j.Logger LOG
- TABLE_INPUT_CLASS_KEY
  
  public static final String TABLE_INPUT_CLASS_KEY
  See Also:
  
  Constant Field Values
Constructor Details
- TableMapReduceUtil
  
  public TableMapReduceUtil()
Method Details
- initTableMapperJob
  
  public static void initTableMapperJob(String table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job) throws IOException
  
  Use this before submitting a TableMap job. It will appropriately set up the job.
  
  Parameters:
  
  table - The table name to read from.
  
  scan - The scan instance with the columns, time range etc.
  
  mapper - The mapper class to use.
  
  outputKeyClass - The class of the output key.
  
  outputValueClass - The class of the output value.
  
  job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
  
  Throws:
  
  IOException - When setting up the details fails.
- initTableMapperJob
  
  public static void initTableMapperJob(TableName table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job) throws IOException
  
  Use this before submitting a TableMap job. It will appropriately set up the job.
  
  Parameters:
  
  table - The table name to read from.
  
  scan - The scan instance with the columns, time range etc.
  
  mapper - The mapper class to use.
  
  outputKeyClass - The class of the output key.
  
  outputValueClass - The class of the output value.
  
  job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
  
  Throws:
  
  IOException - When setting up the details fails.
- initTableMapperJob
  
  public static void initTableMapperJob(byte[] table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job) throws IOException
  
  Use this before submitting a TableMap job. It will appropriately set up the job.
  
  Parameters:
  
  table - Binary representation of the table name to read from.
  
  scan - The scan instance with the columns, time range etc.
  
  mapper - The mapper class to use.
  
  outputKeyClass - The class of the output key.
  
  outputValueClass - The class of the output value.
  
  job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
  
  Throws:
  
  IOException - When setting up the details fails.
- initTableMapperJob
  
  public static void initTableMapperJob(String table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, Class<? extends org.apache.hadoop.mapreduce.InputFormat> inputFormatClass) throws IOException
  
  Use this before submitting a TableMap job. It will appropriately set up the job.
  
  Parameters:
  
  table - The table name to read from.
  
  scan - The scan instance with the columns, time range etc.
  
  mapper - The mapper class to use.
  
  outputKeyClass - The class of the output key.
  
  outputValueClass - The class of the output value.
  
  job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
  
  addDependencyJars - upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).
  
  Throws:
  
  IOException - When setting up the details fails.
- initTableMapperJob
  
  public static void initTableMapperJob(String table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, boolean initCredentials, Class<? extends org.apache.hadoop.mapreduce.InputFormat> inputFormatClass) throws IOException
  
  Use this before submitting a TableMap job. It will appropriately set up the job.
  
  Parameters:
  
  table - The table name to read from.
  
  scan - The scan instance with the columns, time range etc.
  
  mapper - The mapper class to use.
  
  outputKeyClass - The class of the output key.
  
  outputValueClass - The class of the output value.
  
  job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
  
  addDependencyJars - upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).
  
  initCredentials - whether to initialize hbase auth credentials for the job
  
  inputFormatClass - the input format
  
  Throws:
  
  IOException - When setting up the details fails.
- initTableMapperJob
  
  public static void initTableMapperJob(byte[] table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, Class<? extends org.apache.hadoop.mapreduce.InputFormat> inputFormatClass) throws IOException
  
  Use this before submitting a TableMap job. It will appropriately set up the job.
  
  Parameters:
  
  table - Binary representation of the table name to read from.
  
  scan - The scan instance with the columns, time range etc.
  
  mapper - The mapper class to use.
  
  outputKeyClass - The class of the output key.
  
  outputValueClass - The class of the output value.
  
  job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
  
  addDependencyJars - upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).
  
  inputFormatClass - The class of the input format
  
  Throws:
  
  IOException - When setting up the details fails.
- initTableMapperJob
  
  public static void initTableMapperJob(byte[] table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars) throws IOException
  
  Use this before submitting a TableMap job. It will appropriately set up the job.
  
  Parameters:
  
  table - Binary representation of the table name to read from.
  
  scan - The scan instance with the columns, time range etc.
  
  mapper - The mapper class to use.
  
  outputKeyClass - The class of the output key.
  
  outputValueClass - The class of the output value.
  
  job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
  
  addDependencyJars - upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).
  
  Throws:
  
  IOException - When setting up the details fails.
- getConfiguredInputFormat
  
  private static Class<? extends org.apache.hadoop.mapreduce.InputFormat> getConfiguredInputFormat(org.apache.hadoop.mapreduce.Job job)
  
  Returns:
  
  TableInputFormat .class unless Configuration has something else at TABLE_INPUT_CLASS_KEY.
- initTableMapperJob
  
  public static void initTableMapperJob(String table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars) throws IOException
  
  Use this before submitting a TableMap job. It will appropriately set up the job.
  
  Parameters:
  
  table - The table name to read from.
  
  scan - The scan instance with the columns, time range etc.
  
  mapper - The mapper class to use.
  
  outputKeyClass - The class of the output key.
  
  outputValueClass - The class of the output value.
  
  job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
  
  addDependencyJars - upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).
  
  Throws:
  
  IOException - When setting up the details fails.
- resetCacheConfig
  
  public static void resetCacheConfig(org.apache.hadoop.conf.Configuration conf)
  
  Enable a basic on-heap cache for these jobs. Any BlockCache implementation based on direct memory will likely cause the map tasks to OOM when opening the region. This is done here instead of in TableSnapshotRegionRecordReader in case an advanced user wants to override this behavior in their job.
- initMultiTableSnapshotMapperJob
  
  public static void initMultiTableSnapshotMapperJob(Map<String,Collection<Scan>> snapshotScans, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, org.apache.hadoop.fs.Path tmpRestoreDir) throws IOException
  
  Sets up the job for reading from one or more table snapshots, with one or more scans per snapshot. It bypasses hbase servers and read directly from snapshot files.
  
  Parameters:
  
  snapshotScans - map of snapshot name to scans on that snapshot.
  
  mapper - The mapper class to use.
  
  outputKeyClass - The class of the output key.
  
  outputValueClass - The class of the output value.
  
  job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
  
  addDependencyJars - upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).
  
  Throws:
  
  IOException
- initTableSnapshotMapperJob
  
  public static void initTableSnapshotMapperJob(String snapshotName, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, org.apache.hadoop.fs.Path tmpRestoreDir) throws IOException
  
  Sets up the job for reading from a table snapshot. It bypasses hbase servers and read directly from snapshot files.
  Parameters:
  
  snapshotName - The name of the snapshot (of a table) to read from.
  
  scan - The scan instance with the columns, time range etc.
  
  mapper - The mapper class to use.
  
  outputKeyClass - The class of the output key.
  
  outputValueClass - The class of the output value.
  
  job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
  
  addDependencyJars - upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).
  
  tmpRestoreDir - a temporary directory to copy the snapshot files into. Current user should have write permissions to this directory, and this should not be a subdirectory of rootdir. After the job is finished, restore directory can be deleted.
  
  Throws:
  
  IOException - When setting up the details fails.
  
  See Also:
  
  TableSnapshotInputFormat
- initTableSnapshotMapperJob
  
  public static void initTableSnapshotMapperJob(String snapshotName, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, org.apache.hadoop.fs.Path tmpRestoreDir, RegionSplitter.SplitAlgorithm splitAlgo, int numSplitsPerRegion) throws IOException
  
  Sets up the job for reading from a table snapshot. It bypasses hbase servers and read directly from snapshot files.
  Parameters:
  
  snapshotName - The name of the snapshot (of a table) to read from.
  
  scan - The scan instance with the columns, time range etc.
  
  mapper - The mapper class to use.
  
  outputKeyClass - The class of the output key.
  
  outputValueClass - The class of the output value.
  
  job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
  
  addDependencyJars - upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).
  
  tmpRestoreDir - a temporary directory to copy the snapshot files into. Current user should have write permissions to this directory, and this should not be a subdirectory of rootdir. After the job is finished, restore directory can be deleted.
  
  splitAlgo - algorithm to split
  
  numSplitsPerRegion - how many input splits to generate per one region
  
  Throws:
  
  IOException - When setting up the details fails.
  
  See Also:
  
  TableSnapshotInputFormat
- initTableMapperJob
  
  public static void initTableMapperJob(List<Scan> scans, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job) throws IOException
  
  Use this before submitting a Multi TableMap job. It will appropriately set up the job.
  
  Parameters:
  
  scans - The list of Scan objects to read from.
  
  mapper - The mapper class to use.
  
  outputKeyClass - The class of the output key.
  
  outputValueClass - The class of the output value.
  
  job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
  
  Throws:
  
  IOException - When setting up the details fails.
- initTableMapperJob
  
  public static void initTableMapperJob(List<Scan> scans, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars) throws IOException
  
  Use this before submitting a Multi TableMap job. It will appropriately set up the job.
  
  Parameters:
  
  scans - The list of Scan objects to read from.
  
  mapper - The mapper class to use.
  
  outputKeyClass - The class of the output key.
  
  outputValueClass - The class of the output value.
  
  job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
  
  addDependencyJars - upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).
  
  Throws:
  
  IOException - When setting up the details fails.
- initTableMapperJob
  
  public static void initTableMapperJob(List<Scan> scans, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, boolean initCredentials) throws IOException
  
  Use this before submitting a Multi TableMap job. It will appropriately set up the job.
  
  Parameters:
  
  scans - The list of Scan objects to read from.
  
  mapper - The mapper class to use.
  
  outputKeyClass - The class of the output key.
  
  outputValueClass - The class of the output value.
  
  job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
  
  addDependencyJars - upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).
  
  initCredentials - whether to initialize hbase auth credentials for the job
  
  Throws:
  
  IOException - When setting up the details fails.
- initCredentials
  
  public static void initCredentials(org.apache.hadoop.mapreduce.Job job) throws IOException
  
  Throws:
  
  IOException
- initCredentialsForCluster
  
  @Deprecated public static void initCredentialsForCluster(org.apache.hadoop.mapreduce.Job job, String quorumAddress) throws IOException
  
  Deprecated.
  Since 1.2.0 and will be removed in 3.0.0. Use initCredentialsForCluster(Job, Configuration) instead.
  
  Obtain an authentication token, for the specified cluster, on behalf of the current user and add it to the credentials for the given map reduce job. The quorumAddress is the key to the ZK ensemble, which contains: hbase.zookeeper.quorum, hbase.zookeeper.client.port and zookeeper.znode.parent
  Parameters:
  
  job - The job that requires the permission.
  
  quorumAddress - string that contains the 3 required configuratins
  
  Throws:
  
  IOException - When the authentication token cannot be obtained.
  
  See Also:
  
  initCredentialsForCluster(Job, Configuration)
  
  HBASE-14886
- initCredentialsForCluster
  
  public static void initCredentialsForCluster(org.apache.hadoop.mapreduce.Job job, org.apache.hadoop.conf.Configuration conf) throws IOException
  
  Obtain an authentication token, for the specified cluster, on behalf of the current user and add it to the credentials for the given map reduce job.
  
  Parameters:
  
  job - The job that requires the permission.
  
  conf - The configuration to use in connecting to the peer cluster
  
  Throws:
  
  IOException - When the authentication token cannot be obtained.
- convertScanToString
  
  public static String convertScanToString(Scan scan) throws IOException
  
  Writes the given scan into a Base64 encoded string.
  
  Parameters:
  
  scan - The scan to write out.
  
  Returns:
  
  The scan saved in a Base64 encoded string.
  
  Throws:
  
  IOException - When writing the scan fails.
- convertStringToScan
  
  public static Scan convertStringToScan(String base64) throws IOException
  
  Converts the given Base64 string back into a Scan instance.
  
  Parameters:
  
  base64 - The scan details.
  
  Returns:
  
  The newly created Scan instance.
  
  Throws:
  
  IOException - When reading the scan instance fails.
- initTableReducerJob
  
  public static void initTableReducerJob(String table, Class<? extends TableReducer> reducer, org.apache.hadoop.mapreduce.Job job) throws IOException
  
  Use this before submitting a TableReduce job. It will appropriately set up the JobConf.
  
  Parameters:
  
  table - The output table.
  
  reducer - The reducer class to use.
  
  job - The current job to adjust.
  
  Throws:
  
  IOException - When determining the region count fails.
- initTableReducerJob
  
  public static void initTableReducerJob(String table, Class<? extends TableReducer> reducer, org.apache.hadoop.mapreduce.Job job, Class partitioner) throws IOException
  
  Use this before submitting a TableReduce job. It will appropriately set up the JobConf.
  
  Parameters:
  
  table - The output table.
  
  reducer - The reducer class to use.
  
  job - The current job to adjust.
  
  partitioner - Partitioner to use. Pass null to use default partitioner.
  
  Throws:
  
  IOException - When determining the region count fails.
- initTableReducerJob
  
  public static void initTableReducerJob(String table, Class<? extends TableReducer> reducer, org.apache.hadoop.mapreduce.Job job, Class partitioner, String quorumAddress, String serverClass, String serverImpl) throws IOException
  
  Use this before submitting a TableReduce job. It will appropriately set up the JobConf.
  
  Parameters:
  
  table - The output table.
  
  reducer - The reducer class to use.
  
  job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
  
  partitioner - Partitioner to use. Pass null to use default partitioner.
  
  quorumAddress - Distant cluster to write to; default is null for output to the cluster that is designated in hbase-site.xml. Set this String to the zookeeper ensemble of an alternate remote cluster when you would have the reduce write a cluster that is other than the default; e.g. copying tables between clusters, the source would be designated by hbase-site.xml and this param would have the ensemble address of the remote cluster. The format to pass is particular. Pass <hbase.zookeeper.quorum>:< hbase.zookeeper.client.port>:<zookeeper.znode.parent> such as server,server2,server3:2181:/hbase.
  
  serverClass - redefined hbase.regionserver.class
  
  serverImpl - redefined hbase.regionserver.impl
  
  Throws:
  
  IOException - When determining the region count fails.
- initTableReducerJob
  
  public static void initTableReducerJob(String table, Class<? extends TableReducer> reducer, org.apache.hadoop.mapreduce.Job job, Class partitioner, String quorumAddress, String serverClass, String serverImpl, boolean addDependencyJars) throws IOException
  
  Use this before submitting a TableReduce job. It will appropriately set up the JobConf.
  
  Parameters:
  
  table - The output table.
  
  reducer - The reducer class to use.
  
  job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
  
  partitioner - Partitioner to use. Pass null to use default partitioner.
  
  quorumAddress - Distant cluster to write to; default is null for output to the cluster that is designated in hbase-site.xml. Set this String to the zookeeper ensemble of an alternate remote cluster when you would have the reduce write a cluster that is other than the default; e.g. copying tables between clusters, the source would be designated by hbase-site.xml and this param would have the ensemble address of the remote cluster. The format to pass is particular. Pass <hbase.zookeeper.quorum>:< hbase.zookeeper.client.port>:<zookeeper.znode.parent> such as server,server2,server3:2181:/hbase.
  
  serverClass - redefined hbase.regionserver.class
  
  serverImpl - redefined hbase.regionserver.impl
  
  addDependencyJars - upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).
  
  Throws:
  
  IOException - When determining the region count fails.
- limitNumReduceTasks
  
  public static void limitNumReduceTasks(String table, org.apache.hadoop.mapreduce.Job job) throws IOException
  
  Ensures that the given number of reduce tasks for the given job configuration does not exceed the number of regions for the given table.
  
  Parameters:
  
  table - The table to get the region count for.
  
  job - The current job to adjust.
  
  Throws:
  
  IOException - When retrieving the table details fails.
- setNumReduceTasks
  
  public static void setNumReduceTasks(String table, org.apache.hadoop.mapreduce.Job job) throws IOException
  
  Sets the number of reduce tasks for the given job configuration to the number of regions the given table has.
  
  Parameters:
  
  table - The table to get the region count for.
  
  job - The current job to adjust.
  
  Throws:
  
  IOException - When retrieving the table details fails.
- setScannerCaching
  
  public static void setScannerCaching(org.apache.hadoop.mapreduce.Job job, int batchSize)
  
  Sets the number of rows to return and cache with each scanner iteration. Higher caching values will enable faster mapreduce jobs at the expense of requiring more heap to contain the cached rows.
  
  Parameters:
  
  job - The current job to adjust.
  
  batchSize - The number of rows to return in batch with each scanner iteration.
- addHBaseDependencyJars
  
  public static void addHBaseDependencyJars(org.apache.hadoop.conf.Configuration conf) throws IOException
  
  Add HBase and its dependencies (only) to the job configuration.
  This is intended as a low-level API, facilitating code reuse between this class and its mapred counterpart. It also of use to external tools that need to build a MapReduce job that interacts with HBase but want fine-grained control over the jars shipped to the cluster.
  Parameters:
  
  conf - The Configuration object to extend with dependencies.
  
  Throws:
  
  IOException
  
  See Also:
  
  TableMapReduceUtil
  
  PIG-3285
- buildDependencyClasspath
  
  public static String buildDependencyClasspath(org.apache.hadoop.conf.Configuration conf)
  
  Returns a classpath string built from the content of the "tmpjars" value in conf. Also exposed to shell scripts via `bin/hbase mapredcp`.
- addDependencyJars
  
  public static void addDependencyJars(org.apache.hadoop.mapreduce.Job job) throws IOException
  
  Add the HBase dependency jars as well as jars for any of the configured job classes to the job configuration, so that JobClient will ship them to the cluster and add them to the DistributedCache.
  
  Throws:
  
  IOException
- addDependencyJars
  
  @Deprecated public static void addDependencyJars(org.apache.hadoop.conf.Configuration conf, Class<?>... classes) throws IOException
  
  Deprecated.
  since 1.3.0 and will be removed in 3.0.0. Use addDependencyJars(Job) instead.
  
  Add the jars containing the given classes to the job's configuration such that JobClient will ship them to the cluster and add them to the DistributedCache.
  Throws:
  
  IOException
  
  See Also:
  
  addDependencyJars(Job)
  
  HBASE-8386
- addDependencyJarsForClasses
  
  @Private public static void addDependencyJarsForClasses(org.apache.hadoop.conf.Configuration conf, Class<?>... classes) throws IOException
  
  Add the jars containing the given classes to the job's configuration such that JobClient will ship them to the cluster and add them to the DistributedCache. N.B. that this method at most adds one jar per class given. If there is more than one jar available containing a class with the same name as a given class, we don't define which of those jars might be chosen.
  
  Parameters:
  
  conf - The Hadoop Configuration to modify
  
  classes - will add just those dependencies needed to find the given classes
  
  Throws:
  
  IOException - if an underlying library call fails.
- findOrCreateJar
  
  private static org.apache.hadoop.fs.Path findOrCreateJar(Class<?> my_class, org.apache.hadoop.fs.FileSystem fs, Map<String,String> packagedClasses) throws IOException
  
  Finds the Jar for a class or creates it if it doesn't exist. If the class is in a directory in the classpath, it creates a Jar on the fly with the contents of the directory and returns the path to that Jar. If a Jar is created, it is created in the system temporary directory. Otherwise, returns an existing jar that contains a class of the same name. Maintains a mapping from jar contents to the tmp jar created.
  
  Parameters:
  
  my_class - the class to find.
  
  fs - the FileSystem with which to qualify the returned path.
  
  packagedClasses - a map of class name to path.
  
  Returns:
  
  a jar file that contains the class.
  
  Throws:
  
  IOException
- updateMap
  
  private static void updateMap(String jar, Map<String,String> packagedClasses) throws IOException
  
  Add entries to packagedClasses corresponding to class files contained in jar.
  
  Parameters:
  
  jar - The jar who's content to list.
  
  packagedClasses - map[class -> jar]
  
  Throws:
  
  IOException
- findContainingJar
  
  private static String findContainingJar(Class<?> my_class, Map<String,String> packagedClasses) throws IOException
  
  Find a jar that contains a class of the same name, if any. It will return a jar file, even if that is not the first thing on the class path that has a class with the same name. Looks first on the classpath and then in the packagedClasses map.
  
  Parameters:
  
  my_class - the class to find.
  
  Returns:
  
  a jar file that contains the class, or null.
  
  Throws:
  
  IOException
- getJar
  
  private static String getJar(Class<?> my_class)
  
  Invoke 'getJar' on a custom JarFinder implementation. Useful for some job configuration contexts (HBASE-8140) and also for testing on MRv2. check if we have HADOOP-9426.
  
  Parameters:
  
  my_class - the class to find.
  
  Returns:
  
  a jar file that contains the class, or null.
- getRegionCount
  
  private static int getRegionCount(org.apache.hadoop.conf.Configuration conf, TableName tableName) throws IOException
  
  Throws:
  
  IOException

Class TableMapReduceUtil

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

LOG

TABLE_INPUT_CLASS_KEY

Constructor Details

TableMapReduceUtil

Method Details

initTableMapperJob

initTableMapperJob

initTableMapperJob

initTableMapperJob

initTableMapperJob

initTableMapperJob

initTableMapperJob

getConfiguredInputFormat

initTableMapperJob

resetCacheConfig

initMultiTableSnapshotMapperJob

initTableSnapshotMapperJob

initTableSnapshotMapperJob

initTableMapperJob

initTableMapperJob

initTableMapperJob

initCredentials

initCredentialsForCluster

initCredentialsForCluster

convertScanToString

convertStringToScan

initTableReducerJob

initTableReducerJob

initTableReducerJob

initTableReducerJob

limitNumReduceTasks

setNumReduceTasks

setScannerCaching

addHBaseDependencyJars

buildDependencyClasspath

addDependencyJars

addDependencyJars

addDependencyJarsForClasses

findOrCreateJar

updateMap

findContainingJar

getJar

getRegionCount