java.lang.Object

org.apache.hadoop.hbase.mapred.TableMapReduceUtil

@Public public class TableMapReduceUtil extends Object

Utility for TableMap and TableReduce

Field Summary

Fields

Modifier and Type

Field

Description

private static final org.slf4j.Logger

LOG
Constructor Summary

Constructors

Constructor

Description

TableMapReduceUtil()
Method Summary

Modifier and Type

Method

Description

static void

addDependencyJars(org.apache.hadoop.mapred.JobConf job)

private static int

getRegionCount(org.apache.hadoop.conf.Configuration conf, TableName tableName)

static void

initCredentials(org.apache.hadoop.mapred.JobConf job)

static void

initMultiTableSnapshotMapperJob(Map<String,Collection<Scan>> snapshotScans, Class<? extends TableMap> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapred.JobConf job, boolean addDependencyJars, org.apache.hadoop.fs.Path tmpRestoreDir)

Sets up the job for reading from one or more multiple table snapshots, with one or more scans per snapshot.

static void

initTableMapJob(String table, String columns, Class<? extends TableMap> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapred.JobConf job)

Use this before submitting a TableMap job.

static void

initTableMapJob(String table, String columns, Class<? extends TableMap> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapred.JobConf job, boolean addDependencyJars)

static void

initTableMapJob(String table, String columns, Class<? extends TableMap> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapred.JobConf job, boolean addDependencyJars, Class<? extends org.apache.hadoop.mapred.InputFormat> inputFormat)

Use this before submitting a TableMap job.

static void

initTableReduceJob(String table, Class<? extends TableReduce> reducer, org.apache.hadoop.mapred.JobConf job)

Use this before submitting a TableReduce job.

static void

initTableReduceJob(String table, Class<? extends TableReduce> reducer, org.apache.hadoop.mapred.JobConf job, Class partitioner)

Use this before submitting a TableReduce job.

static void

initTableReduceJob(String table, Class<? extends TableReduce> reducer, org.apache.hadoop.mapred.JobConf job, Class partitioner, boolean addDependencyJars)

Use this before submitting a TableReduce job.

static void

initTableSnapshotMapJob(String snapshotName, String columns, Class<? extends TableMap> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapred.JobConf job, boolean addDependencyJars, org.apache.hadoop.fs.Path tmpRestoreDir)

Sets up the job for reading from a table snapshot.

static void

initTableSnapshotMapJob(String snapshotName, String columns, Class<? extends TableMap> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapred.JobConf jobConf, boolean addDependencyJars, org.apache.hadoop.fs.Path tmpRestoreDir, RegionSplitter.SplitAlgorithm splitAlgo, int numSplitsPerRegion)

Sets up the job for reading from a table snapshot.

static void

limitNumMapTasks(String table, org.apache.hadoop.mapred.JobConf job)

Ensures that the given number of map tasks for the given job configuration does not exceed the number of regions for the given table.

static void

limitNumReduceTasks(String table, org.apache.hadoop.mapred.JobConf job)

Ensures that the given number of reduce tasks for the given job configuration does not exceed the number of regions for the given table.

static void

setNumMapTasks(String table, org.apache.hadoop.mapred.JobConf job)

Sets the number of map tasks for the given job configuration to the number of regions the given table has.

static void

setNumReduceTasks(String table, org.apache.hadoop.mapred.JobConf job)

Sets the number of reduce tasks for the given job configuration to the number of regions the given table has.

static void

setScannerCaching(org.apache.hadoop.mapred.JobConf job, int batchSize)

Sets the number of rows to return and cache with each scanner iteration.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- LOG
  
  private static final org.slf4j.Logger LOG
Constructor Details
- TableMapReduceUtil
  
  public TableMapReduceUtil()
Method Details
- initTableMapJob
  
  public static void initTableMapJob(String table, String columns, Class<? extends TableMap> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapred.JobConf job)
  
  Use this before submitting a TableMap job. It will appropriately set up the JobConf.
  
  Parameters:
  
  table - The table name to read from.
  
  columns - The columns to scan.
  
  mapper - The mapper class to use.
  
  outputKeyClass - The class of the output key.
  
  outputValueClass - The class of the output value.
  
  job - The current job configuration to adjust.
- initTableMapJob
  
  public static void initTableMapJob(String table, String columns, Class<? extends TableMap> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapred.JobConf job, boolean addDependencyJars)
- initTableMapJob
  
  public static void initTableMapJob(String table, String columns, Class<? extends TableMap> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapred.JobConf job, boolean addDependencyJars, Class<? extends org.apache.hadoop.mapred.InputFormat> inputFormat)
  
  Use this before submitting a TableMap job. It will appropriately set up the JobConf.
  
  Parameters:
  
  table - The table name to read from.
  
  columns - The columns to scan.
  
  mapper - The mapper class to use.
  
  outputKeyClass - The class of the output key.
  
  outputValueClass - The class of the output value.
  
  job - The current job configuration to adjust.
  
  addDependencyJars - upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).
- initMultiTableSnapshotMapperJob
  
  public static void initMultiTableSnapshotMapperJob(Map<String,Collection<Scan>> snapshotScans, Class<? extends TableMap> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapred.JobConf job, boolean addDependencyJars, org.apache.hadoop.fs.Path tmpRestoreDir) throws IOException
  
  Sets up the job for reading from one or more multiple table snapshots, with one or more scans per snapshot. It bypasses hbase servers and read directly from snapshot files.
  
  Parameters:
  
  snapshotScans - map of snapshot name to scans on that snapshot.
  
  mapper - The mapper class to use.
  
  outputKeyClass - The class of the output key.
  
  outputValueClass - The class of the output value.
  
  job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
  
  addDependencyJars - upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).
  
  Throws:
  
  IOException
- initTableSnapshotMapJob
  
  public static void initTableSnapshotMapJob(String snapshotName, String columns, Class<? extends TableMap> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapred.JobConf job, boolean addDependencyJars, org.apache.hadoop.fs.Path tmpRestoreDir) throws IOException
  
  Sets up the job for reading from a table snapshot. It bypasses hbase servers and read directly from snapshot files.
  Parameters:
  
  snapshotName - The name of the snapshot (of a table) to read from.
  
  columns - The columns to scan.
  
  mapper - The mapper class to use.
  
  outputKeyClass - The class of the output key.
  
  outputValueClass - The class of the output value.
  
  job - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
  
  addDependencyJars - upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).
  
  tmpRestoreDir - a temporary directory to copy the snapshot files into. Current user should have write permissions to this directory, and this should not be a subdirectory of rootdir. After the job is finished, restore directory can be deleted.
  
  Throws:
  
  IOException - When setting up the details fails.
  
  See Also:
  
  TableSnapshotInputFormat
- initTableSnapshotMapJob
  
  public static void initTableSnapshotMapJob(String snapshotName, String columns, Class<? extends TableMap> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapred.JobConf jobConf, boolean addDependencyJars, org.apache.hadoop.fs.Path tmpRestoreDir, RegionSplitter.SplitAlgorithm splitAlgo, int numSplitsPerRegion) throws IOException
  
  Sets up the job for reading from a table snapshot. It bypasses hbase servers and read directly from snapshot files.
  Parameters:
  
  snapshotName - The name of the snapshot (of a table) to read from.
  
  columns - The columns to scan.
  
  mapper - The mapper class to use.
  
  outputKeyClass - The class of the output key.
  
  outputValueClass - The class of the output value.
  
  jobConf - The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.
  
  addDependencyJars - upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).
  
  tmpRestoreDir - a temporary directory to copy the snapshot files into. Current user should have write permissions to this directory, and this should not be a subdirectory of rootdir. After the job is finished, restore directory can be deleted.
  
  splitAlgo - algorithm to split
  
  numSplitsPerRegion - how many input splits to generate per one region
  
  Throws:
  
  IOException - When setting up the details fails.
  
  See Also:
  
  TableSnapshotInputFormat
- initTableReduceJob
  
  public static void initTableReduceJob(String table, Class<? extends TableReduce> reducer, org.apache.hadoop.mapred.JobConf job) throws IOException
  
  Use this before submitting a TableReduce job. It will appropriately set up the JobConf.
  
  Parameters:
  
  table - The output table.
  
  reducer - The reducer class to use.
  
  job - The current job configuration to adjust.
  
  Throws:
  
  IOException - When determining the region count fails.
- initTableReduceJob
  
  public static void initTableReduceJob(String table, Class<? extends TableReduce> reducer, org.apache.hadoop.mapred.JobConf job, Class partitioner) throws IOException
  
  Use this before submitting a TableReduce job. It will appropriately set up the JobConf.
  
  Parameters:
  
  table - The output table.
  
  reducer - The reducer class to use.
  
  job - The current job configuration to adjust.
  
  partitioner - Partitioner to use. Pass null to use default partitioner.
  
  Throws:
  
  IOException - When determining the region count fails.
- initTableReduceJob
  
  public static void initTableReduceJob(String table, Class<? extends TableReduce> reducer, org.apache.hadoop.mapred.JobConf job, Class partitioner, boolean addDependencyJars) throws IOException
  
  Use this before submitting a TableReduce job. It will appropriately set up the JobConf.
  
  Parameters:
  
  table - The output table.
  
  reducer - The reducer class to use.
  
  job - The current job configuration to adjust.
  
  partitioner - Partitioner to use. Pass null to use default partitioner.
  
  addDependencyJars - upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).
  
  Throws:
  
  IOException - When determining the region count fails.
- initCredentials
  
  public static void initCredentials(org.apache.hadoop.mapred.JobConf job) throws IOException
  
  Throws:
  
  IOException
- limitNumReduceTasks
  
  public static void limitNumReduceTasks(String table, org.apache.hadoop.mapred.JobConf job) throws IOException
  
  Ensures that the given number of reduce tasks for the given job configuration does not exceed the number of regions for the given table.
  
  Parameters:
  
  table - The table to get the region count for.
  
  job - The current job configuration to adjust.
  
  Throws:
  
  IOException - When retrieving the table details fails.
- limitNumMapTasks
  
  public static void limitNumMapTasks(String table, org.apache.hadoop.mapred.JobConf job) throws IOException
  
  Ensures that the given number of map tasks for the given job configuration does not exceed the number of regions for the given table.
  
  Parameters:
  
  table - The table to get the region count for.
  
  job - The current job configuration to adjust.
  
  Throws:
  
  IOException - When retrieving the table details fails.
- setNumReduceTasks
  
  public static void setNumReduceTasks(String table, org.apache.hadoop.mapred.JobConf job) throws IOException
  
  Sets the number of reduce tasks for the given job configuration to the number of regions the given table has.
  
  Parameters:
  
  table - The table to get the region count for.
  
  job - The current job configuration to adjust.
  
  Throws:
  
  IOException - When retrieving the table details fails.
- setNumMapTasks
  
  public static void setNumMapTasks(String table, org.apache.hadoop.mapred.JobConf job) throws IOException
  
  Sets the number of map tasks for the given job configuration to the number of regions the given table has.
  
  Parameters:
  
  table - The table to get the region count for.
  
  job - The current job configuration to adjust.
  
  Throws:
  
  IOException - When retrieving the table details fails.
- setScannerCaching
  
  public static void setScannerCaching(org.apache.hadoop.mapred.JobConf job, int batchSize)
  
  Sets the number of rows to return and cache with each scanner iteration. Higher caching values will enable faster mapreduce jobs at the expense of requiring more heap to contain the cached rows.
  
  Parameters:
  
  job - The current job configuration to adjust.
  
  batchSize - The number of rows to return in batch with each scanner iteration.
- addDependencyJars
  
  public static void addDependencyJars(org.apache.hadoop.mapred.JobConf job) throws IOException
  Throws:
  
  IOException
  
  See Also:
  
  TableMapReduceUtil.addDependencyJars(org.apache.hadoop.mapreduce.Job)
- getRegionCount
  
  private static int getRegionCount(org.apache.hadoop.conf.Configuration conf, TableName tableName) throws IOException
  
  Throws:
  
  IOException

Class TableMapReduceUtil

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

LOG

Constructor Details

TableMapReduceUtil

Method Details

initTableMapJob

initTableMapJob

initTableMapJob

initMultiTableSnapshotMapperJob

initTableSnapshotMapJob

initTableSnapshotMapJob

initTableReduceJob

initTableReduceJob

initTableReduceJob

initCredentials

limitNumReduceTasks

limitNumMapTasks

setNumReduceTasks

setNumMapTasks

setScannerCaching

addDependencyJars

getRegionCount