Class TableMapReduceUtil
java.lang.Object
org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil
Utility for
TableMapper and TableReducer-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic voidaddDependencyJars(org.apache.hadoop.conf.Configuration conf, Class<?>... classes) Deprecated.since 1.3.0 and will be removed in 3.0.0.static voidaddDependencyJars(org.apache.hadoop.mapreduce.Job job) Add the HBase dependency jars as well as jars for any of the configured job classes to the job configuration, so that JobClient will ship them to the cluster and add them to the DistributedCache.static voidaddDependencyJarsForClasses(org.apache.hadoop.conf.Configuration conf, Class<?>... classes) Add the jars containing the given classes to the job's configuration such that JobClient will ship them to the cluster and add them to the DistributedCache.static voidaddHBaseDependencyJars(org.apache.hadoop.conf.Configuration conf) Add HBase and its dependencies (only) to the job configuration.static StringbuildDependencyClasspath(org.apache.hadoop.conf.Configuration conf) Returns a classpath string built from the content of the "tmpjars" value inconf.static StringconvertScanToString(Scan scan) Writes the given scan into a Base64 encoded string.static ScanconvertStringToScan(String base64) Converts the given Base64 string back into a Scan instance.private static StringfindContainingJar(Class<?> my_class, Map<String, String> packagedClasses) Find a jar that contains a class of the same name, if any.private static org.apache.hadoop.fs.PathfindOrCreateJar(Class<?> my_class, org.apache.hadoop.fs.FileSystem fs, Map<String, String> packagedClasses) Finds the Jar for a class or creates it if it doesn't exist.private static Class<? extends org.apache.hadoop.mapreduce.InputFormat>getConfiguredInputFormat(org.apache.hadoop.mapreduce.Job job) private static StringInvoke 'getJar' on a custom JarFinder implementation.private static intgetRegionCount(org.apache.hadoop.conf.Configuration conf, TableName tableName) static voidinitCredentials(org.apache.hadoop.mapreduce.Job job) static voidinitCredentialsForCluster(org.apache.hadoop.mapreduce.Job job, String quorumAddress) Deprecated.Since 1.2.0 and will be removed in 3.0.0.static voidinitCredentialsForCluster(org.apache.hadoop.mapreduce.Job job, org.apache.hadoop.conf.Configuration conf) Obtain an authentication token, for the specified cluster, on behalf of the current user and add it to the credentials for the given map reduce job.static voidinitMultiTableSnapshotMapperJob(Map<String, Collection<Scan>> snapshotScans, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, org.apache.hadoop.fs.Path tmpRestoreDir) Sets up the job for reading from one or more table snapshots, with one or more scans per snapshot.static voidinitTableMapperJob(byte[] table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job) Use this before submitting a TableMap job.static voidinitTableMapperJob(byte[] table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars) Use this before submitting a TableMap job.static voidinitTableMapperJob(byte[] table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, Class<? extends org.apache.hadoop.mapreduce.InputFormat> inputFormatClass) Use this before submitting a TableMap job.static voidinitTableMapperJob(String table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job) Use this before submitting a TableMap job.static voidinitTableMapperJob(String table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars) Use this before submitting a TableMap job.static voidinitTableMapperJob(String table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, boolean initCredentials, Class<? extends org.apache.hadoop.mapreduce.InputFormat> inputFormatClass) Use this before submitting a TableMap job.static voidinitTableMapperJob(String table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, Class<? extends org.apache.hadoop.mapreduce.InputFormat> inputFormatClass) Use this before submitting a TableMap job.static voidinitTableMapperJob(List<Scan> scans, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job) Use this before submitting a Multi TableMap job.static voidinitTableMapperJob(List<Scan> scans, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars) Use this before submitting a Multi TableMap job.static voidinitTableMapperJob(List<Scan> scans, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, boolean initCredentials) Use this before submitting a Multi TableMap job.static voidinitTableMapperJob(TableName table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job) Use this before submitting a TableMap job.static voidinitTableReducerJob(String table, Class<? extends TableReducer> reducer, org.apache.hadoop.mapreduce.Job job) Use this before submitting a TableReduce job.static voidinitTableReducerJob(String table, Class<? extends TableReducer> reducer, org.apache.hadoop.mapreduce.Job job, Class partitioner) Use this before submitting a TableReduce job.static voidinitTableReducerJob(String table, Class<? extends TableReducer> reducer, org.apache.hadoop.mapreduce.Job job, Class partitioner, String quorumAddress, String serverClass, String serverImpl) Use this before submitting a TableReduce job.static voidinitTableReducerJob(String table, Class<? extends TableReducer> reducer, org.apache.hadoop.mapreduce.Job job, Class partitioner, String quorumAddress, String serverClass, String serverImpl, boolean addDependencyJars) Use this before submitting a TableReduce job.static voidinitTableSnapshotMapperJob(String snapshotName, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, org.apache.hadoop.fs.Path tmpRestoreDir) Sets up the job for reading from a table snapshot.static voidinitTableSnapshotMapperJob(String snapshotName, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, org.apache.hadoop.fs.Path tmpRestoreDir, RegionSplitter.SplitAlgorithm splitAlgo, int numSplitsPerRegion) Sets up the job for reading from a table snapshot.static voidlimitNumReduceTasks(String table, org.apache.hadoop.mapreduce.Job job) Ensures that the given number of reduce tasks for the given job configuration does not exceed the number of regions for the given table.static voidresetCacheConfig(org.apache.hadoop.conf.Configuration conf) Enable a basic on-heap cache for these jobs.static voidsetNumReduceTasks(String table, org.apache.hadoop.mapreduce.Job job) Sets the number of reduce tasks for the given job configuration to the number of regions the given table has.static voidsetScannerCaching(org.apache.hadoop.mapreduce.Job job, int batchSize) Sets the number of rows to return and cache with each scanner iteration.private static voidAdd entries topackagedClassescorresponding to class files contained injar.
-
Field Details
-
LOG
-
TABLE_INPUT_CLASS_KEY
- See Also:
-
-
Constructor Details
-
TableMapReduceUtil
public TableMapReduceUtil()
-
-
Method Details
-
initTableMapperJob
public static void initTableMapperJob(String table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job) throws IOException Use this before submitting a TableMap job. It will appropriately set up the job.- Parameters:
table- The table name to read from.scan- The scan instance with the columns, time range etc.mapper- The mapper class to use.outputKeyClass- The class of the output key.outputValueClass- The class of the output value.job- The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.- Throws:
IOException- When setting up the details fails.
-
initTableMapperJob
public static void initTableMapperJob(TableName table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job) throws IOException Use this before submitting a TableMap job. It will appropriately set up the job.- Parameters:
table- The table name to read from.scan- The scan instance with the columns, time range etc.mapper- The mapper class to use.outputKeyClass- The class of the output key.outputValueClass- The class of the output value.job- The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.- Throws:
IOException- When setting up the details fails.
-
initTableMapperJob
public static void initTableMapperJob(byte[] table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job) throws IOException Use this before submitting a TableMap job. It will appropriately set up the job.- Parameters:
table- Binary representation of the table name to read from.scan- The scan instance with the columns, time range etc.mapper- The mapper class to use.outputKeyClass- The class of the output key.outputValueClass- The class of the output value.job- The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.- Throws:
IOException- When setting up the details fails.
-
initTableMapperJob
public static void initTableMapperJob(String table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, Class<? extends org.apache.hadoop.mapreduce.InputFormat> inputFormatClass) throws IOException Use this before submitting a TableMap job. It will appropriately set up the job.- Parameters:
table- The table name to read from.scan- The scan instance with the columns, time range etc.mapper- The mapper class to use.outputKeyClass- The class of the output key.outputValueClass- The class of the output value.job- The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.addDependencyJars- upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).- Throws:
IOException- When setting up the details fails.
-
initTableMapperJob
public static void initTableMapperJob(String table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, boolean initCredentials, Class<? extends org.apache.hadoop.mapreduce.InputFormat> inputFormatClass) throws IOException Use this before submitting a TableMap job. It will appropriately set up the job.- Parameters:
table- The table name to read from.scan- The scan instance with the columns, time range etc.mapper- The mapper class to use.outputKeyClass- The class of the output key.outputValueClass- The class of the output value.job- The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.addDependencyJars- upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).initCredentials- whether to initialize hbase auth credentials for the jobinputFormatClass- the input format- Throws:
IOException- When setting up the details fails.
-
initTableMapperJob
public static void initTableMapperJob(byte[] table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, Class<? extends org.apache.hadoop.mapreduce.InputFormat> inputFormatClass) throws IOException Use this before submitting a TableMap job. It will appropriately set up the job.- Parameters:
table- Binary representation of the table name to read from.scan- The scan instance with the columns, time range etc.mapper- The mapper class to use.outputKeyClass- The class of the output key.outputValueClass- The class of the output value.job- The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.addDependencyJars- upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).inputFormatClass- The class of the input format- Throws:
IOException- When setting up the details fails.
-
initTableMapperJob
public static void initTableMapperJob(byte[] table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars) throws IOException Use this before submitting a TableMap job. It will appropriately set up the job.- Parameters:
table- Binary representation of the table name to read from.scan- The scan instance with the columns, time range etc.mapper- The mapper class to use.outputKeyClass- The class of the output key.outputValueClass- The class of the output value.job- The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.addDependencyJars- upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).- Throws:
IOException- When setting up the details fails.
-
getConfiguredInputFormat
private static Class<? extends org.apache.hadoop.mapreduce.InputFormat> getConfiguredInputFormat(org.apache.hadoop.mapreduce.Job job) - Returns:
TableInputFormat.class unless Configuration has something else atTABLE_INPUT_CLASS_KEY.
-
initTableMapperJob
public static void initTableMapperJob(String table, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars) throws IOException Use this before submitting a TableMap job. It will appropriately set up the job.- Parameters:
table- The table name to read from.scan- The scan instance with the columns, time range etc.mapper- The mapper class to use.outputKeyClass- The class of the output key.outputValueClass- The class of the output value.job- The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.addDependencyJars- upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).- Throws:
IOException- When setting up the details fails.
-
resetCacheConfig
Enable a basic on-heap cache for these jobs. Any BlockCache implementation based on direct memory will likely cause the map tasks to OOM when opening the region. This is done here instead of in TableSnapshotRegionRecordReader in case an advanced user wants to override this behavior in their job. -
initMultiTableSnapshotMapperJob
public static void initMultiTableSnapshotMapperJob(Map<String, Collection<Scan>> snapshotScans, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, org.apache.hadoop.fs.Path tmpRestoreDir) throws IOExceptionSets up the job for reading from one or more table snapshots, with one or more scans per snapshot. It bypasses hbase servers and read directly from snapshot files.- Parameters:
snapshotScans- map of snapshot name to scans on that snapshot.mapper- The mapper class to use.outputKeyClass- The class of the output key.outputValueClass- The class of the output value.job- The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.addDependencyJars- upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).- Throws:
IOException
-
initTableSnapshotMapperJob
public static void initTableSnapshotMapperJob(String snapshotName, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, org.apache.hadoop.fs.Path tmpRestoreDir) throws IOException Sets up the job for reading from a table snapshot. It bypasses hbase servers and read directly from snapshot files.- Parameters:
snapshotName- The name of the snapshot (of a table) to read from.scan- The scan instance with the columns, time range etc.mapper- The mapper class to use.outputKeyClass- The class of the output key.outputValueClass- The class of the output value.job- The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.addDependencyJars- upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).tmpRestoreDir- a temporary directory to copy the snapshot files into. Current user should have write permissions to this directory, and this should not be a subdirectory of rootdir. After the job is finished, restore directory can be deleted.- Throws:
IOException- When setting up the details fails.- See Also:
-
initTableSnapshotMapperJob
public static void initTableSnapshotMapperJob(String snapshotName, Scan scan, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, org.apache.hadoop.fs.Path tmpRestoreDir, RegionSplitter.SplitAlgorithm splitAlgo, int numSplitsPerRegion) throws IOException Sets up the job for reading from a table snapshot. It bypasses hbase servers and read directly from snapshot files.- Parameters:
snapshotName- The name of the snapshot (of a table) to read from.scan- The scan instance with the columns, time range etc.mapper- The mapper class to use.outputKeyClass- The class of the output key.outputValueClass- The class of the output value.job- The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.addDependencyJars- upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).tmpRestoreDir- a temporary directory to copy the snapshot files into. Current user should have write permissions to this directory, and this should not be a subdirectory of rootdir. After the job is finished, restore directory can be deleted.splitAlgo- algorithm to splitnumSplitsPerRegion- how many input splits to generate per one region- Throws:
IOException- When setting up the details fails.- See Also:
-
initTableMapperJob
public static void initTableMapperJob(List<Scan> scans, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job) throws IOException Use this before submitting a Multi TableMap job. It will appropriately set up the job.- Parameters:
scans- The list ofScanobjects to read from.mapper- The mapper class to use.outputKeyClass- The class of the output key.outputValueClass- The class of the output value.job- The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.- Throws:
IOException- When setting up the details fails.
-
initTableMapperJob
public static void initTableMapperJob(List<Scan> scans, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars) throws IOException Use this before submitting a Multi TableMap job. It will appropriately set up the job.- Parameters:
scans- The list ofScanobjects to read from.mapper- The mapper class to use.outputKeyClass- The class of the output key.outputValueClass- The class of the output value.job- The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.addDependencyJars- upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).- Throws:
IOException- When setting up the details fails.
-
initTableMapperJob
public static void initTableMapperJob(List<Scan> scans, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job, boolean addDependencyJars, boolean initCredentials) throws IOException Use this before submitting a Multi TableMap job. It will appropriately set up the job.- Parameters:
scans- The list ofScanobjects to read from.mapper- The mapper class to use.outputKeyClass- The class of the output key.outputValueClass- The class of the output value.job- The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.addDependencyJars- upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).initCredentials- whether to initialize hbase auth credentials for the job- Throws:
IOException- When setting up the details fails.
-
initCredentials
- Throws:
IOException
-
initCredentialsForCluster
@Deprecated public static void initCredentialsForCluster(org.apache.hadoop.mapreduce.Job job, String quorumAddress) throws IOException Deprecated.Since 1.2.0 and will be removed in 3.0.0. UseinitCredentialsForCluster(Job, Configuration)instead.Obtain an authentication token, for the specified cluster, on behalf of the current user and add it to the credentials for the given map reduce job. The quorumAddress is the key to the ZK ensemble, which contains: hbase.zookeeper.quorum, hbase.zookeeper.client.port and zookeeper.znode.parent- Parameters:
job- The job that requires the permission.quorumAddress- string that contains the 3 required configuratins- Throws:
IOException- When the authentication token cannot be obtained.- See Also:
-
initCredentialsForCluster
public static void initCredentialsForCluster(org.apache.hadoop.mapreduce.Job job, org.apache.hadoop.conf.Configuration conf) throws IOException Obtain an authentication token, for the specified cluster, on behalf of the current user and add it to the credentials for the given map reduce job.- Parameters:
job- The job that requires the permission.conf- The configuration to use in connecting to the peer cluster- Throws:
IOException- When the authentication token cannot be obtained.
-
convertScanToString
Writes the given scan into a Base64 encoded string.- Parameters:
scan- The scan to write out.- Returns:
- The scan saved in a Base64 encoded string.
- Throws:
IOException- When writing the scan fails.
-
convertStringToScan
Converts the given Base64 string back into a Scan instance.- Parameters:
base64- The scan details.- Returns:
- The newly created Scan instance.
- Throws:
IOException- When reading the scan instance fails.
-
initTableReducerJob
public static void initTableReducerJob(String table, Class<? extends TableReducer> reducer, org.apache.hadoop.mapreduce.Job job) throws IOException Use this before submitting a TableReduce job. It will appropriately set up the JobConf.- Parameters:
table- The output table.reducer- The reducer class to use.job- The current job to adjust.- Throws:
IOException- When determining the region count fails.
-
initTableReducerJob
public static void initTableReducerJob(String table, Class<? extends TableReducer> reducer, org.apache.hadoop.mapreduce.Job job, Class partitioner) throws IOException Use this before submitting a TableReduce job. It will appropriately set up the JobConf.- Parameters:
table- The output table.reducer- The reducer class to use.job- The current job to adjust.partitioner- Partitioner to use. Passnullto use default partitioner.- Throws:
IOException- When determining the region count fails.
-
initTableReducerJob
public static void initTableReducerJob(String table, Class<? extends TableReducer> reducer, org.apache.hadoop.mapreduce.Job job, Class partitioner, String quorumAddress, String serverClass, String serverImpl) throws IOException Use this before submitting a TableReduce job. It will appropriately set up the JobConf.- Parameters:
table- The output table.reducer- The reducer class to use.job- The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.partitioner- Partitioner to use. Passnullto use default partitioner.quorumAddress- Distant cluster to write to; default is null for output to the cluster that is designated inhbase-site.xml. Set this String to the zookeeper ensemble of an alternate remote cluster when you would have the reduce write a cluster that is other than the default; e.g. copying tables between clusters, the source would be designated byhbase-site.xmland this param would have the ensemble address of the remote cluster. The format to pass is particular. Pass<hbase.zookeeper.quorum>:< hbase.zookeeper.client.port>:<zookeeper.znode.parent>such asserver,server2,server3:2181:/hbase.serverClass- redefined hbase.regionserver.classserverImpl- redefined hbase.regionserver.impl- Throws:
IOException- When determining the region count fails.
-
initTableReducerJob
public static void initTableReducerJob(String table, Class<? extends TableReducer> reducer, org.apache.hadoop.mapreduce.Job job, Class partitioner, String quorumAddress, String serverClass, String serverImpl, boolean addDependencyJars) throws IOException Use this before submitting a TableReduce job. It will appropriately set up the JobConf.- Parameters:
table- The output table.reducer- The reducer class to use.job- The current job to adjust. Make sure the passed job is carrying all necessary HBase configuration.partitioner- Partitioner to use. Passnullto use default partitioner.quorumAddress- Distant cluster to write to; default is null for output to the cluster that is designated inhbase-site.xml. Set this String to the zookeeper ensemble of an alternate remote cluster when you would have the reduce write a cluster that is other than the default; e.g. copying tables between clusters, the source would be designated byhbase-site.xmland this param would have the ensemble address of the remote cluster. The format to pass is particular. Pass<hbase.zookeeper.quorum>:< hbase.zookeeper.client.port>:<zookeeper.znode.parent>such asserver,server2,server3:2181:/hbase.serverClass- redefined hbase.regionserver.classserverImpl- redefined hbase.regionserver.impladdDependencyJars- upload HBase jars and jars for any of the configured job classes via the distributed cache (tmpjars).- Throws:
IOException- When determining the region count fails.
-
limitNumReduceTasks
public static void limitNumReduceTasks(String table, org.apache.hadoop.mapreduce.Job job) throws IOException Ensures that the given number of reduce tasks for the given job configuration does not exceed the number of regions for the given table.- Parameters:
table- The table to get the region count for.job- The current job to adjust.- Throws:
IOException- When retrieving the table details fails.
-
setNumReduceTasks
public static void setNumReduceTasks(String table, org.apache.hadoop.mapreduce.Job job) throws IOException Sets the number of reduce tasks for the given job configuration to the number of regions the given table has.- Parameters:
table- The table to get the region count for.job- The current job to adjust.- Throws:
IOException- When retrieving the table details fails.
-
setScannerCaching
Sets the number of rows to return and cache with each scanner iteration. Higher caching values will enable faster mapreduce jobs at the expense of requiring more heap to contain the cached rows.- Parameters:
job- The current job to adjust.batchSize- The number of rows to return in batch with each scanner iteration.
-
addHBaseDependencyJars
public static void addHBaseDependencyJars(org.apache.hadoop.conf.Configuration conf) throws IOException Add HBase and its dependencies (only) to the job configuration.This is intended as a low-level API, facilitating code reuse between this class and its mapred counterpart. It also of use to external tools that need to build a MapReduce job that interacts with HBase but want fine-grained control over the jars shipped to the cluster.
- Parameters:
conf- The Configuration object to extend with dependencies.- Throws:
IOException- See Also:
-
buildDependencyClasspath
Returns a classpath string built from the content of the "tmpjars" value inconf. Also exposed to shell scripts via `bin/hbase mapredcp`. -
addDependencyJars
Add the HBase dependency jars as well as jars for any of the configured job classes to the job configuration, so that JobClient will ship them to the cluster and add them to the DistributedCache.- Throws:
IOException
-
addDependencyJars
@Deprecated public static void addDependencyJars(org.apache.hadoop.conf.Configuration conf, Class<?>... classes) throws IOException Deprecated.since 1.3.0 and will be removed in 3.0.0. UseaddDependencyJars(Job)instead.Add the jars containing the given classes to the job's configuration such that JobClient will ship them to the cluster and add them to the DistributedCache.- Throws:
IOException- See Also:
-
addDependencyJarsForClasses
@Private public static void addDependencyJarsForClasses(org.apache.hadoop.conf.Configuration conf, Class<?>... classes) throws IOException Add the jars containing the given classes to the job's configuration such that JobClient will ship them to the cluster and add them to the DistributedCache. N.B. that this method at most adds one jar per class given. If there is more than one jar available containing a class with the same name as a given class, we don't define which of those jars might be chosen.- Parameters:
conf- The Hadoop Configuration to modifyclasses- will add just those dependencies needed to find the given classes- Throws:
IOException- if an underlying library call fails.
-
findOrCreateJar
private static org.apache.hadoop.fs.Path findOrCreateJar(Class<?> my_class, org.apache.hadoop.fs.FileSystem fs, Map<String, String> packagedClasses) throws IOExceptionFinds the Jar for a class or creates it if it doesn't exist. If the class is in a directory in the classpath, it creates a Jar on the fly with the contents of the directory and returns the path to that Jar. If a Jar is created, it is created in the system temporary directory. Otherwise, returns an existing jar that contains a class of the same name. Maintains a mapping from jar contents to the tmp jar created.- Parameters:
my_class- the class to find.fs- the FileSystem with which to qualify the returned path.packagedClasses- a map of class name to path.- Returns:
- a jar file that contains the class.
- Throws:
IOException
-
updateMap
Add entries topackagedClassescorresponding to class files contained injar.- Parameters:
jar- The jar who's content to list.packagedClasses- map[class -> jar]- Throws:
IOException
-
findContainingJar
private static String findContainingJar(Class<?> my_class, Map<String, String> packagedClasses) throws IOExceptionFind a jar that contains a class of the same name, if any. It will return a jar file, even if that is not the first thing on the class path that has a class with the same name. Looks first on the classpath and then in thepackagedClassesmap.- Parameters:
my_class- the class to find.- Returns:
- a jar file that contains the class, or null.
- Throws:
IOException
-
getJar
Invoke 'getJar' on a custom JarFinder implementation. Useful for some job configuration contexts (HBASE-8140) and also for testing on MRv2. check if we have HADOOP-9426.- Parameters:
my_class- the class to find.- Returns:
- a jar file that contains the class, or null.
-
getRegionCount
private static int getRegionCount(org.apache.hadoop.conf.Configuration conf, TableName tableName) throws IOException - Throws:
IOException
-