Class Import
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.hadoop.hbase.mapreduce.Import
- All Implemented Interfaces:
org.apache.hadoop.conf.Configurable,org.apache.hadoop.util.Tool
@Public
public class Import
extends org.apache.hadoop.conf.Configured
implements org.apache.hadoop.util.Tool
Import data written by
Export.-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic classA mapper that just writes out KeyValues.static classstatic classstatic classstatic classstatic classWrite table content out to files in hdfs. -
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic voidaddFilterAndArguments(org.apache.hadoop.conf.Configuration conf, Class<? extends Filter> clazz, List<String> filterArgs) Add a Filter to be instantiated on importstatic voidconfigureCfRenaming(org.apache.hadoop.conf.Configuration conf, Map<String, String> renameMap) Sets a configuration property with keyCF_RENAME_PROPin conf that tells the mapper how to rename column families.private static ExtendedCellconvertKv(ExtendedCell kv, Map<byte[], byte[]> cfRenameMap) private static Map<byte[],byte[]> createCfRenameMap(org.apache.hadoop.conf.Configuration conf) static org.apache.hadoop.mapreduce.JobcreateSubmittableJob(org.apache.hadoop.conf.Configuration conf, String[] args) Sets up the actual job.static ExtendedCellfilterKv(Filter filter, ExtendedCell c) Attempt to filter out the keyvaluestatic voidflushRegionsIfNecessary(org.apache.hadoop.conf.Configuration conf) If the durability is set toDurability.SKIP_WALand the data is imported to hbase, we need to flush all the regions of the table as the data is held in memory and is also not present in the Write Ahead Log to replay in scenarios of a crash.static FilterinstantiateFilter(org.apache.hadoop.conf.Configuration conf) static voidMain entry point.intprivate static ArrayList<byte[]>toQuotedByteArrays(String... stringArgs) private static voidMethods inherited from class org.apache.hadoop.conf.Configured
getConf, setConfMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
-
Field Details
-
LOG
-
NAME
- See Also:
-
CF_RENAME_PROP
- See Also:
-
BULK_OUTPUT_CONF_KEY
- See Also:
-
FILTER_CLASS_CONF_KEY
- See Also:
-
FILTER_ARGS_CONF_KEY
- See Also:
-
TABLE_NAME
- See Also:
-
WAL_DURABILITY
- See Also:
-
HAS_LARGE_RESULT
- See Also:
-
JOB_NAME_CONF_KEY
- See Also:
-
-
Constructor Details
-
Import
public Import()
-
-
Method Details
-
instantiateFilter
Create aFilterto apply to all incoming keys (KeyValues) to optionally not include in the job output- Parameters:
conf-Configurationfrom which to load the filter- Returns:
- the filter to use for the task, or null if no filter to should be used
- Throws:
IllegalArgumentException- if the filter is misconfigured
-
toQuotedByteArrays
-
filterKv
Attempt to filter out the keyvalue- Parameters:
c-Cellon which to apply the filter- Returns:
- null if the key should not be written, otherwise returns the original
Cell - Throws:
IOException
-
convertKv
-
createCfRenameMap
-
configureCfRenaming
public static void configureCfRenaming(org.apache.hadoop.conf.Configuration conf, Map<String, String> renameMap) Sets a configuration property with key
CF_RENAME_PROPin conf that tells the mapper how to rename column families.Alternately, instead of calling this function, you could set the configuration key
CF_RENAME_PROPyourself. The value should look likesrcCf1:destCf1,srcCf2:destCf2,....
. This would have the same effect on the mapper behavior.- Parameters:
conf- the Configuration in which theCF_RENAME_PROPkey will be setrenameMap- a mapping from source CF names to destination CF names
-
addFilterAndArguments
public static void addFilterAndArguments(org.apache.hadoop.conf.Configuration conf, Class<? extends Filter> clazz, List<String> filterArgs) throws IOException Add a Filter to be instantiated on import- Parameters:
conf- Configuration to update (will be passed to the job)clazz-Filtersubclass to instantiate on the server.filterArgs- List of arguments to pass to the filter on instantiation- Throws:
IOException
-
createSubmittableJob
public static org.apache.hadoop.mapreduce.Job createSubmittableJob(org.apache.hadoop.conf.Configuration conf, String[] args) throws IOException Sets up the actual job.- Parameters:
conf- The current configuration.args- The command line parameters.- Returns:
- The newly created job.
- Throws:
IOException- When setting up the job fails.
-
usage
-
flushRegionsIfNecessary
public static void flushRegionsIfNecessary(org.apache.hadoop.conf.Configuration conf) throws IOException, InterruptedException If the durability is set toDurability.SKIP_WALand the data is imported to hbase, we need to flush all the regions of the table as the data is held in memory and is also not present in the Write Ahead Log to replay in scenarios of a crash. This method flushes all the regions of the table in the scenarios of import data to hbase withDurability.SKIP_WAL- Throws:
IOExceptionInterruptedException
-
run
- Specified by:
runin interfaceorg.apache.hadoop.util.Tool- Throws:
Exception
-
main
Main entry point.- Parameters:
args- The command line parameters.- Throws:
Exception- When running the job fails.
-