Class Import
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.hadoop.hbase.mapreduce.Import
- All Implemented Interfaces:
org.apache.hadoop.conf.Configurable
,org.apache.hadoop.util.Tool
@Public
public class Import
extends org.apache.hadoop.conf.Configured
implements org.apache.hadoop.util.Tool
Import data written by
Export
.-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
A mapper that just writes out KeyValues.static class
static class
static class
static class
static class
Write table content out to files in hdfs. -
Field Summary
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic void
addFilterAndArguments
(org.apache.hadoop.conf.Configuration conf, Class<? extends Filter> clazz, List<String> filterArgs) Add a Filter to be instantiated on importstatic void
configureCfRenaming
(org.apache.hadoop.conf.Configuration conf, Map<String, String> renameMap) Sets a configuration property with keyCF_RENAME_PROP
in conf that tells the mapper how to rename column families.private static ExtendedCell
convertKv
(ExtendedCell kv, Map<byte[], byte[]> cfRenameMap) private static Map<byte[],
byte[]> createCfRenameMap
(org.apache.hadoop.conf.Configuration conf) static org.apache.hadoop.mapreduce.Job
createSubmittableJob
(org.apache.hadoop.conf.Configuration conf, String[] args) Sets up the actual job.static ExtendedCell
filterKv
(Filter filter, ExtendedCell c) Attempt to filter out the keyvaluestatic void
flushRegionsIfNecessary
(org.apache.hadoop.conf.Configuration conf) If the durability is set toDurability.SKIP_WAL
and the data is imported to hbase, we need to flush all the regions of the table as the data is held in memory and is also not present in the Write Ahead Log to replay in scenarios of a crash.static Filter
instantiateFilter
(org.apache.hadoop.conf.Configuration conf) static void
Main entry point.int
private static ArrayList<byte[]>
toQuotedByteArrays
(String... stringArgs) private static void
Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
-
Field Details
-
LOG
-
NAME
- See Also:
-
CF_RENAME_PROP
- See Also:
-
BULK_OUTPUT_CONF_KEY
- See Also:
-
FILTER_CLASS_CONF_KEY
- See Also:
-
FILTER_ARGS_CONF_KEY
- See Also:
-
TABLE_NAME
- See Also:
-
WAL_DURABILITY
- See Also:
-
HAS_LARGE_RESULT
- See Also:
-
JOB_NAME_CONF_KEY
- See Also:
-
-
Constructor Details
-
Import
public Import()
-
-
Method Details
-
instantiateFilter
Create aFilter
to apply to all incoming keys (KeyValues
) to optionally not include in the job output- Parameters:
conf
-Configuration
from which to load the filter- Returns:
- the filter to use for the task, or null if no filter to should be used
- Throws:
IllegalArgumentException
- if the filter is misconfigured
-
toQuotedByteArrays
-
filterKv
Attempt to filter out the keyvalue- Parameters:
c
-Cell
on which to apply the filter- Returns:
- null if the key should not be written, otherwise returns the original
Cell
- Throws:
IOException
-
convertKv
-
createCfRenameMap
-
configureCfRenaming
public static void configureCfRenaming(org.apache.hadoop.conf.Configuration conf, Map<String, String> renameMap) Sets a configuration property with key
CF_RENAME_PROP
in conf that tells the mapper how to rename column families.Alternately, instead of calling this function, you could set the configuration key
CF_RENAME_PROP
yourself. The value should look likesrcCf1:destCf1,srcCf2:destCf2,....
. This would have the same effect on the mapper behavior.- Parameters:
conf
- the Configuration in which theCF_RENAME_PROP
key will be setrenameMap
- a mapping from source CF names to destination CF names
-
addFilterAndArguments
public static void addFilterAndArguments(org.apache.hadoop.conf.Configuration conf, Class<? extends Filter> clazz, List<String> filterArgs) throws IOException Add a Filter to be instantiated on import- Parameters:
conf
- Configuration to update (will be passed to the job)clazz
-Filter
subclass to instantiate on the server.filterArgs
- List of arguments to pass to the filter on instantiation- Throws:
IOException
-
createSubmittableJob
public static org.apache.hadoop.mapreduce.Job createSubmittableJob(org.apache.hadoop.conf.Configuration conf, String[] args) throws IOException Sets up the actual job.- Parameters:
conf
- The current configuration.args
- The command line parameters.- Returns:
- The newly created job.
- Throws:
IOException
- When setting up the job fails.
-
usage
-
flushRegionsIfNecessary
public static void flushRegionsIfNecessary(org.apache.hadoop.conf.Configuration conf) throws IOException, InterruptedException If the durability is set toDurability.SKIP_WAL
and the data is imported to hbase, we need to flush all the regions of the table as the data is held in memory and is also not present in the Write Ahead Log to replay in scenarios of a crash. This method flushes all the regions of the table in the scenarios of import data to hbase withDurability.SKIP_WAL
- Throws:
IOException
InterruptedException
-
run
- Specified by:
run
in interfaceorg.apache.hadoop.util.Tool
- Throws:
Exception
-
main
Main entry point.- Parameters:
args
- The command line parameters.- Throws:
Exception
- When running the job fails.
-