java.lang.Object

org.apache.hadoop.conf.Configured

org.apache.hadoop.hbase.mapreduce.Import

All Implemented Interfaces:: org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

@Public public class Import extends org.apache.hadoop.conf.Configured implements org.apache.hadoop.util.Tool

Import data written by Export.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static class

Import.CellImporter

A mapper that just writes out KeyValues.

static class

Import.CellReducer

static class

Import.CellSortImporter

static class

Import.CellWritableComparable

static class

Import.CellWritableComparablePartitioner

static class

Import.Importer

Write table content out to files in hdfs.
Field Summary

Fields

Modifier and Type

Field

Description

static final String

BULK_OUTPUT_CONF_KEY

static final String

CF_RENAME_PROP

static final String

FILTER_ARGS_CONF_KEY

static final String

FILTER_CLASS_CONF_KEY

static final String

HAS_LARGE_RESULT

private static final String

JOB_NAME_CONF_KEY

private static final org.slf4j.Logger

LOG

(package private) static final String

NAME

static final String

TABLE_NAME

static final String

WAL_DURABILITY
Constructor Summary

Constructors

Constructor

Description

Import()
Method Summary

Modifier and Type

Method

Description

static void

addFilterAndArguments(org.apache.hadoop.conf.Configuration conf, Class<? extends Filter> clazz, List<String> filterArgs)

Add a Filter to be instantiated on import

static void

configureCfRenaming(org.apache.hadoop.conf.Configuration conf, Map<String,String> renameMap)

Sets a configuration property with key CF_RENAME_PROP in conf that tells the mapper how to rename column families.

private static ExtendedCell

convertKv(ExtendedCell kv, Map<byte[],byte[]> cfRenameMap)

private static Map<byte[],byte[]>

createCfRenameMap(org.apache.hadoop.conf.Configuration conf)

static org.apache.hadoop.mapreduce.Job

createSubmittableJob(org.apache.hadoop.conf.Configuration conf, String[] args)

Sets up the actual job.

static ExtendedCell

filterKv(Filter filter, ExtendedCell c)

Attempt to filter out the keyvalue

static void

flushRegionsIfNecessary(org.apache.hadoop.conf.Configuration conf)

If the durability is set to Durability.SKIP_WAL and the data is imported to hbase, we need to flush all the regions of the table as the data is held in memory and is also not present in the Write Ahead Log to replay in scenarios of a crash.

static Filter

instantiateFilter(org.apache.hadoop.conf.Configuration conf)

Create a Filter to apply to all incoming keys (KeyValues) to optionally not include in the job output

static void

main(String[] args)

Main entry point.

int

run(String[] args)

private static ArrayList<byte[]>

toQuotedByteArrays(String... stringArgs)

private static void

usage(String errorMsg)

Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf

Field Details
- LOG
  
  private static final org.slf4j.Logger LOG
- NAME
  
  static final String NAME
  See Also:
  
  Constant Field Values
- CF_RENAME_PROP
  
  public static final String CF_RENAME_PROP
  See Also:
  
  Constant Field Values
- BULK_OUTPUT_CONF_KEY
  
  public static final String BULK_OUTPUT_CONF_KEY
  See Also:
  
  Constant Field Values
- FILTER_CLASS_CONF_KEY
  
  public static final String FILTER_CLASS_CONF_KEY
  See Also:
  
  Constant Field Values
- FILTER_ARGS_CONF_KEY
  
  public static final String FILTER_ARGS_CONF_KEY
  See Also:
  
  Constant Field Values
- TABLE_NAME
  
  public static final String TABLE_NAME
  See Also:
  
  Constant Field Values
- WAL_DURABILITY
  
  public static final String WAL_DURABILITY
  See Also:
  
  Constant Field Values
- HAS_LARGE_RESULT
  
  public static final String HAS_LARGE_RESULT
  See Also:
  
  Constant Field Values
- JOB_NAME_CONF_KEY
  
  private static final String JOB_NAME_CONF_KEY
  See Also:
  
  Constant Field Values
Constructor Details
- Import
  
  public Import()
Method Details
- instantiateFilter
  
  public static Filter instantiateFilter(org.apache.hadoop.conf.Configuration conf)
  
  Create a Filter to apply to all incoming keys (KeyValues) to optionally not include in the job output
  
  Parameters:
  
  conf - Configuration from which to load the filter
  
  Returns:
  
  the filter to use for the task, or null if no filter to should be used
  
  Throws:
  
  IllegalArgumentException - if the filter is misconfigured
- toQuotedByteArrays
  
  private static ArrayList<byte[]> toQuotedByteArrays(String... stringArgs)
- filterKv
  
  public static ExtendedCell filterKv(Filter filter, ExtendedCell c) throws IOException
  
  Attempt to filter out the keyvalue
  
  Parameters:
  
  c - Cell on which to apply the filter
  
  Returns:
  
  null if the key should not be written, otherwise returns the original Cell
  
  Throws:
  
  IOException
- convertKv
  
  private static ExtendedCell convertKv(ExtendedCell kv, Map<byte[],byte[]> cfRenameMap)
- createCfRenameMap
  
  private static Map<byte[],byte[]> createCfRenameMap(org.apache.hadoop.conf.Configuration conf)
- configureCfRenaming
  
  public static void configureCfRenaming(org.apache.hadoop.conf.Configuration conf, Map<String,String> renameMap)
  Sets a configuration property with key CF_RENAME_PROP in conf that tells the mapper how to rename column families.
  Alternately, instead of calling this function, you could set the configuration key CF_RENAME_PROP yourself. The value should look like
  srcCf1:destCf1,srcCf2:destCf2,....
  . This would have the same effect on the mapper behavior.
  Parameters:
  
  conf - the Configuration in which the CF_RENAME_PROP key will be set
  
  renameMap - a mapping from source CF names to destination CF names
- addFilterAndArguments
  
  public static void addFilterAndArguments(org.apache.hadoop.conf.Configuration conf, Class<? extends Filter> clazz, List<String> filterArgs) throws IOException
  
  Add a Filter to be instantiated on import
  
  Parameters:
  
  conf - Configuration to update (will be passed to the job)
  
  clazz - Filter subclass to instantiate on the server.
  
  filterArgs - List of arguments to pass to the filter on instantiation
  
  Throws:
  
  IOException
- createSubmittableJob
  
  public static org.apache.hadoop.mapreduce.Job createSubmittableJob(org.apache.hadoop.conf.Configuration conf, String[] args) throws IOException
  
  Sets up the actual job.
  
  Parameters:
  
  conf - The current configuration.
  
  args - The command line parameters.
  
  Returns:
  
  The newly created job.
  
  Throws:
  
  IOException - When setting up the job fails.
- usage
  
  private static void usage(String errorMsg)
- flushRegionsIfNecessary
  
  public static void flushRegionsIfNecessary(org.apache.hadoop.conf.Configuration conf) throws IOException, InterruptedException
  
  If the durability is set to Durability.SKIP_WAL and the data is imported to hbase, we need to flush all the regions of the table as the data is held in memory and is also not present in the Write Ahead Log to replay in scenarios of a crash. This method flushes all the regions of the table in the scenarios of import data to hbase with Durability.SKIP_WAL
  
  Throws:
  
  IOException
  
  InterruptedException
- run
  
  public int run(String[] args) throws Exception
  
  Specified by:
  
  run in interface org.apache.hadoop.util.Tool
  
  Throws:
  
  Exception
- main
  
  public static void main(String[] args) throws Exception
  
  Main entry point.
  
  Parameters:
  
  args - The command line parameters.
  
  Throws:
  
  Exception - When running the job fails.

Class Import

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class org.apache.hadoop.conf.Configured

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.hadoop.conf.Configurable

Field Details

LOG

NAME

CF_RENAME_PROP

BULK_OUTPUT_CONF_KEY

FILTER_CLASS_CONF_KEY

FILTER_ARGS_CONF_KEY

TABLE_NAME

WAL_DURABILITY

HAS_LARGE_RESULT

JOB_NAME_CONF_KEY

Constructor Details

Import

Method Details

instantiateFilter

toQuotedByteArrays

filterKv

convertKv

createCfRenameMap

configureCfRenaming

addFilterAndArguments

createSubmittableJob

usage

flushRegionsIfNecessary

run

main