org.apache.hadoop.mapreduce.OutputFormat<K,V>

org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<ImmutableBytesWritable,Cell>

org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2

Direct Known Subclasses:: MultiTableHFileOutputFormat

@Public public class HFileOutputFormat2 extends org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<ImmutableBytesWritable,Cell>

Writes HFiles. Passed Cells must arrive in order. Writes current time as the sequence id for the file. Sets the major compacted attribute on created HFiles. Calling write(null,null) will forcibly roll all HFiles being written.

Using this class as part of a MapReduce job is best done using configureIncrementalLoad(Job, TableDescriptor, RegionLocator).

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

(package private) static class

HFileOutputFormat2.TableInfo

(package private) static class

HFileOutputFormat2.WriterLength

Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.Counter
Field Summary

Fields

Modifier and Type

Field

Description

(package private) static final String

BLOCK_SIZE_FAMILIES_CONF_KEY

(package private) static Function<ColumnFamilyDescriptor,String>

blockSizeDetails

Serialize column family to block size map to configuration.

(package private) static final String

BLOOM_PARAM_FAMILIES_CONF_KEY

(package private) static final String

BLOOM_TYPE_FAMILIES_CONF_KEY

(package private) static Function<ColumnFamilyDescriptor,String>

bloomParamDetails

Serialize column family to bloom param map to configuration.

(package private) static Function<ColumnFamilyDescriptor,String>

bloomTypeDetails

Serialize column family to bloom type map to configuration.

(package private) static final String

COMPRESSION_FAMILIES_CONF_KEY

static final String

COMPRESSION_OVERRIDE_CONF_KEY

(package private) static Function<ColumnFamilyDescriptor,String>

compressionDetails

Serialize column family to compression algorithm map to configuration.

(package private) static final String

DATABLOCK_ENCODING_FAMILIES_CONF_KEY

static final String

DATABLOCK_ENCODING_OVERRIDE_CONF_KEY

(package private) static Function<ColumnFamilyDescriptor,String>

dataBlockEncodingDetails

Serialize column family to data block encoding map to configuration.

private static final boolean

DEFAULT_LOCALITY_SENSITIVE

(package private) static final boolean

EXTENDED_CELL_SERIALIZATION_ENABLED_DEFULT

static final String

EXTENDED_CELL_SERIALIZATION_ENABLED_KEY

ExtendedCell and ExtendedCellSerialization are InterfaceAudience.Private.

static final String

LOCALITY_SENSITIVE_CONF_KEY

Keep locality while generating HFiles for bulkload.

private static final org.slf4j.Logger

LOG

(package private) static final String

MULTI_TABLE_HFILEOUTPUTFORMAT_CONF_KEY

(package private) static final String

OUTPUT_TABLE_NAME_CONF_KEY

static final String

REMOTE_CLUSTER_CONF_PREFIX

static final String

REMOTE_CLUSTER_ZOOKEEPER_CLIENT_PORT_CONF_KEY

static final String

REMOTE_CLUSTER_ZOOKEEPER_QUORUM_CONF_KEY

static final String

REMOTE_CLUSTER_ZOOKEEPER_ZNODE_PARENT_CONF_KEY

static final String

STORAGE_POLICY_PROPERTY

static final String

STORAGE_POLICY_PROPERTY_CF_PREFIX

protected static final byte[]

tableSeparator

Fields inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
BASE_OUTPUT_NAME, COMPRESS, COMPRESS_CODEC, COMPRESS_TYPE, OUTDIR, PART
Constructor Summary

Constructors

Constructor

Description

HFileOutputFormat2()
Method Summary

Modifier and Type

Method

Description

protected static byte[]

combineTableNameSuffix(byte[] tableName, byte[] suffix)

(package private) static void

configureIncrementalLoad(org.apache.hadoop.mapreduce.Job job, List<HFileOutputFormat2.TableInfo> multiTableInfo, Class<? extends org.apache.hadoop.mapreduce.OutputFormat<?,?>> cls)

static void

configureIncrementalLoad(org.apache.hadoop.mapreduce.Job job, TableDescriptor tableDescriptor, RegionLocator regionLocator)

Configure a MapReduce Job to perform an incremental load into the given table.

static void

configureIncrementalLoad(org.apache.hadoop.mapreduce.Job job, Table table, RegionLocator regionLocator)

Configure a MapReduce Job to perform an incremental load into the given table.

static void

configureIncrementalLoadMap(org.apache.hadoop.mapreduce.Job job, TableDescriptor tableDescriptor)

(package private) static void

configurePartitioner(org.apache.hadoop.mapreduce.Job job, List<ImmutableBytesWritable> splitPoints, boolean writeMultipleTables)

Configure job with a TotalOrderPartitioner, partitioning against splitPoints.

static void

configureRemoteCluster(org.apache.hadoop.mapreduce.Job job, org.apache.hadoop.conf.Configuration clusterConf)

Configure HBase cluster key for remote cluster to load region location for locality-sensitive if it's enabled.

(package private) static void

configureStoragePolicy(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.FileSystem fs, byte[] tableAndFamily, org.apache.hadoop.fs.Path cfPath)

Configure block storage policy for CF after the directory is created.

(package private) static Map<byte[],Integer>

createFamilyBlockSizeMap(org.apache.hadoop.conf.Configuration conf)

Runs inside the task to deserialize column family to block size map from the configuration.

(package private) static Map<byte[],String>

createFamilyBloomParamMap(org.apache.hadoop.conf.Configuration conf)

Runs inside the task to deserialize column family to bloom filter param map from the configuration.

(package private) static Map<byte[],BloomType>

createFamilyBloomTypeMap(org.apache.hadoop.conf.Configuration conf)

Runs inside the task to deserialize column family to bloom filter type map from the configuration.

(package private) static Map<byte[],Compression.Algorithm>

createFamilyCompressionMap(org.apache.hadoop.conf.Configuration conf)

Runs inside the task to deserialize column family to compression algorithm map from the configuration.

private static Map<byte[],String>

createFamilyConfValueMap(org.apache.hadoop.conf.Configuration conf, String confName)

Run inside the task to deserialize column family to given conf value map.

(package private) static Map<byte[],DataBlockEncoding>

createFamilyDataBlockEncodingMap(org.apache.hadoop.conf.Configuration conf)

Runs inside the task to deserialize column family to data block encoding type map from the configuration.

(package private) static <V extends Cell> org.apache.hadoop.mapreduce.RecordWriter<ImmutableBytesWritable,V>

createRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context, org.apache.hadoop.mapreduce.OutputCommitter committer)

org.apache.hadoop.mapreduce.RecordWriter<ImmutableBytesWritable,Cell>

getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context)

private static List<ImmutableBytesWritable>

getRegionStartKeys(List<RegionLocator> regionLocators, boolean writeMultipleTables)

Return the start keys of all of the regions in this table, as a list of ImmutableBytesWritable.

protected static byte[]

getTableNameSuffixedWithFamily(byte[] tableName, byte[] family)

private static void

mergeSerializations(org.apache.hadoop.conf.Configuration conf)

(package private) static String

serializeColumnFamilyAttribute(Function<ColumnFamilyDescriptor,String> fn, List<TableDescriptor> allTables)

private static void

writePartitions(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path partitionsPath, List<ImmutableBytesWritable> startKeys, boolean writeMultipleTables)

Write out a SequenceFile that can be read by TotalOrderPartitioner that contains the split points in startKeys.

Methods inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
checkOutputSpecs, getCompressOutput, getDefaultWorkFile, getOutputCommitter, getOutputCompressorClass, getOutputName, getOutputPath, getPathForWorkFile, getUniqueFile, getWorkOutputPath, setCompressOutput, setOutputCompressorClass, setOutputName, setOutputPath

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- LOG
  
  private static final org.slf4j.Logger LOG
- tableSeparator
  
  protected static final byte[] tableSeparator
- COMPRESSION_FAMILIES_CONF_KEY
  
  static final String COMPRESSION_FAMILIES_CONF_KEY
  See Also:
  
  Constant Field Values
- BLOOM_TYPE_FAMILIES_CONF_KEY
  
  static final String BLOOM_TYPE_FAMILIES_CONF_KEY
  See Also:
  
  Constant Field Values
- BLOOM_PARAM_FAMILIES_CONF_KEY
  
  static final String BLOOM_PARAM_FAMILIES_CONF_KEY
  See Also:
  
  Constant Field Values
- BLOCK_SIZE_FAMILIES_CONF_KEY
  
  static final String BLOCK_SIZE_FAMILIES_CONF_KEY
  See Also:
  
  Constant Field Values
- DATABLOCK_ENCODING_FAMILIES_CONF_KEY
  
  static final String DATABLOCK_ENCODING_FAMILIES_CONF_KEY
  See Also:
  
  Constant Field Values
- DATABLOCK_ENCODING_OVERRIDE_CONF_KEY
  
  public static final String DATABLOCK_ENCODING_OVERRIDE_CONF_KEY
  See Also:
  
  Constant Field Values
- COMPRESSION_OVERRIDE_CONF_KEY
  
  public static final String COMPRESSION_OVERRIDE_CONF_KEY
  See Also:
  
  Constant Field Values
- LOCALITY_SENSITIVE_CONF_KEY
  
  public static final String LOCALITY_SENSITIVE_CONF_KEY
  
  Keep locality while generating HFiles for bulkload. See HBASE-12596
  See Also:
  
  Constant Field Values
- DEFAULT_LOCALITY_SENSITIVE
  
  private static final boolean DEFAULT_LOCALITY_SENSITIVE
  See Also:
  
  Constant Field Values
- OUTPUT_TABLE_NAME_CONF_KEY
  
  static final String OUTPUT_TABLE_NAME_CONF_KEY
  See Also:
  
  Constant Field Values
- MULTI_TABLE_HFILEOUTPUTFORMAT_CONF_KEY
  
  static final String MULTI_TABLE_HFILEOUTPUTFORMAT_CONF_KEY
  See Also:
  
  Constant Field Values
- EXTENDED_CELL_SERIALIZATION_ENABLED_KEY
  
  @Private public static final String EXTENDED_CELL_SERIALIZATION_ENABLED_KEY
  
  ExtendedCell and ExtendedCellSerialization are InterfaceAudience.Private. We expose this config for internal usage in jobs like WALPlayer which need to use features of ExtendedCell.
  See Also:
  
  Constant Field Values
- EXTENDED_CELL_SERIALIZATION_ENABLED_DEFULT
  
  static final boolean EXTENDED_CELL_SERIALIZATION_ENABLED_DEFULT
  See Also:
  
  Constant Field Values
- REMOTE_CLUSTER_CONF_PREFIX
  
  public static final String REMOTE_CLUSTER_CONF_PREFIX
  See Also:
  
  Constant Field Values
- REMOTE_CLUSTER_ZOOKEEPER_QUORUM_CONF_KEY
  
  public static final String REMOTE_CLUSTER_ZOOKEEPER_QUORUM_CONF_KEY
  See Also:
  
  Constant Field Values
- REMOTE_CLUSTER_ZOOKEEPER_CLIENT_PORT_CONF_KEY
  
  public static final String REMOTE_CLUSTER_ZOOKEEPER_CLIENT_PORT_CONF_KEY
  See Also:
  
  Constant Field Values
- REMOTE_CLUSTER_ZOOKEEPER_ZNODE_PARENT_CONF_KEY
  
  public static final String REMOTE_CLUSTER_ZOOKEEPER_ZNODE_PARENT_CONF_KEY
  See Also:
  
  Constant Field Values
- STORAGE_POLICY_PROPERTY
  
  public static final String STORAGE_POLICY_PROPERTY
  See Also:
  
  Constant Field Values
- STORAGE_POLICY_PROPERTY_CF_PREFIX
  
  public static final String STORAGE_POLICY_PROPERTY_CF_PREFIX
  See Also:
  
  Constant Field Values
- compressionDetails
  
  @Private static Function<ColumnFamilyDescriptor,String> compressionDetails
  
  Serialize column family to compression algorithm map to configuration. Invoked while configuring the MR job for incremental load.
- blockSizeDetails
  
  @Private static Function<ColumnFamilyDescriptor,String> blockSizeDetails
  
  Serialize column family to block size map to configuration. Invoked while configuring the MR job for incremental load.
- bloomTypeDetails
  
  @Private static Function<ColumnFamilyDescriptor,String> bloomTypeDetails
  
  Serialize column family to bloom type map to configuration. Invoked while configuring the MR job for incremental load.
- bloomParamDetails
  
  @Private static Function<ColumnFamilyDescriptor,String> bloomParamDetails
  
  Serialize column family to bloom param map to configuration. Invoked while configuring the MR job for incremental load.
- dataBlockEncodingDetails
  
  @Private static Function<ColumnFamilyDescriptor,String> dataBlockEncodingDetails
  
  Serialize column family to data block encoding map to configuration. Invoked while configuring the MR job for incremental load.
Constructor Details
- HFileOutputFormat2
  
  public HFileOutputFormat2()
Method Details
- combineTableNameSuffix
  
  protected static byte[] combineTableNameSuffix(byte[] tableName, byte[] suffix)
- getRecordWriter
  
  public org.apache.hadoop.mapreduce.RecordWriter<ImmutableBytesWritable,Cell> getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context) throws IOException, InterruptedException
  
  Specified by:
  
  getRecordWriter in class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<ImmutableBytesWritable,Cell>
  
  Throws:
  
  IOException
  
  InterruptedException
- getTableNameSuffixedWithFamily
  
  protected static byte[] getTableNameSuffixedWithFamily(byte[] tableName, byte[] family)
- createRecordWriter
  
  static <V extends Cell> org.apache.hadoop.mapreduce.RecordWriter<ImmutableBytesWritable,V> createRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context, org.apache.hadoop.mapreduce.OutputCommitter committer) throws IOException
  
  Throws:
  
  IOException
- configureStoragePolicy
  
  static void configureStoragePolicy(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.FileSystem fs, byte[] tableAndFamily, org.apache.hadoop.fs.Path cfPath)
  
  Configure block storage policy for CF after the directory is created.
- getRegionStartKeys
  
  private static List<ImmutableBytesWritable> getRegionStartKeys(List<RegionLocator> regionLocators, boolean writeMultipleTables) throws IOException
  
  Return the start keys of all of the regions in this table, as a list of ImmutableBytesWritable.
  
  Throws:
  
  IOException
- writePartitions
  
  private static void writePartitions(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path partitionsPath, List<ImmutableBytesWritable> startKeys, boolean writeMultipleTables) throws IOException
  
  Write out a SequenceFile that can be read by TotalOrderPartitioner that contains the split points in startKeys.
  
  Throws:
  
  IOException
- configureIncrementalLoad
  
  public static void configureIncrementalLoad(org.apache.hadoop.mapreduce.Job job, Table table, RegionLocator regionLocator) throws IOException
  Configure a MapReduce Job to perform an incremental load into the given table. This
  
  Inspects the table to configure a total order partitioner
  
  Uploads the partitions file to the cluster and adds it to the DistributedCache
  
  Sets the number of reduce tasks to match the current number of regions
  
  Sets the output key/value class to match HFileOutputFormat2's requirements
  
  Sets the reducer up to perform the appropriate sorting (either KeyValueSortReducer or PutSortReducer)
  
  Sets the HBase cluster key to load region locations for locality-sensitive
  
  The user should be sure to set the map output value class to either KeyValue or Put before running this function.
  Throws:
  
  IOException
- configureIncrementalLoad
  
  public static void configureIncrementalLoad(org.apache.hadoop.mapreduce.Job job, TableDescriptor tableDescriptor, RegionLocator regionLocator) throws IOException
  Configure a MapReduce Job to perform an incremental load into the given table. This
  
  Inspects the table to configure a total order partitioner
  
  Uploads the partitions file to the cluster and adds it to the DistributedCache
  
  Sets the number of reduce tasks to match the current number of regions
  
  Sets the output key/value class to match HFileOutputFormat2's requirements
  
  Sets the reducer up to perform the appropriate sorting (either KeyValueSortReducer or PutSortReducer)
  
  The user should be sure to set the map output value class to either KeyValue or Put before running this function.
  Throws:
  
  IOException
- configureIncrementalLoad
  
  static void configureIncrementalLoad(org.apache.hadoop.mapreduce.Job job, List<HFileOutputFormat2.TableInfo> multiTableInfo, Class<? extends org.apache.hadoop.mapreduce.OutputFormat<?,?>> cls) throws IOException
  
  Throws:
  
  IOException
- mergeSerializations
  
  private static void mergeSerializations(org.apache.hadoop.conf.Configuration conf)
- configureIncrementalLoadMap
  
  public static void configureIncrementalLoadMap(org.apache.hadoop.mapreduce.Job job, TableDescriptor tableDescriptor) throws IOException
  
  Throws:
  
  IOException
- configureRemoteCluster
  
  public static void configureRemoteCluster(org.apache.hadoop.mapreduce.Job job, org.apache.hadoop.conf.Configuration clusterConf)
  
  Configure HBase cluster key for remote cluster to load region location for locality-sensitive if it's enabled. It's not necessary to call this method explicitly when the cluster key for HBase cluster to be used to load region location is configured in the job configuration. Call this method when another HBase cluster key is configured in the job configuration. For example, you should call when you load data from HBase cluster A using TableInputFormat and generate hfiles for HBase cluster B. Otherwise, HFileOutputFormat2 fetch location from cluster A and locality-sensitive won't working correctly. configureIncrementalLoad(Job, Table, RegionLocator) calls this method using Table.getConfiguration() as clusterConf. See HBASE-25608.
  Parameters:
  
  job - which has configuration to be updated
  
  clusterConf - which contains cluster key of the HBase cluster to be locality-sensitive
  
  See Also:
  
  configureIncrementalLoad(Job, Table, RegionLocator)
  
  LOCALITY_SENSITIVE_CONF_KEY
  
  REMOTE_CLUSTER_ZOOKEEPER_QUORUM_CONF_KEY
  
  REMOTE_CLUSTER_ZOOKEEPER_CLIENT_PORT_CONF_KEY
  
  REMOTE_CLUSTER_ZOOKEEPER_ZNODE_PARENT_CONF_KEY
- createFamilyCompressionMap
  
  @Private static Map<byte[],Compression.Algorithm> createFamilyCompressionMap(org.apache.hadoop.conf.Configuration conf)
  
  Runs inside the task to deserialize column family to compression algorithm map from the configuration.
  
  Parameters:
  
  conf - to read the serialized values from
  
  Returns:
  
  a map from column family to the configured compression algorithm
- createFamilyBloomTypeMap
  
  @Private static Map<byte[],BloomType> createFamilyBloomTypeMap(org.apache.hadoop.conf.Configuration conf)
  
  Runs inside the task to deserialize column family to bloom filter type map from the configuration.
  
  Parameters:
  
  conf - to read the serialized values from
  
  Returns:
  
  a map from column family to the the configured bloom filter type
- createFamilyBloomParamMap
  
  @Private static Map<byte[],String> createFamilyBloomParamMap(org.apache.hadoop.conf.Configuration conf)
  
  Runs inside the task to deserialize column family to bloom filter param map from the configuration.
  
  Parameters:
  
  conf - to read the serialized values from
  
  Returns:
  
  a map from column family to the the configured bloom filter param
- createFamilyBlockSizeMap
  
  @Private static Map<byte[],Integer> createFamilyBlockSizeMap(org.apache.hadoop.conf.Configuration conf)
  
  Runs inside the task to deserialize column family to block size map from the configuration.
  
  Parameters:
  
  conf - to read the serialized values from
  
  Returns:
  
  a map from column family to the configured block size
- createFamilyDataBlockEncodingMap
  
  @Private static Map<byte[],DataBlockEncoding> createFamilyDataBlockEncodingMap(org.apache.hadoop.conf.Configuration conf)
  
  Runs inside the task to deserialize column family to data block encoding type map from the configuration.
  
  Parameters:
  
  conf - to read the serialized values from
  
  Returns:
  
  a map from column family to HFileDataBlockEncoder for the configured data block type for the family
- createFamilyConfValueMap
  
  private static Map<byte[],String> createFamilyConfValueMap(org.apache.hadoop.conf.Configuration conf, String confName)
  
  Run inside the task to deserialize column family to given conf value map.
  
  Parameters:
  
  conf - to read the serialized values from
  
  confName - conf key to read from the configuration
  
  Returns:
  
  a map of column family to the given configuration value
- configurePartitioner
  
  static void configurePartitioner(org.apache.hadoop.mapreduce.Job job, List<ImmutableBytesWritable> splitPoints, boolean writeMultipleTables) throws IOException
  
  Configure job with a TotalOrderPartitioner, partitioning against splitPoints. Cleans up the partitions file after job exists.
  
  Throws:
  
  IOException
- serializeColumnFamilyAttribute
  
  @Private static String serializeColumnFamilyAttribute(Function<ColumnFamilyDescriptor,String> fn, List<TableDescriptor> allTables) throws UnsupportedEncodingException
  
  Throws:
  
  UnsupportedEncodingException

Class HFileOutputFormat2

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat

Field Summary

Fields inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat

Constructor Summary

Method Summary

Methods inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat

Methods inherited from class java.lang.Object

Field Details

LOG

tableSeparator

COMPRESSION_FAMILIES_CONF_KEY

BLOOM_TYPE_FAMILIES_CONF_KEY

BLOOM_PARAM_FAMILIES_CONF_KEY

BLOCK_SIZE_FAMILIES_CONF_KEY

DATABLOCK_ENCODING_FAMILIES_CONF_KEY

DATABLOCK_ENCODING_OVERRIDE_CONF_KEY

COMPRESSION_OVERRIDE_CONF_KEY

LOCALITY_SENSITIVE_CONF_KEY

DEFAULT_LOCALITY_SENSITIVE

OUTPUT_TABLE_NAME_CONF_KEY

MULTI_TABLE_HFILEOUTPUTFORMAT_CONF_KEY

EXTENDED_CELL_SERIALIZATION_ENABLED_KEY

EXTENDED_CELL_SERIALIZATION_ENABLED_DEFULT

REMOTE_CLUSTER_CONF_PREFIX

REMOTE_CLUSTER_ZOOKEEPER_QUORUM_CONF_KEY

REMOTE_CLUSTER_ZOOKEEPER_CLIENT_PORT_CONF_KEY

REMOTE_CLUSTER_ZOOKEEPER_ZNODE_PARENT_CONF_KEY

STORAGE_POLICY_PROPERTY

STORAGE_POLICY_PROPERTY_CF_PREFIX

compressionDetails

blockSizeDetails

bloomTypeDetails

bloomParamDetails

dataBlockEncodingDetails

Constructor Details

HFileOutputFormat2

Method Details

combineTableNameSuffix

getRecordWriter

getTableNameSuffixedWithFamily

createRecordWriter

configureStoragePolicy

getRegionStartKeys

writePartitions

configureIncrementalLoad

configureIncrementalLoad

configureIncrementalLoad

mergeSerializations

configureIncrementalLoadMap

configureRemoteCluster

createFamilyCompressionMap

createFamilyBloomTypeMap

createFamilyBloomParamMap

createFamilyBlockSizeMap

createFamilyDataBlockEncodingMap

createFamilyConfValueMap

configurePartitioner

serializeColumnFamilyAttribute