HFileOutputFormat2 (Apache HBase 2.5.0 API)

java.lang.Object
- org.apache.hadoop.mapreduce.OutputFormat<K,V>
- - org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<ImmutableBytesWritable,Cell>
  - - org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2

Direct Known Subclasses:

MultiTableHFileOutputFormat
```
@InterfaceAudience.Public
public class HFileOutputFormat2
extends org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<ImmutableBytesWritable,Cell>
```
Writes HFiles. Passed Cells must arrive in order. Writes current time as the sequence id for the file. Sets the major compacted attribute on created HFiles. Calling write(null,null) will forcibly roll all HFiles being written.
Using this class as part of a MapReduce job is best done using configureIncrementalLoad(Job, TableDescriptor, RegionLocator).

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

(package private) static class HFileOutputFormat2.TableInfo

(package private) static class HFileOutputFormat2.WriterLength
- Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
  org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.Counter

Nested Classes
Modifier and Type	Class and Description
`(package private) static class`	`HFileOutputFormat2.TableInfo`
`(package private) static class`	`HFileOutputFormat2.WriterLength`

Field Summary

Fields
Modifier and Type	Field and Description
`(package private) static String`	`BLOCK_SIZE_FAMILIES_CONF_KEY`
`(package private) static Function<ColumnFamilyDescriptor,String>`	`blockSizeDetails` Serialize column family to block size map to configuration.
`(package private) static String`	`BLOOM_PARAM_FAMILIES_CONF_KEY`
`(package private) static String`	`BLOOM_TYPE_FAMILIES_CONF_KEY`
`(package private) static Function<ColumnFamilyDescriptor,String>`	`bloomParamDetails` Serialize column family to bloom param map to configuration.
`(package private) static Function<ColumnFamilyDescriptor,String>`	`bloomTypeDetails` Serialize column family to bloom type map to configuration.
`(package private) static String`	`COMPRESSION_FAMILIES_CONF_KEY`
`static String`	`COMPRESSION_OVERRIDE_CONF_KEY`
`(package private) static Function<ColumnFamilyDescriptor,String>`	`compressionDetails` Serialize column family to compression algorithm map to configuration.
`(package private) static String`	`DATABLOCK_ENCODING_FAMILIES_CONF_KEY`
`static String`	`DATABLOCK_ENCODING_OVERRIDE_CONF_KEY`
`(package private) static Function<ColumnFamilyDescriptor,String>`	`dataBlockEncodingDetails` Serialize column family to data block encoding map to configuration.
`private static boolean`	`DEFAULT_LOCALITY_SENSITIVE`
`static String`	`LOCALITY_SENSITIVE_CONF_KEY` Keep locality while generating HFiles for bulkload.
`private static org.slf4j.Logger`	`LOG`
`(package private) static String`	`MULTI_TABLE_HFILEOUTPUTFORMAT_CONF_KEY`
`(package private) static String`	`OUTPUT_TABLE_NAME_CONF_KEY`
`static String`	`REMOTE_CLUSTER_CONF_PREFIX`
`static String`	`REMOTE_CLUSTER_ZOOKEEPER_CLIENT_PORT_CONF_KEY`
`static String`	`REMOTE_CLUSTER_ZOOKEEPER_QUORUM_CONF_KEY`
`static String`	`REMOTE_CLUSTER_ZOOKEEPER_ZNODE_PARENT_CONF_KEY`
`static String`	`STORAGE_POLICY_PROPERTY`
`static String`	`STORAGE_POLICY_PROPERTY_CF_PREFIX`
`protected static byte[]`	`tableSeparator`

Fields inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
BASE_OUTPUT_NAME, COMPRESS, COMPRESS_CODEC, COMPRESS_TYPE, OUTDIR, PART

Constructor Summary

Constructors
Constructor and Description

HFileOutputFormat2()

Constructors
Constructor and Description
`HFileOutputFormat2()`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`protected static byte[]`	`combineTableNameSuffix(byte[] tableName, byte[] suffix)`
`(package private) static void`	`configureIncrementalLoad(org.apache.hadoop.mapreduce.Job job, List<HFileOutputFormat2.TableInfo> multiTableInfo, Class<? extends org.apache.hadoop.mapreduce.OutputFormat<?,?>> cls)`
`static void`	`configureIncrementalLoad(org.apache.hadoop.mapreduce.Job job, TableDescriptor tableDescriptor, RegionLocator regionLocator)` Configure a MapReduce Job to perform an incremental load into the given table.
`static void`	`configureIncrementalLoad(org.apache.hadoop.mapreduce.Job job, Table table, RegionLocator regionLocator)` Configure a MapReduce Job to perform an incremental load into the given table.
`static void`	`configureIncrementalLoadMap(org.apache.hadoop.mapreduce.Job job, TableDescriptor tableDescriptor)`
`(package private) static void`	`configurePartitioner(org.apache.hadoop.mapreduce.Job job, List<ImmutableBytesWritable> splitPoints, boolean writeMultipleTables)` Configure `job` with a TotalOrderPartitioner, partitioning against `splitPoints`.
`static void`	`configureRemoteCluster(org.apache.hadoop.mapreduce.Job job, org.apache.hadoop.conf.Configuration clusterConf)` Configure HBase cluster key for remote cluster to load region location for locality-sensitive if it's enabled.
`(package private) static void`	`configureStoragePolicy(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.FileSystem fs, byte[] tableAndFamily, org.apache.hadoop.fs.Path cfPath)` Configure block storage policy for CF after the directory is created.
`(package private) static Map<byte[],Integer>`	`createFamilyBlockSizeMap(org.apache.hadoop.conf.Configuration conf)` Runs inside the task to deserialize column family to block size map from the configuration.
`(package private) static Map<byte[],String>`	`createFamilyBloomParamMap(org.apache.hadoop.conf.Configuration conf)` Runs inside the task to deserialize column family to bloom filter param map from the configuration.
`(package private) static Map<byte[],BloomType>`	`createFamilyBloomTypeMap(org.apache.hadoop.conf.Configuration conf)` Runs inside the task to deserialize column family to bloom filter type map from the configuration.
`(package private) static Map<byte[],Compression.Algorithm>`	`createFamilyCompressionMap(org.apache.hadoop.conf.Configuration conf)` Runs inside the task to deserialize column family to compression algorithm map from the configuration.
`private static Map<byte[],String>`	`createFamilyConfValueMap(org.apache.hadoop.conf.Configuration conf, String confName)` Run inside the task to deserialize column family to given conf value map.
`(package private) static Map<byte[],DataBlockEncoding>`	`createFamilyDataBlockEncodingMap(org.apache.hadoop.conf.Configuration conf)` Runs inside the task to deserialize column family to data block encoding type map from the configuration.
`(package private) static <V extends Cell> org.apache.hadoop.mapreduce.RecordWriter<ImmutableBytesWritable,V>`	`createRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context, org.apache.hadoop.mapreduce.OutputCommitter committer)`
`org.apache.hadoop.mapreduce.RecordWriter<ImmutableBytesWritable,Cell>`	`getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context)`
`private static List<ImmutableBytesWritable>`	`getRegionStartKeys(List<RegionLocator> regionLocators, boolean writeMultipleTables)` Return the start keys of all of the regions in this table, as a list of ImmutableBytesWritable.
`protected static byte[]`	`getTableNameSuffixedWithFamily(byte[] tableName, byte[] family)`
`(package private) static String`	`serializeColumnFamilyAttribute(Function<ColumnFamilyDescriptor,String> fn, List<TableDescriptor> allTables)`
`private static void`	`writePartitions(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path partitionsPath, List<ImmutableBytesWritable> startKeys, boolean writeMultipleTables)` Write out a `SequenceFile` that can be read by `TotalOrderPartitioner` that contains the split points in startKeys.

Methods inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
checkOutputSpecs, getCompressOutput, getDefaultWorkFile, getOutputCommitter, getOutputCompressorClass, getOutputName, getOutputPath, getPathForWorkFile, getUniqueFile, getWorkOutputPath, setCompressOutput, setOutputCompressorClass, setOutputName, setOutputPath

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail
- LOG
```
private static final org.slf4j.Logger LOG
```
- tableSeparator
```
protected static final byte[] tableSeparator
```
- COMPRESSION_FAMILIES_CONF_KEY
```
static final String COMPRESSION_FAMILIES_CONF_KEY
```
  See Also:
  
  Constant Field Values
- BLOOM_TYPE_FAMILIES_CONF_KEY
```
static final String BLOOM_TYPE_FAMILIES_CONF_KEY
```
  See Also:
  
  Constant Field Values
- BLOOM_PARAM_FAMILIES_CONF_KEY
```
static final String BLOOM_PARAM_FAMILIES_CONF_KEY
```
  See Also:
  
  Constant Field Values
- BLOCK_SIZE_FAMILIES_CONF_KEY
```
static final String BLOCK_SIZE_FAMILIES_CONF_KEY
```
  See Also:
  
  Constant Field Values
- DATABLOCK_ENCODING_FAMILIES_CONF_KEY
```
static final String DATABLOCK_ENCODING_FAMILIES_CONF_KEY
```
  See Also:
  
  Constant Field Values
- DATABLOCK_ENCODING_OVERRIDE_CONF_KEY
```
public static final String DATABLOCK_ENCODING_OVERRIDE_CONF_KEY
```
  See Also:
  
  Constant Field Values
- COMPRESSION_OVERRIDE_CONF_KEY
```
public static final String COMPRESSION_OVERRIDE_CONF_KEY
```
  See Also:
  
  Constant Field Values
- LOCALITY_SENSITIVE_CONF_KEY
```
public static final String LOCALITY_SENSITIVE_CONF_KEY
```
  Keep locality while generating HFiles for bulkload. See HBASE-12596
  
  See Also:
  
  Constant Field Values
- DEFAULT_LOCALITY_SENSITIVE
```
private static final boolean DEFAULT_LOCALITY_SENSITIVE
```
  See Also:
  
  Constant Field Values
- OUTPUT_TABLE_NAME_CONF_KEY
```
static final String OUTPUT_TABLE_NAME_CONF_KEY
```
  See Also:
  
  Constant Field Values
- MULTI_TABLE_HFILEOUTPUTFORMAT_CONF_KEY
```
static final String MULTI_TABLE_HFILEOUTPUTFORMAT_CONF_KEY
```
  See Also:
  
  Constant Field Values
- REMOTE_CLUSTER_CONF_PREFIX
```
public static final String REMOTE_CLUSTER_CONF_PREFIX
```
  See Also:
  
  Constant Field Values
- REMOTE_CLUSTER_ZOOKEEPER_QUORUM_CONF_KEY
```
public static final String REMOTE_CLUSTER_ZOOKEEPER_QUORUM_CONF_KEY
```
  See Also:
  
  Constant Field Values
- REMOTE_CLUSTER_ZOOKEEPER_CLIENT_PORT_CONF_KEY
```
public static final String REMOTE_CLUSTER_ZOOKEEPER_CLIENT_PORT_CONF_KEY
```
  See Also:
  
  Constant Field Values
- REMOTE_CLUSTER_ZOOKEEPER_ZNODE_PARENT_CONF_KEY
```
public static final String REMOTE_CLUSTER_ZOOKEEPER_ZNODE_PARENT_CONF_KEY
```
  See Also:
  
  Constant Field Values
- STORAGE_POLICY_PROPERTY
```
public static final String STORAGE_POLICY_PROPERTY
```
  See Also:
  
  Constant Field Values
- STORAGE_POLICY_PROPERTY_CF_PREFIX
```
public static final String STORAGE_POLICY_PROPERTY_CF_PREFIX
```
  See Also:
  
  Constant Field Values
- compressionDetails
```
@InterfaceAudience.Private
static Function<ColumnFamilyDescriptor,String> compressionDetails
```
  Serialize column family to compression algorithm map to configuration. Invoked while configuring the MR job for incremental load.
- blockSizeDetails
```
@InterfaceAudience.Private
static Function<ColumnFamilyDescriptor,String> blockSizeDetails
```
  Serialize column family to block size map to configuration. Invoked while configuring the MR job for incremental load.
- bloomTypeDetails
```
@InterfaceAudience.Private
static Function<ColumnFamilyDescriptor,String> bloomTypeDetails
```
  Serialize column family to bloom type map to configuration. Invoked while configuring the MR job for incremental load.
- bloomParamDetails
```
@InterfaceAudience.Private
static Function<ColumnFamilyDescriptor,String> bloomParamDetails
```
  Serialize column family to bloom param map to configuration. Invoked while configuring the MR job for incremental load.
- dataBlockEncodingDetails
```
@InterfaceAudience.Private
static Function<ColumnFamilyDescriptor,String> dataBlockEncodingDetails
```
  Serialize column family to data block encoding map to configuration. Invoked while configuring the MR job for incremental load.

Constructor Detail
- HFileOutputFormat2
```
public HFileOutputFormat2()
```

Method Detail

combineTableNameSuffix

protected static byte[] combineTableNameSuffix(byte[] tableName,
                                               byte[] suffix)

getRecordWriter

public org.apache.hadoop.mapreduce.RecordWriter<ImmutableBytesWritable,Cell> getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context)
                                                                                      throws IOException,
                                                                                             InterruptedException

Specified by:: getRecordWriter in class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<ImmutableBytesWritable,Cell>
Throws:: IOException; InterruptedException

getTableNameSuffixedWithFamily

protected static byte[] getTableNameSuffixedWithFamily(byte[] tableName,
                                                       byte[] family)

createRecordWriter

static <V extends Cell> org.apache.hadoop.mapreduce.RecordWriter<ImmutableBytesWritable,V> createRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context,
                                                                                                              org.apache.hadoop.mapreduce.OutputCommitter committer)
                                                                                                       throws IOException

Throws:: IOException

configureStoragePolicy

static void configureStoragePolicy(org.apache.hadoop.conf.Configuration conf,
                                   org.apache.hadoop.fs.FileSystem fs,
                                   byte[] tableAndFamily,
                                   org.apache.hadoop.fs.Path cfPath)

Configure block storage policy for CF after the directory is created.

getRegionStartKeys

private static List<ImmutableBytesWritable> getRegionStartKeys(List<RegionLocator> regionLocators,
                                                               boolean writeMultipleTables)
                                                        throws IOException

Return the start keys of all of the regions in this table, as a list of ImmutableBytesWritable.

Throws:: IOException

writePartitions

private static void writePartitions(org.apache.hadoop.conf.Configuration conf,
                                    org.apache.hadoop.fs.Path partitionsPath,
                                    List<ImmutableBytesWritable> startKeys,
                                    boolean writeMultipleTables)
                             throws IOException

Write out a SequenceFile that can be read by TotalOrderPartitioner that contains the split points in startKeys.

Throws:: IOException

configureIncrementalLoad
```
public static void configureIncrementalLoad(org.apache.hadoop.mapreduce.Job job,
                                            Table table,
                                            RegionLocator regionLocator)
                                     throws IOException
```
Configure a MapReduce Job to perform an incremental load into the given table. This
- Inspects the table to configure a total order partitioner
- Uploads the partitions file to the cluster and adds it to the DistributedCache
- Sets the number of reduce tasks to match the current number of regions
- Sets the output key/value class to match HFileOutputFormat2's requirements
- Sets the reducer up to perform the appropriate sorting (either KeyValueSortReducer or PutSortReducer)
- Sets the HBase cluster key to load region locations for locality-sensitive
The user should be sure to set the map output value class to either KeyValue or Put before running this function.
Throws:

IOException

configureIncrementalLoad
```
public static void configureIncrementalLoad(org.apache.hadoop.mapreduce.Job job,
                                            TableDescriptor tableDescriptor,
                                            RegionLocator regionLocator)
                                     throws IOException
```
Configure a MapReduce Job to perform an incremental load into the given table. This
- Inspects the table to configure a total order partitioner
- Uploads the partitions file to the cluster and adds it to the DistributedCache
- Sets the number of reduce tasks to match the current number of regions
- Sets the output key/value class to match HFileOutputFormat2's requirements
- Sets the reducer up to perform the appropriate sorting (either KeyValueSortReducer or PutSortReducer)
The user should be sure to set the map output value class to either KeyValue or Put before running this function.
Throws:

IOException

configureIncrementalLoad

static void configureIncrementalLoad(org.apache.hadoop.mapreduce.Job job,
                                     List<HFileOutputFormat2.TableInfo> multiTableInfo,
                                     Class<? extends org.apache.hadoop.mapreduce.OutputFormat<?,?>> cls)
                              throws IOException

Throws:: IOException

configureIncrementalLoadMap

public static void configureIncrementalLoadMap(org.apache.hadoop.mapreduce.Job job,
                                               TableDescriptor tableDescriptor)
                                        throws IOException

Throws:: IOException

configureRemoteCluster
```
public static void configureRemoteCluster(org.apache.hadoop.mapreduce.Job job,
                                          org.apache.hadoop.conf.Configuration clusterConf)
```
Configure HBase cluster key for remote cluster to load region location for locality-sensitive if it's enabled. It's not necessary to call this method explicitly when the cluster key for HBase cluster to be used to load region location is configured in the job configuration. Call this method when another HBase cluster key is configured in the job configuration. For example, you should call when you load data from HBase cluster A using TableInputFormat and generate hfiles for HBase cluster B. Otherwise, HFileOutputFormat2 fetch location from cluster A and locality-sensitive won't working correctly. configureIncrementalLoad(Job, Table, RegionLocator) calls this method using Table.getConfiguration() as clusterConf. See HBASE-25608.

Parameters:

job - which has configuration to be updated

clusterConf - which contains cluster key of the HBase cluster to be locality-sensitive

See Also:

configureIncrementalLoad(Job, Table, RegionLocator), LOCALITY_SENSITIVE_CONF_KEY, REMOTE_CLUSTER_ZOOKEEPER_QUORUM_CONF_KEY, REMOTE_CLUSTER_ZOOKEEPER_CLIENT_PORT_CONF_KEY, REMOTE_CLUSTER_ZOOKEEPER_ZNODE_PARENT_CONF_KEY

createFamilyCompressionMap
```
@InterfaceAudience.Private
static Map<byte[],Compression.Algorithm> createFamilyCompressionMap(org.apache.hadoop.conf.Configuration conf)
```
Runs inside the task to deserialize column family to compression algorithm map from the configuration.

Parameters:

conf - to read the serialized values from

Returns:

a map from column family to the configured compression algorithm

createFamilyBloomTypeMap
```
@InterfaceAudience.Private
static Map<byte[],BloomType> createFamilyBloomTypeMap(org.apache.hadoop.conf.Configuration conf)
```
Runs inside the task to deserialize column family to bloom filter type map from the configuration.

Parameters:

conf - to read the serialized values from

Returns:

a map from column family to the the configured bloom filter type

createFamilyBloomParamMap
```
@InterfaceAudience.Private
static Map<byte[],String> createFamilyBloomParamMap(org.apache.hadoop.conf.Configuration conf)
```
Runs inside the task to deserialize column family to bloom filter param map from the configuration.

Parameters:

conf - to read the serialized values from

Returns:

a map from column family to the the configured bloom filter param

createFamilyBlockSizeMap
```
@InterfaceAudience.Private
static Map<byte[],Integer> createFamilyBlockSizeMap(org.apache.hadoop.conf.Configuration conf)
```
Runs inside the task to deserialize column family to block size map from the configuration.

Parameters:

conf - to read the serialized values from

Returns:

a map from column family to the configured block size

createFamilyDataBlockEncodingMap
```
@InterfaceAudience.Private
static Map<byte[],DataBlockEncoding> createFamilyDataBlockEncodingMap(org.apache.hadoop.conf.Configuration conf)
```
Runs inside the task to deserialize column family to data block encoding type map from the configuration.

Parameters:

conf - to read the serialized values from

Returns:

a map from column family to HFileDataBlockEncoder for the configured data block type for the family

createFamilyConfValueMap
```
private static Map<byte[],String> createFamilyConfValueMap(org.apache.hadoop.conf.Configuration conf,
                                                           String confName)
```
Run inside the task to deserialize column family to given conf value map.

Parameters:

conf - to read the serialized values from

confName - conf key to read from the configuration

Returns:

a map of column family to the given configuration value

configurePartitioner

static void configurePartitioner(org.apache.hadoop.mapreduce.Job job,
                                 List<ImmutableBytesWritable> splitPoints,
                                 boolean writeMultipleTables)
                          throws IOException

Configure job with a TotalOrderPartitioner, partitioning against splitPoints. Cleans up the partitions file after job exists.

Throws:: IOException

serializeColumnFamilyAttribute

@InterfaceAudience.Private
static String serializeColumnFamilyAttribute(Function<ColumnFamilyDescriptor,String> fn,
                                                                        List<TableDescriptor> allTables)
                                                                 throws UnsupportedEncodingException

Throws:: UnsupportedEncodingException

Class HFileOutputFormat2

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat

Field Summary

Fields inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat

Constructor Summary

Method Summary

Methods inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat

Methods inherited from class java.lang.Object

Field Detail

LOG

tableSeparator

COMPRESSION_FAMILIES_CONF_KEY

BLOOM_TYPE_FAMILIES_CONF_KEY

BLOOM_PARAM_FAMILIES_CONF_KEY

BLOCK_SIZE_FAMILIES_CONF_KEY

DATABLOCK_ENCODING_FAMILIES_CONF_KEY

DATABLOCK_ENCODING_OVERRIDE_CONF_KEY

COMPRESSION_OVERRIDE_CONF_KEY

LOCALITY_SENSITIVE_CONF_KEY

DEFAULT_LOCALITY_SENSITIVE

OUTPUT_TABLE_NAME_CONF_KEY

MULTI_TABLE_HFILEOUTPUTFORMAT_CONF_KEY

REMOTE_CLUSTER_CONF_PREFIX

REMOTE_CLUSTER_ZOOKEEPER_QUORUM_CONF_KEY

REMOTE_CLUSTER_ZOOKEEPER_CLIENT_PORT_CONF_KEY

REMOTE_CLUSTER_ZOOKEEPER_ZNODE_PARENT_CONF_KEY

STORAGE_POLICY_PROPERTY

STORAGE_POLICY_PROPERTY_CF_PREFIX

compressionDetails

blockSizeDetails

bloomTypeDetails

bloomParamDetails

dataBlockEncodingDetails

Constructor Detail

HFileOutputFormat2

Method Detail

combineTableNameSuffix

getRecordWriter

getTableNameSuffixedWithFamily

createRecordWriter

configureStoragePolicy

getRegionStartKeys

writePartitions

configureIncrementalLoad

configureIncrementalLoad

configureIncrementalLoad

configureIncrementalLoadMap

configureRemoteCluster

createFamilyCompressionMap

createFamilyBloomTypeMap

createFamilyBloomParamMap

createFamilyBlockSizeMap

createFamilyDataBlockEncodingMap

createFamilyConfValueMap

configurePartitioner

serializeColumnFamilyAttribute