HFileOutputFormat2 (Apache HBase 1.2.12 API)

java.lang.Object
- org.apache.hadoop.mapreduce.OutputFormat<K,V>
- - org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<ImmutableBytesWritable,Cell>
  - - org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2

```
@InterfaceAudience.Public
@InterfaceStability.Evolving
public class HFileOutputFormat2
extends org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<ImmutableBytesWritable,Cell>
```
Writes HFiles. Passed Cells must arrive in order. Writes current time as the sequence id for the file. Sets the major compacted attribute on created @{link HFiles. Calling write(null,null) will forcibly roll all HFiles being written.
Using this class as part of a MapReduce job is best done using configureIncrementalLoad(Job, Table, RegionLocator).

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

(package private) static class HFileOutputFormat2.WriterLength
- Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
  org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.Counter

Nested Classes
Modifier and Type	Class and Description
`(package private) static class`	`HFileOutputFormat2.WriterLength`

Field Summary

Fields
Modifier and Type	Field and Description
`private static String`	`BLOCK_SIZE_FAMILIES_CONF_KEY`
`private static String`	`BLOOM_TYPE_FAMILIES_CONF_KEY`
`private static String`	`COMPRESSION_FAMILIES_CONF_KEY`
`private static String`	`DATABLOCK_ENCODING_FAMILIES_CONF_KEY`
`static String`	`DATABLOCK_ENCODING_OVERRIDE_CONF_KEY`
`private static org.apache.commons.logging.Log`	`LOG`

Fields inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
BASE_OUTPUT_NAME, COMPRESS, COMPRESS_CODEC, COMPRESS_TYPE, OUTDIR, PART

Constructor Summary

Constructors
Constructor and Description

HFileOutputFormat2()

Constructors
Constructor and Description
`HFileOutputFormat2()`

Method Summary

Methods
Modifier and Type	Method and Description
`(package private) static void`	`configureBlockSize(HTableDescriptor tableDescriptor, org.apache.hadoop.conf.Configuration conf)` Serialize column family to block size map to configuration.
`(package private) static void`	`configureBloomType(HTableDescriptor tableDescriptor, org.apache.hadoop.conf.Configuration conf)` Serialize column family to bloom type map to configuration.
`(package private) static void`	`configureCompression(org.apache.hadoop.conf.Configuration conf, HTableDescriptor tableDescriptor)` Serialize column family to compression algorithm map to configuration.
`(package private) static void`	`configureDataBlockEncoding(HTableDescriptor tableDescriptor, org.apache.hadoop.conf.Configuration conf)` Serialize column family to data block encoding map to configuration.
`static void`	`configureIncrementalLoad(org.apache.hadoop.mapreduce.Job job, HTable table)` Deprecated. Use `configureIncrementalLoad(Job, Table, RegionLocator)` instead.
`static void`	`configureIncrementalLoad(org.apache.hadoop.mapreduce.Job job, HTableDescriptor tableDescriptor, RegionLocator regionLocator)` Configure a MapReduce Job to perform an incremental load into the given table.
`(package private) static void`	`configureIncrementalLoad(org.apache.hadoop.mapreduce.Job job, HTableDescriptor tableDescriptor, RegionLocator regionLocator, Class<? extends org.apache.hadoop.mapreduce.OutputFormat<?,?>> cls)`
`static void`	`configureIncrementalLoad(org.apache.hadoop.mapreduce.Job job, Table table, RegionLocator regionLocator)` Configure a MapReduce Job to perform an incremental load into the given table.
`static void`	`configureIncrementalLoadMap(org.apache.hadoop.mapreduce.Job job, Table table)`
`(package private) static void`	`configurePartitioner(org.apache.hadoop.mapreduce.Job job, List<ImmutableBytesWritable> splitPoints)` Configure `job` with a TotalOrderPartitioner, partitioning against `splitPoints`.
`(package private) static Map<byte[],Integer>`	`createFamilyBlockSizeMap(org.apache.hadoop.conf.Configuration conf)` Runs inside the task to deserialize column family to block size map from the configuration.
`(package private) static Map<byte[],BloomType>`	`createFamilyBloomTypeMap(org.apache.hadoop.conf.Configuration conf)` Runs inside the task to deserialize column family to bloom filter type map from the configuration.
`(package private) static Map<byte[],Compression.Algorithm>`	`createFamilyCompressionMap(org.apache.hadoop.conf.Configuration conf)` Runs inside the task to deserialize column family to compression algorithm map from the configuration.
`private static Map<byte[],String>`	`createFamilyConfValueMap(org.apache.hadoop.conf.Configuration conf, String confName)` Run inside the task to deserialize column family to given conf value map.
`(package private) static Map<byte[],DataBlockEncoding>`	`createFamilyDataBlockEncodingMap(org.apache.hadoop.conf.Configuration conf)` Runs inside the task to deserialize column family to data block encoding type map from the configuration.
`(package private) static <V extends Cell> org.apache.hadoop.mapreduce.RecordWriter<ImmutableBytesWritable,V>`	`createRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context, org.apache.hadoop.mapreduce.OutputCommitter committer)`
`org.apache.hadoop.mapreduce.RecordWriter<ImmutableBytesWritable,Cell>`	`getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context)`
`private static List<ImmutableBytesWritable>`	`getRegionStartKeys(RegionLocator table)` Return the start keys of all of the regions in this table, as a list of ImmutableBytesWritable.
`private static void`	`writePartitions(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path partitionsPath, List<ImmutableBytesWritable> startKeys)` Write out a `SequenceFile` that can be read by `TotalOrderPartitioner` that contains the split points in startKeys.

Methods inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
checkOutputSpecs, getCompressOutput, getDefaultWorkFile, getOutputCommitter, getOutputCompressorClass, getOutputName, getOutputPath, getPathForWorkFile, getUniqueFile, getWorkOutputPath, setCompressOutput, setOutputCompressorClass, setOutputName, setOutputPath

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - LOG
```
private static final org.apache.commons.logging.Log LOG
```
  - COMPRESSION_FAMILIES_CONF_KEY
```
private static final String COMPRESSION_FAMILIES_CONF_KEY
```
    See Also:
    Constant Field Values
  - BLOOM_TYPE_FAMILIES_CONF_KEY
```
private static final String BLOOM_TYPE_FAMILIES_CONF_KEY
```
    See Also:
    Constant Field Values
  - BLOCK_SIZE_FAMILIES_CONF_KEY
```
private static final String BLOCK_SIZE_FAMILIES_CONF_KEY
```
    See Also:
    Constant Field Values
  - DATABLOCK_ENCODING_FAMILIES_CONF_KEY
```
private static final String DATABLOCK_ENCODING_FAMILIES_CONF_KEY
```
    See Also:
    Constant Field Values
  - DATABLOCK_ENCODING_OVERRIDE_CONF_KEY
```
public static final String DATABLOCK_ENCODING_OVERRIDE_CONF_KEY
```
    See Also:
    Constant Field Values
- Constructor Detail
  - HFileOutputFormat2
```
public HFileOutputFormat2()
```
- Method Detail
  - getRecordWriter
```
public org.apache.hadoop.mapreduce.RecordWriter<ImmutableBytesWritable,Cell> getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context)
                                                                                      throws IOException,
                                                                                             InterruptedException
```
    Specified by:
    
    getRecordWriter in class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<ImmutableBytesWritable,Cell>
    
    Throws:
    
    IOException
    
    InterruptedException
  - createRecordWriter
```
static <V extends Cell> org.apache.hadoop.mapreduce.RecordWriter<ImmutableBytesWritable,V> createRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context,
                                                                                                     org.apache.hadoop.mapreduce.OutputCommitter committer)
                                                                                                   throws IOException
```
    Throws:
    
    IOException
  - getRegionStartKeys
```
private static List<ImmutableBytesWritable> getRegionStartKeys(RegionLocator table)
                                                        throws IOException
```
    Return the start keys of all of the regions in this table, as a list of ImmutableBytesWritable.
    
    Throws:
    
    IOException
  - writePartitions
```
private static void writePartitions(org.apache.hadoop.conf.Configuration conf,
                   org.apache.hadoop.fs.Path partitionsPath,
                   List<ImmutableBytesWritable> startKeys)
                             throws IOException
```
    Write out a SequenceFile that can be read by TotalOrderPartitioner that contains the split points in startKeys.
    
    Throws:
    
    IOException
  - configureIncrementalLoad
```
@Deprecated
public static void configureIncrementalLoad(org.apache.hadoop.mapreduce.Job job,
                                       HTable table)
                                     throws IOException
```
    Deprecated. Use configureIncrementalLoad(Job, Table, RegionLocator) instead.
    Configure a MapReduce Job to perform an incremental load into the given table. This
    - Inspects the table to configure a total order partitioner
    - Uploads the partitions file to the cluster and adds it to the DistributedCache
    - Sets the number of reduce tasks to match the current number of regions
    - Sets the output key/value class to match HFileOutputFormat2's requirements
    - Sets the reducer up to perform the appropriate sorting (either KeyValueSortReducer or PutSortReducer)
    The user should be sure to set the map output value class to either KeyValue or Put before running this function.
    Throws:
    
    IOException
  - configureIncrementalLoad
```
public static void configureIncrementalLoad(org.apache.hadoop.mapreduce.Job job,
                            Table table,
                            RegionLocator regionLocator)
                                     throws IOException
```
    Configure a MapReduce Job to perform an incremental load into the given table. This
    - Inspects the table to configure a total order partitioner
    - Uploads the partitions file to the cluster and adds it to the DistributedCache
    - Sets the number of reduce tasks to match the current number of regions
    - Sets the output key/value class to match HFileOutputFormat2's requirements
    - Sets the reducer up to perform the appropriate sorting (either KeyValueSortReducer or PutSortReducer)
    The user should be sure to set the map output value class to either KeyValue or Put before running this function.
    Throws:
    
    IOException
  - configureIncrementalLoad
```
public static void configureIncrementalLoad(org.apache.hadoop.mapreduce.Job job,
                            HTableDescriptor tableDescriptor,
                            RegionLocator regionLocator)
                                     throws IOException
```
    Configure a MapReduce Job to perform an incremental load into the given table. This
    - Inspects the table to configure a total order partitioner
    - Uploads the partitions file to the cluster and adds it to the DistributedCache
    - Sets the number of reduce tasks to match the current number of regions
    - Sets the output key/value class to match HFileOutputFormat2's requirements
    - Sets the reducer up to perform the appropriate sorting (either KeyValueSortReducer or PutSortReducer)
    The user should be sure to set the map output value class to either KeyValue or Put before running this function.
    Throws:
    
    IOException
  - configureIncrementalLoad
```
static void configureIncrementalLoad(org.apache.hadoop.mapreduce.Job job,
                            HTableDescriptor tableDescriptor,
                            RegionLocator regionLocator,
                            Class<? extends org.apache.hadoop.mapreduce.OutputFormat<?,?>> cls)
                              throws IOException,
                                     UnsupportedEncodingException
```
    Throws:
    
    IOException
    
    UnsupportedEncodingException
  - configureIncrementalLoadMap
```
public static void configureIncrementalLoadMap(org.apache.hadoop.mapreduce.Job job,
                               Table table)
                                        throws IOException
```
    Throws:
    
    IOException
  - createFamilyCompressionMap
```
static Map<byte[],Compression.Algorithm> createFamilyCompressionMap(org.apache.hadoop.conf.Configuration conf)
```
    Runs inside the task to deserialize column family to compression algorithm map from the configuration.
    
    Parameters:
    conf - to read the serialized values from
    
    Returns:
    a map from column family to the configured compression algorithm
  - createFamilyBloomTypeMap
```
static Map<byte[],BloomType> createFamilyBloomTypeMap(org.apache.hadoop.conf.Configuration conf)
```
    Runs inside the task to deserialize column family to bloom filter type map from the configuration.
    
    Parameters:
    conf - to read the serialized values from
    
    Returns:
    a map from column family to the the configured bloom filter type
  - createFamilyBlockSizeMap
```
static Map<byte[],Integer> createFamilyBlockSizeMap(org.apache.hadoop.conf.Configuration conf)
```
    Runs inside the task to deserialize column family to block size map from the configuration.
    
    Parameters:
    conf - to read the serialized values from
    
    Returns:
    a map from column family to the configured block size
  - createFamilyDataBlockEncodingMap
```
static Map<byte[],DataBlockEncoding> createFamilyDataBlockEncodingMap(org.apache.hadoop.conf.Configuration conf)
```
    Runs inside the task to deserialize column family to data block encoding type map from the configuration.
    
    Parameters:
    conf - to read the serialized values from
    
    Returns:
    a map from column family to HFileDataBlockEncoder for the configured data block type for the family
  - createFamilyConfValueMap
```
private static Map<byte[],String> createFamilyConfValueMap(org.apache.hadoop.conf.Configuration conf,
                                          String confName)
```
    Run inside the task to deserialize column family to given conf value map.
    
    Parameters:
    conf - to read the serialized values from
    confName - conf key to read from the configuration
    
    Returns:
    a map of column family to the given configuration value
  - configurePartitioner
```
static void configurePartitioner(org.apache.hadoop.mapreduce.Job job,
                        List<ImmutableBytesWritable> splitPoints)
                          throws IOException
```
    Configure job with a TotalOrderPartitioner, partitioning against splitPoints. Cleans up the partitions file after job exists.
    
    Throws:
    
    IOException
  - configureCompression
```
static void configureCompression(org.apache.hadoop.conf.Configuration conf,
                        HTableDescriptor tableDescriptor)
                          throws UnsupportedEncodingException
```
    Serialize column family to compression algorithm map to configuration. Invoked while configuring the MR job for incremental load.
    
    Parameters:
    table - to read the properties from
    conf - to persist serialized values into
    
    Throws:
    
    IOException - on failure to read column family descriptors
    
    UnsupportedEncodingException
  - configureBlockSize
```
static void configureBlockSize(HTableDescriptor tableDescriptor,
                      org.apache.hadoop.conf.Configuration conf)
                        throws UnsupportedEncodingException
```
    Serialize column family to block size map to configuration. Invoked while configuring the MR job for incremental load.
    
    Parameters:
    tableDescriptor - to read the properties from
    conf - to persist serialized values into
    
    Throws:
    
    IOException - on failure to read column family descriptors
    
    UnsupportedEncodingException
  - configureBloomType
```
static void configureBloomType(HTableDescriptor tableDescriptor,
                      org.apache.hadoop.conf.Configuration conf)
                        throws UnsupportedEncodingException
```
    Serialize column family to bloom type map to configuration. Invoked while configuring the MR job for incremental load.
    
    Parameters:
    tableDescriptor - to read the properties from
    conf - to persist serialized values into
    
    Throws:
    
    IOException - on failure to read column family descriptors
    
    UnsupportedEncodingException
  - configureDataBlockEncoding
```
static void configureDataBlockEncoding(HTableDescriptor tableDescriptor,
                              org.apache.hadoop.conf.Configuration conf)
                                throws UnsupportedEncodingException
```
    Serialize column family to data block encoding map to configuration. Invoked while configuring the MR job for incremental load.
    
    Parameters:
    table - to read the properties from
    conf - to persist serialized values into
    
    Throws:
    
    IOException - on failure to read column family descriptors
    
    UnsupportedEncodingException

Class HFileOutputFormat2

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat

Field Summary

Fields inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat

Constructor Summary

Method Summary

Methods inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat

Methods inherited from class java.lang.Object

Field Detail

LOG

COMPRESSION_FAMILIES_CONF_KEY

BLOOM_TYPE_FAMILIES_CONF_KEY

BLOCK_SIZE_FAMILIES_CONF_KEY

DATABLOCK_ENCODING_FAMILIES_CONF_KEY

DATABLOCK_ENCODING_OVERRIDE_CONF_KEY

Constructor Detail

HFileOutputFormat2

Method Detail

getRecordWriter

createRecordWriter

getRegionStartKeys

writePartitions

configureIncrementalLoad

configureIncrementalLoad

configureIncrementalLoad

configureIncrementalLoad

configureIncrementalLoadMap

createFamilyCompressionMap

createFamilyBloomTypeMap

createFamilyBlockSizeMap

createFamilyDataBlockEncodingMap

createFamilyConfValueMap

configurePartitioner

configureCompression

configureBlockSize

configureBloomType

configureDataBlockEncoding