Class HFileOutputFormat2
java.lang.Object
org.apache.hadoop.mapreduce.OutputFormat<K,V>
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<ImmutableBytesWritable,Cell>
org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2
- Direct Known Subclasses:
MultiTableHFileOutputFormat
@Public
public class HFileOutputFormat2
extends org.apache.hadoop.mapreduce.lib.output.FileOutputFormat<ImmutableBytesWritable,Cell>
Writes HFiles. Passed Cells must arrive in order. Writes current time as the sequence id for the
file. Sets the major compacted attribute on created
HFiles. Calling write(null,null) will
forcibly roll all HFiles being written.
Using this class as part of a MapReduce job is best done using
configureIncrementalLoad(Job, TableDescriptor, RegionLocator).
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescription(package private) static class(package private) static classNested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.Counter -
Field Summary
FieldsModifier and TypeFieldDescription(package private) static final String(package private) static Function<ColumnFamilyDescriptor,String> Serialize column family to block size map to configuration.(package private) static final String(package private) static final String(package private) static Function<ColumnFamilyDescriptor,String> Serialize column family to bloom param map to configuration.(package private) static Function<ColumnFamilyDescriptor,String> Serialize column family to bloom type map to configuration.(package private) static final Stringstatic final String(package private) static Function<ColumnFamilyDescriptor,String> Serialize column family to compression algorithm map to configuration.(package private) static final Stringstatic final String(package private) static Function<ColumnFamilyDescriptor,String> Serialize column family to data block encoding map to configuration.private static final booleanprivate static final booleanstatic final String(package private) static final booleanstatic final StringExtendedCell and ExtendedCellSerialization are InterfaceAudience.Private.static final StringKeep locality while generating HFiles for bulkload.private static final org.slf4j.Logger(package private) static final String(package private) static final Stringstatic final Stringstatic final Stringstatic final Stringstatic final Stringstatic final Stringstatic final Stringprotected static final byte[]Fields inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
BASE_OUTPUT_NAME, COMPRESS, COMPRESS_CODEC, COMPRESS_TYPE, OUTDIR, PART -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected static byte[]combineTableNameSuffix(byte[] tableName, byte[] suffix) (package private) static voidconfigureIncrementalLoad(org.apache.hadoop.mapreduce.Job job, List<HFileOutputFormat2.TableInfo> multiTableInfo, Class<? extends org.apache.hadoop.mapreduce.OutputFormat<?, ?>> cls) static voidconfigureIncrementalLoad(org.apache.hadoop.mapreduce.Job job, TableDescriptor tableDescriptor, RegionLocator regionLocator) Configure a MapReduce Job to perform an incremental load into the given table.static voidconfigureIncrementalLoad(org.apache.hadoop.mapreduce.Job job, Table table, RegionLocator regionLocator) Configure a MapReduce Job to perform an incremental load into the given table.static voidconfigureIncrementalLoadMap(org.apache.hadoop.mapreduce.Job job, TableDescriptor tableDescriptor) (package private) static voidconfigurePartitioner(org.apache.hadoop.mapreduce.Job job, List<ImmutableBytesWritable> splitPoints, boolean writeMultipleTables) Configurejobwith a TotalOrderPartitioner, partitioning againstsplitPoints.static voidconfigureRemoteCluster(org.apache.hadoop.mapreduce.Job job, org.apache.hadoop.conf.Configuration clusterConf) Configure HBase cluster key for remote cluster to load region location for locality-sensitive if it's enabled.(package private) static voidconfigureStoragePolicy(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.FileSystem fs, byte[] tableAndFamily, org.apache.hadoop.fs.Path cfPath) Configure block storage policy for CF after the directory is created.createFamilyBlockSizeMap(org.apache.hadoop.conf.Configuration conf) Runs inside the task to deserialize column family to block size map from the configuration.createFamilyBloomParamMap(org.apache.hadoop.conf.Configuration conf) Runs inside the task to deserialize column family to bloom filter param map from the configuration.createFamilyBloomTypeMap(org.apache.hadoop.conf.Configuration conf) Runs inside the task to deserialize column family to bloom filter type map from the configuration.(package private) static Map<byte[],Compression.Algorithm> createFamilyCompressionMap(org.apache.hadoop.conf.Configuration conf) Runs inside the task to deserialize column family to compression algorithm map from the configuration.createFamilyConfValueMap(org.apache.hadoop.conf.Configuration conf, String confName) Run inside the task to deserialize column family to given conf value map.(package private) static Map<byte[],DataBlockEncoding> createFamilyDataBlockEncodingMap(org.apache.hadoop.conf.Configuration conf) Runs inside the task to deserialize column family to data block encoding type map from the configuration.(package private) static <V extends Cell>
org.apache.hadoop.mapreduce.RecordWriter<ImmutableBytesWritable,V> createRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context, org.apache.hadoop.mapreduce.OutputCommitter committer) static booleandiskBasedSortingEnabled(org.apache.hadoop.conf.Configuration conf) org.apache.hadoop.mapreduce.RecordWriter<ImmutableBytesWritable,Cell> getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context) private static List<ImmutableBytesWritable>getRegionStartKeys(List<RegionLocator> regionLocators, boolean writeMultipleTables) Return the start keys of all of the regions in this table, as a list of ImmutableBytesWritable.protected static byte[]getTableNameSuffixedWithFamily(byte[] tableName, byte[] family) private static voidmergeSerializations(org.apache.hadoop.conf.Configuration conf) (package private) static StringserializeColumnFamilyAttribute(Function<ColumnFamilyDescriptor, String> fn, List<TableDescriptor> allTables) private static voidwritePartitions(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path partitionsPath, List<ImmutableBytesWritable> startKeys, boolean writeMultipleTables) Write out aSequenceFilethat can be read byTotalOrderPartitionerthat contains the split points in startKeys.Methods inherited from class org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
checkOutputSpecs, getCompressOutput, getDefaultWorkFile, getOutputCommitter, getOutputCompressorClass, getOutputName, getOutputPath, getPathForWorkFile, getUniqueFile, getWorkOutputPath, setCompressOutput, setOutputCompressorClass, setOutputName, setOutputPath
-
Field Details
-
LOG
-
tableSeparator
-
COMPRESSION_FAMILIES_CONF_KEY
- See Also:
-
BLOOM_TYPE_FAMILIES_CONF_KEY
- See Also:
-
BLOOM_PARAM_FAMILIES_CONF_KEY
- See Also:
-
BLOCK_SIZE_FAMILIES_CONF_KEY
- See Also:
-
DATABLOCK_ENCODING_FAMILIES_CONF_KEY
- See Also:
-
DATABLOCK_ENCODING_OVERRIDE_CONF_KEY
- See Also:
-
COMPRESSION_OVERRIDE_CONF_KEY
- See Also:
-
LOCALITY_SENSITIVE_CONF_KEY
Keep locality while generating HFiles for bulkload. See HBASE-12596- See Also:
-
DEFAULT_LOCALITY_SENSITIVE
- See Also:
-
OUTPUT_TABLE_NAME_CONF_KEY
- See Also:
-
MULTI_TABLE_HFILEOUTPUTFORMAT_CONF_KEY
- See Also:
-
EXTENDED_CELL_SERIALIZATION_ENABLED_KEY
ExtendedCell and ExtendedCellSerialization are InterfaceAudience.Private. We expose this config for internal usage in jobs like WALPlayer which need to use features of ExtendedCell.- See Also:
-
EXTENDED_CELL_SERIALIZATION_ENABLED_DEFULT
- See Also:
-
DISK_BASED_SORTING_ENABLED_KEY
- See Also:
-
DISK_BASED_SORTING_ENABLED_DEFAULT
- See Also:
-
REMOTE_CLUSTER_CONF_PREFIX
- See Also:
-
REMOTE_CLUSTER_ZOOKEEPER_QUORUM_CONF_KEY
- See Also:
-
REMOTE_CLUSTER_ZOOKEEPER_CLIENT_PORT_CONF_KEY
- See Also:
-
REMOTE_CLUSTER_ZOOKEEPER_ZNODE_PARENT_CONF_KEY
- See Also:
-
STORAGE_POLICY_PROPERTY
- See Also:
-
STORAGE_POLICY_PROPERTY_CF_PREFIX
- See Also:
-
compressionDetails
Serialize column family to compression algorithm map to configuration. Invoked while configuring the MR job for incremental load. -
blockSizeDetails
Serialize column family to block size map to configuration. Invoked while configuring the MR job for incremental load. -
bloomTypeDetails
Serialize column family to bloom type map to configuration. Invoked while configuring the MR job for incremental load. -
bloomParamDetails
Serialize column family to bloom param map to configuration. Invoked while configuring the MR job for incremental load. -
dataBlockEncodingDetails
Serialize column family to data block encoding map to configuration. Invoked while configuring the MR job for incremental load.
-
-
Constructor Details
-
HFileOutputFormat2
public HFileOutputFormat2()
-
-
Method Details
-
combineTableNameSuffix
-
getRecordWriter
public org.apache.hadoop.mapreduce.RecordWriter<ImmutableBytesWritable,Cell> getRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context) throws IOException, InterruptedException - Specified by:
getRecordWriterin classorg.apache.hadoop.mapreduce.lib.output.FileOutputFormat<ImmutableBytesWritable,Cell> - Throws:
IOExceptionInterruptedException
-
getTableNameSuffixedWithFamily
-
createRecordWriter
static <V extends Cell> org.apache.hadoop.mapreduce.RecordWriter<ImmutableBytesWritable,V> createRecordWriter(org.apache.hadoop.mapreduce.TaskAttemptContext context, org.apache.hadoop.mapreduce.OutputCommitter committer) throws IOException - Throws:
IOException
-
configureStoragePolicy
static void configureStoragePolicy(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.FileSystem fs, byte[] tableAndFamily, org.apache.hadoop.fs.Path cfPath) Configure block storage policy for CF after the directory is created. -
getRegionStartKeys
private static List<ImmutableBytesWritable> getRegionStartKeys(List<RegionLocator> regionLocators, boolean writeMultipleTables) throws IOException Return the start keys of all of the regions in this table, as a list of ImmutableBytesWritable.- Throws:
IOException
-
writePartitions
private static void writePartitions(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path partitionsPath, List<ImmutableBytesWritable> startKeys, boolean writeMultipleTables) throws IOException Write out aSequenceFilethat can be read byTotalOrderPartitionerthat contains the split points in startKeys.- Throws:
IOException
-
configureIncrementalLoad
public static void configureIncrementalLoad(org.apache.hadoop.mapreduce.Job job, Table table, RegionLocator regionLocator) throws IOException Configure a MapReduce Job to perform an incremental load into the given table. This- Inspects the table to configure a total order partitioner
- Uploads the partitions file to the cluster and adds it to the DistributedCache
- Sets the number of reduce tasks to match the current number of regions
- Sets the output key/value class to match HFileOutputFormat2's requirements
- Sets the reducer up to perform the appropriate sorting (either KeyValueSortReducer or PutSortReducer)
- Sets the HBase cluster key to load region locations for locality-sensitive
- Throws:
IOException
-
configureIncrementalLoad
public static void configureIncrementalLoad(org.apache.hadoop.mapreduce.Job job, TableDescriptor tableDescriptor, RegionLocator regionLocator) throws IOException Configure a MapReduce Job to perform an incremental load into the given table. This- Inspects the table to configure a total order partitioner
- Uploads the partitions file to the cluster and adds it to the DistributedCache
- Sets the number of reduce tasks to match the current number of regions
- Sets the output key/value class to match HFileOutputFormat2's requirements
- Sets the reducer up to perform the appropriate sorting (either KeyValueSortReducer or PutSortReducer)
- Throws:
IOException
-
diskBasedSortingEnabled
-
configureIncrementalLoad
static void configureIncrementalLoad(org.apache.hadoop.mapreduce.Job job, List<HFileOutputFormat2.TableInfo> multiTableInfo, Class<? extends org.apache.hadoop.mapreduce.OutputFormat<?, ?>> cls) throws IOException- Throws:
IOException
-
mergeSerializations
-
configureIncrementalLoadMap
public static void configureIncrementalLoadMap(org.apache.hadoop.mapreduce.Job job, TableDescriptor tableDescriptor) throws IOException - Throws:
IOException
-
configureRemoteCluster
public static void configureRemoteCluster(org.apache.hadoop.mapreduce.Job job, org.apache.hadoop.conf.Configuration clusterConf) throws IOException Configure HBase cluster key for remote cluster to load region location for locality-sensitive if it's enabled. It's not necessary to call this method explicitly when the cluster key for HBase cluster to be used to load region location is configured in the job configuration. Call this method when another HBase cluster key is configured in the job configuration. For example, you should call when you load data from HBase cluster A usingTableInputFormatand generate hfiles for HBase cluster B. Otherwise, HFileOutputFormat2 fetch location from cluster A and locality-sensitive won't working correctly.configureIncrementalLoad(Job, Table, RegionLocator)calls this method usingTable.getConfiguration()as clusterConf. See HBASE-25608.- Parameters:
job- which has configuration to be updatedclusterConf- which contains cluster key of the HBase cluster to be locality-sensitive- Throws:
IOException- See Also:
-
createFamilyCompressionMap
@Private static Map<byte[],Compression.Algorithm> createFamilyCompressionMap(org.apache.hadoop.conf.Configuration conf) Runs inside the task to deserialize column family to compression algorithm map from the configuration.- Parameters:
conf- to read the serialized values from- Returns:
- a map from column family to the configured compression algorithm
-
createFamilyBloomTypeMap
@Private static Map<byte[],BloomType> createFamilyBloomTypeMap(org.apache.hadoop.conf.Configuration conf) Runs inside the task to deserialize column family to bloom filter type map from the configuration.- Parameters:
conf- to read the serialized values from- Returns:
- a map from column family to the the configured bloom filter type
-
createFamilyBloomParamMap
@Private static Map<byte[],String> createFamilyBloomParamMap(org.apache.hadoop.conf.Configuration conf) Runs inside the task to deserialize column family to bloom filter param map from the configuration.- Parameters:
conf- to read the serialized values from- Returns:
- a map from column family to the the configured bloom filter param
-
createFamilyBlockSizeMap
@Private static Map<byte[],Integer> createFamilyBlockSizeMap(org.apache.hadoop.conf.Configuration conf) Runs inside the task to deserialize column family to block size map from the configuration.- Parameters:
conf- to read the serialized values from- Returns:
- a map from column family to the configured block size
-
createFamilyDataBlockEncodingMap
@Private static Map<byte[],DataBlockEncoding> createFamilyDataBlockEncodingMap(org.apache.hadoop.conf.Configuration conf) Runs inside the task to deserialize column family to data block encoding type map from the configuration.- Parameters:
conf- to read the serialized values from- Returns:
- a map from column family to HFileDataBlockEncoder for the configured data block type for the family
-
createFamilyConfValueMap
private static Map<byte[],String> createFamilyConfValueMap(org.apache.hadoop.conf.Configuration conf, String confName) Run inside the task to deserialize column family to given conf value map.- Parameters:
conf- to read the serialized values fromconfName- conf key to read from the configuration- Returns:
- a map of column family to the given configuration value
-
configurePartitioner
static void configurePartitioner(org.apache.hadoop.mapreduce.Job job, List<ImmutableBytesWritable> splitPoints, boolean writeMultipleTables) throws IOException Configurejobwith a TotalOrderPartitioner, partitioning againstsplitPoints. Cleans up the partitions file after job exists.- Throws:
IOException
-
serializeColumnFamilyAttribute
@Private static String serializeColumnFamilyAttribute(Function<ColumnFamilyDescriptor, String> fn, List<TableDescriptor> allTables) throws UnsupportedEncodingException- Throws:
UnsupportedEncodingException
-