Package org.apache.hadoop.hbase.io.hfile
Class HFileWriterImpl
java.lang.Object
org.apache.hadoop.hbase.io.hfile.HFileWriterImpl
- All Implemented Interfaces:
Closeable,AutoCloseable,HFile.Writer,CellSink,ShipperListener
Common functionality needed by all versions of
HFile writers.-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate List<HFileBlock.BlockWritable>Additional data items to be written to the "load-on-open" section.protected final HFileDataBlockEncoderThe data block encoding which will be used.protected HFileBlock.Writerblock writerprotected final CacheConfigCache configuration for caching data on write.protected final booleanTrue if we opened theoutputStream(and so will close it).protected final org.apache.hadoop.conf.Configurationprivate HFileBlockIndex.BlockIndexWriterprivate longprivate final intBlock size limit after encoding, used to unify encoded block Cache entry sizeprotected longTotal # of key/value entries, i.e.protected HFileInfoA "file info" block: a key-value map of file-wide metadata.protected ExtendedCellFirst cell in a block.private longThe offset of the first data block or -1 if the file is empty.protected final HFileContextprotected final HFileIndexBlockEncoderprivate List<InlineBlockWriter>Inline block writers for multi-level block index and compound Blooms.static final intVersion for KeyValue which includes memstore timestampstatic final byte[]KeyValue version in FileInfoprotected byte[]Key of the biggest cell.protected ExtendedCellThe Cell previously appended.private ExtendedCellThe last(stop) Cell of the previous data block.protected longThe offset of the last data block or 0 if the file is empty.protected longLen of the biggest cell.private static final org.slf4j.Loggerprotected longprivate intprivate HFileBlockIndex.BlockIndexWriterprotected List<org.apache.hadoop.io.Writable>Writables representing meta block data.protected List<byte[]>Meta block names.protected final StringName for this object used when logging or in toString.protected org.apache.hadoop.fs.FSDataOutputStreamFileSystem stream to write into.protected final org.apache.hadoop.fs.PathMay be null if we were passed a stream.private final TimeRangeTrackerprivate Supplier<TimeRangeTracker>protected longUsed for calculating the average key length.protected longTotal uncompressed bytes, maybe calculate a compression ratio later.protected longUsed for calculating the average value length.static final Stringif this feature is enabled, preCalculate encoded data size before real encoding happensprivate static final longFields inherited from interface org.apache.hadoop.hbase.io.hfile.HFile.Writer
MAX_MEMSTORE_TS_KEY -
Constructor Summary
ConstructorsConstructorDescriptionHFileWriterImpl(org.apache.hadoop.conf.Configuration conf, CacheConfig cacheConf, org.apache.hadoop.fs.Path path, org.apache.hadoop.fs.FSDataOutputStream outputStream, HFileContext fileContext) -
Method Summary
Modifier and TypeMethodDescriptionprivate voidaddBloomFilter(BloomFilterWriter bfw, BlockType blockType) voidStore delete family Bloom filter in the file, which is only supported in HFile V2.voidStore general Bloom filter in the file.voidAdds an inline block writer such as a multi-level block index writer or a compound Bloom filter writer.voidappend(ExtendedCell cell) Add key/value to file.voidappendCustomCellTimestampsToMetadata(TimeRangeTracker timeRangeTracker) Add Custom cell timestamp to MetadatavoidappendFileInfo(byte[] k, byte[] v) Add to the file info.voidappendMetaBlock(String metaBlockName, org.apache.hadoop.io.Writable content) Add a meta block to the end of the file.voidAdd TimestampRange and earliest put timestamp to MetadatavoidThe action that needs to be performed beforeShipper.shipped()is performedprivate BlockCacheKeybuildCacheBlockKey(long offset, BlockType blockType) protected voidAt a block boundary, write all the inline blocks and opens new block.protected booleanChecks that the given Cell's key does not violate the key order.protected voidcheckValue(byte[] value, int offset, int length) Checks the given value for validity.voidclose()static Compression.AlgorithmcompressionByName(String algoName) protected static org.apache.hadoop.fs.FSDataOutputStreamcreateOutputStream(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, InetSocketAddress[] favoredNodes) A helper method to create HFile output streams in constructorsprivate voiddoCacheOnWrite(long offset) Caches the last written HFile block.private voidClean up the data block that is currently being written.protected voidfinishClose(FixedFileTrailer trailer) protected voidprotected voidfinishInit(org.apache.hadoop.conf.Configuration conf) Additional initialization stepsReturn the file context for the HFile this writer belongs toprivate StringgetLexicalErrorMessage(Cell cell) protected intstatic ExtendedCellgetMidpoint(CellComparator comparator, ExtendedCell left, ExtendedCell right) Try to return a Cell that falls betweenleftandrightbut that is shorter; i.e.private static byte[]getMinimumMidpointArray(byte[] leftArray, int leftOffset, int leftLength, byte[] rightArray, int rightOffset, int rightLength) Try to get a byte array that falls between left and right as short as possible with lexicographical order;private static byte[]getMinimumMidpointArray(ByteBuffer left, int leftOffset, int leftLength, ByteBuffer right, int rightOffset, int rightLength) Try to create a new byte array that falls between left and right as short as possible with lexicographical order.protected intorg.apache.hadoop.fs.PathgetPath()Returns Path or null if we were passed a stream rather than a Path.longgetPos()protected voidnewBlock()Ready a new block for writing.voidsetTimeRangeTrackerForTiering(Supplier<TimeRangeTracker> timeRangeTrackerForTiering) private booleanshouldCacheBlock(BlockCache cache, BlockCacheKey key) toString()private voidtrackTimestamps(ExtendedCell cell) Record the earliest Put timestamp.protected final voidwriteFileInfo(FixedFileTrailer trailer, DataOutputStream out) Sets the file info offset in the trailer, finishes up populating fields in the file info, and writes the file info into the given data output.private voidwriteInlineBlocks(boolean closing) Gives inline block writers an opportunity to contribute blocks.
-
Field Details
-
LOG
-
UNSET
- See Also:
-
UNIFIED_ENCODED_BLOCKSIZE_RATIO
if this feature is enabled, preCalculate encoded data size before real encoding happens- See Also:
-
encodedBlockSizeLimit
Block size limit after encoding, used to unify encoded block Cache entry size -
lastCell
The Cell previously appended. Becomes the last cell in the file. -
outputStream
FileSystem stream to write into. -
closeOutputStream
True if we opened theoutputStream(and so will close it). -
fileInfo
A "file info" block: a key-value map of file-wide metadata. -
entryCount
Total # of key/value entries, i.e. how many times add() was called. -
totalKeyLength
Used for calculating the average key length. -
totalValueLength
Used for calculating the average value length. -
lenOfBiggestCell
Len of the biggest cell. -
keyOfBiggestCell
Key of the biggest cell. -
totalUncompressedBytes
Total uncompressed bytes, maybe calculate a compression ratio later. -
metaNames
Meta block names. -
metaData
Writables representing meta block data. -
firstCellInBlock
First cell in a block. This reference should be short-lived since we write hfiles in a burst. -
path
May be null if we were passed a stream. -
conf
-
cacheConf
Cache configuration for caching data on write. -
timeRangeTrackerForTiering
-
name
Name for this object used when logging or in toString. Is either the result of a toString on stream or else name of passed file Path. -
blockEncoder
The data block encoding which will be used.NoOpDataBlockEncoder.INSTANCEif there is no encoding. -
indexBlockEncoder
-
hFileContext
-
maxTagsLength
-
KEY_VALUE_VERSION
KeyValue version in FileInfo -
KEY_VALUE_VER_WITH_MEMSTORE
Version for KeyValue which includes memstore timestamp- See Also:
-
inlineBlockWriters
Inline block writers for multi-level block index and compound Blooms. -
blockWriter
block writer -
dataBlockIndexWriter
-
metaBlockIndexWriter
-
firstDataBlockOffset
The offset of the first data block or -1 if the file is empty. -
lastDataBlockOffset
The offset of the last data block or 0 if the file is empty. -
lastCellOfPreviousBlock
The last(stop) Cell of the previous data block. This reference should be short-lived since we write hfiles in a burst. -
additionalLoadOnOpenData
Additional data items to be written to the "load-on-open" section. -
maxMemstoreTS
-
timeRangeTracker
-
earliestPutTs
-
-
Constructor Details
-
HFileWriterImpl
public HFileWriterImpl(org.apache.hadoop.conf.Configuration conf, CacheConfig cacheConf, org.apache.hadoop.fs.Path path, org.apache.hadoop.fs.FSDataOutputStream outputStream, HFileContext fileContext)
-
-
Method Details
-
setTimeRangeTrackerForTiering
-
appendFileInfo
Add to the file info. All added key/value pairs can be obtained usingHFile.Reader.getHFileInfo().- Specified by:
appendFileInfoin interfaceHFile.Writer- Parameters:
k- Keyv- Value- Throws:
IOException- in case the key or the value are invalid
-
writeFileInfo
protected final void writeFileInfo(FixedFileTrailer trailer, DataOutputStream out) throws IOException Sets the file info offset in the trailer, finishes up populating fields in the file info, and writes the file info into the given data output. The reason the data output is not alwaysoutputStreamis that we store file info as a block in version 2.- Parameters:
trailer- fixed file trailerout- the data output to write the file info to- Throws:
IOException
-
getPos
- Throws:
IOException
-
checkKey
Checks that the given Cell's key does not violate the key order.- Parameters:
cell- Cell whose key to check.- Returns:
- true if the key is duplicate
- Throws:
IOException- if the key or the key order is wrong
-
getLexicalErrorMessage
-
checkValue
Checks the given value for validity.- Throws:
IOException
-
getPath
Returns Path or null if we were passed a stream rather than a Path.- Specified by:
getPathin interfaceHFile.Writer
-
toString
-
compressionByName
-
createOutputStream
protected static org.apache.hadoop.fs.FSDataOutputStream createOutputStream(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, InetSocketAddress[] favoredNodes) throws IOException A helper method to create HFile output streams in constructors- Throws:
IOException
-
finishInit
Additional initialization steps -
checkBlockBoundary
At a block boundary, write all the inline blocks and opens new block.- Throws:
IOException
-
finishBlock
Clean up the data block that is currently being written.- Throws:
IOException
-
getMidpoint
public static ExtendedCell getMidpoint(CellComparator comparator, ExtendedCell left, ExtendedCell right) Try to return a Cell that falls betweenleftandrightbut that is shorter; i.e. takes up less space. This trick is used building HFile block index. Its an optimization. It does not always work. In this case we'll just return therightcell.- Returns:
- A cell that sorts between
leftandright.
-
getMinimumMidpointArray
private static byte[] getMinimumMidpointArray(byte[] leftArray, int leftOffset, int leftLength, byte[] rightArray, int rightOffset, int rightLength) Try to get a byte array that falls between left and right as short as possible with lexicographical order;- Returns:
- Return a new array that is between left and right and minimally sized else just return null if left == right.
-
getMinimumMidpointArray
private static byte[] getMinimumMidpointArray(ByteBuffer left, int leftOffset, int leftLength, ByteBuffer right, int rightOffset, int rightLength) Try to create a new byte array that falls between left and right as short as possible with lexicographical order.- Returns:
- Return a new array that is between left and right and minimally sized else just return null if left == right.
-
writeInlineBlocks
Gives inline block writers an opportunity to contribute blocks.- Throws:
IOException
-
doCacheOnWrite
Caches the last written HFile block.- Parameters:
offset- the offset of the block we want to cache. Used to determine the cache key.
-
buildCacheBlockKey
-
shouldCacheBlock
-
newBlock
Ready a new block for writing.- Throws:
IOException
-
appendMetaBlock
Add a meta block to the end of the file. Call before close(). Metadata blocks are expensive. Fill one with a bunch of serialized data rather than do a metadata block per metadata instance. If metadata is small, consider adding to file info usingappendFileInfo(byte[], byte[])name of the block will call readFields to get data later (DO NOT REUSE)- Specified by:
appendMetaBlockin interfaceHFile.Writer
-
close
- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Throws:
IOException
-
addInlineBlockWriter
Description copied from interface:HFile.WriterAdds an inline block writer such as a multi-level block index writer or a compound Bloom filter writer.- Specified by:
addInlineBlockWriterin interfaceHFile.Writer
-
addGeneralBloomFilter
Description copied from interface:HFile.WriterStore general Bloom filter in the file. This does not deal with Bloom filter internals but is necessary, since Bloom filters are stored differently in HFile version 1 and version 2.- Specified by:
addGeneralBloomFilterin interfaceHFile.Writer
-
addDeleteFamilyBloomFilter
Description copied from interface:HFile.WriterStore delete family Bloom filter in the file, which is only supported in HFile V2.- Specified by:
addDeleteFamilyBloomFilterin interfaceHFile.Writer
-
addBloomFilter
-
getFileContext
Description copied from interface:HFile.WriterReturn the file context for the HFile this writer belongs to- Specified by:
getFileContextin interfaceHFile.Writer
-
append
Add key/value to file. Keys must be added in an order that agrees with the Comparator passed on construction. Cell to add. Cannot be empty nor null.- Specified by:
appendin interfaceCellSink- Parameters:
cell- the cell to be added- Throws:
IOException
-
beforeShipped
Description copied from interface:ShipperListenerThe action that needs to be performed beforeShipper.shipped()is performed- Specified by:
beforeShippedin interfaceShipperListener- Throws:
IOException
-
getLastCell
-
finishFileInfo
- Throws:
IOException
-
getMajorVersion
-
getMinorVersion
-
finishClose
- Throws:
IOException
-
appendTrackedTimestampsToMetadata
Add TimestampRange and earliest put timestamp to Metadata- Specified by:
appendTrackedTimestampsToMetadatain interfaceHFile.Writer- Throws:
IOException
-
appendCustomCellTimestampsToMetadata
public void appendCustomCellTimestampsToMetadata(TimeRangeTracker timeRangeTracker) throws IOException Description copied from interface:HFile.WriterAdd Custom cell timestamp to Metadata- Specified by:
appendCustomCellTimestampsToMetadatain interfaceHFile.Writer- Throws:
IOException
-
trackTimestamps
Record the earliest Put timestamp. If the timeRangeTracker is not set, update TimeRangeTracker to include the timestamp of this key
-