Package org.apache.hadoop.hbase.io.hfile
Class HFileBlock.Writer
java.lang.Object
org.apache.hadoop.hbase.io.hfile.HFileBlock.Writer
- All Implemented Interfaces:
ShipperListener
- Enclosing class:
- HFileBlock
Unified version 2
HFile
block writer. The intended usage pattern is as follows:
- Construct an
HFileBlock.Writer
, providing a compression algorithm. - Call
startWriting(org.apache.hadoop.hbase.io.hfile.BlockType)
and get a data stream to write to. - Write your data into the stream.
- Call Writer#writeHeaderAndData(FSDataOutputStream) as many times as you need to. store the serialized block into an external stream.
- Repeat to write more blocks.
-
Nested Class Summary
-
Field Summary
Modifier and TypeFieldDescriptionprivate final ByteBuffAllocator
private ByteArrayOutputStream
The stream we use to accumulate data into a block in an uncompressed format.private BlockType
Current block type.private BlockCompressedSizePredicator
private final HFileDataBlockEncoder
Data block encoder used for data blocksprivate HFileBlockEncodingContext
private HFileBlockDefaultEncodingContext
block encoding context for non-data blocksprivate HFileContext
Meta data that holds information about the hfileblockprivate int
private ByteArrayOutputStream
Bytes to be written to the file system, including the header.private byte[]
The size of the checksum data on disk.private long
The offset of the previous block of the same typeprivate long[]
Offset of previous block by block type.private long
Current block's start offset in theHFile
.private HFileBlock.Writer.State
Writer state.private DataOutputStream
A stream that we write uncompressed bytes to, which compresses them and writes them tobaosInMemory
. -
Constructor Summary
ConstructorDescriptionWriter
(org.apache.hadoop.conf.Configuration conf, HFileDataBlockEncoder dataBlockEncoder, HFileContext fileContext) Writer
(org.apache.hadoop.conf.Configuration conf, HFileDataBlockEncoder dataBlockEncoder, HFileContext fileContext, ByteBuffAllocator allocator, int maxSizeUnCompressed) -
Method Summary
Modifier and TypeMethodDescriptionvoid
The action that needs to be performed beforeShipper.shipped()
is performedint
Returns the number of bytes written into the current block so far, or zero if not writing the block at the moment.boolean
private ByteBuff
Clones the header followed by the on-disk (compressed/encoded/encrypted) data.(package private) ByteBuff
Clones the header followed by the uncompressed data, even if using compression.int
Returns the number of bytes written into the current block so far, or zero if not writing the block at the moment.(package private) void
Transitions the block writer from the "writing" state to the "block ready" state.private void
expectState
(HFileBlock.Writer.State expectedState) private void
Finish up writing of the block.protected void
Writes the header and the compressed data of this block (or uncompressed data when not using compression) into the given stream.(package private) HFileBlock
getBlockForCaching
(CacheConfig cacheConf) Creates a new HFileBlock.(package private) EncodingState
(package private) byte[]
Returns the header or the compressed data (or uncompressed data when not using compression) as a byte array.(package private) int
Returns the on-disk size of the block.(package private) int
Returns the on-disk size of the data portion of the block.int
The uncompressed size of the block data, including header size.(package private) int
The uncompressed size of the block data.(package private) boolean
Returns true if a block is being writtenprivate void
putHeader
(byte[] dest, int offset, int onDiskSize, int uncompressedSize, int onDiskDataSize) Put the header into the given byte array at the given offset.private void
putHeader
(ByteArrayOutputStream dest, int onDiskSize, int uncompressedSize, int onDiskDataSize) private void
(package private) void
release()
Releases resources used by this writer.(package private) DataOutputStream
startWriting
(BlockType newBlockType) Starts writing into the block.(package private) void
write
(ExtendedCell cell) Writes the Cell to this block(package private) void
writeBlock
(HFileBlock.BlockWritable bw, org.apache.hadoop.fs.FSDataOutputStream out) Takes the givenHFileBlock.BlockWritable
instance, creates a new block of its appropriate type, writes the writable into this block, and flushes the block into the output stream.(package private) void
writeHeaderAndData
(org.apache.hadoop.fs.FSDataOutputStream out) Similar towriteHeaderAndData(FSDataOutputStream)
, but records the offset of this block so that it can be referenced in the next block of the same type.
-
Field Details
-
maxSizeUnCompressed
-
compressedSizePredicator
-
state
Writer state. Used to ensure the correct usage protocol. -
dataBlockEncoder
Data block encoder used for data blocks -
dataBlockEncodingCtx
-
defaultBlockEncodingCtx
block encoding context for non-data blocks -
baosInMemory
The stream we use to accumulate data into a block in an uncompressed format. We reset this stream at the end of each block and reuse it. The header is written as the firstHConstants.HFILEBLOCK_HEADER_SIZE
bytes into this stream. -
blockType
Current block type. Set instartWriting(BlockType)
. Could be changed infinishBlock()
fromBlockType.DATA
toBlockType.ENCODED_DATA
. -
userDataStream
A stream that we write uncompressed bytes to, which compresses them and writes them tobaosInMemory
. -
onDiskBlockBytesWithHeader
Bytes to be written to the file system, including the header. Compressed if compression is turned on. It also includes the checksum data that immediately follows the block data. (header + data + checksums) -
onDiskChecksum
The size of the checksum data on disk. It is used only if data is not compressed. If data is compressed, then the checksums are already part of onDiskBytesWithHeader. If data is uncompressed, then this variable stores the checksum data for this block. -
startOffset
Current block's start offset in theHFile
. Set inwriteHeaderAndData(FSDataOutputStream)
. -
prevOffsetByType
Offset of previous block by block type. Updated when the next block is started. -
prevOffset
The offset of the previous block of the same type -
fileContext
Meta data that holds information about the hfileblock -
allocator
-
-
Constructor Details
-
Writer
public Writer(org.apache.hadoop.conf.Configuration conf, HFileDataBlockEncoder dataBlockEncoder, HFileContext fileContext) - Parameters:
dataBlockEncoder
- data block encoding algorithm to use
-
Writer
public Writer(org.apache.hadoop.conf.Configuration conf, HFileDataBlockEncoder dataBlockEncoder, HFileContext fileContext, ByteBuffAllocator allocator, int maxSizeUnCompressed)
-
-
Method Details
-
beforeShipped
Description copied from interface:ShipperListener
The action that needs to be performed beforeShipper.shipped()
is performed- Specified by:
beforeShipped
in interfaceShipperListener
-
getEncodingState
-
startWriting
Starts writing into the block. The previous block's data is discarded.- Returns:
- the stream the user can write their data into
- Throws:
IOException
-
write
Writes the Cell to this block- Throws:
IOException
-
ensureBlockReady
Transitions the block writer from the "writing" state to the "block ready" state. Does nothing if a block is already finished.- Throws:
IOException
-
checkBoundariesWithPredicate
-
finishBlock
Finish up writing of the block. Flushes the compressing stream (if using compression), fills out the header, does any compression/encryption of bytes to flush out to disk, and manages the cache on write content, if applicable. Sets block write state to "block ready".- Throws:
IOException
-
putHeader
private void putHeader(byte[] dest, int offset, int onDiskSize, int uncompressedSize, int onDiskDataSize) Put the header into the given byte array at the given offset.- Parameters:
onDiskSize
- size of the block on disk header + data + checksumuncompressedSize
- size of the block after decompression (but before optional data block decoding) including headeronDiskDataSize
- size of the block on disk with header and data but not including the checksums
-
putHeader
-
putHeader
private void putHeader(ByteArrayOutputStream dest, int onDiskSize, int uncompressedSize, int onDiskDataSize) -
writeHeaderAndData
Similar towriteHeaderAndData(FSDataOutputStream)
, but records the offset of this block so that it can be referenced in the next block of the same type.- Throws:
IOException
-
finishBlockAndWriteHeaderAndData
Writes the header and the compressed data of this block (or uncompressed data when not using compression) into the given stream. Can be called in the "writing" state or in the "block ready" state. If called in the "writing" state, transitions the writer to the "block ready" state.- Parameters:
out
- the output stream to write the- Throws:
IOException
-
getHeaderAndDataForTest
Returns the header or the compressed data (or uncompressed data when not using compression) as a byte array. Can be called in the "writing" state or in the "block ready" state. If called in the "writing" state, transitions the writer to the "block ready" state. This returns the header + data + checksums stored on disk.- Returns:
- header and data as they would be stored on disk in a byte array
- Throws:
IOException
-
release
void release()Releases resources used by this writer. -
getOnDiskSizeWithoutHeader
Returns the on-disk size of the data portion of the block. This is the compressed size if compression is enabled. Can only be called in the "block ready" state. Header is not compressed, and its size is not included in the return value.- Returns:
- the on-disk size of the block, not including the header.
-
getOnDiskSizeWithHeader
int getOnDiskSizeWithHeader()Returns the on-disk size of the block. Can only be called in the "block ready" state.- Returns:
- the on-disk size of the block ready to be written, including the header size, the data and the checksum data.
-
getUncompressedSizeWithoutHeader
The uncompressed size of the block data. Does not include header size. -
getUncompressedSizeWithHeader
The uncompressed size of the block data, including header size. -
isWriting
boolean isWriting()Returns true if a block is being written -
encodedBlockSizeWritten
Returns the number of bytes written into the current block so far, or zero if not writing the block at the moment. Note that this will return zero in the "block ready" state as well.- Returns:
- the number of bytes written
-
blockSizeWritten
Returns the number of bytes written into the current block so far, or zero if not writing the block at the moment. Note that this will return zero in the "block ready" state as well.- Returns:
- the number of bytes written
-
cloneUncompressedBufferWithHeader
Clones the header followed by the uncompressed data, even if using compression. This is needed for storing uncompressed blocks in the block cache. Can be called in the "writing" state or the "block ready" state. Returns only the header and data, does not include checksum data.- Returns:
- Returns an uncompressed block ByteBuff for caching on write
-
cloneOnDiskBufferWithHeader
Clones the header followed by the on-disk (compressed/encoded/encrypted) data. This is needed for storing packed blocks in the block cache. Returns only the header and data, Does not include checksum data.- Returns:
- Returns a copy of block bytes for caching on write
-
expectState
-
writeBlock
void writeBlock(HFileBlock.BlockWritable bw, org.apache.hadoop.fs.FSDataOutputStream out) throws IOException Takes the givenHFileBlock.BlockWritable
instance, creates a new block of its appropriate type, writes the writable into this block, and flushes the block into the output stream. The writer is instructed not to buffer uncompressed bytes for cache-on-write.- Parameters:
bw
- the block-writable object to write as a blockout
- the file system output stream- Throws:
IOException
-
getBlockForCaching
Creates a new HFileBlock. Checksums have already been validated, so the byte buffer passed into the constructor of this newly created block does not have checksum data even though the header minor version is MINOR_VERSION_WITH_CHECKSUM. This is indicated by setting a 0 value in bytesPerChecksum. This method copies the on-disk or uncompressed data to build the HFileBlock which is used only while writing blocks and caching.TODO: Should there be an option where a cache can ask that hbase preserve block checksums for checking after a block comes out of the cache? Otehrwise, cache is responsible for blocks being wholesome (ECC memory or if file-backed, it does checksumming).
-