@InterfaceAudience.Private public class HFileBlock extends Object implements Cacheable
HFile
version 2 file.
Version 2 was introduced in hbase-0.92.0.
Version 1 was the original file block. Version 2 was introduced when we changed the hbase file format to support multi-level block indexes and compound bloom filters (HBASE-3857). Support for Version 1 was removed in hbase-1.3.0.
BlockType
(8 bytes):
e.g. DATABLK*
HFile
. If compression is NONE, this is
just raw, serialized Cells.
Cacheable.serialize(ByteBuffer, boolean)
and Cacheable.getDeserializer()
.
TODO: Should we cache the checksums? Down in Writer#getBlockForCaching(CacheConfig) where we make a block to cache-on-write, there is an attempt at turning off checksums. This is not the only place we get blocks to cache. We also will cache the raw return from an hdfs read. In this case, the checksums may be present. If the cache is backed by something that doesn't do ECC, say an SSD, we might want to preserve checksums. For now this is open question.
TODO: Over in BucketCache, we save a block allocation by doing a custom serialization. Be sure to change it if serialization changes in here. Could we add a method here that takes an IOEngine and that then serializes to it rather than expose our internals over in BucketCache? IOEngine is in the bucket subpackage. Pull it up? Then this class knows about bucketcache. Ugh.
Modifier and Type | Class and Description |
---|---|
(package private) static interface |
HFileBlock.BlockIterator
Iterator for
HFileBlock s. |
(package private) static interface |
HFileBlock.BlockWritable
Something that can be written into a block.
|
(package private) static interface |
HFileBlock.FSReader
An HFile block reader with iteration ability.
|
(package private) static class |
HFileBlock.FSReaderImpl
Reads version 2 HFile blocks from the filesystem.
|
(package private) static class |
HFileBlock.Header |
private static class |
HFileBlock.PrefetchedHeader
Data-structure to use caching the header of the NEXT block.
|
(package private) static class |
HFileBlock.Writer
Unified version 2
HFile block writer. |
Cacheable.MemoryType
Modifier and Type | Field and Description |
---|---|
(package private) static CacheableDeserializer<Cacheable> |
BLOCK_DESERIALIZER
Used deserializing blocks from Cache.
|
(package private) static int |
BLOCK_METADATA_SPACE
Space for metadata on a block that gets stored along with the block when we cache it.
|
private BlockType |
blockType
Type of block.
|
private ByteBuff |
buf
The in-memory representation of the hfile block.
|
(package private) static int |
CHECKSUM_SIZE
Each checksum value is an integer that can be stored in 4 bytes.
|
(package private) static int |
CHECKSUM_VERIFICATION_NUM_IO_THRESHOLD
On a checksum failure, do these many succeeding read requests using hdfs checksums before
auto-reenabling hbase checksum verification.
|
private static int |
DESERIALIZER_IDENTIFIER |
static boolean |
DONT_FILL_HEADER |
(package private) static byte[] |
DUMMY_HEADER_NO_CHECKSUM |
private HFileContext |
fileContext
Meta data that holds meta information on the hfileblock.
|
static boolean |
FILL_HEADER |
private static org.slf4j.Logger |
LOG |
private Cacheable.MemoryType |
memType |
static int |
MULTI_BYTE_BUFFER_HEAP_SIZE |
private int |
nextBlockOnDiskSize
The on-disk size of the next block, including the header and checksums if present.
|
private long |
offset
The offset of this block in the file.
|
private int |
onDiskDataSizeWithHeader
Size on disk of header + data.
|
private int |
onDiskSizeWithoutHeader
Size on disk excluding header, including checksum.
|
private long |
prevBlockOffset
The offset of the previous block on disk.
|
private int |
uncompressedSizeWithoutHeader
Size of pure data.
|
private static int |
UNSET |
Modifier | Constructor and Description |
---|---|
|
HFileBlock(BlockType blockType,
int onDiskSizeWithoutHeader,
int uncompressedSizeWithoutHeader,
long prevBlockOffset,
ByteBuffer b,
boolean fillHeader,
long offset,
int nextBlockOnDiskSize,
int onDiskDataSizeWithHeader,
HFileContext fileContext)
Creates a new
HFile block from the given fields. |
(package private) |
HFileBlock(ByteBuff buf,
boolean usesHBaseChecksum,
Cacheable.MemoryType memType,
long offset,
int nextBlockOnDiskSize,
HFileContext fileContext)
Creates a block from an existing buffer starting with a header.
|
private |
HFileBlock(HFileBlock that)
Copy constructor.
|
private |
HFileBlock(HFileBlock that,
boolean bufCopy)
Copy constructor.
|
Modifier and Type | Method and Description |
---|---|
private ByteBuffer |
addMetaData(ByteBuffer destination,
boolean includeNextBlockMetadata)
Adds metadata at current position (position is moved forward).
|
private void |
allocateBuffer()
Always allocates a new buffer of the correct size.
|
HFileBlock |
deepClone() |
boolean |
equals(Object comparison) |
BlockType |
getBlockType() |
ByteBuff |
getBufferReadOnly()
Returns a read-only duplicate of the buffer this block stores internally ready to be read.
|
ByteBuff |
getBufferWithoutHeader()
Returns a buffer that does not include the header or checksum.
|
(package private) int |
getBytesPerChecksum() |
(package private) DataInputStream |
getByteStream() |
(package private) byte |
getChecksumType() |
(package private) DataBlockEncoding |
getDataBlockEncoding() |
(package private) short |
getDataBlockEncodingId() |
CacheableDeserializer<Cacheable> |
getDeserializer()
Returns CacheableDeserializer instance which reconstructs original object from ByteBuffer.
|
(package private) byte[] |
getDummyHeaderForVersion()
Return the appropriate DUMMY_HEADER for the minor version
|
private static byte[] |
getDummyHeaderForVersion(boolean usesHBaseChecksum)
Return the appropriate DUMMY_HEADER for the minor version
|
(package private) HFileContext |
getHFileContext() |
Cacheable.MemoryType |
getMemoryType() |
ByteBuffer |
getMetaData()
For use by bucketcache.
|
(package private) int |
getNextBlockOnDiskSize() |
(package private) long |
getOffset()
Cannot be
UNSET . |
(package private) int |
getOnDiskDataSizeWithHeader() |
int |
getOnDiskSizeWithHeader() |
private static int |
getOnDiskSizeWithHeader(ByteBuffer headerBuf,
boolean verifyChecksum)
Parse total on disk size including header and checksum.
|
(package private) int |
getOnDiskSizeWithoutHeader() |
(package private) long |
getPrevBlockOffset() |
int |
getSerializedLength()
Returns the length of the ByteBuffer required to serialized the object.
|
(package private) int |
getUncompressedSizeWithoutHeader() |
int |
hashCode() |
int |
headerSize()
Returns the size of this block header.
|
static int |
headerSize(boolean usesHBaseChecksum)
Maps a minor version to the size of the header.
|
long |
heapSize() |
private void |
init(BlockType blockType,
int onDiskSizeWithoutHeader,
int uncompressedSizeWithoutHeader,
long prevBlockOffset,
long offset,
int onDiskDataSizeWithHeader,
int nextBlockOnDiskSize,
HFileContext fileContext)
Called from constructors.
|
boolean |
isUnpacked()
Return true when this block's buffer has been unpacked, false otherwise.
|
private void |
overwriteHeader()
Rewinds
buf and writes first 4 header fields. |
(package private) static boolean |
positionalReadWithExtra(org.apache.hadoop.fs.FSDataInputStream in,
long position,
byte[] buf,
int bufOffset,
int necessaryLen,
int extraLen)
Read from an input stream at least
necessaryLen and if possible,
extraLen also if available. |
(package private) static boolean |
readWithExtra(InputStream in,
byte[] buf,
int bufOffset,
int necessaryLen,
int extraLen)
Read from an input stream at least
necessaryLen and if possible,
extraLen also if available. |
(package private) void |
sanityCheck()
Checks if the block is internally consistent, i.e.
|
private void |
sanityCheckAssertion(BlockType valueFromBuf,
BlockType valueFromField) |
private void |
sanityCheckAssertion(long valueFromBuf,
long valueFromField,
String fieldName) |
(package private) void |
sanityCheckUncompressed()
An additional sanity-check in case no compression or encryption is being used.
|
(package private) void |
sanityCheckUncompressedSize()
An additional sanity-check in case no compression or encryption is being used.
|
void |
serialize(ByteBuffer destination,
boolean includeNextBlockMetadata)
Serializes its data into destination.
|
String |
toString() |
(package private) static String |
toStringHeader(ByteBuff buf)
Convert the contents of the block header into a human readable string.
|
(package private) int |
totalChecksumBytes()
Calculate the number of bytes required to store all the checksums
for this block.
|
(package private) HFileBlock |
unpack(HFileContext fileContext,
HFileBlock.FSReader reader)
Retrieves the decompressed/decrypted view of this block.
|
(package private) boolean |
usesSharedMemory() |
private static final org.slf4j.Logger LOG
private int onDiskSizeWithoutHeader
private int uncompressedSizeWithoutHeader
private long prevBlockOffset
private int onDiskDataSizeWithHeader
onDiskSizeWithoutHeader
when using HDFS checksum.private ByteBuff buf
Be careful reading from this buf
. Duplicate and work on the duplicate or if
not, be sure to reset position and limit else trouble down the road.
TODO: Make this read-only once made.
We are using the ByteBuff type. ByteBuffer is not extensible yet we need to be able to have a ByteBuffer-like API across multiple ByteBuffers reading from a cache such as BucketCache. So, we have this ByteBuff type. Unfortunately, it is spread all about HFileBlock. Would be good if could be confined to cache-use only but hard-to-do.
private HFileContext fileContext
private long offset
private Cacheable.MemoryType memType
private int nextBlockOnDiskSize
static final int CHECKSUM_VERIFICATION_NUM_IO_THRESHOLD
private static int UNSET
public static final boolean FILL_HEADER
public static final boolean DONT_FILL_HEADER
public static final int MULTI_BYTE_BUFFER_HEAP_SIZE
static final int BLOCK_METADATA_SPACE
static final int CHECKSUM_SIZE
static final byte[] DUMMY_HEADER_NO_CHECKSUM
static final CacheableDeserializer<Cacheable> BLOCK_DESERIALIZER
++++++++++++++
+ HFileBlock +
++++++++++++++
+ Checksums + <= Optional
++++++++++++++
+ Metadata! + <= See note on BLOCK_METADATA_SPACE above.
++++++++++++++
#serialize(ByteBuffer)
private static final int DESERIALIZER_IDENTIFIER
private HFileBlock(HFileBlock that)
that
's buffer.private HFileBlock(HFileBlock that, boolean bufCopy)
that
's buffer as per the boolean
param.public HFileBlock(BlockType blockType, int onDiskSizeWithoutHeader, int uncompressedSizeWithoutHeader, long prevBlockOffset, ByteBuffer b, boolean fillHeader, long offset, int nextBlockOnDiskSize, int onDiskDataSizeWithHeader, HFileContext fileContext)
HFile
block from the given fields. This constructor
is used only while writing blocks and caching,
and is sitting in a byte buffer and we want to stuff the block into cache.
See HFileBlock.Writer.getBlockForCaching(CacheConfig)
.
TODO: The caller presumes no checksumming required of this block instance since going into cache; checksum already verified on underlying block data pulled in from filesystem. Is that correct? What if cache is SSD?
blockType
- the type of this block, see BlockType
onDiskSizeWithoutHeader
- see onDiskSizeWithoutHeader
uncompressedSizeWithoutHeader
- see uncompressedSizeWithoutHeader
prevBlockOffset
- see prevBlockOffset
b
- block header (HConstants.HFILEBLOCK_HEADER_SIZE
bytes)fillHeader
- when true, write the first 4 header fields into passed buffer.offset
- the file offset the block was read fromonDiskDataSizeWithHeader
- see onDiskDataSizeWithHeader
fileContext
- HFile meta dataHFileBlock(ByteBuff buf, boolean usesHBaseChecksum, Cacheable.MemoryType memType, long offset, int nextBlockOnDiskSize, HFileContext fileContext) throws IOException
buf
- Has header, content, and trailing checksums if present.IOException
private void init(BlockType blockType, int onDiskSizeWithoutHeader, int uncompressedSizeWithoutHeader, long prevBlockOffset, long offset, int onDiskDataSizeWithHeader, int nextBlockOnDiskSize, HFileContext fileContext)
private static int getOnDiskSizeWithHeader(ByteBuffer headerBuf, boolean verifyChecksum)
headerBuf
- Header ByteBuffer. Presumed exact size of header.verifyChecksum
- true if checksum verification is in use.int getNextBlockOnDiskSize()
public BlockType getBlockType()
getBlockType
in interface Cacheable
short getDataBlockEncodingId()
public int getOnDiskSizeWithHeader()
int getOnDiskSizeWithoutHeader()
int getUncompressedSizeWithoutHeader()
long getPrevBlockOffset()
private void overwriteHeader()
buf
and writes first 4 header fields. buf
position
is modified as side-effect.public ByteBuff getBufferWithoutHeader()
public ByteBuff getBufferReadOnly()
CompoundBloomFilter
to avoid object creation on every Bloom
filter lookup, but has to be used with caution. Buffer holds header, block content,
and any follow-on checksums if present.private void sanityCheckAssertion(long valueFromBuf, long valueFromField, String fieldName) throws IOException
IOException
private void sanityCheckAssertion(BlockType valueFromBuf, BlockType valueFromField) throws IOException
IOException
void sanityCheck() throws IOException
HConstants.HFILEBLOCK_HEADER_SIZE
bytes of the buffer contain a
valid header consistent with the fields. Assumes a packed block structure.
This function is primary for testing and debugging, and is not
thread-safe, because it alters the internal buffer pointer.
Used by tests only.IOException
HFileBlock unpack(HFileContext fileContext, HFileBlock.FSReader reader) throws IOException
IOException
private void allocateBuffer()
public boolean isUnpacked()
void sanityCheckUncompressedSize() throws IOException
IOException
long getOffset()
UNSET
. Must be a legitimate value. Used re-making the BlockCacheKey
when
block is returned to the cache.DataInputStream getByteStream()
public long heapSize()
static boolean readWithExtra(InputStream in, byte[] buf, int bufOffset, int necessaryLen, int extraLen) throws IOException
necessaryLen
and if possible,
extraLen
also if available. Analogous to
IOUtils.readFully(InputStream, byte[], int, int)
, but specifies a
number of "extra" bytes to also optionally read.in
- the input stream to read frombuf
- the buffer to read intobufOffset
- the destination offset in the buffernecessaryLen
- the number of bytes that are absolutely necessary to readextraLen
- the number of extra bytes that would be nice to readIOException
- if failed to read the necessary bytesstatic boolean positionalReadWithExtra(org.apache.hadoop.fs.FSDataInputStream in, long position, byte[] buf, int bufOffset, int necessaryLen, int extraLen) throws IOException
necessaryLen
and if possible,
extraLen
also if available. Analogous to
IOUtils.readFully(InputStream, byte[], int, int)
, but uses
positional read and specifies a number of "extra" bytes that would be
desirable but not absolutely necessary to read.in
- the input stream to read fromposition
- the position within the stream from which to start readingbuf
- the buffer to read intobufOffset
- the destination offset in the buffernecessaryLen
- the number of bytes that are absolutely necessary to
readextraLen
- the number of extra bytes that would be nice to readIOException
- if failed to read the necessary bytesvoid sanityCheckUncompressed() throws IOException
IOException
public int getSerializedLength()
Cacheable
getSerializedLength
in interface Cacheable
public void serialize(ByteBuffer destination, boolean includeNextBlockMetadata)
Cacheable
public ByteBuffer getMetaData()
private ByteBuffer addMetaData(ByteBuffer destination, boolean includeNextBlockMetadata)
destination
with metadata added.public CacheableDeserializer<Cacheable> getDeserializer()
Cacheable
getDeserializer
in interface Cacheable
DataBlockEncoding getDataBlockEncoding()
byte getChecksumType()
int getBytesPerChecksum()
int getOnDiskDataSizeWithHeader()
int totalChecksumBytes()
public int headerSize()
public static int headerSize(boolean usesHBaseChecksum)
byte[] getDummyHeaderForVersion()
private static byte[] getDummyHeaderForVersion(boolean usesHBaseChecksum)
HFileContext getHFileContext()
public Cacheable.MemoryType getMemoryType()
getMemoryType
in interface Cacheable
MemoryType
of this Cacheableboolean usesSharedMemory()
static String toStringHeader(ByteBuff buf) throws IOException
IOException
public HFileBlock deepClone()
Copyright © 2007–2020 The Apache Software Foundation. All rights reserved.