Class HFileBlockIndex.BlockIndexReader

java.lang.Object
org.apache.hadoop.hbase.io.hfile.HFileBlockIndex.BlockIndexReader
All Implemented Interfaces:
HeapSize
Direct Known Subclasses:
HFileBlockIndex.ByteArrayKeyBlockIndexReader, HFileBlockIndex.CellBasedKeyBlockIndexReader
Enclosing class:
HFileBlockIndex

abstract static class HFileBlockIndex.BlockIndexReader extends Object implements HeapSize
The reader will always hold the root level index in the memory. Index blocks at all other levels will be cached in the LRU cache in practice, although this API does not enforce that.

All non-root (leaf and intermediate) index blocks contain what we call a "secondary index": an array of offsets to the entries within the block. This allows us to do binary search for the entry corresponding to the given key without having to deserialize the block.

  • Field Details

  • Constructor Details

  • Method Details

    • isEmpty

      public abstract boolean isEmpty()
      Returns true if the block index is empty.
    • ensureNonEmpty

      public void ensureNonEmpty()
      Verifies that the block index is non-empty and throws an IllegalStateException otherwise.
    • seekToDataBlock

      public HFileBlock seekToDataBlock(ExtendedCell key, HFileBlock currentBlock, boolean cacheBlocks, boolean pread, boolean isCompaction, DataBlockEncoding expectedDataBlockEncoding, HFile.CachingBlockReader cachingBlockReader) throws IOException
      Return the data block which contains this key. This function will only be called when the HFile version is larger than 1.
      Parameters:
      key - the key we are looking for
      currentBlock - the current block, to avoid re-reading the same block
      expectedDataBlockEncoding - the data block encoding the caller is expecting the data block to be in, or null to not perform this check and return the block irrespective of the encoding
      Returns:
      reader a basic way to load blocks
      Throws:
      IOException
    • loadDataBlockWithScanInfo

      public abstract BlockWithScanInfo loadDataBlockWithScanInfo(ExtendedCell key, HFileBlock currentBlock, boolean cacheBlocks, boolean pread, boolean isCompaction, DataBlockEncoding expectedDataBlockEncoding, HFile.CachingBlockReader cachingBlockReader) throws IOException
      Return the BlockWithScanInfo, a data structure which contains the Data HFileBlock with other scan info such as the key that starts the next HFileBlock. This function will only be called when the HFile version is larger than 1.
      Parameters:
      key - the key we are looking for
      currentBlock - the current block, to avoid re-reading the same block
      expectedDataBlockEncoding - the data block encoding the caller is expecting the data block to be in, or null to not perform this check and return the block irrespective of the encoding.
      Returns:
      the BlockWithScanInfo which contains the DataBlock with other scan info such as nextIndexedKey.
      Throws:
      IOException
    • midkey

      public abstract Cell midkey(HFile.CachingBlockReader cachingBlockReader) throws IOException
      An approximation to the HFile's mid-key. Operates on block boundaries, and does not go inside blocks. In other words, returns the first key of the middle block of the file.
      Returns:
      the first key of the middle block
      Throws:
      IOException
    • getRootBlockOffset

      public long getRootBlockOffset(int i)
      Parameters:
      i - from 0 to - 1
    • getRootBlockDataSize

      public int getRootBlockDataSize(int i)
      Parameters:
      i - zero-based index of a root-level block
      Returns:
      the on-disk size of the root-level block for version 2, or the uncompressed size for version 1
    • getRootBlockCount

      public int getRootBlockCount()
      Returns the number of root-level blocks in this block index
    • rootBlockContainingKey

      public abstract int rootBlockContainingKey(byte[] key, int offset, int length, CellComparator comp)
      Finds the root-level index block containing the given key. Key to find the comparator to be used
      Returns:
      Offset of block containing key (between 0 and the number of blocks - 1) or -1 if this file does not contain the request.
    • rootBlockContainingKey

      public int rootBlockContainingKey(byte[] key, int offset, int length)
      Finds the root-level index block containing the given key. Key to find
      Returns:
      Offset of block containing key (between 0 and the number of blocks - 1) or -1 if this file does not contain the request.
    • rootBlockContainingKey

      public abstract int rootBlockContainingKey(Cell key)
      Finds the root-level index block containing the given key. Key to find
    • getNonRootIndexedKey

      static byte[] getNonRootIndexedKey(ByteBuff nonRootIndex, int i)
      The indexed key at the ith position in the nonRootIndex. The position starts at 0.
      Parameters:
      i - the ith position
      Returns:
      The indexed key at the ith position in the nonRootIndex.
    • binarySearchNonRootIndex

      static int binarySearchNonRootIndex(Cell key, ByteBuff nonRootIndex, CellComparator comparator)
      Performs a binary search over a non-root level index block. Utilizes the secondary index, which records the offsets of (offset, onDiskSize, firstKey) tuples of all entries. the key we are searching for offsets to individual entries in the blockIndex buffer the non-root index block buffer, starting with the secondary index. The position is ignored.
      Returns:
      the index i in [0, numEntries - 1] such that keys[i] <= key < keys[i + 1], if keys is the array of all keys being searched, or -1 otherwise
    • locateNonRootIndexEntry

      static int locateNonRootIndexEntry(ByteBuff nonRootBlock, Cell key, CellComparator comparator)
      Search for one key using the secondary index in a non-root block. In case of success, positions the provided buffer at the entry of interest, where the file offset and the on-disk-size can be read. a non-root block without header. Initial position does not matter. the byte array containing the key
      Returns:
      the index position where the given key was found, otherwise return -1 in the case the given key is before the first key.
    • readRootIndex

      public void readRootIndex(DataInput in, int numEntries) throws IOException
      Read in the root-level index from the given input stream. Must match what was written into the root level by HFileBlockIndex.BlockIndexWriter.writeIndexBlocks(FSDataOutputStream) at the offset that function returned.
      Parameters:
      in - the buffered input stream or wrapped byte input stream
      numEntries - the number of root-level index entries
      Throws:
      IOException
    • initialize

      protected abstract void initialize(int numEntries)
    • add

      protected abstract void add(byte[] key, long offset, int dataSize)
    • readRootIndex

      public DataInputStream readRootIndex(HFileBlock blk, int numEntries) throws IOException
      Read in the root-level index from the given input stream. Must match what was written into the root level by HFileBlockIndex.BlockIndexWriter.writeIndexBlocks(FSDataOutputStream) at the offset that function returned.
      Parameters:
      blk - the HFile block
      numEntries - the number of root-level index entries
      Returns:
      the buffered input stream or wrapped byte input stream
      Throws:
      IOException
    • readMultiLevelIndexRoot

      public void readMultiLevelIndexRoot(HFileBlock blk, int numEntries) throws IOException
      Read the root-level metadata of a multi-level block index. Based on readRootIndex(DataInput, int), but also reads metadata necessary to compute the mid-key in a multi-level index.
      Parameters:
      blk - the HFile block
      numEntries - the number of root-level index entries
      Throws:
      IOException
    • heapSize

      public long heapSize()
      Description copied from interface: HeapSize
      Return the approximate 'exclusive deep size' of implementing object. Includes count of payload and hosting object sizings.
      Specified by:
      heapSize in interface HeapSize
    • calculateHeapSizeForBlockKeys

      protected abstract long calculateHeapSizeForBlockKeys(long heapSize)