Class LruBlockCache

java.lang.Object
org.apache.hadoop.hbase.io.hfile.LruBlockCache
All Implemented Interfaces:
Iterable<CachedBlock>, HeapSize, BlockCache, FirstLevelBlockCache, ResizableBlockCache
Direct Known Subclasses:
IndexOnlyLruBlockCache

@Private public class LruBlockCache extends Object implements FirstLevelBlockCache
A block cache implementation that is memory-aware using HeapSize, memory-bound using an LRU eviction algorithm, and concurrent: backed by a ConcurrentHashMap and with a non-blocking eviction thread giving constant-time cacheBlock(org.apache.hadoop.hbase.io.hfile.BlockCacheKey, org.apache.hadoop.hbase.io.hfile.Cacheable, boolean) and getBlock(org.apache.hadoop.hbase.io.hfile.BlockCacheKey, boolean, boolean, boolean) operations.

Contains three levels of block priority to allow for scan-resistance and in-memory families ColumnFamilyDescriptorBuilder.setInMemory(boolean) (An in-memory column family is a column family that should be served from memory if possible): single-access, multiple-accesses, and in-memory priority. A block is added with an in-memory priority flag if ColumnFamilyDescriptor.isInMemory(), otherwise a block becomes a single access priority the first time it is read into this block cache. If a block is accessed again while in cache, it is marked as a multiple access priority block. This delineation of blocks is used to prevent scans from thrashing the cache adding a least-frequently-used element to the eviction algorithm.

Each priority is given its own chunk of the total cache to ensure fairness during eviction. Each priority will retain close to its maximum size, however, if any priority is not using its entire chunk the others are able to grow beyond their chunk size.

Instantiated at a minimum with the total size and average block size. All sizes are in bytes. The block size is not especially important as this cache is fully dynamic in its sizing of blocks. It is only used for pre-allocating data structures and in initial heap estimation of the map.

The detailed constructor defines the sizes for the three priorities (they should total to the maximum size defined). It also sets the levels that trigger and control the eviction thread.

The acceptable size is the cache size level which triggers the eviction process to start. It evicts enough blocks to get the size below the minimum size specified.

Eviction happens in a separate thread and involves a single full-scan of the map. It determines how many bytes must be freed to reach the minimum size, and then while scanning determines the fewest least-recently-used blocks necessary from each of the three priorities (would be 3 times bytes to free). It then uses the priority chunk sizes to evict fairly according to the relative sizes and usage.

  • Field Details

  • Constructor Details

    • LruBlockCache

      public LruBlockCache(long maxSize, long blockSize)
      Default constructor. Specify maximum size and expected average block size (approximation is fine).

      All other factors will be calculated based on defaults specified in this class.

      Parameters:
      maxSize - maximum size of cache, in bytes
      blockSize - approximate size of each block, in bytes
    • LruBlockCache

      public LruBlockCache(long maxSize, long blockSize, boolean evictionThread)
      Constructor used for testing. Allows disabling of the eviction thread.
    • LruBlockCache

      public LruBlockCache(long maxSize, long blockSize, boolean evictionThread, org.apache.hadoop.conf.Configuration conf)
    • LruBlockCache

      public LruBlockCache(long maxSize, long blockSize, org.apache.hadoop.conf.Configuration conf)
    • LruBlockCache

      public LruBlockCache(long maxSize, long blockSize, boolean evictionThread, int mapInitialSize, float mapLoadFactor, int mapConcurrencyLevel, float minFactor, float acceptableFactor, float singleFactor, float multiFactor, float memoryFactor, float hardLimitFactor, boolean forceInMemory, long maxBlockSize)
      Configurable constructor. Use this constructor if not using defaults.
      Parameters:
      maxSize - maximum size of this cache, in bytes
      blockSize - expected average size of blocks, in bytes
      evictionThread - whether to run evictions in a bg thread or not
      mapInitialSize - initial size of backing ConcurrentHashMap
      mapLoadFactor - initial load factor of backing ConcurrentHashMap
      mapConcurrencyLevel - initial concurrency factor for backing CHM
      minFactor - percentage of total size that eviction will evict until
      acceptableFactor - percentage of total size that triggers eviction
      singleFactor - percentage of total size for single-access blocks
      multiFactor - percentage of total size for multiple-access blocks
      memoryFactor - percentage of total size for in-memory blocks
  • Method Details

    • setVictimCache

      public void setVictimCache(BlockCache victimCache)
      Description copied from interface: FirstLevelBlockCache
      Specifies the secondary cache. An entry that is evicted from this cache due to a size constraint will be inserted into the victim cache.
      Specified by:
      setVictimCache in interface FirstLevelBlockCache
      Parameters:
      victimCache - the second level cache
    • setMaxSize

      public void setMaxSize(long maxSize)
      Description copied from interface: ResizableBlockCache
      Sets the max heap size that can be used by the BlockCache.
      Specified by:
      setMaxSize in interface ResizableBlockCache
      Parameters:
      maxSize - The max heap size.
    • asReferencedHeapBlock

      The block cached in LRUBlockCache will always be an heap block: on the one side, the heap access will be more faster then off-heap, the small index block or meta block cached in CombinedBlockCache will benefit a lot. on other side, the LRUBlockCache size is always calculated based on the total heap size, if caching an off-heap block in LRUBlockCache, the heap size will be messed up. Here we will clone the block into an heap block if it's an off-heap block, otherwise just use the original block. The key point is maintain the refCnt of the block (HBASE-22127):
      1. if cache the cloned heap block, its refCnt is an totally new one, it's easy to handle;
      2. if cache the original heap block, we're sure that it won't be tracked in ByteBuffAllocator's reservoir, if both RPC and LRUBlockCache release the block, then it can be garbage collected by JVM, so need a retain here.
      Parameters:
      buf - the original block
      Returns:
      an block with an heap memory backend.
    • cacheBlock

      public void cacheBlock(BlockCacheKey cacheKey, Cacheable buf, boolean inMemory)
      Cache the block with the specified name and buffer.

      It is assumed this will NOT be called on an already cached block. In rare cases (HBASE-8547) this can happen, for which we compare the buffer contents.

      Specified by:
      cacheBlock in interface BlockCache
      Parameters:
      cacheKey - block's cache key
      buf - block buffer
      inMemory - if block is in-memory
    • assertCounterSanity

      private static void assertCounterSanity(long mapSize, long counterVal)
      Sanity-checking for parity between actual block cache content and metrics. Intended only for use with TRACE level logging and -ea JVM.
    • cacheBlock

      public void cacheBlock(BlockCacheKey cacheKey, Cacheable buf)
      Cache the block with the specified name and buffer.

      TODO after HBASE-22005, we may cache an block which allocated from off-heap, but our LRU cache sizing is based on heap size, so we should handle this in HBASE-22127. It will introduce an switch whether make the LRU on-heap or not, if so we may need copy the memory to on-heap, otherwise the caching size is based on off-heap.

      Specified by:
      cacheBlock in interface BlockCache
      Parameters:
      cacheKey - block's cache key
      buf - block buffer
    • updateSizeMetrics

      private long updateSizeMetrics(LruCachedBlock cb, boolean evict)
      Helper function that updates the local size counter and also updates any per-cf or per-blocktype metrics it can discern from given LruCachedBlock
    • getBlock

      public Cacheable getBlock(BlockCacheKey cacheKey, boolean caching, boolean repeat, boolean updateCacheMetrics)
      Get the buffer of the block with the specified name.
      Specified by:
      getBlock in interface BlockCache
      Parameters:
      cacheKey - block's cache key
      caching - true if the caller caches blocks on cache misses
      repeat - Whether this is a repeat lookup for the same block (used to avoid double counting cache misses when doing double-check locking)
      updateCacheMetrics - Whether to update cache metrics or not
      Returns:
      buffer of specified cache key, or null if not in cache
    • containsBlock

      public boolean containsBlock(BlockCacheKey cacheKey)
      Whether the cache contains block with specified cacheKey
      Specified by:
      containsBlock in interface FirstLevelBlockCache
      Parameters:
      cacheKey - cache key for the block
      Returns:
      true if contains the block
    • evictBlock

      public boolean evictBlock(BlockCacheKey cacheKey)
      Description copied from interface: BlockCache
      Evict block from cache.
      Specified by:
      evictBlock in interface BlockCache
      Parameters:
      cacheKey - Block to evict
      Returns:
      true if block existed and was evicted, false if not
    • evictBlocksByHfileName

      public int evictBlocksByHfileName(String hfileName)
      Evicts all blocks for a specific HFile. This is an expensive operation implemented as a linear-time search through all blocks in the cache. Ideally this should be a search in a log-access-time map.

      This is used for evict-on-close to remove all blocks of a specific HFile.

      Specified by:
      evictBlocksByHfileName in interface BlockCache
      Returns:
      the number of blocks evicted
    • evictBlock

      protected long evictBlock(LruCachedBlock block, boolean evictedByEvictionProcess)
      Evict the block, and it will be cached by the victim handler if exists && block may be read again later
      Parameters:
      evictedByEvictionProcess - true if the given block is evicted by EvictionThread
      Returns:
      the heap size of evicted block
    • runEviction

      private void runEviction()
      Multi-threaded call to run the eviction process.
    • isEvictionInProgress

    • getOverhead

      long getOverhead()
    • evict

      void evict()
      Eviction method.
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • getMaxSize

      public long getMaxSize()
      Get the maximum size of this cache.
      Specified by:
      getMaxSize in interface BlockCache
      Returns:
      max size in bytes
    • getCurrentSize

      public long getCurrentSize()
      Description copied from interface: BlockCache
      Returns the occupied size of the block cache, in bytes.
      Specified by:
      getCurrentSize in interface BlockCache
      Returns:
      occupied space in cache, in bytes
    • getCurrentDataSize

      public long getCurrentDataSize()
      Description copied from interface: BlockCache
      Returns the occupied size of data blocks, in bytes.
      Specified by:
      getCurrentDataSize in interface BlockCache
      Returns:
      occupied space in cache, in bytes
    • getCurrentIndexSize

      public long getCurrentIndexSize()
    • getCurrentBloomSize

      public long getCurrentBloomSize()
    • getFreeSize

      public long getFreeSize()
      Description copied from interface: BlockCache
      Returns the free size of the block cache, in bytes.
      Specified by:
      getFreeSize in interface BlockCache
      Returns:
      free space in cache, in bytes
    • size

      public long size()
      Description copied from interface: BlockCache
      Returns the total size of the block cache, in bytes.
      Specified by:
      size in interface BlockCache
      Returns:
      size of cache, in bytes
    • getBlockCount

      public long getBlockCount()
      Description copied from interface: BlockCache
      Returns the number of blocks currently cached in the block cache.
      Specified by:
      getBlockCount in interface BlockCache
      Returns:
      number of blocks in the cache
    • getDataBlockCount

      public long getDataBlockCount()
      Description copied from interface: BlockCache
      Returns the number of data blocks currently cached in the block cache.
      Specified by:
      getDataBlockCount in interface BlockCache
      Returns:
      number of blocks in the cache
    • getIndexBlockCount

      public long getIndexBlockCount()
    • getBloomBlockCount

      public long getBloomBlockCount()
    • getEvictionThread

    • logStats

      public void logStats()
    • getStats

      public CacheStats getStats()
      Get counter statistics for this cache.

      Includes: total accesses, hits, misses, evicted blocks, and runs of the eviction processes.

      Specified by:
      getStats in interface BlockCache
    • heapSize

      public long heapSize()
      Description copied from interface: HeapSize
      Return the approximate 'exclusive deep size' of implementing object. Includes count of payload and hosting object sizings.
      Specified by:
      heapSize in interface HeapSize
    • calculateOverhead

      private static long calculateOverhead(long maxSize, long blockSize, int concurrency)
    • iterator

      Description copied from interface: BlockCache
      Returns Iterator over the blocks in the cache.
      Specified by:
      iterator in interface BlockCache
      Specified by:
      iterator in interface Iterable<CachedBlock>
    • acceptableSize

    • minSize

      private long minSize()
    • singleSize

      private long singleSize()
    • multiSize

      private long multiSize()
    • memorySize

      private long memorySize()
    • shutdown

      public void shutdown()
      Description copied from interface: BlockCache
      Shutdown the cache.
      Specified by:
      shutdown in interface BlockCache
    • clearCache

      public void clearCache()
      Clears the cache. Used in tests.
    • getCachedFileNamesForTest

      Used in testing. May be very inefficient.
      Returns:
      the set of cached file names
    • getEncodingCountsForTest

    • getMapForTests

    • getBlockCaches

      Description copied from interface: BlockCache
      Returns The list of sub blockcaches that make up this one; returns null if no sub caches.
      Specified by:
      getBlockCaches in interface BlockCache