Package org.apache.hadoop.hbase.io.hfile
Class LruBlockCache
java.lang.Object
org.apache.hadoop.hbase.io.hfile.LruBlockCache
- All Implemented Interfaces:
Iterable<CachedBlock>
,HeapSize
,BlockCache
,FirstLevelBlockCache
,ResizableBlockCache
- Direct Known Subclasses:
IndexOnlyLruBlockCache
A block cache implementation that is memory-aware using
HeapSize
, memory-bound using an
LRU eviction algorithm, and concurrent: backed by a ConcurrentHashMap
and with a
non-blocking eviction thread giving constant-time cacheBlock(org.apache.hadoop.hbase.io.hfile.BlockCacheKey, org.apache.hadoop.hbase.io.hfile.Cacheable, boolean)
and getBlock(org.apache.hadoop.hbase.io.hfile.BlockCacheKey, boolean, boolean, boolean)
operations.
Contains three levels of block priority to allow for scan-resistance and in-memory families
ColumnFamilyDescriptorBuilder.setInMemory(boolean)
(An
in-memory column family is a column family that should be served from memory if possible):
single-access, multiple-accesses, and in-memory priority. A block is added with an in-memory
priority flag if ColumnFamilyDescriptor.isInMemory()
,
otherwise a block becomes a single access priority the first time it is read into this block
cache. If a block is accessed again while in cache, it is marked as a multiple access priority
block. This delineation of blocks is used to prevent scans from thrashing the cache adding a
least-frequently-used element to the eviction algorithm.
Each priority is given its own chunk of the total cache to ensure fairness during eviction. Each
priority will retain close to its maximum size, however, if any priority is not using its entire
chunk the others are able to grow beyond their chunk size.
Instantiated at a minimum with the total size and average block size. All sizes are in bytes. The
block size is not especially important as this cache is fully dynamic in its sizing of blocks. It
is only used for pre-allocating data structures and in initial heap estimation of the map.
The detailed constructor defines the sizes for the three priorities (they should total to the
maximum size
defined). It also sets the levels that trigger and control the eviction
thread.
The acceptable size
is the cache size level which triggers the eviction process to
start. It evicts enough blocks to get the size below the minimum size specified.
Eviction happens in a separate thread and involves a single full-scan of the map. It determines
how many bytes must be freed to reach the minimum size, and then while scanning determines the
fewest least-recently-used blocks necessary from each of the three priorities (would be 3 times
bytes to free). It then uses the priority chunk sizes to evict fairly according to the relative
sizes and usage.-
Nested Class Summary
Modifier and TypeClassDescriptionprivate class
Used to group blocks into priority buckets.(package private) static class
(package private) static class
-
Field Summary
Modifier and TypeFieldDescriptionprivate float
Acceptable size of cache (no evictions if size < acceptable)private long
Approximate block sizeprivate final LongAdder
Current number of cached bloom block elementsprivate final LongAdder
Current size of bloom blocksstatic final long
private final AtomicLong
Cache access count (sequential ID)private final LongAdder
Current number of cached data block elementsprivate final LongAdder
Current size of data blocks(package private) static final float
(package private) static final int
private static final float
private static final boolean
(package private) static final float
private static final long
private static final float
private static final float
private static final float
private static final float
private final AtomicLong
Current number of cached elementsprivate boolean
Volatile boolean to track if we are in an eviction process or notprivate final ReentrantLock
Eviction lock (locked when eviction in process)private final LruBlockCache.EvictionThread
Eviction threadprivate boolean
Whether in-memory hfile's data block has higher priority when evictingprivate float
hard capacity limitprivate final LongAdder
Current number of cached index block elementsprivate final LongAdder
Current size of index blocksprivate static final org.slf4j.Logger
private static final String
Acceptable size of cache (no evictions if size < acceptable)(package private) static final String
Hard capacity limit of cache, will reject any put if size > this * acceptableprivate static final String
Configuration key to force data-block always (except in-memory are too much) cached in memory for in-memory hfile, unlike inMemory, which is a column-family configuration, inMemoryForceMode is a cluster-wide configurationprivate static final String
private static final String
private static final String
Percentage of total size that eviction will evict until; e.g.private static final String
private static final String
private final ConcurrentHashMap<BlockCacheKey,
LruCachedBlock> Defined the cache map asConcurrentHashMap
here, because ingetBlock(org.apache.hadoop.hbase.io.hfile.BlockCacheKey, boolean, boolean, boolean)
, we need to guarantee the atomicity of map#k (key, func).private final long
private long
Maximum allowable size of cache (block put if size > max, evict)private float
In-memory bucket sizeprivate float
Minimum threshold of cache (when evicting, evict until size < min)private float
Multiple access bucket sizeprivate long
Overhead of the structure itselfprivate final ScheduledExecutorService
Statistics thread schedule pool (for heavy debugging, could remove)private float
Single access bucket sizeprivate final AtomicLong
Current size of cacheprivate static final int
private final CacheStats
Cache statisticsprivate BlockCache
Where to send victims (blocks evicted/missing from the cache). -
Constructor Summary
ConstructorDescriptionLruBlockCache
(long maxSize, long blockSize) Default constructor.LruBlockCache
(long maxSize, long blockSize, boolean evictionThread) Constructor used for testing.LruBlockCache
(long maxSize, long blockSize, boolean evictionThread, int mapInitialSize, float mapLoadFactor, int mapConcurrencyLevel, float minFactor, float acceptableFactor, float singleFactor, float multiFactor, float memoryFactor, float hardLimitFactor, boolean forceInMemory, long maxBlockSize) Configurable constructor.LruBlockCache
(long maxSize, long blockSize, boolean evictionThread, org.apache.hadoop.conf.Configuration conf) LruBlockCache
(long maxSize, long blockSize, org.apache.hadoop.conf.Configuration conf) -
Method Summary
Modifier and TypeMethodDescription(package private) long
private Cacheable
The block cached in LRUBlockCache will always be an heap block: on the one side, the heap access will be more faster then off-heap, the small index block or meta block cached in CombinedBlockCache will benefit a lot.private static void
assertCounterSanity
(long mapSize, long counterVal) Sanity-checking for parity between actual block cache content and metrics.void
cacheBlock
(BlockCacheKey cacheKey, Cacheable buf) Cache the block with the specified name and buffer.void
cacheBlock
(BlockCacheKey cacheKey, Cacheable buf, boolean inMemory) Cache the block with the specified name and buffer.private static long
calculateOverhead
(long maxSize, long blockSize, int concurrency) void
Clears the cache.boolean
containsBlock
(BlockCacheKey cacheKey) Whether the cache contains block with specified cacheKey(package private) void
evict()
Eviction method.boolean
evictBlock
(BlockCacheKey cacheKey) Evict block from cache.protected long
evictBlock
(LruCachedBlock block, boolean evictedByEvictionProcess) Evict the block, and it will be cached by the victim handler if exists && block may be read again laterint
evictBlocksByHfileName
(String hfileName) Evicts all blocks for a specific HFile.getBlock
(BlockCacheKey cacheKey, boolean caching, boolean repeat, boolean updateCacheMetrics) Get the buffer of the block with the specified name.Returns The list of sub blockcaches that make up this one; returns null if no sub caches.long
Returns the number of blocks currently cached in the block cache.long
Used in testing.long
long
Returns the occupied size of data blocks, in bytes.long
long
Returns the occupied size of the block cache, in bytes.long
Returns the number of data blocks currently cached in the block cache.(package private) LruBlockCache.EvictionThread
long
Returns the free size of the block cache, in bytes.long
(package private) Map<BlockCacheKey,
LruCachedBlock> long
Get the maximum size of this cache.(package private) long
getStats()
Get counter statistics for this cache.long
heapSize()
Return the approximate 'exclusive deep size' of implementing object.(package private) boolean
iterator()
Returns Iterator over the blocks in the cache.void
logStats()
private long
private long
minSize()
private long
private void
Multi-threaded call to run the eviction process.void
setMaxSize
(long maxSize) Sets the max heap size that can be used by the BlockCache.void
setVictimCache
(BlockCache victimCache) Specifies the secondary cache.void
shutdown()
Shutdown the cache.private long
long
size()
Returns the total size of the block cache, in bytes.toString()
private long
updateSizeMetrics
(LruCachedBlock cb, boolean evict) Helper function that updates the local size counter and also updates any per-cf or per-blocktype metrics it can discern from givenLruCachedBlock
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface org.apache.hadoop.hbase.io.hfile.BlockCache
blockFitsIntoTheCache, cacheBlock, evictBlocksRangeByHfileName, getBlock, getBlockSize, getFullyCachedFiles, getRegionCachedInfo, isAlreadyCached, isCacheEnabled, isMetaBlock, notifyFileCachingCompleted, shouldCacheFile, waitForCacheInitialization
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
Field Details
-
LOG
-
LRU_MIN_FACTOR_CONFIG_NAME
Percentage of total size that eviction will evict until; e.g. if set to .8, then we will keep evicting during an eviction run till the cache size is down to 80% of the total.- See Also:
-
LRU_ACCEPTABLE_FACTOR_CONFIG_NAME
Acceptable size of cache (no evictions if size < acceptable)- See Also:
-
LRU_HARD_CAPACITY_LIMIT_FACTOR_CONFIG_NAME
Hard capacity limit of cache, will reject any put if size > this * acceptable- See Also:
-
LRU_SINGLE_PERCENTAGE_CONFIG_NAME
- See Also:
-
LRU_MULTI_PERCENTAGE_CONFIG_NAME
- See Also:
-
LRU_MEMORY_PERCENTAGE_CONFIG_NAME
- See Also:
-
LRU_IN_MEMORY_FORCE_MODE_CONFIG_NAME
Configuration key to force data-block always (except in-memory are too much) cached in memory for in-memory hfile, unlike inMemory, which is a column-family configuration, inMemoryForceMode is a cluster-wide configuration- See Also:
-
DEFAULT_LOAD_FACTOR
- See Also:
-
DEFAULT_CONCURRENCY_LEVEL
- See Also:
-
DEFAULT_MIN_FACTOR
- See Also:
-
DEFAULT_ACCEPTABLE_FACTOR
- See Also:
-
DEFAULT_SINGLE_FACTOR
- See Also:
-
DEFAULT_MULTI_FACTOR
- See Also:
-
DEFAULT_MEMORY_FACTOR
- See Also:
-
DEFAULT_HARD_CAPACITY_LIMIT_FACTOR
- See Also:
-
DEFAULT_IN_MEMORY_FORCE_MODE
- See Also:
-
STAT_THREAD_PERIOD
- See Also:
-
LRU_MAX_BLOCK_SIZE
- See Also:
-
DEFAULT_MAX_BLOCK_SIZE
- See Also:
-
map
Defined the cache map asConcurrentHashMap
here, because ingetBlock(org.apache.hadoop.hbase.io.hfile.BlockCacheKey, boolean, boolean, boolean)
, we need to guarantee the atomicity of map#k (key, func). Besides, the func method must execute exactly once only when the key is present and under the lock context, otherwise the reference count will be messed up. Notice that theConcurrentSkipListMap
can not guarantee that. Some code using #computeIfPresent also expects the supplier to be executed only once. ConcurrentHashMap can guarantee that. Other types may not. -
evictionLock
Eviction lock (locked when eviction in process) -
maxBlockSize
-
evictionInProgress
Volatile boolean to track if we are in an eviction process or not -
evictionThread
Eviction thread -
scheduleThreadPool
Statistics thread schedule pool (for heavy debugging, could remove) -
size
Current size of cache -
dataBlockSize
Current size of data blocks -
indexBlockSize
Current size of index blocks -
bloomBlockSize
Current size of bloom blocks -
elements
Current number of cached elements -
dataBlockElements
Current number of cached data block elements -
indexBlockElements
Current number of cached index block elements -
bloomBlockElements
Current number of cached bloom block elements -
count
Cache access count (sequential ID) -
hardCapacityLimitFactor
hard capacity limit -
stats
Cache statistics -
maxSize
Maximum allowable size of cache (block put if size > max, evict) -
blockSize
Approximate block size -
acceptableFactor
Acceptable size of cache (no evictions if size < acceptable) -
minFactor
Minimum threshold of cache (when evicting, evict until size < min) -
singleFactor
Single access bucket size -
multiFactor
Multiple access bucket size -
memoryFactor
In-memory bucket size -
overhead
Overhead of the structure itself -
forceInMemory
Whether in-memory hfile's data block has higher priority when evicting -
victimHandler
Where to send victims (blocks evicted/missing from the cache). This is used only when we use an external cache as L2. Note: See org.apache.hadoop.hbase.io.hfile.MemcachedBlockCache -
CACHE_FIXED_OVERHEAD
-
-
Constructor Details
-
LruBlockCache
Default constructor. Specify maximum size and expected average block size (approximation is fine).All other factors will be calculated based on defaults specified in this class.
- Parameters:
maxSize
- maximum size of cache, in bytesblockSize
- approximate size of each block, in bytes
-
LruBlockCache
Constructor used for testing. Allows disabling of the eviction thread. -
LruBlockCache
public LruBlockCache(long maxSize, long blockSize, boolean evictionThread, org.apache.hadoop.conf.Configuration conf) -
LruBlockCache
-
LruBlockCache
public LruBlockCache(long maxSize, long blockSize, boolean evictionThread, int mapInitialSize, float mapLoadFactor, int mapConcurrencyLevel, float minFactor, float acceptableFactor, float singleFactor, float multiFactor, float memoryFactor, float hardLimitFactor, boolean forceInMemory, long maxBlockSize) Configurable constructor. Use this constructor if not using defaults.- Parameters:
maxSize
- maximum size of this cache, in bytesblockSize
- expected average size of blocks, in bytesevictionThread
- whether to run evictions in a bg thread or notmapInitialSize
- initial size of backing ConcurrentHashMapmapLoadFactor
- initial load factor of backing ConcurrentHashMapmapConcurrencyLevel
- initial concurrency factor for backing CHMminFactor
- percentage of total size that eviction will evict untilacceptableFactor
- percentage of total size that triggers evictionsingleFactor
- percentage of total size for single-access blocksmultiFactor
- percentage of total size for multiple-access blocksmemoryFactor
- percentage of total size for in-memory blocks
-
-
Method Details
-
setVictimCache
Description copied from interface:FirstLevelBlockCache
Specifies the secondary cache. An entry that is evicted from this cache due to a size constraint will be inserted into the victim cache.- Specified by:
setVictimCache
in interfaceFirstLevelBlockCache
- Parameters:
victimCache
- the second level cache
-
setMaxSize
Description copied from interface:ResizableBlockCache
Sets the max heap size that can be used by the BlockCache.- Specified by:
setMaxSize
in interfaceResizableBlockCache
- Parameters:
maxSize
- The max heap size.
-
asReferencedHeapBlock
The block cached in LRUBlockCache will always be an heap block: on the one side, the heap access will be more faster then off-heap, the small index block or meta block cached in CombinedBlockCache will benefit a lot. on other side, the LRUBlockCache size is always calculated based on the total heap size, if caching an off-heap block in LRUBlockCache, the heap size will be messed up. Here we will clone the block into an heap block if it's an off-heap block, otherwise just use the original block. The key point is maintain the refCnt of the block (HBASE-22127):
1. if cache the cloned heap block, its refCnt is an totally new one, it's easy to handle;
2. if cache the original heap block, we're sure that it won't be tracked in ByteBuffAllocator's reservoir, if both RPC and LRUBlockCache release the block, then it can be garbage collected by JVM, so need a retain here.- Parameters:
buf
- the original block- Returns:
- an block with an heap memory backend.
-
cacheBlock
Cache the block with the specified name and buffer.It is assumed this will NOT be called on an already cached block. In rare cases (HBASE-8547) this can happen, for which we compare the buffer contents.
- Specified by:
cacheBlock
in interfaceBlockCache
- Parameters:
cacheKey
- block's cache keybuf
- block bufferinMemory
- if block is in-memory
-
assertCounterSanity
Sanity-checking for parity between actual block cache content and metrics. Intended only for use with TRACE level logging and -ea JVM. -
cacheBlock
Cache the block with the specified name and buffer.TODO after HBASE-22005, we may cache an block which allocated from off-heap, but our LRU cache sizing is based on heap size, so we should handle this in HBASE-22127. It will introduce an switch whether make the LRU on-heap or not, if so we may need copy the memory to on-heap, otherwise the caching size is based on off-heap.
- Specified by:
cacheBlock
in interfaceBlockCache
- Parameters:
cacheKey
- block's cache keybuf
- block buffer
-
updateSizeMetrics
Helper function that updates the local size counter and also updates any per-cf or per-blocktype metrics it can discern from givenLruCachedBlock
-
getBlock
public Cacheable getBlock(BlockCacheKey cacheKey, boolean caching, boolean repeat, boolean updateCacheMetrics) Get the buffer of the block with the specified name.- Specified by:
getBlock
in interfaceBlockCache
- Parameters:
cacheKey
- block's cache keycaching
- true if the caller caches blocks on cache missesrepeat
- Whether this is a repeat lookup for the same block (used to avoid double counting cache misses when doing double-check locking)updateCacheMetrics
- Whether to update cache metrics or not- Returns:
- buffer of specified cache key, or null if not in cache
-
containsBlock
Whether the cache contains block with specified cacheKey- Specified by:
containsBlock
in interfaceFirstLevelBlockCache
- Parameters:
cacheKey
- cache key for the block- Returns:
- true if contains the block
-
evictBlock
Description copied from interface:BlockCache
Evict block from cache.- Specified by:
evictBlock
in interfaceBlockCache
- Parameters:
cacheKey
- Block to evict- Returns:
- true if block existed and was evicted, false if not
-
evictBlocksByHfileName
Evicts all blocks for a specific HFile. This is an expensive operation implemented as a linear-time search through all blocks in the cache. Ideally this should be a search in a log-access-time map.This is used for evict-on-close to remove all blocks of a specific HFile.
- Specified by:
evictBlocksByHfileName
in interfaceBlockCache
- Returns:
- the number of blocks evicted
-
evictBlock
Evict the block, and it will be cached by the victim handler if exists && block may be read again later- Parameters:
evictedByEvictionProcess
- true if the given block is evicted by EvictionThread- Returns:
- the heap size of evicted block
-
runEviction
Multi-threaded call to run the eviction process. -
isEvictionInProgress
boolean isEvictionInProgress() -
getOverhead
long getOverhead() -
evict
void evict()Eviction method. -
toString
-
getMaxSize
Get the maximum size of this cache.- Specified by:
getMaxSize
in interfaceBlockCache
- Returns:
- max size in bytes
-
getCurrentSize
Description copied from interface:BlockCache
Returns the occupied size of the block cache, in bytes.- Specified by:
getCurrentSize
in interfaceBlockCache
- Returns:
- occupied space in cache, in bytes
-
getCurrentDataSize
Description copied from interface:BlockCache
Returns the occupied size of data blocks, in bytes.- Specified by:
getCurrentDataSize
in interfaceBlockCache
- Returns:
- occupied space in cache, in bytes
-
getCurrentIndexSize
-
getCurrentBloomSize
-
getFreeSize
Description copied from interface:BlockCache
Returns the free size of the block cache, in bytes.- Specified by:
getFreeSize
in interfaceBlockCache
- Returns:
- free space in cache, in bytes
-
size
Description copied from interface:BlockCache
Returns the total size of the block cache, in bytes.- Specified by:
size
in interfaceBlockCache
- Returns:
- size of cache, in bytes
-
getBlockCount
Description copied from interface:BlockCache
Returns the number of blocks currently cached in the block cache.- Specified by:
getBlockCount
in interfaceBlockCache
- Returns:
- number of blocks in the cache
-
getDataBlockCount
Description copied from interface:BlockCache
Returns the number of data blocks currently cached in the block cache.- Specified by:
getDataBlockCount
in interfaceBlockCache
- Returns:
- number of blocks in the cache
-
getIndexBlockCount
-
getBloomBlockCount
-
getEvictionThread
-
logStats
-
getStats
Get counter statistics for this cache.Includes: total accesses, hits, misses, evicted blocks, and runs of the eviction processes.
- Specified by:
getStats
in interfaceBlockCache
-
heapSize
Description copied from interface:HeapSize
Return the approximate 'exclusive deep size' of implementing object. Includes count of payload and hosting object sizings. -
calculateOverhead
-
iterator
Description copied from interface:BlockCache
Returns Iterator over the blocks in the cache.- Specified by:
iterator
in interfaceBlockCache
- Specified by:
iterator
in interfaceIterable<CachedBlock>
-
acceptableSize
long acceptableSize() -
minSize
-
singleSize
-
multiSize
-
memorySize
-
shutdown
Description copied from interface:BlockCache
Shutdown the cache.- Specified by:
shutdown
in interfaceBlockCache
-
clearCache
Clears the cache. Used in tests. -
getCachedFileNamesForTest
Used in testing. May be very inefficient.- Returns:
- the set of cached file names
-
getEncodingCountsForTest
-
getMapForTests
-
getBlockCaches
Description copied from interface:BlockCache
Returns The list of sub blockcaches that make up this one; returns null if no sub caches.- Specified by:
getBlockCaches
in interfaceBlockCache
-