Package org.apache.hadoop.hbase.io.hfile

Provides implementations of HFile and HFile BlockCache.

See: Description

Package org.apache.hadoop.hbase.io.hfile Description

Provides implementations of HFile and HFile BlockCache. Caches are configured (and instantiated) by CacheConfig. See head of the CacheConfig class for constants that define cache options and configuration keys to use setting cache options. Cache implementations include the default, native on-heap LruBlockCache, a SlabCache that can serve as an L2 for LruBlockCache (hosted inside the class DoubleBlockCache that caches blocks in BOTH L1 and L2, and on evict, moves from L1 to L2, etc), and a BucketCache that has a bunch of deploy formats including acting as a L2 for LruBlockCache -- when a block is evicted from LruBlockCache, it goes to the BucketCache and when we search a block, we look in both places -- or using CombinedBlockCache, as a host for data blocks with meta blocks in the LRUBlockCache as well as onheap, offheap, and file options.

Which BlockCache should I use?

BucketCache has seen more production deploys and has more deploy options. Fetching will always be slower when fetching from BucketCache but latencies tend to be less erratic over time (roughly because GC is less). SlabCache tends to do more GCs as blocks are moved between L1 and L2 always, at least given the way DoubleBlockCache currently works. It is tough doing an apples to apples compare since their hosting classes, CombinedBlockCache for BucketCache vs DoubleBlockCache operate so differently. See Nick Dimiduk's BlockCache 101 for some numbers. See also the description of HBASE-7404 where Chunhui Shen lists issues he found with BlockCache (inefficent use of memory, doesn't help w/ GC).

Enabling SlabCache

SlabCache is the original offheap block cache but unfortunately has seen little use. It is originally described in Caching in Apache HBase: SlabCache.To enable it, set the float hbase.offheapcache.percentage (CacheConfig.SLAB_CACHE_OFFHEAP_PERCENTAGE_KEY) to some value between 0 and 1 in your hbase-site.xml file. This enables DoubleBlockCache, a facade over LruBlockCache and SlabCache. DoubleBlockCache works as follows. When caching, it "...attempts to cache the block in both caches, while readblock reads first from the faster onheap cache before looking for the block in the off heap cache. Metrics are the combined size and hits and misses of both caches." The value set in hbase.offheapcache.percentage will be multiplied by whatever the setting for -XX:MaxDirectMemorySize is in your hbase-env.sh configuration file and this is what will be used by SlabCache as its offheap store. Onheap store will be whatever the float HConstants.HFILE_BLOCK_CACHE_SIZE_KEY setting is (some value between 0 and 1) times the size of the allocated java heap.

Restart (or rolling restart) your cluster for the configs to take effect. Check logs to ensure your configurations came out as expected.

Enabling BucketCache

Ensure the SlabCache config hbase.offheapcache.percentage is not set (or set to 0). At this point, it is probably best to read the code to learn the list of bucket cache options and how they combine (to be fixed). Read the options and defaults for BucketCache in the head of the CacheConfig.

Here is a simple example of how to enable a 4G offheap bucket cache with 1G onheap cache. The onheap/offheap caches are managed by CombinedBlockCache by default. For the CombinedBlockCache (from the class comment), "The smaller lruCache is used to cache bloom blocks and index blocks, the larger bucketCache is used to cache data blocks. getBlock reads first from the smaller lruCache before looking for the block in the bucketCache. Metrics are the combined size and hits and misses of both caches." To disable CombinedBlockCache and have the BucketCache act as a strict L2 cache to the L1 LruBlockCache (i.e. on eviction from L1, blocks go to L2), set CacheConfig.BUCKET_CACHE_COMBINED_KEY to false. Also by default, unless you change it, CacheConfig.BUCKET_CACHE_COMBINED_PERCENTAGE_KEY defaults to 0.9 (see the top of the CacheConfig in the BucketCache defaults section). This means that whatever size you set for the bucket cache with CacheConfig.BUCKET_CACHE_SIZE_KEY, 90% will be used for offheap and 10% of the size will be used by the onheap LruBlockCache.

Back to the example of setting an onheap cache of 1G and ofheap of 4G, in hbase-env.sh ensure the java option -XX:MaxDirectMemorySize is enabled and 5G in size: e.g. -XX:MaxDirectMemorySize=5G. Then in hbase-site.xml add the following configurations:

<property>
  <name>hbase.bucketcache.ioengine</name>
  <value>offheap</value>
</property>
<property>
  <name>hbase.bucketcache.percentage.in.combinedcache</name>
  <value>0.8</value>
</property>
<property>
  <name>hbase.bucketcache.size</name>
  <value>5120</value>
</property>
. Above we set a cache of 5G, 80% of which will be offheap (4G) and 1G onheap. Restart (or rolling restart) your cluster for the configs to take effect. Check logs to ensure your configurations came out as expected.

Copyright © 2014 The Apache Software Foundation. All rights reserved.