In his presentation, Avoiding Full GCs
with MemStore-Local Allocation Buffers, Todd Lipcon describes two cases of
stop-the-world garbage collections common in HBase, especially during loading; CMS failure
modes and old generation heap fragmentation brought. To address the first, start the CMS
earlier than default by adding
-XX:CMSInitiatingOccupancyFraction and setting
it down from defaults. Start at 60 or 70 percent (The lower you bring down the threshold,
the more GCing is done, the more CPU used). To address the second fragmentation issue,
Todd added an experimental facility, , that
must be explicitly enabled in Apache HBase 0.90.x (Its defaulted to be on in Apache 0.92.x
hbase.hregion.memstore.mslab.enabled to true in your
Configuration. See the cited slides for background and detail. Be aware that when enabled, each MemStore instance will occupy at least an
MSLAB instance of memory. If you have thousands of regions or lots of regions each with
many column families, this allocation of MSLAB may be responsible for a good portion of
your heap allocation and in an extreme case cause you to OOME. Disable MSLAB in this case,
or lower the amount of memory it uses or float less regions per server.
If you have a write-heavy workload, check out HBASE-8163
MemStoreChunkPool: An improvement for JAVA GC when using MSLAB. It describes
configurations to lower the amount of young GC during write-heavy loadings. If you do not
have HBASE-8163 installed, and you are trying to improve your young GC times, one trick to
consider -- courtesy of our Liang Xie -- is to set the GC config
hbase-env.sh to be
just smaller than the size of
MSLAB allocations happen in the tenured space directly rather than first in the young gen.
You'd do this because these MSLAB allocations are going to likely make it to the old gen
anyways and rather than pay the price of a copies between s0 and s1 in eden space followed
by the copy up from young to old gen after the MSLABs have achieved sufficient tenure,
save a bit of YGC churn and allocate in the old gen directly.
For more information about GC logs, see Section 15.2.3, “JVM Garbage Collection Logs”.
Consider also enabling the offheap Block Cache. This has been shown to mitigate GC pause times. See Section 9.6.4, “Block Cache”
 The latest jvms do better regards fragmentation so make sure you are running a recent release. Read down in the message, Identifying concurrent mode failures caused by fragmentation.