17.4. HBase Metrics

17.4.1. Metric Setup

See Metrics for an introduction and how to enable Metrics emission. Still valid for HBase 0.94.x.

For HBase 0.95.x and up, see http://hadoop.apache.org/docs/current/api/org/apache/hadoop/metrics2/package-summary.html

17.4.2. Warning To Ganglia Users

Warning to Ganglia Users: by default, HBase will emit a LOT of metrics per RegionServer which may swamp your installation. Options include either increasing Ganglia server capacity, or configuring HBase to emit fewer metrics.

17.4.3. Most Important RegionServer Metrics

17.4.3.1. blockCacheExpressCachingRatio (formerly blockCacheHitCachingRatio)

Block cache hit caching ratio (0 to 100). The cache-hit ratio for reads configured to look in the cache (i.e., cacheBlocks=true).

17.4.3.2. callQueueLength

Point in time length of the RegionServer call queue. If requests arrive faster than the RegionServer handlers can process them they will back up in the callQueue.

17.4.3.3. compactionQueueLength (formerly compactionQueueSize)

Point in time length of the compaction queue. This is the number of Stores in the RegionServer that have been targeted for compaction.

17.4.3.4. flushQueueSize

Point in time number of enqueued regions in the MemStore awaiting flush.

17.4.3.5. hdfsBlocksLocalityIndex

Point in time percentage of HDFS blocks that are local to this RegionServer. The higher the better.

17.4.3.6. memstoreSizeMB

Point in time sum of all the memstore sizes in this RegionServer (MB). Watch for this nearing or exceeding the configured high-watermark for MemStore memory in the RegionServer.

17.4.3.7. numberOfOnlineRegions

Point in time number of regions served by the RegionServer. This is an important metric to track for RegionServer-Region density.

17.4.3.8. readRequestsCount

Number of read requests for this RegionServer since startup. Note: this is a 32-bit integer and can roll.

17.4.3.9. slowHLogAppendCount

Number of slow HLog append writes for this RegionServer since startup, where "slow" is > 1 second. This is a good "canary" metric for HDFS.

17.4.3.10. usedHeapMB

Point in time amount of memory used by the RegionServer (MB).

17.4.3.11. writeRequestsCount

Number of write requests for this RegionServer since startup. Note: this is a 32-bit integer and can roll.

17.4.4. Other RegionServer Metrics

17.4.4.1. blockCacheCount

Point in time block cache item count in memory. This is the number of blocks of StoreFiles (HFiles) in the cache.

17.4.4.2. blockCacheEvictedCount

Number of blocks that had to be evicted from the block cache due to heap size constraints by RegionServer since startup.

17.4.4.3. blockCacheFreeMB

Point in time block cache memory available (MB).

17.4.4.4. blockCacheHitCount

Number of blocks of StoreFiles (HFiles) read from the cache by RegionServer since startup.

17.4.4.5. blockCacheHitRatio

Block cache hit ratio (0 to 100) from RegionServer startup. Includes all read requests, although those with cacheBlocks=false will always read from disk and be counted as a "cache miss", which means that full-scan MapReduce jobs can affect this metric significantly.

17.4.4.6. blockCacheMissCount

Number of blocks of StoreFiles (HFiles) requested but not read from the cache from RegionServer startup.

17.4.4.7. blockCacheSizeMB

Point in time block cache size in memory (MB). i.e., memory in use by the BlockCache

17.4.4.8. fsPreadLatency*

There are several filesystem positional read latency (ms) metrics, all measured from RegionServer startup.

17.4.4.9. fsReadLatency*

There are several filesystem read latency (ms) metrics, all measured from RegionServer startup. The issue with interpretation is that ALL reads go into this metric (e.g., single-record Gets, full table Scans), including reads required for compactions. This metric is only interesting "over time" when comparing major releases of HBase or your own code.

17.4.4.10. fsWriteLatency*

There are several filesystem write latency (ms) metrics, all measured from RegionServer startup. The issue with interpretation is that ALL writes go into this metric (e.g., single-record Puts, full table re-writes due to compaction). This metric is only interesting "over time" when comparing major releases of HBase or your own code.

17.4.4.11. NumberOfStores

Point in time number of Stores open on the RegionServer. A Store corresponds to a ColumnFamily. For example, if a table (which contains the column family) has 3 regions on a RegionServer, there will be 3 stores open for that column family.

17.4.4.12. NumberOfStorefiles

Point in time number of StoreFiles open on the RegionServer. A store may have more than one StoreFile (HFile).

17.4.4.13. requestsPerSecond

Point in time number of read and write requests. Requests correspond to RegionServer RPC calls, thus a single Get will result in 1 request, but a Scan with caching set to 1000 will result in 1 request for each 'next' call (i.e., not each row). A bulk-load request will constitute 1 request per HFile. This metric is less interesting than readRequestsCount and writeRequestsCount in terms of measuring activity due to this metric being periodic.

17.4.4.14. storeFileIndexSizeMB

Point in time sum of all the StoreFile index sizes in this RegionServer (MB)

comments powered by Disqus