Class StochasticLoadBalancer

java.lang.Object
org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer
org.apache.hadoop.hbase.master.balancer.StochasticLoadBalancer
All Implemented Interfaces:
ConfigurationObserver, LoadBalancer, Stoppable
Direct Known Subclasses:
CacheAwareLoadBalancer, FavoredStochasticBalancer

@LimitedPrivate("Configuration") public class StochasticLoadBalancer extends BaseLoadBalancer

This is a best effort load balancer. Given a Cost function F(C) => x It will randomly try and mutate the cluster to Cprime. If F(Cprime) < F(C) then the new cluster state becomes the plan. It includes costs functions to compute the cost of:

  • Region Load
  • Table Load
  • Data Locality
  • Memstore Sizes
  • Storefile Sizes

Every cost function returns a number between 0 and 1 inclusive; where 0 is the lowest cost best solution, and 1 is the highest possible cost and the worst solution. The computed costs are scaled by their respective multipliers:

  • hbase.master.balancer.stochastic.regionLoadCost
  • hbase.master.balancer.stochastic.moveCost
  • hbase.master.balancer.stochastic.tableLoadCost
  • hbase.master.balancer.stochastic.localityCost
  • hbase.master.balancer.stochastic.memstoreSizeCost
  • hbase.master.balancer.stochastic.storefileSizeCost

You can also add custom Cost function by setting the the following configuration value:

  • hbase.master.balancer.stochastic.additionalCostFunctions

All custom Cost Functions needs to extends CostFunction

In addition to the above configurations, the balancer can be tuned by the following configuration values:

  • hbase.master.balancer.stochastic.maxMoveRegions which controls what the max number of regions that can be moved in a single invocation of this balancer.
  • hbase.master.balancer.stochastic.stepsPerRegion is the coefficient by which the number of regions is multiplied to try and get the number of times the balancer will mutate all servers.
  • hbase.master.balancer.stochastic.maxSteps which controls the maximum number of times that the balancer will try and mutate all the servers. The balancer will use the minimum of this value and the above computation.

This balancer is best used with hbase.master.loadbalance.bytable set to false so that the balancer gets the full picture of all loads on the cluster.