Class StochasticLoadBalancer
- All Implemented Interfaces:
ConfigurationObserver,LoadBalancer,Stoppable
- Direct Known Subclasses:
CacheAwareLoadBalancer,FavoredStochasticBalancer
This is a best effort load balancer. Given a Cost function F(C) => x It will randomly try and mutate the cluster to Cprime. If F(Cprime) < F(C) then the new cluster state becomes the plan. It includes costs functions to compute the cost of:
- Region Load
- Table Load
- Data Locality
- Memstore Sizes
- Storefile Sizes
Every cost function returns a number between 0 and 1 inclusive; where 0 is the lowest cost best solution, and 1 is the highest possible cost and the worst solution. The computed costs are scaled by their respective multipliers:
- hbase.master.balancer.stochastic.regionLoadCost
- hbase.master.balancer.stochastic.moveCost
- hbase.master.balancer.stochastic.tableLoadCost
- hbase.master.balancer.stochastic.localityCost
- hbase.master.balancer.stochastic.memstoreSizeCost
- hbase.master.balancer.stochastic.storefileSizeCost
You can also add custom Cost function by setting the the following configuration value:
- hbase.master.balancer.stochastic.additionalCostFunctions
All custom Cost Functions needs to extends CostFunction
In addition to the above configurations, the balancer can be tuned by the following configuration values:
- hbase.master.balancer.stochastic.maxMoveRegions which controls what the max number of regions that can be moved in a single invocation of this balancer.
- hbase.master.balancer.stochastic.stepsPerRegion is the coefficient by which the number of regions is multiplied to try and get the number of times the balancer will mutate all servers.
- hbase.master.balancer.stochastic.maxSteps which controls the maximum number of times that the balancer will try and mutate all the servers. The balancer will use the minimum of this value and the above computation.
This balancer is best used with hbase.master.loadbalance.bytable set to false so that the balancer gets the full picture of all loads on the cluster.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final BalancerConditionalsprotected Map<Class<? extends CandidateGenerator>,CandidateGenerator> protected static final Stringprotected List<CostFunction>private double[]private doubleprotected static final intprotected static final longprotected static final intprotected static final floatprotected static final booleanprotected static final intprotected static final String(package private) Map<String,Deque<BalancerRegionLoad>> private LocalityBasedCandidateGeneratorprivate ServerLocalityCostFunctionprivate static final org.slf4j.Loggerprotected static final Stringprotected static final Stringprivate longprivate intprotected static final Stringprivate floatprivate intstatic final Stringprivate RackLocalityCostFunction(package private) Map<String,Pair<ServerName, Float>> private RegionReplicaHostCostFunctionprivate RegionReplicaRackCostFunctionprotected static final Stringprivate booleanprotected final Supplier<List<Class<? extends CandidateGenerator>>>protected static final Stringprivate intprivate floatprivate static final Stringprivate double[]private final Map<Class<? extends CandidateGenerator>,Double> Fields inherited from class org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer
BALANCER_DECISION_BUFFER_ENABLED, BALANCER_REJECTION_BUFFER_ENABLED, clusterStatus, DEFAULT_BALANCER_DECISION_BUFFER_ENABLED, DEFAULT_BALANCER_REJECTION_BUFFER_ENABLED, DEFAULT_HBASE_MASTER_LOADBALANCE_BYTABLE, isByTable, masterServerName, metricsBalancer, MIN_SERVER_BALANCE, provider, rackManager, regionFinder, REGIONS_SLOP_DEFAULT, REGIONS_SLOP_KEY, slop, useRegionFinderFields inherited from interface org.apache.hadoop.hbase.master.LoadBalancer
BOGUS_SERVER_NAME, HBASE_RSGROUP_LOADBALANCER_CLASS, MOVE_THROTTLING, MOVE_THROTTLING_DEFAULT -
Constructor Summary
ConstructorsConstructorDescriptionThe constructor that pass a MetricsStochasticBalancer to BaseLoadBalancer to replace its default MetricsBalancerStochasticLoadBalancer(MetricsStochasticBalancer metricsStochasticBalancer) -
Method Summary
Modifier and TypeMethodDescriptionprivate voidaddCostFunction(List<CostFunction> costFunctions, CostFunction costFunction) private booleanprivate booleanprotected List<RegionPlan>balanceTable(TableName tableName, Map<ServerName, List<RegionInfo>> loadOfOneTable) Given the cluster state this will try and approach an optimal balance.private longcalculateMaxSteps(BalancerClusterState cluster) (package private) static StringcomposeAttributeName(String tableName, String costFunctionName) A helper function to compose the attribute name from tablename and costfunction name(package private) doublecomputeCost(BalancerClusterState cluster, double previousCost) This is the main cost function.protected Map<Class<? extends CandidateGenerator>,CandidateGenerator> createCandidateGenerators(org.apache.hadoop.conf.Configuration conf) private static CostFunctioncreateCostFunction(Class<? extends CostFunction> clazz, org.apache.hadoop.conf.Configuration conf) protected List<CostFunction>createCostFunctions(org.apache.hadoop.conf.Configuration conf) private List<RegionPlan>createRegionPlans(BalancerClusterState cluster) Create all of the RegionPlan's needed to move from the initial cluster state to the desired state.protected Stringprivate StringgetBalanceReason(double total, double sumMultiplier) (package private) Map<Class<? extends CandidateGenerator>,CandidateGenerator> (package private) String[]Get the names of the cost functions(package private) List<CostFunction>protected CandidateGeneratorgetRandomGenerator(BalancerClusterState cluster) Select the candidate generator to use based on the cost of cost functions.(package private) voidinitCosts(BalancerClusterState cluster) protected voidloadConf(org.apache.hadoop.conf.Configuration conf) private voidloadCustomCostFunctions(org.apache.hadoop.conf.Configuration conf) (package private) booleanneedsBalance(TableName tableName, BalancerClusterState cluster) (package private) Pair<CandidateGenerator,BalanceAction> nextAction(BalancerClusterState cluster) private CandidateGeneratorpickAnyGenerator(List<Class<? extends CandidateGenerator>> generatorClasses) private voidsendRegionPlansToRingBuffer(List<RegionPlan> plans, double currentCost, double initCost, String initFunctionTotalCosts, long step) protected voidsendRejectionReasonToRingBuffer(Supplier<String> reason, List<CostFunction> costFunctions) (package private) voidsetRackManager(RackManager rackManager) private StringvoidupdateBalancerLoadInfo(Map<TableName, Map<ServerName, List<RegionInfo>>> loadOfAllTable) In some scenarios, Balancer needs to update internal status or information according to the current tables loadprivate voidupdateBalancerTableLoadInfo(TableName tableName, Map<ServerName, List<RegionInfo>> loadOfOneTable) voidSet the current cluster status.(package private) voidupdateCostsAndWeightsWithAction(BalancerClusterState cluster, BalanceAction action) Update both the costs of costfunctions and the weights of candidate generators(package private) voidupdateMetricsSize(int size) Update the number of metrics that are reported to JMXprivate voidStore the current region loads.private voidupdateStochasticCosts(TableName tableName, double overall, double[] subCosts) update costs to JMXMethods inherited from class org.apache.hadoop.hbase.master.balancer.BaseLoadBalancer
balanceCluster, getConf, getDefaultSlop, idleRegionServerExist, initialize, isStopped, onConfigurationChange, postMasterStartupInitialize, preBalanceCluster, randomAssignment, regionOffline, regionOnline, retainAssignment, roundRobinAssignment, setClusterInfoProvider, sloppyRegionServerExist, stop, toEnsumbleTableLoad, updateBalancerStatusMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.hadoop.hbase.master.LoadBalancer
throttle
-
Field Details
-
LOG
-
STEPS_PER_REGION_KEY
- See Also:
-
DEFAULT_STEPS_PER_REGION
- See Also:
-
MAX_STEPS_KEY
- See Also:
-
DEFAULT_MAX_STEPS
- See Also:
-
RUN_MAX_STEPS_KEY
- See Also:
-
DEFAULT_RUN_MAX_STEPS
- See Also:
-
MAX_RUNNING_TIME_KEY
- See Also:
-
DEFAULT_MAX_RUNNING_TIME
- See Also:
-
KEEP_REGION_LOADS
- See Also:
-
DEFAULT_KEEP_REGION_LOADS
- See Also:
-
TABLE_FUNCTION_SEP
- See Also:
-
MIN_COST_NEED_BALANCE_KEY
- See Also:
-
DEFAULT_MIN_COST_NEED_BALANCE
- See Also:
-
COST_FUNCTIONS_COST_FUNCTIONS_KEY
- See Also:
-
OVERALL_COST_FUNCTION_NAME
- See Also:
-
loads
-
maxSteps
-
runMaxSteps
-
stepsPerRegion
-
maxRunningTime
-
numRegionLoadsToRemember
-
minCostNeedBalance
-
regionCacheRatioOnOldServerMap
-
costFunctions
-
sumMultiplier
-
curOverallCost
-
tempFunctionCosts
-
curFunctionCosts
-
localityCandidateGenerator
-
localityCost
-
rackLocalityCost
-
regionReplicaHostCostFunction
-
regionReplicaRackCostFunction
-
weightsOfGenerators
-
candidateGenerators
-
shuffledGeneratorClasses
-
balancerConditionals
-
-
Constructor Details
-
StochasticLoadBalancer
public StochasticLoadBalancer()The constructor that pass a MetricsStochasticBalancer to BaseLoadBalancer to replace its default MetricsBalancer -
StochasticLoadBalancer
-
-
Method Details
-
createCostFunction
private static CostFunction createCostFunction(Class<? extends CostFunction> clazz, org.apache.hadoop.conf.Configuration conf) -
loadCustomCostFunctions
-
getCandidateGenerators
Map<Class<? extends CandidateGenerator>,CandidateGenerator> getCandidateGenerators() -
createCandidateGenerators
protected Map<Class<? extends CandidateGenerator>,CandidateGenerator> createCandidateGenerators(org.apache.hadoop.conf.Configuration conf) -
createCostFunctions
-
loadConf
- Overrides:
loadConfin classBaseLoadBalancer
-
updateClusterMetrics
Description copied from interface:LoadBalancerSet the current cluster status. This allows a LoadBalancer to map host name to a server- Specified by:
updateClusterMetricsin interfaceLoadBalancer- Overrides:
updateClusterMetricsin classBaseLoadBalancer
-
updateBalancerTableLoadInfo
private void updateBalancerTableLoadInfo(TableName tableName, Map<ServerName, List<RegionInfo>> loadOfOneTable) -
updateBalancerLoadInfo
Description copied from interface:LoadBalancerIn some scenarios, Balancer needs to update internal status or information according to the current tables load- Parameters:
loadOfAllTable- region load of servers for all table
-
updateMetricsSize
Update the number of metrics that are reported to JMX -
areSomeRegionReplicasColocatedOnHost
-
areSomeRegionReplicasColocatedOnRack
-
getBalanceReason
-
needsBalance
-
nextAction
-
getRandomGenerator
Select the candidate generator to use based on the cost of cost functions. The chance of selecting a candidate generator is proportional to the share of cost of all cost functions among all cost functions that benefit from it. -
pickAnyGenerator
private CandidateGenerator pickAnyGenerator(List<Class<? extends CandidateGenerator>> generatorClasses) -
setRackManager
-
calculateMaxSteps
-
balanceTable
protected List<RegionPlan> balanceTable(TableName tableName, Map<ServerName, List<RegionInfo>> loadOfOneTable) Given the cluster state this will try and approach an optimal balance. This should always approach the optimal state given enough steps.- Specified by:
balanceTablein classBaseLoadBalancer- Parameters:
tableName- the table to be balancedloadOfOneTable- region load of servers for the specific one table- Returns:
- List of plans
-
sendRejectionReasonToRingBuffer
protected void sendRejectionReasonToRingBuffer(Supplier<String> reason, List<CostFunction> costFunctions) -
sendRegionPlansToRingBuffer
private void sendRegionPlansToRingBuffer(List<RegionPlan> plans, double currentCost, double initCost, String initFunctionTotalCosts, long step) -
updateStochasticCosts
update costs to JMX -
addCostFunction
-
functionCost
-
getCostFunctions
-
totalCostsPerFunc
-
createRegionPlans
Create all of the RegionPlan's needed to move from the initial cluster state to the desired state.- Parameters:
cluster- The state of the cluster- Returns:
- List of RegionPlan's that represent the moves needed to get to desired final state.
-
updateRegionLoad
Store the current region loads. -
initCosts
-
updateCostsAndWeightsWithAction
Update both the costs of costfunctions and the weights of candidate generators -
getCostFunctionNames
Get the names of the cost functions -
computeCost
This is the main cost function. It will compute a cost associated with a proposed cluster state. All different costs will be combined with their multipliers to produce a double cost.- Parameters:
cluster- The state of the clusterpreviousCost- the previous cost. This is used as an early out.- Returns:
- a double of a cost associated with the proposed cluster state. This cost is an aggregate of all individual cost functions.
-
composeAttributeName
A helper function to compose the attribute name from tablename and costfunction name
-