Class ServerCrashProcedure
- All Implemented Interfaces:
Comparable<Procedure<MasterProcedureEnv>>,ServerProcedureInterface
- Direct Known Subclasses:
HBCKServerCrashProcedure
The procedure flow varies dependent on whether meta is assigned and if we are to split logs.
We come in here after ServerManager has noticed a server has expired. Procedures queued on the rpc should have been notified about fail and should be concurrently getting themselves ready to assign elsewhere.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.hadoop.hbase.procedure2.StateMachineProcedure
StateMachineProcedure.FlowNested classes/interfaces inherited from class org.apache.hadoop.hbase.procedure2.Procedure
Procedure.LockStateNested classes/interfaces inherited from interface org.apache.hadoop.hbase.master.procedure.ServerProcedureInterface
ServerProcedureInterface.ServerOperationType -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate booleanprivate org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashStatestatic final booleanDefault value ofMASTER_SCP_RETAIN_ASSIGNMENTprivate static final org.slf4j.Loggerstatic final StringConfiguration parameter to enable/disable the retain region assignment during ServerCrashProcedure.private booleanWhether DeadServer knows that we are processing it.private List<RegionInfo>Regions that were on the crashed server.private ServerNameName of the crashed server to process.private booleanprivate MonitoredTaskFields inherited from class org.apache.hadoop.hbase.procedure2.StateMachineProcedure
stateCountFields inherited from class org.apache.hadoop.hbase.procedure2.Procedure
NO_PROC_ID, NO_TIMEOUT -
Constructor Summary
ConstructorsConstructorDescriptionUsed when deserializing from a procedure store; we'll construct one of these then call #deserializeStateData(InputStream).ServerCrashProcedure(MasterProcedureEnv env, ServerName serverName, boolean shouldSplitWal, boolean carryingMeta) Call this constructor queuing up a Procedure. -
Method Summary
Modifier and TypeMethodDescriptionprotected booleanabort(MasterProcedureEnv env) The abort() call is asynchronous and each procedure must decide how to deal with it, if they want to be abortable.protected Procedure.LockStateThe user should override this method if they need a lock on an Entity.private voidassignRegions(MasterProcedureEnv env, List<RegionInfo> regions) Assign the regions on the crashed RS to other Rses.private voidprivate Procedure[]createSplittingWalProcedures(MasterProcedureEnv env, boolean splitMeta) protected voiddeserializeStateData(ProcedureStateSerializer serializer) Called on store load to allow the user to decode the previously serialized state.protected StateMachineProcedure.FlowexecuteFromState(MasterProcedureEnv env, org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState state) called to perform a single step of the specified 'state' of the procedureprivate booleanprotected org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashStateReturn the initial state object that will be used for the first call to executeFromState().protected ProcedureMetricsOverride this method to provide procedure specific counters for submitted count, failed count and time histogram.(package private) List<RegionInfo>Returns List of Regions on crashed server.Returns Name of this server instance.Given an operation type we can take decisions about what to do with pending operations.protected org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashStategetState(int stateId) Convert an ordinal (or state id) to an Enum (or more descriptive) state object.protected intgetStateId(org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState state) Convert the Enum (or more descriptive) state object to an ordinal (or state id).booleanReturns True if this server has an hbase:meta table region.protected booleanUsed to keep the procedure lock even when the procedure is yielding or suspended.private booleanbooleanprotected booleanMoved out here so can be overridden by the HBCK fix-up SCP to be less strict about what it will tolerate as a 'match'.private booleanisSplittingDone(MasterProcedureEnv env, boolean splitMeta) protected voidThe user should override this method, and release lock if necessary.protected voidrollbackState(MasterProcedureEnv env, org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState state) called to perform the rollback of the specified stateprotected voidserializeStateData(ProcedureStateSerializer serializer) The user-level code of the procedure may have some state to persist (e.g.protected booleanBy default, the executor will keep the procedure result around util the eviction TTL is expired.voidExtend the toString() information with the procedure details e.g.(package private) voidupdateProgress(boolean updateState) static voidupdateProgress(MasterProcedureEnv env, long parentId) private voidSplit logs using 'classic' zk-based coordination.private voidSplit hbase:meta logs using 'classic' zk-based coordination.Methods inherited from class org.apache.hadoop.hbase.procedure2.StateMachineProcedure
addChildProcedure, execute, failIfAborted, getCurrentState, getCurrentStateId, getCycles, isEofState, isRollbackSupported, isRollbackSupported, isYieldAfterExecutionStep, isYieldBeforeExecuteFromState, rollback, setNextState, toStringStateMethods inherited from class org.apache.hadoop.hbase.procedure2.Procedure
addStackIndex, afterReplay, beforeReplay, bypass, compareTo, completionCleanup, doExecute, doRollback, elapsedTime, getChildrenLatch, getException, getLastUpdate, getNonceKey, getOwner, getParentProcId, getProcId, getProcIdHashCode, getResult, getRootProcedureId, getRootProcId, getStackIndexes, getState, getSubmittedTime, getTimeout, getTimeoutTimestamp, hasChildren, hasException, hasLock, hasOwner, hasParent, hasTimeout, haveSameParent, incChildrenLatch, isBypass, isFailed, isFinished, isInitializing, isLockedWhenLoading, isRunnable, isSuccess, isWaiting, removeStackIndex, setAbortFailure, setChildrenLatch, setExecuted, setFailure, setFailure, setLastUpdate, setNonceKey, setOwner, setOwner, setParentProcId, setProcId, setResult, setRootProcId, setStackIndexes, setState, setSubmittedTime, setTimeout, setTimeoutFailure, skipPersistence, suspend, toString, toStringClass, toStringDetails, toStringSimpleSB, updateMetricsOnFinish, updateMetricsOnSubmit, updateTimestamp, waitInitialized, wasExecuted
-
Field Details
-
LOG
-
MASTER_SCP_RETAIN_ASSIGNMENT
Configuration parameter to enable/disable the retain region assignment during ServerCrashProcedure.By default retain assignment is disabled which makes the failover faster and improve the availability; useful for cloud scenario where region block locality is not important. Enable this when RegionServers are deployed on same host where Datanode are running, this will improve read performance due to local read.
see HBASE-24900 for more details.
- See Also:
-
DEFAULT_MASTER_SCP_RETAIN_ASSIGNMENT
Default value ofMASTER_SCP_RETAIN_ASSIGNMENT- See Also:
-
serverName
Name of the crashed server to process. -
notifiedDeadServer
Whether DeadServer knows that we are processing it. -
regionsOnCrashedServer
Regions that were on the crashed server. -
carryingMeta
-
shouldSplitWal
-
status
-
currentRunningState
private org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState currentRunningState
-
-
Constructor Details
-
ServerCrashProcedure
public ServerCrashProcedure(MasterProcedureEnv env, ServerName serverName, boolean shouldSplitWal, boolean carryingMeta) Call this constructor queuing up a Procedure.- Parameters:
serverName- Name of the crashed server.shouldSplitWal- True if we should split WALs as part of crashed server processing.carryingMeta- True if carrying hbase:meta table region.
-
ServerCrashProcedure
public ServerCrashProcedure()Used when deserializing from a procedure store; we'll construct one of these then call #deserializeStateData(InputStream). Do not use directly.
-
-
Method Details
-
isInRecoverMetaState
-
executeFromState
protected StateMachineProcedure.Flow executeFromState(MasterProcedureEnv env, org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState state) throws ProcedureSuspendedException, ProcedureYieldException Description copied from class:StateMachineProcedurecalled to perform a single step of the specified 'state' of the procedure- Specified by:
executeFromStatein classStateMachineProcedure<MasterProcedureEnv,org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState> state- state to execute- Returns:
- Flow.NO_MORE_STATE if the procedure is completed, Flow.HAS_MORE_STATE if there is another step.
- Throws:
ProcedureSuspendedExceptionProcedureYieldException
-
getRegionsOnCrashedServer
Returns List of Regions on crashed server. -
cleanupSplitDir
-
isSplittingDone
-
createSplittingWalProcedures
private Procedure[] createSplittingWalProcedures(MasterProcedureEnv env, boolean splitMeta) throws IOException - Throws:
IOException
-
filterDefaultMetaRegions
-
isDefaultMetaRegion
-
zkCoordinatedSplitMetaLogs
Split hbase:meta logs using 'classic' zk-based coordination. Superceded by procedure-based WAL splitting.- Throws:
IOException- See Also:
-
zkCoordinatedSplitLogs
Split logs using 'classic' zk-based coordination. Superceded by procedure-based WAL splitting.- Throws:
IOException- See Also:
-
updateProgress
-
rollbackState
protected void rollbackState(MasterProcedureEnv env, org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState state) throws IOException Description copied from class:StateMachineProcedurecalled to perform the rollback of the specified state- Specified by:
rollbackStatein classStateMachineProcedure<MasterProcedureEnv,org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState> state- state to rollback- Throws:
IOException- temporary failure, the rollback will retry later
-
getState
protected org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState getState(int stateId) Description copied from class:StateMachineProcedureConvert an ordinal (or state id) to an Enum (or more descriptive) state object.- Specified by:
getStatein classStateMachineProcedure<MasterProcedureEnv,org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState> - Parameters:
stateId- the ordinal() of the state enum (or state id)- Returns:
- the state enum object
-
getStateId
protected int getStateId(org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState state) Description copied from class:StateMachineProcedureConvert the Enum (or more descriptive) state object to an ordinal (or state id).- Specified by:
getStateIdin classStateMachineProcedure<MasterProcedureEnv,org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState> - Parameters:
state- the state enum object- Returns:
- stateId the ordinal() of the state enum (or state id)
-
getInitialState
protected org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState getInitialState()Description copied from class:StateMachineProcedureReturn the initial state object that will be used for the first call to executeFromState().- Specified by:
getInitialStatein classStateMachineProcedure<MasterProcedureEnv,org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState> - Returns:
- the initial state enum object
-
abort
Description copied from class:ProcedureThe abort() call is asynchronous and each procedure must decide how to deal with it, if they want to be abortable. The simplest implementation is to have an AtomicBoolean set in the abort() method and then the execute() will check if the abort flag is set or not. abort() may be called multiple times from the client, so the implementation must be idempotent.NOTE: abort() is not like Thread.interrupt(). It is just a notification that allows the procedure implementor abort.
- Overrides:
abortin classStateMachineProcedure<MasterProcedureEnv,org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState>
-
acquireLock
Description copied from class:ProcedureThe user should override this method if they need a lock on an Entity. A lock can be anything, and it is up to the implementor. The Procedure Framework will call this method just before it invokesProcedure.execute(Object). It callsProcedure.releaseLock(Object)after the call to execute. If you need to hold the lock for the life of the Procedure -- i.e. you do not want any other Procedure interfering while this Procedure is running, seeProcedure.holdLock(Object). Example: in our Master we can execute request in parallel for different tables. We can create t1 and create t2 and these creates can be executed at the same time. Anything else on t1/t2 is queued waiting that specific table create to happen. There are 3 LockState:- LOCK_ACQUIRED should be returned when the proc has the lock and the proc is ready to execute.
- LOCK_YIELD_WAIT should be returned when the proc has not the lock and the framework should take care of readding the procedure back to the runnable set for retry
- LOCK_EVENT_WAIT should be returned when the proc has not the lock and someone will take care of readding the procedure back to the runnable set when the lock is available.
- Overrides:
acquireLockin classProcedure<MasterProcedureEnv>- Returns:
- the lock state as described above.
-
releaseLock
Description copied from class:ProcedureThe user should override this method, and release lock if necessary.- Overrides:
releaseLockin classProcedure<MasterProcedureEnv>
-
toStringClassDetails
Description copied from class:ProcedureExtend the toString() information with the procedure details e.g. className and parameters- Overrides:
toStringClassDetailsin classProcedure<MasterProcedureEnv>- Parameters:
sb- the string builder to use to append the proc specific information
-
getProcName
- Overrides:
getProcNamein classProcedure<MasterProcedureEnv>
-
serializeStateData
Description copied from class:ProcedureThe user-level code of the procedure may have some state to persist (e.g. input arguments or current position in the processing state) to be able to resume on failure.- Overrides:
serializeStateDatain classStateMachineProcedure<MasterProcedureEnv,org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState> - Parameters:
serializer- stores the serializable state- Throws:
IOException
-
deserializeStateData
Description copied from class:ProcedureCalled on store load to allow the user to decode the previously serialized state.- Overrides:
deserializeStateDatain classStateMachineProcedure<MasterProcedureEnv,org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProcedureProtos.ServerCrashState> - Parameters:
serializer- contains the serialized state- Throws:
IOException
-
getServerName
Description copied from interface:ServerProcedureInterfaceReturns Name of this server instance.- Specified by:
getServerNamein interfaceServerProcedureInterface
-
hasMetaTableRegion
Description copied from interface:ServerProcedureInterfaceReturns True if this server has an hbase:meta table region.- Specified by:
hasMetaTableRegionin interfaceServerProcedureInterface
-
getServerOperationType
Description copied from interface:ServerProcedureInterfaceGiven an operation type we can take decisions about what to do with pending operations. e.g. if we get a crash handler and we have some assignment operation pending we can abort those operations.- Specified by:
getServerOperationTypein interfaceServerProcedureInterface- Returns:
- the operation type that the procedure is executing.
-
shouldWaitClientAck
Description copied from class:ProcedureBy default, the executor will keep the procedure result around util the eviction TTL is expired. The client can cut down the waiting time by requesting that the result is removed from the executor. In case of system started procedure, we can force the executor to auto-ack.- Overrides:
shouldWaitClientAckin classProcedure<MasterProcedureEnv>- Parameters:
env- the environment passed to the ProcedureExecutor- Returns:
- true if the executor should wait the client ack for the result. Defaults to return true.
-
isMatchingRegionLocation
Moved out here so can be overridden by the HBCK fix-up SCP to be less strict about what it will tolerate as a 'match'.- Returns:
- True if the region location in
rsnmatches that of this crashed server.
-
assignRegions
Assign the regions on the crashed RS to other Rses. In this method we will go through all the RegionStateNodes of the give regions to find out whether there is already an TRSP for the region, if so we interrupt it and let it retry on other server, otherwise we will schedule a TRSP to bring the region online. We will also check whether the table for a region is enabled, if not, we will skip assigning it.- Throws:
IOException
-
getProcedureMetrics
Description copied from class:ProcedureOverride this method to provide procedure specific counters for submitted count, failed count and time histogram.- Overrides:
getProcedureMetricsin classProcedure<MasterProcedureEnv>- Parameters:
env- The environment passed to the procedure executor- Returns:
- Container object for procedure related metric
-
holdLock
Description copied from class:ProcedureUsed to keep the procedure lock even when the procedure is yielding or suspended.- Overrides:
holdLockin classProcedure<MasterProcedureEnv>- Returns:
- true if the procedure should hold on the lock until completionCleanup()
-
updateProgress
-