Class ServerManager
- All Implemented Interfaces:
ConfigurationObserver
Maintains lists of online and dead servers. Processes the startups, shutdowns, and deaths of region servers.
Servers are distinguished in two different ways. A given server has a location, specified by hostname and port, and of which there can only be one online at any given time. A server instance is specified by the location (hostname and port) as well as the startcode (timestamp from when the server was started). This is used to differentiate a restarted instance of a given server from the original instance.
If a sever is known not to be running any more, it is called dead. The dead server needs to be handled by a ServerShutdownHandler. If the handler is not enabled yet, the server can't be handled right away so it is queued up. After the handler is enabled, the server will be submitted to a handler to handle. However, the handler may be just partially enabled. If so, the server cannot be fully processed, and be queued up for further processing. A server is fully processed only after the handler is fully enabled and has completed the handling.
-
Nested Class Summary
Modifier and TypeClassDescriptionprivate class
static enum
-
Field Summary
Modifier and TypeFieldDescriptionprivate AtomicBoolean
private final DeadServer
private final ArrayList<ServerName>
List of region servers that should not get any more new regions.static final String
static final int
private final ConcurrentNavigableMap<byte[],
Long> The last flushed sequence id for a region.private boolean
private static final String
File on hdfs to store last flushed sequence id of regionsprivate List<ServerListener>
Listeners that are called on server events.private static final org.slf4j.Logger
private final MasterServices
static final String
private final long
private final ConcurrentNavigableMap<ServerName,
ServerMetrics> Map of registered servers to their current loadstatic final String
see HBASE-20727 if set to true, flushedSequenceIdByRegion and storeFlushedSequenceIdsByRegion will be persisted to HDFS and loaded when master restart to speed up log splitstatic final boolean
private boolean
private boolean
Configured value of HConstants.REJECT_DECOMMISSIONED_HOSTS_KEYprivate final RegionServerList
private final ConcurrentNavigableMap<byte[],
ConcurrentNavigableMap<byte[], Long>> The last flushed sequence id for a store in a region.static final String
static final String
static final String
static final String
private final long
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionboolean
Add the server to the drain list.boolean
Checks if any dead servers are currently in progress.(package private) boolean
checkAndRecordNewServer
(ServerName serverName, ServerMetrics sl) Check is a server of same host and port already exists, if not, or the existed one got a smaller start code, record it.private void
checkClockSkew
(ServerName serverName, long serverCurrentTime) Checks if the clock skew between the server and the master.private void
checkIsDead
(ServerName serverName, String what) Called when RegionServer first reports in for duty and thereafter each time it heartbeats to make sure it is has not been figured for dead.private void
Checks if the Master is configured to reject decommissioned hosts or not.(package private) void
To clear any dead server with same host name and port of any online serverstatic void
closeRegionSilentlyAndWait
(AsyncClusterConnection connection, ServerName server, RegionInfo region, long timeout) Contacts a region server and waits up to timeout ms to close the region.int
Returns the count of active regionserversCallscreateDestinationServersList(java.util.List<org.apache.hadoop.hbase.ServerName>)
without server to exclude.createDestinationServersList
(List<ServerName> serversToExclude) Creates a list of possible destinations for a region.long
expireServer
(ServerName serverName) Expire the passed server.(package private) long
expireServer
(ServerName serverName, boolean force) (package private) void
findDeadServersAndProcess
(Set<ServerName> deadServersFromPE, Set<ServerName> liveServersFromWALDir) Find out the region servers crashed between the crash of the previous master instance and the current master instance and schedule SCP for them.findServerWithSameHostnamePortWithLock
(ServerName serverName) Assumes onlineServers is locked.double
Compute the average load across all region servers.Returns A copy of the internal list of draining servers.ConcurrentNavigableMap<byte[],
Long> int
getInfoPort
(ServerName serverName) org.apache.hadoop.hbase.shaded.protobuf.generated.ClusterStatusProtos.RegionStoreSequenceIds
getLastFlushedSequenceId
(byte[] encodedRegionName) getLoad
(ServerName serverName) Returns ServerMetrics if serverName is known else nullprivate int
Calculate min necessary to start.Returns Read-only map of servers to serverinfoReturns A copy of the internal list of online servers.getOnlineServersListWithPredicator
(List<ServerName> keys, Predicate<ServerMetrics> idleServerPredicator) boolean
getRejectDecommissionedHostsConfig
(org.apache.hadoop.conf.Configuration conf) Reads the value of HConstants.REJECT_DECOMMISSIONED_HOSTS_KEY from the config and returns itprivate String
getStrForMax
(int max) getVersion
(ServerName serverName) May return "0.0.0" when server is not onlineint
getVersionNumber
(ServerName serverName) May return 0 when server is not online.boolean
boolean
boolean
isServerDead
(ServerName serverName) Check if a server is known to be dead.isServerKnownAndOnline
(ServerName serverName) Returns whether the server is online, dead, or unknown.boolean
isServerOnline
(ServerName serverName) boolean
isServerUnknown
(ServerName serverName) Check if a server is unknown.(package private) void
void
Load last flushed sequence id of each region from HDFS, if persistedvoid
Called when server has expired.void
onConfigurationChange
(org.apache.hadoop.conf.Configuration conf) Implementation of the ConfigurationObserver interface.private void
Persist last flushed sequence id of each region to HDFS(package private) void
recordNewServerWithLock
(ServerName serverName, ServerMetrics sl) Adds the onlineServers list.void
(package private) ServerName
regionServerStartup
(org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos.RegionServerStartupRequest request, int versionNumber, String version, InetAddress ia) Let the server manager know a new regionserver has come onlinevoid
registerListener
(ServerListener listener) Add the listener to the notification list.void
Regions may have been removed between latest persist of FlushedSequenceIds and master abort.void
removeRegion
(RegionInfo regionInfo) Called by delete table and similar to notify the ServerManager that a region was removed.void
removeRegions
(List<RegionInfo> regions) Called by delete table and similar to notify the ServerManager that a region was removed.boolean
void
void
start chore in ServerManagervoid
stop()
Stop the ServerManager.boolean
unregisterListener
(ServerListener listener) Remove the listener from the notification list.private void
Updates last flushed sequence Ids for the regions on server snvoid
waitForRegionServers
(MonitoredTask status) Wait for the region servers to report in.
-
Field Details
-
WAIT_ON_REGIONSERVERS_MAXTOSTART
- See Also:
-
WAIT_ON_REGIONSERVERS_MINTOSTART
- See Also:
-
WAIT_ON_REGIONSERVERS_TIMEOUT
- See Also:
-
WAIT_ON_REGIONSERVERS_INTERVAL
- See Also:
-
PERSIST_FLUSHEDSEQUENCEID
see HBASE-20727 if set to true, flushedSequenceIdByRegion and storeFlushedSequenceIdsByRegion will be persisted to HDFS and loaded when master restart to speed up log split- See Also:
-
PERSIST_FLUSHEDSEQUENCEID_DEFAULT
- See Also:
-
FLUSHEDSEQUENCEID_FLUSHER_INTERVAL
- See Also:
-
FLUSHEDSEQUENCEID_FLUSHER_INTERVAL_DEFAULT
- See Also:
-
MAX_CLOCK_SKEW_MS
- See Also:
-
LOG
-
clusterShutdown
-
flushedSequenceIdByRegion
The last flushed sequence id for a region. -
persistFlushedSequenceId
-
isFlushSeqIdPersistInProgress
-
LAST_FLUSHED_SEQ_ID_FILE
File on hdfs to store last flushed sequence id of regions- See Also:
-
flushedSeqIdFlusher
-
storeFlushedSequenceIdsByRegion
private final ConcurrentNavigableMap<byte[],ConcurrentNavigableMap<byte[], storeFlushedSequenceIdsByRegionLong>> The last flushed sequence id for a store in a region. -
onlineServers
Map of registered servers to their current load -
drainingServers
List of region servers that should not get any more new regions. -
master
-
storage
-
deadservers
-
maxSkew
-
warningSkew
-
listeners
Listeners that are called on server events. -
rejectDecommissionedHostsConfig
Configured value of HConstants.REJECT_DECOMMISSIONED_HOSTS_KEY
-
-
Constructor Details
-
ServerManager
Constructor.
-
-
Method Details
-
onConfigurationChange
Implementation of the ConfigurationObserver interface. We are interested in live-loading the configuration value of HConstants.REJECT_DECOMMISSIONED_HOSTS_KEY- Specified by:
onConfigurationChange
in interfaceConfigurationObserver
- Parameters:
conf
- Server configuration instance
-
getRejectDecommissionedHostsConfig
Reads the value of HConstants.REJECT_DECOMMISSIONED_HOSTS_KEY from the config and returns it- Parameters:
conf
- Configuration instance of the Master
-
registerListener
Add the listener to the notification list.- Parameters:
listener
- The ServerListener to register
-
unregisterListener
Remove the listener from the notification list.- Parameters:
listener
- The ServerListener to unregister
-
regionServerStartup
ServerName regionServerStartup(org.apache.hadoop.hbase.shaded.protobuf.generated.RegionServerStatusProtos.RegionServerStartupRequest request, int versionNumber, String version, InetAddress ia) throws IOException Let the server manager know a new regionserver has come online- Parameters:
request
- the startup requestversionNumber
- the version number of the new regionserverversion
- the version of the new regionserver, could contain strings like "SNAPSHOT"ia
- the InetAddress from which request is received- Returns:
- The ServerName we know this server as.
- Throws:
IOException
-
updateLastFlushedSequenceIds
Updates last flushed sequence Ids for the regions on server sn -
regionServerReport
- Throws:
YouAreDeadException
-
checkRejectableDecommissionedStatus
private void checkRejectableDecommissionedStatus(ServerName sn) throws DecommissionedHostRejectedException Checks if the Master is configured to reject decommissioned hosts or not. When it's configured to do so, any RegionServer trying to join the cluster will have it's host checked against the list of hosts of currently decommissioned servers and potentially get prevented from reporting for duty; otherwise, we do nothing and we let them pass to the next check. See HBASE-28342 for details.- Parameters:
sn
- The ServerName to check for- Throws:
DecommissionedHostRejectedException
- if the Master is configured to reject decommissioned hosts and this host exists in the list of the decommissioned servers
-
checkAndRecordNewServer
Check is a server of same host and port already exists, if not, or the existed one got a smaller start code, record it.- Parameters:
serverName
- the server to check and recordsl
- the server load on the server- Returns:
- true if the server is recorded, otherwise, false
-
findDeadServersAndProcess
void findDeadServersAndProcess(Set<ServerName> deadServersFromPE, Set<ServerName> liveServersFromWALDir) Find out the region servers crashed between the crash of the previous master instance and the current master instance and schedule SCP for them. Since theRegionServerTracker
has already helped us to construct the online servers set by scanning zookeeper, now we can compare the online servers withliveServersFromWALDir
to find out whether there are servers which are already dead. Must be called inside the initialization method ofRegionServerTracker
to avoid concurrency issue.- Parameters:
deadServersFromPE
- the region servers which already have a SCP associated.liveServersFromWALDir
- the live region servers from wal directory.
-
checkClockSkew
private void checkClockSkew(ServerName serverName, long serverCurrentTime) throws ClockOutOfSyncException Checks if the clock skew between the server and the master. If the clock skew exceeds the configured max, it will throw an exception; if it exceeds the configured warning threshold, it will log a warning but start normally.- Parameters:
serverName
- Incoming servers's name- Throws:
ClockOutOfSyncException
- if the skew exceeds the configured max value
-
checkIsDead
Called when RegionServer first reports in for duty and thereafter each time it heartbeats to make sure it is has not been figured for dead. If this server is on the dead list, reject it with a YouAreDeadException. If it was dead but came back with a new start code, remove the old entry from the dead list.- Parameters:
what
- START or REPORT- Throws:
YouAreDeadException
-
findServerWithSameHostnamePortWithLock
Assumes onlineServers is locked.- Returns:
- ServerName with matching hostname and port.
-
recordNewServerWithLock
Adds the onlineServers list. onlineServers should be locked.- Parameters:
serverName
- The remote servers name.
-
getFlushedSequenceIdByRegion
-
getLastFlushedSequenceId
public org.apache.hadoop.hbase.shaded.protobuf.generated.ClusterStatusProtos.RegionStoreSequenceIds getLastFlushedSequenceId(byte[] encodedRegionName) -
getLoad
Returns ServerMetrics if serverName is known else null -
getAverageLoad
Compute the average load across all region servers. Currently, this uses a very naive computation - just uses the number of regions being served, ignoring stats about number of requests.- Returns:
- the average load
-
countOfRegionServers
Returns the count of active regionservers -
getOnlineServers
Returns Read-only map of servers to serverinfo -
getDeadServers
-
areDeadServersInProgress
Checks if any dead servers are currently in progress.- Returns:
- true if any RS are being processed as dead, false if not
- Throws:
IOException
-
letRegionServersShutdown
void letRegionServersShutdown() -
getRegionServersInZK
private List<String> getRegionServersInZK(ZKWatcher zkw) throws org.apache.zookeeper.KeeperException - Throws:
org.apache.zookeeper.KeeperException
-
expireServer
Expire the passed server. Add it to list of dead servers and queue a shutdown processing.- Returns:
- pid if we queued a ServerCrashProcedure else
Procedure.NO_PROC_ID
if we did not (could happen for many reasons including the fact that its this server that is going down or we already have queued an SCP for this server or SCP processing is currently disabled because we are in startup phase).
-
expireServer
-
moveFromOnlineToDeadServers
Called when server has expired. -
removeServerFromDrainList
-
addServerToDrainList
Add the server to the drain list.- Returns:
- True if the server is added or the server is already on the drain list.
-
closeRegionSilentlyAndWait
public static void closeRegionSilentlyAndWait(AsyncClusterConnection connection, ServerName server, RegionInfo region, long timeout) throws IOException, InterruptedException Contacts a region server and waits up to timeout ms to close the region. This bypasses the active hmaster. Pass -1 as timeout if you do not want to wait on result.- Throws:
IOException
InterruptedException
-
getMinToStart
Calculate min necessary to start. This is not an absolute. It is just a friction that will cause us hang around a bit longer waiting on RegionServers to check-in. -
waitForRegionServers
Wait for the region servers to report in. We will wait until one of this condition is met: - the master is stopped - the 'hbase.master.wait.on.regionservers.maxtostart' number of region servers is reached - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND there have been no new region server in for 'hbase.master.wait.on.regionservers.interval' time AND the 'hbase.master.wait.on.regionservers.timeout' is reached- Throws:
InterruptedException
-
getStrForMax
-
getOnlineServersList
Returns A copy of the internal list of online servers. -
getOnlineServersListWithPredicator
public List<ServerName> getOnlineServersListWithPredicator(List<ServerName> keys, Predicate<ServerMetrics> idleServerPredicator) - Parameters:
keys
- The target server nameidleServerPredicator
- Evaluates the server on the given load- Returns:
- A copy of the internal list of online servers matched by the predicator
-
getDrainingServersList
Returns A copy of the internal list of draining servers. -
isServerOnline
-
isServerKnownAndOnline
Returns whether the server is online, dead, or unknown. -
isServerDead
Check if a server is known to be dead. A server can be online, or known to be dead, or unknown to this manager (i.e, not online, not known to be dead either; it is simply not tracked by the master any more, for example, a very old previous instance). -
isServerUnknown
Check if a server is unknown. A server can be online, or known to be dead, or unknown to this manager (i.e, not online, not known to be dead either; it is simply not tracked by the master any more, for example, a very old previous instance). -
shutdownCluster
-
isClusterShutdown
-
startChore
start chore in ServerManager -
stop
Stop the ServerManager. -
createDestinationServersList
Creates a list of possible destinations for a region. It contains the online servers, but not the draining or dying servers.- Parameters:
serversToExclude
- can be null if there is no server to exclude
-
createDestinationServersList
CallscreateDestinationServersList(java.util.List<org.apache.hadoop.hbase.ServerName>)
without server to exclude. -
clearDeadServersWithSameHostNameAndPortOfOnlineServer
To clear any dead server with same host name and port of any online server -
removeRegion
Called by delete table and similar to notify the ServerManager that a region was removed. -
isRegionInServerManagerStates
-
removeRegions
Called by delete table and similar to notify the ServerManager that a region was removed. -
getVersionNumber
May return 0 when server is not online. -
getVersion
May return "0.0.0" when server is not online -
getInfoPort
-
persistRegionLastFlushedSequenceIds
Persist last flushed sequence id of each region to HDFS- Throws:
IOException
- if persit to HDFS fails
-
loadLastFlushedSequenceIds
Load last flushed sequence id of each region from HDFS, if persisted- Throws:
IOException
-
removeDeletedRegionFromLoadedFlushedSequenceIds
Regions may have been removed between latest persist of FlushedSequenceIds and master abort. So after loading FlushedSequenceIds from file, and after meta loaded, we need to remove the deleted region according to RegionStates.
-