@InterfaceAudience.Private public class SplitLogManager extends Object
SplitLogManager monitors the tasks that it creates using the
timeoutMonitor thread. If a task's progress is slow then
SplitLogManagerCoordination.checkTasks()
will take away the
task from the owner SplitLogWorker
and the task will be up for grabs again. When the task is done then it is deleted
by SplitLogManager.
Clients call splitLogDistributed(Path)
to split a region server's
log files. The caller thread waits in this method until all the log files
have been split.
All the coordination calls made by this class are asynchronous. This is mainly to help reduce response time seen by the callers.
There is race in this design between the SplitLogManager and the SplitLogWorker. SplitLogManager might re-queue a task that has in reality already been completed by a SplitLogWorker. We rely on the idempotency of the log splitting task for correctness.
It is also assumed that every log splitting task is unique and once completed (either with success or with error) it will be not be submitted again. If a task is resubmitted then there is a risk that old "delete task" can delete the re-submission.
Modifier and Type | Class and Description |
---|---|
static class |
SplitLogManager.ResubmitDirective |
static class |
SplitLogManager.Task
in memory state of an active task.
|
static class |
SplitLogManager.TaskBatch
Keeps track of the batch of tasks submitted together by a caller in splitLogDistributed().
|
static class |
SplitLogManager.TerminationStatus |
private class |
SplitLogManager.TimeoutMonitor
Periodically checks all active tasks and resubmits the ones that have timed out
|
Modifier and Type | Field and Description |
---|---|
private long |
checkRecoveringTimeThreshold |
private ChoreService |
choreService |
private org.apache.hadoop.conf.Configuration |
conf |
private Set<ServerName> |
deadWorkers |
private Object |
deadWorkersLock |
static int |
DEFAULT_UNASSIGNED_TIMEOUT |
private List<Pair<Set<ServerName>,Boolean>> |
failedRecoveringRegionDeletions |
private long |
lastTaskCreateTime |
private static org.apache.commons.logging.Log |
LOG |
protected ReentrantLock |
recoveringRegionLock
In distributedLogReplay mode, we need touch both splitlog and recovering-regions znodes in one
operation.
|
private Server |
server |
private Stoppable |
stopper |
private ConcurrentMap<String,SplitLogManager.Task> |
tasks |
private SplitLogManager.TimeoutMonitor |
timeoutMonitor |
private long |
unassignedTimeout |
Constructor and Description |
---|
SplitLogManager(Server server,
org.apache.hadoop.conf.Configuration conf,
Stoppable stopper,
MasterServices master,
ServerName serverName)
Its OK to construct this object even when region-servers are not online.
|
Modifier and Type | Method and Description |
---|---|
private int |
activeTasks(SplitLogManager.TaskBatch batch) |
private SplitLogManager.Task |
createTaskIfAbsent(String path,
SplitLogManager.TaskBatch batch) |
(package private) boolean |
enqueueSplitTask(String taskname,
SplitLogManager.TaskBatch batch)
Add a task entry to coordination if it is not already there.
|
(package private) SplitLogManager.Task |
findOrCreateOrphanTask(String path) |
static org.apache.hadoop.fs.FileStatus[] |
getFileList(org.apache.hadoop.conf.Configuration conf,
List<org.apache.hadoop.fs.Path> logDirs,
org.apache.hadoop.fs.PathFilter filter)
Get a list of paths that need to be split given a set of server-specific directories and
optionally a filter.
|
private org.apache.hadoop.fs.FileStatus[] |
getFileList(List<org.apache.hadoop.fs.Path> logDirs,
org.apache.hadoop.fs.PathFilter filter) |
org.apache.hadoop.hbase.protobuf.generated.ZooKeeperProtos.SplitLogTask.RecoveryMode |
getRecoveryMode() |
(package private) ConcurrentMap<String,SplitLogManager.Task> |
getTasks() |
(package private) void |
handleDeadWorker(ServerName workerName) |
(package private) void |
handleDeadWorkers(Set<ServerName> serverNames) |
boolean |
isLogReplaying() |
boolean |
isLogSplitting() |
void |
markRegionsRecovering(ServerName server,
Set<HRegionInfo> userRegions) |
private void |
removeRecoveringRegions(Set<ServerName> serverNames,
Boolean isMetaRecovery)
It removes recovering regions under /hbase/recovering-regions/[encoded region name] so that the
region server hosting the region can allow reads to the recovered region
|
(package private) void |
removeStaleRecoveringRegions(Set<ServerName> failedServers)
It removes stale recovering regions under /hbase/recovering-regions/[encoded region name]
during master initialization phase.
|
void |
setRecoveryMode(boolean isForInitialization)
This function is to set recovery mode from outstanding split log tasks from before or current
configuration setting
|
long |
splitLogDistributed(List<org.apache.hadoop.fs.Path> logDirs)
The caller will block until all the log files of the given region server have been processed -
successfully split or an error is encountered - by an available worker region server.
|
long |
splitLogDistributed(org.apache.hadoop.fs.Path logDir) |
long |
splitLogDistributed(Set<ServerName> serverNames,
List<org.apache.hadoop.fs.Path> logDirs,
org.apache.hadoop.fs.PathFilter filter)
The caller will block until all the hbase:meta log files of the given region server have been
processed - successfully split or an error is encountered - by an available worker region
server.
|
void |
stop() |
private void |
waitForSplittingCompletion(SplitLogManager.TaskBatch batch,
MonitoredTask status) |
private static final org.apache.commons.logging.Log LOG
private Server server
private final Stoppable stopper
private final org.apache.hadoop.conf.Configuration conf
private final ChoreService choreService
public static final int DEFAULT_UNASSIGNED_TIMEOUT
private long unassignedTimeout
private long lastTaskCreateTime
private long checkRecoveringTimeThreshold
private final List<Pair<Set<ServerName>,Boolean>> failedRecoveringRegionDeletions
protected final ReentrantLock recoveringRegionLock
private final ConcurrentMap<String,SplitLogManager.Task> tasks
private SplitLogManager.TimeoutMonitor timeoutMonitor
private volatile Set<ServerName> deadWorkers
private final Object deadWorkersLock
public SplitLogManager(Server server, org.apache.hadoop.conf.Configuration conf, Stoppable stopper, MasterServices master, ServerName serverName) throws IOException
server
- the server instanceconf
- the HBase configurationstopper
- the stoppable in case anything is wrongmaster
- the master servicesserverName
- the master server nameIOException
private org.apache.hadoop.fs.FileStatus[] getFileList(List<org.apache.hadoop.fs.Path> logDirs, org.apache.hadoop.fs.PathFilter filter) throws IOException
IOException
public static org.apache.hadoop.fs.FileStatus[] getFileList(org.apache.hadoop.conf.Configuration conf, List<org.apache.hadoop.fs.Path> logDirs, org.apache.hadoop.fs.PathFilter filter) throws IOException
DefaultWALProvider.getServerNameFromWALDirectoryName(org.apache.hadoop.conf.Configuration, java.lang.String)
for more info on directory
layout.
Should be package-private, but is needed by
WALSplitter.split(Path, Path, Path, FileSystem,
Configuration, WALFactory)
for tests.IOException
public long splitLogDistributed(org.apache.hadoop.fs.Path logDir) throws IOException
logDir
- one region sever wal dir path in .logsIOException
- if there was an error while splitting any log fileIOException
public long splitLogDistributed(List<org.apache.hadoop.fs.Path> logDirs) throws IOException
logDirs
- List of log dirs to splitIOException
- If there was an error while splitting any log filepublic long splitLogDistributed(Set<ServerName> serverNames, List<org.apache.hadoop.fs.Path> logDirs, org.apache.hadoop.fs.PathFilter filter) throws IOException
logDirs
- List of log dirs to splitfilter
- the Path filter to select specific files for consideringIOException
- If there was an error while splitting any log fileboolean enqueueSplitTask(String taskname, SplitLogManager.TaskBatch batch)
taskname
- the path of the log to be splitbatch
- the batch this task belongs toprivate void waitForSplittingCompletion(SplitLogManager.TaskBatch batch, MonitoredTask status)
ConcurrentMap<String,SplitLogManager.Task> getTasks()
private int activeTasks(SplitLogManager.TaskBatch batch)
private void removeRecoveringRegions(Set<ServerName> serverNames, Boolean isMetaRecovery)
serverNames
- servers which are just recoveredisMetaRecovery
- whether current recovery is for the meta region on
serverNames
void removeStaleRecoveringRegions(Set<ServerName> failedServers) throws IOException, InterruptedIOException
failedServers
- A set of known failed serversIOException
InterruptedIOException
private SplitLogManager.Task createTaskIfAbsent(String path, SplitLogManager.TaskBatch batch)
path
- batch
- SplitLogManager.Task findOrCreateOrphanTask(String path)
public void stop()
void handleDeadWorker(ServerName workerName)
void handleDeadWorkers(Set<ServerName> serverNames)
public void setRecoveryMode(boolean isForInitialization) throws IOException
isForInitialization
- IOException
- throws if it's impossible to set recovery modepublic void markRegionsRecovering(ServerName server, Set<HRegionInfo> userRegions) throws InterruptedIOException, IOException
InterruptedIOException
IOException
public boolean isLogReplaying()
public boolean isLogSplitting()
public org.apache.hadoop.hbase.protobuf.generated.ZooKeeperProtos.SplitLogTask.RecoveryMode getRecoveryMode()
Copyright © 2007–2019 The Apache Software Foundation. All rights reserved.