Class SyncReplicationReplayWALManager
java.lang.Object
org.apache.hadoop.hbase.master.replication.SyncReplicationReplayWALManager
The manager for replaying remote wal.
First, it will be used to balance the replay work across all the region servers. We will record
the region servers which have already been used for replaying wal, and prevent sending new replay
work to it, until the previous replay work has been done, where we will remove the region server
from the used worker set. See the comment for
UsedReplayWorkersForPeer
for more details.
Second, the logic for managing the remote wal directory is kept here. Before replaying the wals,
we will rename the remote wal directory, the new name is called 'replay' directory, see
renameToPeerReplayWALDir(String)
. This is used to prevent further writing of remote
wals, which is very important for keeping consistency. And then we will start replaying all the
wals, once a wal has been replayed, we will truncate the file, so that if there are crashes
happen, we do not need to replay all the wals again, see finishReplayWAL(String)
and
isReplayWALFinished(String)
. After replaying all the wals, we will rename the 'replay'
directory, the new name is called 'snapshot' directory. In the directory, we will keep all the
names for the wals being replayed, since all the files should have been truncated. When we
transitting original the ACTIVE cluster to STANDBY later, and there are region server crashes, we
will see the wals in this directory to determine whether a wal should be split and replayed or
not. You can see the code in SplitLogWorker
for more
details.-
Nested Class Summary
Modifier and TypeClassDescriptionprivate static final class
This class is used to record the used workers(region servers) for a replication peer. -
Field Summary
Modifier and TypeFieldDescriptionprivate final org.apache.hadoop.fs.FileSystem
private static final org.slf4j.Logger
private final org.apache.hadoop.fs.Path
private final ServerManager
private final org.apache.hadoop.fs.Path
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionacquirePeerWorker
(String peerId, Procedure<?> proc) Get a worker for replaying remote wal for a give peer.void
addUsedPeerWorker
(String peerId, ServerName worker) Will only be called when loading procedures, where we need to construct the used worker set for each peer.void
createPeerRemoteWALDir
(String peerId) private void
void
finishReplayWAL
(String wal) org.apache.hadoop.fs.Path
List<org.apache.hadoop.fs.Path>
boolean
void
registerPeer
(String peerId) void
releasePeerWorker
(String peerId, ServerName worker, MasterProcedureScheduler scheduler) void
removePeerRemoteWALs
(String peerId) removeWALRootPath
(org.apache.hadoop.fs.Path path) private void
void
renameToPeerReplayWALDir
(String peerId) void
renameToPeerSnapshotWALDir
(String peerId) void
unregisterPeer
(String peerId)
-
Field Details
-
LOG
-
serverManager
-
fs
-
walRootDir
-
remoteWALDir
-
usedWorkersByPeer
-
-
Constructor Details
-
SyncReplicationReplayWALManager
public SyncReplicationReplayWALManager(MasterServices services) throws IOException, ReplicationException - Throws:
IOException
ReplicationException
-
-
Method Details
-
registerPeer
-
unregisterPeer
-
acquirePeerWorker
public ServerName acquirePeerWorker(String peerId, Procedure<?> proc) throws ProcedureSuspendedException Get a worker for replaying remote wal for a give peer. If no worker available, i.e, all the region servers have been used by others, aProcedureSuspendedException
will be thrown to suspend the procedure. And it will be woken up later when there are available workers, either by others release a worker, or there is a new region server joins the cluster.- Throws:
ProcedureSuspendedException
-
releasePeerWorker
-
addUsedPeerWorker
Will only be called when loading procedures, where we need to construct the used worker set for each peer. -
createPeerRemoteWALDir
- Throws:
IOException
-
rename
private void rename(org.apache.hadoop.fs.Path src, org.apache.hadoop.fs.Path dst, String peerId) throws IOException - Throws:
IOException
-
renameToPeerReplayWALDir
- Throws:
IOException
-
renameToPeerSnapshotWALDir
- Throws:
IOException
-
getReplayWALsAndCleanUpUnusedFiles
public List<org.apache.hadoop.fs.Path> getReplayWALsAndCleanUpUnusedFiles(String peerId) throws IOException - Throws:
IOException
-
deleteDir
- Throws:
IOException
-
removePeerRemoteWALs
- Throws:
IOException
-
removeWALRootPath
-
finishReplayWAL
- Throws:
IOException
-
isReplayWALFinished
- Throws:
IOException
-
getRemoteWALDir
-