org.apache.hadoop.hbase.master.replication.SyncReplicationReplayWALManager

@Private public class SyncReplicationReplayWALManager extends Object

The manager for replaying remote wal.

First, it will be used to balance the replay work across all the region servers. We will record the region servers which have already been used for replaying wal, and prevent sending new replay work to it, until the previous replay work has been done, where we will remove the region server from the used worker set. See the comment for UsedReplayWorkersForPeer for more details.

Second, the logic for managing the remote wal directory is kept here. Before replaying the wals, we will rename the remote wal directory, the new name is called 'replay' directory, see renameToPeerReplayWALDir(String). This is used to prevent further writing of remote wals, which is very important for keeping consistency. And then we will start replaying all the wals, once a wal has been replayed, we will truncate the file, so that if there are crashes happen, we do not need to replay all the wals again, see finishReplayWAL(String) and isReplayWALFinished(String). After replaying all the wals, we will rename the 'replay' directory, the new name is called 'snapshot' directory. In the directory, we will keep all the names for the wals being replayed, since all the files should have been truncated. When we transitting original the ACTIVE cluster to STANDBY later, and there are region server crashes, we will see the wals in this directory to determine whether a wal should be split and replayed or not. You can see the code in SplitLogWorker for more details.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

private static final class

SyncReplicationReplayWALManager.UsedReplayWorkersForPeer

This class is used to record the used workers(region servers) for a replication peer.
Field Summary

Fields

Modifier and Type

Field

Description

private final org.apache.hadoop.fs.FileSystem

fs

private static final org.slf4j.Logger

LOG

private final org.apache.hadoop.fs.Path

remoteWALDir

private final ServerManager

serverManager

private final ConcurrentMap<String,SyncReplicationReplayWALManager.UsedReplayWorkersForPeer>

usedWorkersByPeer

private final org.apache.hadoop.fs.Path

walRootDir
Constructor Summary

Constructors

Constructor

Description

SyncReplicationReplayWALManager(MasterServices services)
Method Summary

Modifier and Type

Method

Description

ServerName

acquirePeerWorker(String peerId, Procedure<?> proc)

Get a worker for replaying remote wal for a give peer.

void

addUsedPeerWorker(String peerId, ServerName worker)

Will only be called when loading procedures, where we need to construct the used worker set for each peer.

void

createPeerRemoteWALDir(String peerId)

private void

deleteDir(org.apache.hadoop.fs.Path dir, String peerId)

void

finishReplayWAL(String wal)

org.apache.hadoop.fs.Path

getRemoteWALDir()

List<org.apache.hadoop.fs.Path>

getReplayWALsAndCleanUpUnusedFiles(String peerId)

boolean

isReplayWALFinished(String wal)

void

registerPeer(String peerId)

void

releasePeerWorker(String peerId, ServerName worker, MasterProcedureScheduler scheduler)

void

removePeerRemoteWALs(String peerId)

String

removeWALRootPath(org.apache.hadoop.fs.Path path)

private void

rename(org.apache.hadoop.fs.Path src, org.apache.hadoop.fs.Path dst, String peerId)

void

renameToPeerReplayWALDir(String peerId)

void

renameToPeerSnapshotWALDir(String peerId)

void

unregisterPeer(String peerId)

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- LOG
  
  private static final org.slf4j.Logger LOG
- serverManager
  
  private final ServerManager serverManager
- fs
  
  private final org.apache.hadoop.fs.FileSystem fs
- walRootDir
  
  private final org.apache.hadoop.fs.Path walRootDir
- remoteWALDir
  
  private final org.apache.hadoop.fs.Path remoteWALDir
- usedWorkersByPeer
  
  private final ConcurrentMap<String,SyncReplicationReplayWALManager.UsedReplayWorkersForPeer> usedWorkersByPeer
Constructor Details
- SyncReplicationReplayWALManager
  
  public SyncReplicationReplayWALManager(MasterServices services) throws IOException, ReplicationException
  
  Throws:
  
  IOException
  
  ReplicationException
Method Details
- registerPeer
  
  public void registerPeer(String peerId)
- unregisterPeer
  
  public void unregisterPeer(String peerId)
- acquirePeerWorker
  
  public ServerName acquirePeerWorker(String peerId, Procedure<?> proc) throws ProcedureSuspendedException
  
  Get a worker for replaying remote wal for a give peer. If no worker available, i.e, all the region servers have been used by others, a ProcedureSuspendedException will be thrown to suspend the procedure. And it will be woken up later when there are available workers, either by others release a worker, or there is a new region server joins the cluster.
  
  Throws:
  
  ProcedureSuspendedException
- releasePeerWorker
  
  public void releasePeerWorker(String peerId, ServerName worker, MasterProcedureScheduler scheduler)
- addUsedPeerWorker
  
  public void addUsedPeerWorker(String peerId, ServerName worker)
  
  Will only be called when loading procedures, where we need to construct the used worker set for each peer.
- createPeerRemoteWALDir
  
  public void createPeerRemoteWALDir(String peerId) throws IOException
  
  Throws:
  
  IOException
- rename
  
  private void rename(org.apache.hadoop.fs.Path src, org.apache.hadoop.fs.Path dst, String peerId) throws IOException
  
  Throws:
  
  IOException
- renameToPeerReplayWALDir
  
  public void renameToPeerReplayWALDir(String peerId) throws IOException
  
  Throws:
  
  IOException
- renameToPeerSnapshotWALDir
  
  public void renameToPeerSnapshotWALDir(String peerId) throws IOException
  
  Throws:
  
  IOException
- getReplayWALsAndCleanUpUnusedFiles
  
  public List<org.apache.hadoop.fs.Path> getReplayWALsAndCleanUpUnusedFiles(String peerId) throws IOException
  
  Throws:
  
  IOException
- deleteDir
  
  private void deleteDir(org.apache.hadoop.fs.Path dir, String peerId) throws IOException
  
  Throws:
  
  IOException
- removePeerRemoteWALs
  
  public void removePeerRemoteWALs(String peerId) throws IOException
  
  Throws:
  
  IOException
- removeWALRootPath
  
  public String removeWALRootPath(org.apache.hadoop.fs.Path path)
- finishReplayWAL
  
  public void finishReplayWAL(String wal) throws IOException
  
  Throws:
  
  IOException
- isReplayWALFinished
  
  public boolean isReplayWALFinished(String wal) throws IOException
  
  Throws:
  
  IOException
- getRemoteWALDir
  
  public org.apache.hadoop.fs.Path getRemoteWALDir()

Class SyncReplicationReplayWALManager

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

LOG

serverManager

fs

walRootDir

remoteWALDir

usedWorkersByPeer

Constructor Details

SyncReplicationReplayWALManager

Method Details

registerPeer

unregisterPeer

acquirePeerWorker

releasePeerWorker

addUsedPeerWorker

createPeerRemoteWALDir

rename

renameToPeerReplayWALDir

renameToPeerSnapshotWALDir

getReplayWALsAndCleanUpUnusedFiles

deleteDir

removePeerRemoteWALs

removeWALRootPath

finishReplayWAL

isReplayWALFinished

getRemoteWALDir