Interface Summary
Interface	Description
ReplicationEndpoint	ReplicationEndpoint is a plugin which implements replication to other HBase clusters, or other systems.
ReplicationListener	The replication listener interface can be implemented if a class needs to subscribe to events generated by the ReplicationTracker.
ReplicationPeer	ReplicationPeer manages enabled / disabled state for the peer.
ReplicationPeerConfigListener
ReplicationPeers	This provides an interface for maintaining a set of peer clusters.
ReplicationQueues	This provides an interface for maintaining a region server's replication queues.
ReplicationQueuesClient	This provides an interface for clients of replication to view replication queues.
ReplicationTracker	This is the interface for a Replication Tracker.
WALCellFilter	A filter for WAL entry cells before being sent over to replication.
WALEntryFilter	A Filter for WAL entries before being sent over to replication.

Class Summary
Class	Description
BaseReplicationEndpoint	A Base implementation for `ReplicationEndpoint`s.
BaseWALEntryFilter	A base class WALEntryFilter implementations.
BulkLoadCellFilter
ChainWALEntryFilter	A `WALEntryFilter` which contains multiple filters and applies them in chain order
ClusterMarkingEntryFilter	Filters out entries with our peerClusterId (i.e.
HBaseReplicationEndpoint	A `BaseReplicationEndpoint` for replication endpoints whose target cluster is an HBase cluster.
HBaseReplicationEndpoint.PeerRegionServerListener	Tracks changes to the list of region servers in a peer's cluster.
ReplicationEndpoint.Context
ReplicationEndpoint.ReplicateContext	A context for `ReplicationEndpoint.replicate(ReplicateContext)` method.
ReplicationFactory	A factory class for instantiating replication objects that deal with replication state.
ReplicationLoadSink	A HBase ReplicationLoad to present MetricsSink information
ReplicationLoadSource	A HBase ReplicationLoad to present MetricsSource information
ReplicationPeerConfig	A configuration for the replication peer cluster.
ReplicationPeersZKImpl	This class provides an implementation of the ReplicationPeers interface using Zookeeper.
ReplicationPeerZKImpl
ReplicationQueueInfo	This class is responsible for the parsing logic for a znode representing a queue.
ReplicationQueuesClientZKImpl
ReplicationQueuesZKImpl	This class provides an implementation of the ReplicationQueues interface using Zookeeper.
ReplicationSerDeHelper
ReplicationStateZKBase	This is a base class for maintaining replication state in zookeeper.
ReplicationTrackerZKImpl	This class is a Zookeeper implementation of the ReplicationTracker interface.
ScopeWALEntryFilter	Keeps KVs that are scoped other than local
SystemTableWALEntryFilter	Skips WAL edits for all System tables including META
TableCfWALEntryFilter

Enum Summary
Enum Description

ReplicationPeer.PeerState
State of the peer, whether it is enabled or not
Exception Summary
Exception Description

ReplicationException
An HBase Replication exception.

Enum Summary
Enum	Description
ReplicationPeer.PeerState	State of the peer, whether it is enabled or not

Exception Summary
Exception	Description
ReplicationException	An HBase Replication exception.

Package org.apache.hadoop.hbase.replication Description

Multi Cluster Replication

This package provides replication between HBase clusters.

Status
Requirements
Deployment
Verifying Replicated Data

Status

This package is experimental quality software and is only meant to be a base for future developments. The current implementation offers the following features:

Master/Slave replication.
Master/Master replication.
Cyclic replication.
Replication of scoped families in user tables.
Start/stop replication stream.
Supports clusters of different sizes.
Handling of partitions longer than 10 minutes.
Ability to add/remove slave clusters at runtime.
MapReduce job to compare tables on two clusters

Please report bugs on the project's Jira when found.

Requirements

Before trying out replication, make sure to review the following requirements:

Zookeeper should be handled by yourself, not by HBase, and should always be available during the deployment.
All machines from both clusters should be able to reach every other machine since replication goes from any region server to any other one on the slave cluster. That also includes the Zookeeper clusters.
Both clusters should have the same HBase and Hadoop major revision. For example, having 0.90.1 on the master and 0.90.0 on the slave is correct but not 0.90.1 and 0.89.20100725.
Every table that contains families that are scoped for replication should exist on every cluster with the exact same name, same for those replicated families.
For multiple slaves, Master/Master, or cyclic replication version 0.92 or greater is needed.

Deployment

The following steps describe how to enable replication from a cluster to another.

Edit ${HBASE_HOME}/conf/hbase-site.xml on both cluster to add the following configurations:
```
<property>
  <name>hbase.replication</name>
  <value>true</value>
</property>
```
deploy the files, and then restart HBase if it was running.
Run the following command in the master's shell while it's running
```
add_peer 'ID' 'CLUSTER_KEY'
```
The ID must be a short integer. To compose the CLUSTER_KEY, use the following template:
```
hbase.zookeeper.quorum:hbase.zookeeper.property.clientPort:zookeeper.znode.parent
```
This will show you the help to setup the replication stream between both clusters. If both clusters use the same Zookeeper cluster, you have to use a different zookeeper.znode.parent since they can't write in the same folder.
Once you have a peer, you need to enable replication on your column families. One way to do it is to alter the table and to set the scope like this:
```
      disable 'your_table'
      alter 'your_table', {NAME => 'family_name', REPLICATION_SCOPE => '1'}
      enable 'your_table'
    
```
Currently, a scope of 0 (default) means that it won't be replicated and a scope of 1 means it's going to be. In the future, different scope can be used for routing policies.
To list all configured peers run the following command in the master's shell
```
list_peers
```
(as of version 0.92)
To enable a peer that was previousy disabled, run the following command in the master's shell.
```
enable_peer 'ID'
```
To disable a peer, run the following command in the master's shell. This setting causes HBase to stop sending the edits to that peer cluster, but it still keeps track of all the new WALs that it will need to replicate if and when it is re-enabled.
```
disable_peer 'ID'
```
To remove a peer, use the following command in the master's shell.
```
remove_peer 'ID'
```

You can confirm that your setup works by looking at any region server's log on the master cluster and look for the following lines;

Considering 1 rs, with ratio 0.1
Getting 1 rs from peer cluster # 0
Choosing peer 10.10.1.49:62020

In this case it indicates that 1 region server from the slave cluster was chosen for replication.

Verifying the replicated data on two clusters is easy to do in the shell when looking only at a few rows, but doing a systematic comparison requires more computing power. This is why the VerifyReplication MR job was created, it has to be run on the master cluster and needs to be provided with a peer id (the one provided when establishing a replication stream) and a table name. Other options let you specify a time range and specific families. This job's short name is "verifyrep" and needs to be provided when pointing "hadoop jar" to the hbase jar.

Package org.apache.hadoop.hbase.replication

Package org.apache.hadoop.hbase.replication Description

Multi Cluster Replication

Table Of Contents

Status

Requirements

Deployment

Verifying Replicated Data