18.11. Developing

18.11.1. Codelines

Most development is done on the master branch, which is named master in the Git repository. Previously, HBase used Subversion, in which the master branch was called TRUNK. Branches exist for minor releases, and important features and bug fixes are often back-ported.

18.11.2. Unit Tests

The following information is from http://blog.cloudera.com/blog/2013/09/how-to-test-hbase-applications-using-popular-tools/. The following sections discuss JUnit, Mockito, MRUnit, and HBaseTestingUtility.

18.11.2.1. JUnit

HBase uses JUnit 4 for unit tests

This example will add unit tests to the following example class:

public class MyHBaseDAO {

    public static void insertRecord(HTableInterface table, HBaseTestObj obj)
    throws Exception {
        Put put = createPut(obj);
        table.put(put);
    }
    
    private static Put createPut(HBaseTestObj obj) {
        Put put = new Put(Bytes.toBytes(obj.getRowKey()));
        put.add(Bytes.toBytes("CF"), Bytes.toBytes("CQ-1"),
                    Bytes.toBytes(obj.getData1()));
        put.add(Bytes.toBytes("CF"), Bytes.toBytes("CQ-2"),
                    Bytes.toBytes(obj.getData2()));
        return put;
    }
}                
            

The first step is to add JUnit dependencies to your Maven POM file:

<dependency>
    <groupId>junit</groupId>
    <artifactId>junit</artifactId>
    <version>4.11</version>
    <scope>test</scope>
</dependency>                
                

Next, add some unit tests to your code. Tests are annotated with @Test. Here, the unit tests are in bold.

public class TestMyHbaseDAOData {
  @Test
  public void testCreatePut() throws Exception {
  HBaseTestObj obj = new HBaseTestObj();
  obj.setRowKey("ROWKEY-1");
  obj.setData1("DATA-1");
  obj.setData2("DATA-2");
  Put put = MyHBaseDAO.createPut(obj);
  assertEquals(obj.getRowKey(), Bytes.toString(put.getRow()));
  assertEquals(obj.getData1(), Bytes.toString(put.get(Bytes.toBytes("CF"), Bytes.toBytes("CQ-1")).get(0).getValue()));
  assertEquals(obj.getData2(), Bytes.toString(put.get(Bytes.toBytes("CF"), Bytes.toBytes("CQ-2")).get(0).getValue()));
  }
}                
            

These tests ensure that your createPut method creates, populates, and returns a Put object with expected values. Of course, JUnit can do much more than this. For an introduction to JUnit, see https://github.com/junit-team/junit/wiki/Getting-started.

18.11.2.2. Mockito

Mockito is a mocking framework. It goes further than JUnit by allowing you to test the interactions between objects without having to replicate the entire environment. You can read more about Mockito at its project site, https://code.google.com/p/mockito/.

You can use Mockito to do unit testing on smaller units. For instance, you can mock a org.apache.hadoop.hbase.Server instance or a org.apache.hadoop.hbase.master.MasterServices interface reference rather than a full-blown org.apache.hadoop.hbase.master.HMaster.

This example builds upon the example code in Section 18.11.2, “Unit Tests”, to test the insertRecord method.

First, add a dependency for Mockito to your Maven POM file.

<dependency>
    <groupId>org.mockito</groupId>
    <artifactId>mockito-all</artifactId>
    <version>1.9.5</version>
    <scope>test</scope>
</dependency>                   
                   

Next, add a @RunWith annotation to your test class, to direct it to use Mockito.

@RunWith(MockitoJUnitRunner.class)
public class TestMyHBaseDAO{
  @Mock 
  private HTableInterface table;
  @Mock
  private HTablePool hTablePool;
  @Captor
  private ArgumentCaptor putCaptor;

  @Test
  public void testInsertRecord() throws Exception {
    //return mock table when getTable is called
    when(hTablePool.getTable("tablename")).thenReturn(table);
    //create test object and make a call to the DAO that needs testing
    HBaseTestObj obj = new HBaseTestObj();
    obj.setRowKey("ROWKEY-1");
    obj.setData1("DATA-1");
    obj.setData2("DATA-2");
    MyHBaseDAO.insertRecord(table, obj);
    verify(table).put(putCaptor.capture());
    Put put = putCaptor.getValue();
  
    assertEquals(Bytes.toString(put.getRow()), obj.getRowKey());
    assert(put.has(Bytes.toBytes("CF"), Bytes.toBytes("CQ-1")));
    assert(put.has(Bytes.toBytes("CF"), Bytes.toBytes("CQ-2")));
    assertEquals(Bytes.toString(put.get(Bytes.toBytes("CF"),Bytes.toBytes("CQ-1")).get(0).getValue()), "DATA-1");
    assertEquals(Bytes.toString(put.get(Bytes.toBytes("CF"),Bytes.toBytes("CQ-2")).get(0).getValue()), "DATA-2");
  }
}                   
               

This code populates HBaseTestObj with “ROWKEY-1”, “DATA-1”, “DATA-2” as values. It then inserts the record into the mocked table. The Put that the DAO would have inserted is captured, and values are tested to verify that they are what you expected them to be.

The key here is to manage htable pool and htable instance creation outside the DAO. This allows you to mock them cleanly and test Puts as shown above. Similarly, you can now expand into other operations such as Get, Scan, or Delete.

18.11.2.3. MRUnit

Apache MRUnit is a library that allows you to unit-test MapReduce jobs. You can use it to test HBase jobs in the same way as other MapReduce jobs.

Given a MapReduce job that writes to an HBase table called MyTest, which has one column family called CF, the reducer of such a job could look like the following:

public class MyReducer extends TableReducer<Text, Text, ImmutableBytesWritable> {
   public static final byte[] CF = "CF".getBytes();
   public static final byte[] QUALIFIER = "CQ-1".getBytes();
   public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
     //bunch of processing to extract data to be inserted, in our case, lets say we are simply
     //appending all the records we receive from the mapper for this particular
     //key and insert one record into HBase
     StringBuffer data = new StringBuffer();
     Put put = new Put(Bytes.toBytes(key.toString()));
     for (Text val : values) {
         data = data.append(val);
     }
     put.add(CF, QUALIFIER, Bytes.toBytes(data.toString()));
     //write to HBase
     context.write(new ImmutableBytesWritable(Bytes.toBytes(key.toString())), put);
   }
 }                    
                

To test this code, the first step is to add a dependency to MRUnit to your Maven POM file.

<dependency>
   <groupId>org.apache.mrunit</groupId>
   <artifactId>mrunit</artifactId>
   <version>1.0.0 </version>
   <scope>test</scope>
</dependency>                    
                    

Next, use the ReducerDriver provided by MRUnit, in your Reducer job.

public class MyReducerTest {
    ReduceDriver<Text, Text, ImmutableBytesWritable, Writable> reduceDriver;
    byte[] CF = "CF".getBytes();
    byte[] QUALIFIER = "CQ-1".getBytes();

    @Before
    public void setUp() {
      MyReducer reducer = new MyReducer();
      reduceDriver = ReduceDriver.newReduceDriver(reducer);
    }
  
   @Test
   public void testHBaseInsert() throws IOException {
      String strKey = "RowKey-1", strValue = "DATA", strValue1 = "DATA1", 
strValue2 = "DATA2";
      List<Text> list = new ArrayList<Text>();
      list.add(new Text(strValue));
      list.add(new Text(strValue1));
      list.add(new Text(strValue2));
      //since in our case all that the reducer is doing is appending the records that the mapper   
      //sends it, we should get the following back
      String expectedOutput = strValue + strValue1 + strValue2;
     //Setup Input, mimic what mapper would have passed
      //to the reducer and run test
      reduceDriver.withInput(new Text(strKey), list);
      //run the reducer and get its output
      List<Pair<ImmutableBytesWritable, Writable>> result = reduceDriver.run();
    
      //extract key from result and verify
      assertEquals(Bytes.toString(result.get(0).getFirst().get()), strKey);
    
      //extract value for CF/QUALIFIER and verify
      Put a = (Put)result.get(0).getSecond();
      String c = Bytes.toString(a.get(CF, QUALIFIER).get(0).getValue());
      assertEquals(expectedOutput,c );
   }

}                    
                    

Your MRUnit test verifies that the output is as expected, the Put that is inserted into HBase has the correct value, and the ColumnFamily and ColumnQualifier have the correct values.

MRUnit includes a MapperDriver to test mapping jobs, and you can use MRUnit to test other operations, including reading from HBase, processing data, or writing to HDFS,

18.11.2.4. Integration Testing with a HBase Mini-Cluster

HBase ships with HBaseTestingUtility, which makes it easy to write integration tests using a mini-cluster. The first step is to add some dependencies to your Maven POM file. Check the versions to be sure they are appropriate.

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-common</artifactId>
    <version>2.0.0</version>
    <type>test-jar</type>
    <scope>test</scope>
</dependency>

<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase</artifactId>
    <version>0.98.3</version>
    <type>test-jar</type>
    <scope>test</scope>
</dependency>
        
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-hdfs</artifactId>
    <version>2.0.0</version>
    <type>test-jar</type>
    <scope>test</scope>
</dependency>

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-hdfs</artifactId>
    <version>2.0.0</version>
    <scope>test</scope>
</dependency>                    
                    

This code represents an integration test for the MyDAO insert shown in Section 18.11.2, “Unit Tests”.

public class MyHBaseIntegrationTest {
    private static HBaseTestingUtility utility;
    byte[] CF = "CF".getBytes();
    byte[] QUALIFIER = "CQ-1".getBytes();
    
    @Before
    public void setup() throws Exception {
    	utility = new HBaseTestingUtility();
    	utility.startMiniCluster();
    }

    @Test
        public void testInsert() throws Exception {
       	 HTableInterface table = utility.createTable(Bytes.toBytes("MyTest"),
       			 Bytes.toBytes("CF"));
       	 HBaseTestObj obj = new HBaseTestObj();
       	 obj.setRowKey("ROWKEY-1");
       	 obj.setData1("DATA-1");
       	 obj.setData2("DATA-2");
       	 MyHBaseDAO.insertRecord(table, obj);
       	 Get get1 = new Get(Bytes.toBytes(obj.getRowKey()));
       	 get1.addColumn(CF, CQ1);
       	 Result result1 = table.get(get1);
       	 assertEquals(Bytes.toString(result1.getRow()), obj.getRowKey());
       	 assertEquals(Bytes.toString(result1.value()), obj.getData1());
       	 Get get2 = new Get(Bytes.toBytes(obj.getRowKey()));
       	 get2.addColumn(CF, CQ2);
       	 Result result2 = table.get(get2);
       	 assertEquals(Bytes.toString(result2.getRow()), obj.getRowKey());
       	 assertEquals(Bytes.toString(result2.value()), obj.getData2());
    }
}                    
                

This code creates an HBase mini-cluster and starts it. Next, it creates a table called MyTest with one column family, CF. A record is inserted, a Get is performed from the same table, and the insertion is verified.

Note

Starting the mini-cluster takes about 20-30 seconds, but that should be appropriate for integration testing.

To use an HBase mini-cluster on Microsoft Windows, you need to use a Cygwin environment.

See the paper at HBase Case-Study: Using HBaseTestingUtility for Local Testing and Development (2010) for more information about HBaseTestingUtility.

18.11.3. Code Standards

See Section 18.3.1.1, “Code Formatting” and Section 18.12.5, “Common Patch Feedback”.

18.11.3.1. Interface Classifications

Interfaces are classified both by audience and by stability level. These labels appear at the head of a class. The conventions followed by HBase are inherited by its parent project, Hadoop.

The following interface classifications are commonly used:

@InterfaceAudience

@InterfaceAudience.Public

APIs for users and HBase applications. These APIs will be deprecated through major versions of HBase.

@InterfaceAudience.Private

APIs for HBase internals developers. No guarantees on compatibility or availability in future versions. Private interfaces do not need an @InterfaceStability classification.

@InterfaceAudience.LimitedPrivate(HBaseInterfaceAudience.COPROC)

APIs for HBase coprocessor writers. As of HBase 0.92/0.94/0.96/0.98 this api is still unstable. No guarantees on compatibility with future versions.

No @InterfaceAudience Classification

Packages without an @InterfaceAudience label are considered private. Mark your new packages if publicly accessible.

Excluding Non-Public Interfaces from API Documentation

Only interfaces classified @InterfaceAudience.Public should be included in API documentation (Javadoc). Committers must add new package excludes ExcludePackageNames section of the pom.xml for new packages which do not contain public classes.

@InterfaceStability

@InterfaceStability is important for packages marked @InterfaceAudience.Public.

@InterfaceStability.Stable

Public packages marked as stable cannot be changed without a deprecation path or a very good reason.

@InterfaceStability.Unstable

Public packages marked as unstable can be changed without a deprecation path.

@InterfaceStability.Evolving

Public packages marked as evolving may be changed, but it is discouraged.

No @InterfaceStability Label

Public classes with no @InterfaceStability label are discouraged, and should be considered implicitly unstable.

If you are unclear about how to mark packages, ask on the development list.

18.11.4. Invariants

We don't have many but what we have we list below. All are subject to challenge of course but until then, please hold to the rules of the road.

18.11.4.1. No permanent state in ZooKeeper

ZooKeeper state should transient (treat it like memory). If ZooKeeper state is deleted, hbase should be able to recover and essentially be in the same state.

Exceptions

  • There are currently a few exceptions that we need to fix around whether a table is enabled or disabled.

  • Replication data is currently stored only in ZooKeeper. Deleting ZooKeeper data related to replication may cause replication to be disabled. Do not delete the replication tree, /hbase/replication/.

    Warning

    Replication may be disrupted and data loss may occur if you delete the replication tree (/hbase/replication/) from ZooKeeper. Follow progress on this issue at HBASE-10295.

18.11.5. Running In-Situ

If you are developing Apache HBase, frequently it is useful to test your changes against a more-real cluster than what you find in unit tests. In this case, HBase can be run directly from the source in local-mode. All you need to do is run:

${HBASE_HOME}/bin/start-hbase.sh

This will spin up a full local-cluster, just as if you had packaged up HBase and installed it on your machine.

Keep in mind that you will need to have installed HBase into your local maven repository for the in-situ cluster to work properly. That is, you will need to run:

mvn clean install -DskipTests

to ensure that maven can find the correct classpath and dependencies. Generally, the above command is just a good thing to try running first, if maven is acting oddly.

18.11.6. Adding Metrics

After adding a new feature a developer might want to add metrics. HBase exposes metrics using the Hadoop Metrics 2 system, so adding a new metric involves exposing that metric to the hadoop system. Unfortunately the API of metrics2 changed from hadoop 1 to hadoop 2. In order to get around this a set of interfaces and implementations have to be loaded at runtime. To get an in-depth look at the reasoning and structure of these classes you can read the blog post located here. To add a metric to an existing MBean follow the short guide below:

18.11.6.1. Add Metric name and Function to Hadoop Compat Interface.

Inside of the source interface the corresponds to where the metrics are generated (eg MetricsMasterSource for things coming from HMaster) create new static strings for metric name and description. Then add a new method that will be called to add new reading.

18.11.6.2. Add the Implementation to Both Hadoop 1 and Hadoop 2 Compat modules.

Inside of the implementation of the source (eg. MetricsMasterSourceImpl in the above example) create a new histogram, counter, gauge, or stat in the init method. Then in the method that was added to the interface wire up the parameter passed in to the histogram.

Now add tests that make sure the data is correctly exported to the metrics 2 system. For this the MetricsAssertHelper is provided.

comments powered by Disqus