Package org.apache.hadoop.hbase.client.coprocessor

Provides client classes for invoking Coprocessor RPC protocols

See: Description

Package org.apache.hadoop.hbase.client.coprocessor Description

Provides client classes for invoking Coprocessor RPC protocols

Overview

The coprocessor framework provides a way for custom code to run in place on the HBase region servers with each of a table's regions. These client classes enable applications to communicate with coprocessor instances via custom RPC protocols.

In order to provide a custom RPC protocol to clients, a coprocessor implementation defines an interface that extends CoprocessorProtocol. The interface can define any methods that the coprocessor wishes to expose. Using this protocol, you can communicate with the coprocessor instances via the HTable.coprocessorProxy(Class, byte[]) and HTable.coprocessorExec(Class, byte[], byte[], org.apache.hadoop.hbase.client.coprocessor.Batch.Call, org.apache.hadoop.hbase.client.coprocessor.Batch.Callback) methods.

Since CoprocessorProtocol instances are associated with individual regions within the table, the client RPC calls must ultimately identify which regions should be used in the CoprocessorProtocol method invocations. Since regions are seldom handled directly in client code and the region names may change over time, the coprocessor RPC calls use row keys to identify which regions should be used for the method invocations. Clients can call CoprocessorProtocol methods against either:

Note that the row keys passed as parameters to the HTable methods are not passed to the CoprocessorProtocol implementations. They are only used to identify the regions for endpoints of the remote calls.

The Batch class defines two interfaces used for CoprocessorProtocol invocations against multiple regions. Clients implement Batch.Call to call methods of the actual CoprocessorProtocol instance. The interface's call() method will be called once per selected region, passing the CoprocessorProtocol instance for the region as a parameter. Clients can optionally implement Batch.Callback to be notified of the results from each region invocation as they complete. The instance's Batch.Callback.update(byte[], byte[], Object) method will be called with the Batch.Call.call(Object) return value from each region.

Example usage

To start with, let's use a fictitious coprocessor, RowCountCoprocessor that counts the number of rows and key-values in each region where it is running. For clients to query this information, the coprocessor defines and implements the following CoprocessorProtocol extension interface:

public interface RowCountProtocol extends CoprocessorProtocol {
  long getRowCount();
  long getRowCount(Filter filt);
  long getKeyValueCount();
}

Now we need a way to access the results that RowCountCoprocessor is making available. If we want to find the row count for all regions, we could use:

HTable table = new HTable("mytable");
// find row count keyed by region name
Map results = table.coprocessorExec(
    RowCountProtocol.class, // the protocol interface we're invoking
    null, null,             // start and end row keys
    new Batch.Call() {
       public Long call(RowCountProtocol counter) {
         return counter.getRowCount();
       }
     });

This will return a java.util.Map of the counter.getRowCount() result for the RowCountCoprocessor instance running in each region of mytable, keyed by the region name.

By implementing Batch.Call as an anonymous class, we can invoke RowCountProtocol methods directly against the Batch.Call.call(Object) method's argument. Calling HTable.coprocessorExec(Class, byte[], byte[], org.apache.hadoop.hbase.client.coprocessor.Batch.Call) will take care of invoking Batch.Call.call() against our anonymous class with the RowCountCoprocessor instance for each table region.

For this simple case, where we only want to obtain the result from a single CoprocessorProtocol method, there's also a bit of syntactic sugar we can use to cut down on the amount of code required:

HTable table = new HTable("mytable");
Batch.Call call = Batch.forMethod(RowCountProtocol.class, "getRowCount");
Map results = table.coprocessorExec(RowCountProtocol.class, null, null, call);

Batch.forMethod(Class, String, Object...) is a simple factory method that will return a Batch.Call instance that will call RowCountProtocol.getRowCount() for us using reflection.

However, if you want to perform additional processing on the results, implementing Batch.Call directly will provide more power and flexibility. For example, if you would like to combine row count and key-value count for each region:

HTable table = new HTable("mytable");
// combine row count and kv count for region
Map> results = table.coprocessorExec(
    RowCountProtocol.class,
    null, null,
    new Batch.Call>() {
        public Pair call(RowCountProtocol counter) {
          return new Pair(counter.getRowCount(), counter.getKeyValueCount());
        }
    });

Similarly, you could average the number of key-values per row for each region:

Map results = table.coprocessorExec(
    RowCountProtocol.class,
    null, null,
    new Batch.Call() {
        public Double call(RowCountProtocol counter) {
          return ((double)counter.getKeyValueCount()) / ((double)counter.getRowCount());
        }
    });

Copyright © 2015 The Apache Software Foundation. All rights reserved.