2.2. HBase run modes: Standalone and Distributed

HBase has two run modes: Section 2.2.1, “Standalone HBase” and Section 2.2.2, “Distributed”. Out of the box, HBase runs in standalone mode. Whatever your mode, you will need to configure HBase by editing files in the HBase conf directory. At a minimum, you must edit conf/hbase-env.sh to tell HBase which java to use. In this file you set HBase environment variables such as the heapsize and other options for the JVM, the preferred location for log files, etc. Set JAVA_HOME to point at the root of your java install.

2.2.1. Standalone HBase

This is the default mode. Standalone mode is what is described in the Section 1.2, “Quick Start” section. In standalone mode, HBase does not use HDFS -- it uses the local filesystem instead -- and it runs all HBase daemons and a local ZooKeeper all up in the same JVM. Zookeeper binds to a well known port so clients may talk to HBase.

2.2.2. Distributed

Distributed mode can be subdivided into distributed but all daemons run on a single node -- a.k.a pseudo-distributed-- and fully-distributed where the daemons are spread across all nodes in the cluster [10].

Pseudo-distributed mode can run against the local filesystem or it can run against an instance of the Hadoop Distributed File System (HDFS). Fully-distributed mode can ONLY run on HDFS. See the Hadoop requirements and instructions for how to set up HDFS.

Below we describe the different distributed setups. Starting, verification and exploration of your install, whether a pseudo-distributed or fully-distributed configuration is described in a section that follows, Section 2.2.3, “Running and Confirming Your Installation”. The same verification script applies to both deploy types.

2.2.2.1. Pseudo-distributed

A pseudo-distributed mode is simply a fully-distributed mode run on a single host. Use this configuration testing and prototyping on HBase. Do not use this configuration for production nor for evaluating HBase performance.

First, if you want to run on HDFS rather than on the local filesystem, setup your HDFS. You can set up HDFS also in pseudo-distributed mode (TODO: Add pointer to HOWTO doc; the hadoop site doesn't have any any more). Ensure you have a working HDFS before proceeding.

Next, configure HBase. Edit conf/hbase-site.xml. This is the file into which you add local customizations and overrides. At a minimum, you must tell HBase to run in (pseudo-)distributed mode rather than in default standalone mode. To do this, set the hbase.cluster.distributed property to true (Its default is false). The absolute bare-minimum hbase-site.xml is therefore as follows:

<configuration>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>
</configuration>

With this configuration, HBase will start up an HBase Master process, a ZooKeeper server, and a RegionServer process running against the local filesystem writing to wherever your operating system stores temporary files into a directory named hbase-YOUR_USER_NAME.

Such a setup, using the local filesystem and writing to the operating systems's temporary directory is an ephemeral setup; the Hadoop local filesystem -- which is what HBase uses when it is writing the local filesytem does not support sync so unless the system is shutdown properly, the data will be lost. Writing to the operating system's temporary directory can also make for data loss when the machine is restarted as this directory is usually cleared on reboot. For a more permanent setup, see the next example where we make use of an instance of HDFS; HBase data will be written to the Hadoop distributed filesystem rather than to the local filesystem's tmp directory.

In this conf/hbase-site.xml example, the hbase.rootdir property points to the local HDFS instance homed on the node h-24-30.example.com.

Let HBase create ${hbase.rootdir}

Let HBase create the hbase.rootdir directory. If you don't, you'll get warning saying HBase needs a migration run because the directory is missing files expected by HBase (it'll create them if you let it).

<configuration>
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://h-24-30.sfo.stumble.net:8020/hbase</value>
  </property>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
  </property>
</configuration>

Now skip to Section 2.2.3, “Running and Confirming Your Installation” for how to start and verify your pseudo-distributed install. [11]

2.2.2.1.1. Pseudo-distributed Extras
2.2.2.1.1.1. Startup

To start up the initial HBase cluster...

% bin/start-hbase.sh

To start up an extra backup master(s) on the same server run...

% bin/local-master-backup.sh start 1

... the '1' means use ports 16001 & 16011, and this backup master's logfile will be at logs/hbase-${USER}-1-master-${HOSTNAME}.log.

To startup multiple backup masters run...

% bin/local-master-backup.sh start 2 3

You can start up to 9 backup masters (10 total).

To start up more regionservers...

% bin/local-regionservers.sh start 1

... where '1' means use ports 16201 & 16301 and its logfile will be at `logs/hbase-${USER}-1-regionserver-${HOSTNAME}.log.

To add 4 more regionservers in addition to the one you just started by running...

% bin/local-regionservers.sh start 2 3 4 5

This supports up to 99 extra regionservers (100 total).

2.2.2.1.1.2. Stop

Assuming you want to stop master backup # 1, run...

% cat /tmp/hbase-${USER}-1-master.pid |xargs kill -9

Note that bin/local-master-backup.sh stop 1 will try to stop the cluster along with the master.

To stop an individual regionserver, run...

% bin/local-regionservers.sh stop 1
	                

2.2.2.2. Fully-distributed

For running a fully-distributed operation on more than one host, make the following configurations. In hbase-site.xml, add the property hbase.cluster.distributed and set it to true and point the HBase hbase.rootdir at the appropriate HDFS NameNode and location in HDFS where you would like HBase to write data. For example, if you namenode were running at namenode.example.org on port 8020 and you wanted to home your HBase in HDFS at /hbase, make the following configuration.

<configuration>
  ...
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://namenode.example.org:8020/hbase</value>
    <description>The directory shared by RegionServers.
    </description>
  </property>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
    <description>The mode the cluster will be in. Possible values are
      false: standalone and pseudo-distributed setups with managed Zookeeper
      true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
    </description>
  </property>
  ...
</configuration>
2.2.2.2.1. regionservers

In addition, a fully-distributed mode requires that you modify conf/regionservers. The Section 2.4.1.2, “regionservers file lists all hosts that you would have running HRegionServers, one host per line (This file in HBase is like the Hadoop slaves file). All servers listed in this file will be started and stopped when HBase cluster start or stop is run.

2.2.2.2.2. ZooKeeper and HBase

See section Chapter 17, ZooKeeper for ZooKeeper setup for HBase.

2.2.2.2.3. HDFS Client Configuration

Of note, if you have made HDFS client configuration on your Hadoop cluster -- i.e. configuration you want HDFS clients to use as opposed to server-side configurations -- HBase will not see this configuration unless you do one of the following:

  • Add a pointer to your HADOOP_CONF_DIR to the HBASE_CLASSPATH environment variable in hbase-env.sh.

  • Add a copy of hdfs-site.xml (or hadoop-site.xml) or, better, symlinks, under ${HBASE_HOME}/conf, or

  • if only a small set of HDFS client configurations, add them to hbase-site.xml.

An example of such an HDFS client configuration is dfs.replication. If for example, you want to run with a replication factor of 5, hbase will create files with the default of 3 unless you do the above to make the configuration available to HBase.

2.2.3. Running and Confirming Your Installation

Make sure HDFS is running first. Start and stop the Hadoop HDFS daemons by running bin/start-hdfs.sh over in the HADOOP_HOME directory. You can ensure it started properly by testing the put and get of files into the Hadoop filesystem. HBase does not normally use the mapreduce daemons. These do not need to be started.

If you are managing your own ZooKeeper, start it and confirm its running else, HBase will start up ZooKeeper for you as part of its start process.

Start HBase with the following command:

bin/start-hbase.sh
Run the above from the HBASE_HOME directory.

You should now have a running HBase instance. HBase logs can be found in the logs subdirectory. Check them out especially if HBase had trouble starting.

HBase also puts up a UI listing vital attributes. By default its deployed on the Master host at port 16010 (HBase RegionServers listen on port 16020 by default and put up an informational http server at 16030). If the Master were running on a host named master.example.org on the default port, to see the Master's homepage you'd point your browser at http://master.example.org:16010.

Prior to HBase 0.98, the default ports the master ui was deployed on port 16010, and the HBase RegionServers would listen on port 16020 by default and put up an informational http server at 16030.

Once HBase has started, see the Section 1.2.3, “Shell Exercises” for how to create tables, add data, scan your insertions, and finally disable and drop your tables.

To stop HBase after exiting the HBase shell enter

$ ./bin/stop-hbase.sh
stopping hbase...............

Shutdown can take a moment to complete. It can take longer if your cluster is comprised of many machines. If you are running a distributed operation, be sure to wait until HBase has shut down completely before stopping the Hadoop daemons.



[10] The pseudo-distributed vs fully-distributed nomenclature comes from Hadoop.

[11] See Section 2.2.2.1.1, “Pseudo-distributed Extras” for notes on how to start extra Masters and RegionServers when running pseudo-distributed.

comments powered by Disqus