HBase has two run modes: Section 2.2.1, “Standalone HBase” and Section 2.2.2, “Distributed”. Out of the box, HBase runs in
standalone mode. Whatever your mode, you will need to configure HBase by editing files in the HBase
directory. At a minimum, you must edit
conf/hbase-env.sh to tell HBase which
java to use. In this file you set HBase environment
variables such as the heapsize and other options for the
JVM, the preferred location for log files,
JAVA_HOME to point at the root of your
This is the default mode. Standalone mode is what is described in the Section 1.2, “Quick Start” section. In standalone mode, HBase does not use HDFS -- it uses the local filesystem instead -- and it runs all HBase daemons and a local ZooKeeper all up in the same JVM. Zookeeper binds to a well known port so clients may talk to HBase.
Distributed mode can be subdivided into distributed but all daemons run on a single node -- a.k.a pseudo-distributed-- and fully-distributed where the daemons are spread across all nodes in the cluster .
Pseudo-distributed mode can run against the local filesystem or it can run against an instance of the Hadoop Distributed File System (HDFS). Fully-distributed mode can ONLY run on HDFS. See the Hadoop requirements and instructions for how to set up HDFS.
Below we describe the different distributed setups. Starting, verification and exploration of your install, whether a pseudo-distributed or fully-distributed configuration is described in a section that follows, Section 2.2.3, “Running and Confirming Your Installation”. The same verification script applies to both deploy types.
A pseudo-distributed mode is simply a fully-distributed mode run on a single host. Use this configuration testing and prototyping on HBase. Do not use this configuration for production nor for evaluating HBase performance.
First, if you want to run on HDFS rather than on the local filesystem, setup your HDFS. You can set up HDFS also in pseudo-distributed mode (TODO: Add pointer to HOWTO doc; the hadoop site doesn't have any any more). Ensure you have a working HDFS before proceeding.
Next, configure HBase. Edit
This is the file into which you add local customizations and overrides.
At a minimum, you must tell HBase to run in (pseudo-)distributed mode rather than
in default standalone mode. To do this, set the
property to true (Its default is
false). The absolute bare-minimum
hbase-site.xml is therefore as follows:
<configuration> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> </configuration>
With this configuration, HBase will start up an HBase Master process, a ZooKeeper server,
and a RegionServer process running against the
local filesystem writing to wherever your operating system stores temporary files into a directory
Such a setup, using the local filesystem and writing to the operating systems's temporary directory is an ephemeral setup; the Hadoop local filesystem -- which is what HBase uses when it is writing the local filesytem does not support sync so unless the system is shutdown properly, the data will be lost. Writing to the operating system's temporary directory can also make for data loss when the machine is restarted as this directory is usually cleared on reboot. For a more permanent setup, see the next example where we make use of an instance of HDFS; HBase data will be written to the Hadoop distributed filesystem rather than to the local filesystem's tmp directory.
conf/hbase-site.xml example, the
hbase.rootdir property points to the local HDFS instance
homed on the node
Let HBase create the
directory. If you don't, you'll get warning saying HBase needs a
migration run because the directory is missing files expected by
HBase (it'll create them if you let it).
<configuration> <property> <name>hbase.rootdir</name> <value>hdfs://h-24-30.sfo.stumble.net:8020/hbase</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> </configuration>
Now skip to Section 2.2.3, “Running and Confirming Your Installation” for how to start and verify your pseudo-distributed install. 
To start up the initial HBase cluster...
To start up an extra backup master(s) on the same server run...
% bin/local-master-backup.sh start 1
... the '1' means use ports 60001 & 60011, and this backup master's logfile will be at
To startup multiple backup masters run...
% bin/local-master-backup.sh start 2 3
You can start up to 9 backup masters (10 total).
To start up more regionservers...
% bin/local-regionservers.sh start 1
where '1' means use ports 60201 & 60301 and its logfile will be at
To add 4 more regionservers in addition to the one you just started by running...
% bin/local-regionservers.sh start 2 3 4 5
This supports up to 99 extra regionservers (100 total).
For running a fully-distributed operation on more than one
host, make the following configurations. In
hbase-site.xml, add the property
hbase.cluster.distributed and set it to
true and point the HBase
hbase.rootdir at the appropriate HDFS NameNode
and location in HDFS where you would like HBase to write data. For
example, if you namenode were running at namenode.example.org on
port 8020 and you wanted to home your HBase in HDFS at
/hbase, make the following
<configuration> ... <property> <name>hbase.rootdir</name> <value>hdfs://namenode.example.org:8020/hbase</value> <description>The directory shared by RegionServers. </description> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> <description>The mode the cluster will be in. Possible values are false: standalone and pseudo-distributed setups with managed Zookeeper true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh) </description> </property> ... </configuration>
In addition, a fully-distributed mode requires that you
Section 18.104.22.168, “
lists all hosts that you would have running
HRegionServers, one host per line (This
file in HBase is like the Hadoop
file). All servers listed in this file will be started and stopped
when HBase cluster start or stop is run.
See section Chapter 17, ZooKeeper for ZooKeeper setup for HBase.
Of note, if you have made HDFS client configuration on your Hadoop cluster -- i.e. configuration you want HDFS clients to use as opposed to server-side configurations -- HBase will not see this configuration unless you do one of the following:
Add a pointer to your
HBASE_CLASSPATH environment variable
Add a copy of
hadoop-site.xml) or, better, symlinks,
if only a small set of HDFS client configurations, add
An example of such an HDFS client configuration is
dfs.replication. If for example, you want to
run with a replication factor of 5, hbase will create files with
the default of 3 unless you do the above to make the configuration
available to HBase.
Make sure HDFS is running first. Start and stop the Hadoop HDFS
daemons by running
bin/start-hdfs.sh over in the
HADOOP_HOME directory. You can ensure it started
properly by testing the put and
get of files into the Hadoop filesystem. HBase does
not normally use the mapreduce daemons. These do not need to be
If you are managing your own ZooKeeper, start it and confirm its running else, HBase will start up ZooKeeper for you as part of its start process.
Start HBase with the following command:
bin/start-hbase.shRun the above from the
You should now have a running HBase instance. HBase logs can be
found in the
logs subdirectory. Check them out
especially if HBase had trouble starting.
HBase also puts up a UI listing vital attributes. By default its
deployed on the Master host at port 60010 (HBase RegionServers listen
on port 60020 by default and put up an informational http server at
60030). If the Master were running on a host named
master.example.org on the default port, to see the
Master's homepage you'd point your browser at
Once HBase has started, see the Section 1.2.3, “Shell Exercises” for how to create tables, add data, scan your insertions, and finally disable and drop your tables.
To stop HBase after exiting the HBase shell enter
$ ./bin/stop-hbase.sh stopping hbase...............
Shutdown can take a moment to complete. It can take longer if your cluster is comprised of many machines. If you are running a distributed operation, be sure to wait until HBase has shut down completely before stopping the Hadoop daemons.