The Apache HBase™ Reference Guide

Revision History
Revision 2.0.0-SNAPSHOT 2014-10-22T16:15

Abstract

This is the official reference guide of Apache HBase™, a distributed, versioned, big data store built on top of Apache Hadoop™ and Apache ZooKeeper™.


Table of Contents

Preface
1. Getting Started
1.1. Introduction
1.2. Quick Start - Standalone HBase
2. Apache HBase Configuration
2.1. Basic Prerequisites
2.2. HBase run modes: Standalone and Distributed
2.3. Running and Confirming Your Installation
2.4. Configuration Files
2.5. Example Configurations
2.6. The Important Configurations
2.7. Dynamic Configuration
3. Upgrading
3.1. HBase version numbers
3.2. Upgrading from 0.98.x to 1.0.x
3.3. Upgrading from 0.96.x to 0.98.x
3.4. Upgrading from 0.94.x to 0.98.x
3.5. Upgrading from 0.94.x to 0.96.x
3.6. Upgrading from 0.92.x to 0.94.x
3.7. Upgrading from 0.90.x to 0.92.x
3.8. Upgrading to HBase 0.90.x from 0.20.x or 0.89.x
4. The Apache HBase Shell
4.1. Scripting with Ruby
4.2. Running the Shell in Non-Interactive Mode
4.3. HBase Shell in OS Scripts
4.4. Read HBase Shell Commands from a Command File
4.5. Passing VM Options to the Shell
4.6. Shell Tricks
5. Data Model
5.1. Conceptual View
5.2. Physical View
5.3. Namespace
5.4. Table
5.5. Row
5.6. Column Family
5.7. Cells
5.8. Data Model Operations
5.9. Versions
5.10. Sort Order
5.11. Column Metadata
5.12. Joins
5.13. ACID
6. HBase and Schema Design
6.1. Schema Creation
6.2. On the number of column families
6.3. Rowkey Design
6.4. Number of Versions
6.5. Supported Datatypes
6.6. Joins
6.7. Time To Live (TTL)
6.8. Keeping Deleted Cells
6.9. Secondary Indexes and Alternate Query Paths
6.10. Constraints
6.11. Schema Design Case Studies
6.12. Operational and Performance Configuration Options
7. HBase and MapReduce
7.1. HBase, MapReduce, and the CLASSPATH
7.2. MapReduce Scan Caching
7.3. Bundled HBase MapReduce Jobs
7.4. HBase as a MapReduce Job Data Source and Data Sink
7.5. Writing HFiles Directly During Bulk Import
7.6. RowCounter Example
7.7. Map-Task Splitting
7.8. HBase MapReduce Examples
7.9. Accessing Other HBase Tables in a MapReduce Job
7.10. Speculative Execution
8. Secure Apache HBase
8.1. Secure Client Access to Apache HBase
8.2. Simple User Access to Apache HBase
8.3. Securing Access To Your Data
8.4. Security Configuration Example
9. Architecture
9.1. Overview
9.2. Catalog Tables
9.3. Client
9.4. Client Request Filters
9.5. Master
9.6. RegionServer
9.7. Regions
9.8. Bulk Loading
9.9. HDFS
9.10. Timeline-consistent High Available Reads
10. Apache HBase APIs
11. Apache HBase External APIs
11.1. Non-Java Languages Talking to the JVM
11.2. REST
11.3. Thrift
11.4. C/C++ Apache HBase Client
12. Thrift API and Filter Language
12.1. Filter Language
13. Apache HBase Coprocessors
13.1. Coprocessor Framework
13.2. Examples
13.3. Building A Coprocessor
13.4. Check the Status of a Coprocessor
13.5. Monitor Time Spent in Coprocessors
13.6. Status of Coprocessors in HBase
14. Apache HBase Performance Tuning
14.1. Operating System
14.2. Network
14.3. Java
14.4. HBase Configurations
14.5. ZooKeeper
14.6. Schema Design
14.7. HBase General Patterns
14.8. Writing to HBase
14.9. Reading from HBase
14.10. Deleting from HBase
14.11. HDFS
14.12. Amazon EC2
14.13. Collocating HBase and MapReduce
14.14. Case Studies
15. Troubleshooting and Debugging Apache HBase
15.1. General Guidelines
15.2. Logs
15.3. Resources
15.4. Tools
15.5. Client
15.6. MapReduce
15.7. NameNode
15.8. Network
15.9. RegionServer
15.10. Master
15.11. ZooKeeper
15.12. Amazon EC2
15.13. HBase and Hadoop version issues
15.14. IPC Configuration Conflicts with Hadoop
15.15. HBase and HDFS
15.16. Running unit or integration tests
15.17. Case Studies
15.18. Cryptographic Features
15.19. Operating System Specific Issues
15.20. JDK Issues
16. Apache HBase Case Studies
16.1. Overview
16.2. Schema Design
16.3. Performance/Troubleshooting
17. Apache HBase Operational Management
17.1. HBase Tools and Utilities
17.2. Region Management
17.3. Node Management
17.4. HBase Metrics
17.5. HBase Monitoring
17.6. Cluster Replication
17.7. HBase Backup
17.8. HBase Snapshots
17.9. Capacity Planning and Region Sizing
17.10. Table Rename
18. Building and Developing Apache HBase
18.1. Getting Involved
18.2. Apache HBase Repositories
18.3. IDEs
18.4. Building Apache HBase
18.5. Releasing Apache HBase
18.6. Voting on Release Candidates
18.7. Generating the HBase Reference Guide
18.8. Updating hbase.apache.org
18.9. Tests
18.10. Developer Guidelines
19. Unit Testing HBase Applications
19.1. JUnit
19.2. Mockito
19.3. MRUnit
19.4. Integration Testing with a HBase Mini-Cluster
20. ZooKeeper
20.1. Using existing ZooKeeper ensemble
20.2. SASL Authentication with ZooKeeper
21. Community
21.1. Decisions
21.2. Community Roles
21.3. Commit Message format
A. Contributing to Documentation
A.1. Getting Access to the Wiki
A.2. Contributing to Documentation or Other Strings
A.3. Editing the HBase Website
A.4. Editing the HBase Reference Guide
A.5. Auto-Generated Content
A.6. Multi-Page and Single-Page Output
A.7. Images in the HBase Reference Guide
A.8. Adding a New Chapter to the HBase Reference Guide
A.9. Docbook Common Issues
B. FAQ
C. hbck In Depth
C.1. Running hbck to identify inconsistencies
C.2. Inconsistencies
C.3. Localized repairs
C.4. Region Overlap Repairs
D. Access Control Matrix
E. Compression and Data Block Encoding In HBase
E.1. Which Compressor or Data Block Encoder To Use
E.2. Making use of Hadoop Native Libraries in HBase
E.3. Compressor Configuration, Installation, and Use
E.4. Enable Data Block Encoding
F. SQL over HBase
F.1. Apache Phoenix
F.2. Trafodion
G. YCSB: The Yahoo! Cloud Serving Benchmark and HBase
H. HFile format
H.1. HBase File Format (version 1)
H.2. HBase file format with inline blocks (version 2)
H.3. HBase File Format with Security Enhancements (version 3)
I. Other Information About HBase
I.1. HBase Videos
I.2. HBase Presentations (Slides)
I.3. HBase Papers
I.4. HBase Sites
I.5. HBase Books
I.6. Hadoop Books
J. HBase History
K. HBase and the Apache Software Foundation
K.1. ASF Development Process
K.2. ASF Board Reporting
L. Apache HBase Orca
M. Enabling Dapper-like Tracing in HBase
M.1. SpanReceivers
M.2. Client Modifications
M.3. Tracing from HBase Shell
N. 0.95 RPC Specification
N.1. Goals
N.2. TODO
N.3. RPC
N.4. Notes
Index

List of Figures

9.1. Region State Transitions
9.2. HFile Version 1
13.1. Coprocessor Metrics UI
17.1. Basic Info
17.2. Config
17.3. Stats
17.4. L1 and L2
17.5. Replication Architecture Overview
E.1. ColumnFamily with No Encoding
E.2. ColumnFamily with Prefix Encoding
E.3. ColumnFamily with Diff Encoding
H.1. HFile V1 Format

List of Tables

1.1. Distributed Cluster Demo Architecture
2.1. Java
2.2. Hadoop version support matrix
5.1. Table webtable
5.2. ColumnFamily anchor
5.3. ColumnFamily contents
8.1. Operation To Permission Mapping
8.2. Examples of Visibility Expressions
9.1. Stripe Sizing Settings
18.1. Release Managers
D.1. ACL Matrix

List of Examples

1.1. Example /etc/hosts File for Ubuntu
1.2. Example hbase-site.xml for Standalone HBase
1.3. node-a jps Output
1.4. node-b jps Output
1.5. node-c jps Output
2.1. Calculate the Potential Number of Open Files
2.2. Example Distributed HBase Cluster
4.1. Passing Commands to the HBase Shell
4.2. Checking the Result of a Scripted Command
4.3. Example Command File
4.4. Directing HBase Shell to Execute the Commands
5.1. Examples
5.2. Examples
5.3. Modify the Maximum Number of Versions for a Column
5.4. Modify the Minimum Number of Versions for a Column
6.1. Salting Example
6.2. Hashing Example
6.3. Change the Value of KEEP_DELETED_CELLS Using HBase Shell
6.4. Change the Value of KEEP_DELETED_CELLS Using the API
8.1. HBase Shell
8.2. API
8.3. Revoking Access To a Table
8.4. HBase Shell
8.5. API
8.6. HBase Shell
8.7. Java API
8.8. HBase Shell
8.9. Java API
8.10. HBase Shell
8.11. Java API
8.12. HBase Shell
8.13. Java API
8.14. HBase Shell
8.15. Java API
8.16. Example Security Settings in hbase-site.xml
8.17. Example Group Mapper in Hadoop core-site.xml
9.1. Pre-Creating a HConnection
10.1. Create a Table Using Java
10.2. Add, Modify, and Delete a Table
12.1. Compound Operators
12.2. Precedence Example
12.3. Example 1
12.4. Example 2
12.5. Example 3
12.6. Example 4
13.1. Example RegionObserver Configuration
13.2. Load a Coprocessor On a Table Using HBase Shell
13.3. Unload a Coprocessor From a Table Using HBase Shell
14.1. Enable Prefetch Using HBase Shell
14.2. Enable Prefetch Using the API
14.3. Hedged Reads Configuration Example
17.1. rolling-restart.sh General Usage
18.1. Code Blocks in Jira Comments
18.2. Example ~/.m2/settings.xml File
18.3. Example of Committing a Patch
B.1. Maven Dependency for HBase 0.98
B.2. Maven Dependency for HBase 0.96
B.3. Maven Dependency for HBase 0.94
E.1. Enabling Compression on a ColumnFamily of an Existing Table using HBase Shell
E.2. Creating a New Table with Compression On a ColumnFamily
E.3. Verifying a ColumnFamily's Compression Settings
E.4. LoadTestTool Usage
E.5. Example Usage of LoadTestTool
E.6. Enable Data Block Encoding On a Table
E.7. Verifying a ColumnFamily's Data Block Encoding
comments powered by Disqus