About ◆◆◆

HBaseCon (founded in 2012) is the premier conference for the Apache HBase community—including committers/contributors, developers, operators, learners, and users (including some of those managing the largest deployments in the world). If you run Apache HBase in production or aspire to do so, HBaseCon has no substitute!

Apache HBase is a native distributed data store for the Apache Hadoop ecosystem. Its community works independently within the ASF to provide HBase software under the permissive Apache license.

San Francisco | May 24, 2016

The Village
969 Market St.
San Francisco, CA 94103

Attendees have exclusive access to a 15% discount on 3-day developer training for HBase, May 25-27 in San Francisco!

Note also: Attendees are invited to attend an HBase meetup on HBaseCon eve (May 23) hosted by Splice Machine, and "PhoenixCon" on May 25, hosted by Salesforce.

Program Committee

All paper proposals are evaluated and selected by a diverse cross-section of the HBase community (thanks, PC!):

Sean Busbey, Software Engineer, Cloudera / Apache HBase PMC
Elliott Clark, Engineer, Facebook / Apache HBase PMC
Lars Hofhansl, Architect, Salesforce.com / Apache HBase PMC
Matthew Hunt, Head of Open Source R&D, Bloomberg LP / Apache HBase Contributor
Francis Liu, Software Engineer, Yahoo! / Apache HBase Contributor
Carter Page, Senior Engineering Manager, Google Bigtable Team
Andrew Purtell, Architect, Salesforce.com / Apache HBase PMC Chair
Enis Söztutar, Member of Technical Staff, Hortonworks / Apache HBase PMC
Michael Stack, Software Engineer, Cloudera / Apache HBase PMC


Time Development & Internals Operations Applications
7:30am-6:30pm Registration Open
7:30am-8:30pm Exhibits Open
7:30am-8:50am Breakfast

Pre-conference Session: Apache HBase - Just the Basics
Jesse Anderson (Smoking Hand)

This early-morning session offers an overview of what HBase is, how it works, its API, and considerations for using HBase as part of a Big Data solution. It will be helpful for people who are new to HBase, and also serve as a refresher for those who may need one.

9am-10:40am Opening General Session

Keynote: Welcome Message/State of Apache HBase
Apache HBase PMC

An update about achievements by the community since HBaseCon 2015, and what's in the works.

Keynote: The Road to Apache HBase
Cesar Delgado (Apple)

The story of HBase at Apple.

Keynote: Apache HBase at Yahoo! Scale
Francis Liu (Yahoo!)

Yahoo has long been involved in HBase and its community. In 2013, HBase was offered as a hosted service at Yahoo. Since then, adoption has grown rapidly., and today, HBase is used by numerous teams across the company, helping to enable a diverse set of use cases ranging from near real-time processing to data warehousing.

This was made possible thanks to HBase along with some enhancements to support multi-tenancy and scale. As our clusters continue to grow and use cases become more demanding we are working towards supporting a million regions in a single cluster.

In this keynote, we’ll paint a picture of where Yahoo! is today and the enhancements we have been working on to reach today’s scale as well as supporting a million regions and beyond.

Keynote: Facebook's Return to (Real) Open Source 
Elliott Clark (Facebook)

Facebook internally has a long history with Apache Hadoop and HBase. A while ago Facebook engineers essentially forked HBase and created the 0.98-fb branch. While its source was out in the open, it wasn't easy to take any code from 0.98-fb and apply the wins to HBase releases. The inverse was also very true: As time went on, the two branches drifted apart and became very different. For the past year, Facebook has been working to rejoin the Apache world. And now, it has succeeded: Our production clusters are running all open source code. I will discuss the motivations for returning, what the major differences were, and what went right/wrong on that journey. Finally, I will set out some aspirations for what we would like to see as focus areas in the coming year.

10:40am-11am Break

Apache HBase Improvements and Practices at Xiaomi
Duo Zhang and Liangliang He (Xiaomi)

In this session, we’ll discuss the various practices around HBase in use at Xiaomi, including those relating to HA, tiered compaction, multi-tenancy, and failover across data centers.

Argus Production Monitoring at Salesforce
Tom Valine and Bhinav Sura (Salesforce)

We’ll present details about Argus, a time-series monitoring and alerting platform developed at Salesforce to provide insight into the health of infrastructure as an alternative to systems such as Graphite and Seyren.

The Inevitability of Bigtable
Michael O’Reilly (Google)

Why is Bigtable the way it is? This session will be a walk-through of designing a storage system from scratch, exploring why physics pushes us toward designs that look a lot like Bigtable.


Apache HBase, Accelerated: In-Memory Flush and Compaction
Eshcar Hillel and Anastasia Braginsky (Yahoo!)

Real-time HBase application performance depends critically on the amount of I/O in the datapath. Here we’ll describe an optimization of HBase for high-churn applications that frequently insert/update/delete the same keys, such as for high-speed queuing and e-commerce.

Apache HBase Replication at Scale
Ashu Pauchari (Facebook)

Disaster readiness and high data availability form essential components in any production cluster, and HBase clusters at Facebook are no different in this regard. This session presents how HBase data replication forms a key component in ensuring that we achieve these objectives.

Apache HBase in the Enterprise Data Hub at Cerner
Swarnim Kulkarni (Cerner)

Cerner has been an active consumer of HBase for a very long time, storing petabytes of healthcare data in its multiple isolated HBase clusters. This talk will walk through the design of Cerner's enterprise data hub with a focus on the multi-tenant HBase as a service offering within the hub.

12:30pm-1:30pm Lunch

Tales from Taming the Long Tail
Deepankar Reddy and Ishan Chhabra (Rocket Fuel)

Rocket Fuel is a marketing technology company that participates in 120+ billion real-time bidding auctions daily to show the right ad to the right user at the right time for our clients. In this talk, we discuss our efforts to systematically identify causes of, and how to decrease, long-tail read latencies.

Update on OpenTSDB and AsyncHBase
Chris Larsen (Yahoo!)

This year we'll talk about the joys of the HBase Fuzzy Row Filter, new TSDB filters, expression support, Graphite functions and running OpenTSDB on top of Google’s hosted Bigtable. AsyncHBase now includes per-RPC timeouts, append support, Kerberos auth, and a beta implementation in Go.

Apache HBase at Airbnb
Jingwei Lu and Jason Zhang (Airbnb)

AirStream is a realtime stream computation framework built on top of Spark Streaming and HBase that allows our engineers and data scientists to easily leverage HBase to get real-time insights and build real-time feedback loops. In this talk, we will introduce AirStream, and then go over a few production use cases.


Improvements to Apache HBase and Its Applications in Alibaba Search
Yu Li and Shaoxuan Wang (Alibaba)

HBase is the core storage system in Alibaba’s Search Infrastructure. In this session, we will talk about the details of how we use HBase to serve such high-throughput, low-latency, mixed workloads and the various improvements we made to HBase to meet these challenges.

Apache HBase Security at Scale
Gary Helmling (Facebook)

Building on top of Kerberos authentication, HBase security brings with it new operational burdens and failure modes, as well as new requirements in provisioning and configuration. In this talk, we will describe how we rolled out HBase security within Facebook, and some of the challenges we faced along the way.

Rolling Out Apache HBase for Mobile Offerings at Visa
Partha Saha and CW Chung (Visa)

Visa has embarked on an ambitious multi-year redesign of its entire data platform that powers its business. As part of this plan, the Apache Hadoop ecosystem, including HBase, will now become a staple in many of its solutions. Here, we will describe our journey in rolling out a high-availability NoSQL solution based on HBase behind some of our prominent mobile offerings.


Off-heaping the Apache HBase Read Path
Anoop Sam John and Ramkrishna Vasudevan (Intel)

HBase provides an LRU based on heap cache but its size (and so the total data size that can be cached) is limited by Java’s max heap space. This talk highlights our work under HBASE-11425 to allow the HBase read path to work directly from the off-heap area.

Containerizing Apache HBase Clusters
David Pope and Javier Maestro (Facebook)

At Facebook, all production HBase clusters run in a containerized environment, with every daemon running inside its own LXC container. Containerization allows us to ensure isolation between services running on the same host and simplify operations, but sometimes abstractions leak and problems can't be addressed inside the container. In this talk, we will discuss how Facebook runs HBase as a stateful service inside containers.

Time-Series Apache HBase (20 mins.) / Date-tiered Compaction Policy for Time-series Data (20 mins.)
Vladimir Rodionov (Hortonworks) / Clara Xiong (Flurry/Yahoo!)

Time-series applications (sensor data, application/system logging events, user interactions etc) present a new set of data storage challenges: very high velocity and very high volume of data. This talk will present the recent development in Apache HBase that make it a good fit for time-series applications.

With petabytes of data on thousands of nodes replicated across multiple data centers, growing at an accelerating rate, we have been running a workload at scale with a bottleneck of IO bandwidth. This talk covers a new compaction policy to improve efficiency for time-range scans of various look-back windows by structuring and maintaining a date-tiered store file layout for time-series data with infrequent updates and deletes.

3:50pm-4:10pm Break

Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Nitin Verma, Pravin Mittal, and Maxim Lukiyanov (Microsoft)

This session presents our success story of enabling a big internal customer on Microsoft Azure’s HBase service along with the methodology and tools used to meet high-throughput goals. We will also present how new features in HBase (like BucketCache and MultiWAL) are helping our customers in the medium-latency/high-bandwidth cloud-storage scenario.

Solving Multi-tenancy and G1GC in Apache HBase
Graham Baecher & Patrick Dignan (HubSpot)

At HubSpot, all HBase clusters run with G1GC and are highly multi-tenant, powering hundreds of unique APIs, Hadoop jobs, daemons, and crons. This two-part talk will cover challenges and solutions involving HBase multi-tenancy and G1GC tuning at HubSpot, including an overview of our request-by-request monitoring and analysis tools and how we identify/address G1 settings and behaviors that might be causing performance or stability problems.

Apache Kylin’s Performance Boost from Apache HBase (20 mins.) / In Search of Database Nirvana: Challenges of Delivering HTAP (20 mins.)
Hongbin Ma and Luke Han (Kyligence) / Rohit Jain (Esgyn)

Part 1:
Apache Kylin is an open source distributed analytics engine that provides a SQL interface and multi-dimensional analysis on Hadoop supporting extremely large datasets. In the forthcoming Kylin release, we optimized query performance by exploring the potentials of parallel storage on top of HBase. This talk explains how that work was done.

Part 2:
Customers are looking for one database engine to address all their varied needs--from transactional to analytical workloads--against structured, semi-structured, and unstructured data (Gartner’s term Hybrid Transactional/Analytical Processing, or HTAP, perhaps comes closest to describing this nirvana.) But can it be achieved? The motivation of this talk is to establish a framework for assessing the maturity and capabilities of query engines on Apache Hadoop ecosystem storage engines such as HBase in meeting these diverse needs.


Apache Spark on Apache HBase: Current and Future
Ted Malaska (Cloudera), Jean-Marc Spaggiari (Cloudera), Zhan Zhang (Hortonworks)

The integration of Spark and HBase is becoming more popular in online data analytics. In this session, we briefly walk through the current offering of the HBase-Spark module in HBase at an abstract level and for RDD and DataFrames (digging into some real-world implementations and code examples), and then discuss future work.

BigBucket Cache, Texas Edition (20 mins.) / Breaking the Sound Barrier with Persistent Memory (20 mins.)
Viplava Madasu (HPE) and Michael Stack (Cloudera) / Liqi Yi and Shylaja Kokoori (Intel)

Part 1:
HBase read performance is important for HPE and Cloudera customers. As such, to fully take advantage of hardware capabilities, HBase BucketCache needs to perform and scale well. This talk covers adventures in BucketCache internals on the way to reaching 4 million ops (YCSB Workload C) in a 4U rack space.

Part 2:
A fully optimized HBase cluster could easily hit the limit of the underlying storage device’s capability, which is beyond the reach of software optimization alone. To get around this constraint, we need a new design that brings data processing and data storage closer together. In this presentation, we will look at how persistent memory will change the way large datasets are stored. We will review the hardware characteristics of 3D XPoint™, a new persistent memory technology with low latency and high capacity. We will also discuss opportunities for further improvement within the HBase framework using persistent memory.

Apache Phoenix: Use Cases and New Features
James Taylor (Salesforce) and Maryann Xue (Intel)

This talk with be broken into two parts: Phoenix use cases and new Phoenix features. Three use cases will be presented as lightning talks by individuals from 1) Sony about its social media NewsSuite app, 2) eHarmony on its matching service, and 3) Salesforce.com on its time-series metrics engine. Two new features will be discussed in detail by the engineers who developed them: ACID transactions in Phoenix through Apache Tephra. and cost-based query optimization through Apache Calcite. The focus will be on helping end users more easily develop scalable applications on top of Phoenix.

5:50pm-6:30pm Closing General Session

Keynote: The Future of Apache HBase (Panel)
Moderated by Lars Hofhansl (Salesforce), with Matteo Bertozzi (Cloudera), John Leach (Splice Machine), Maxim Lukiyanov (Microsoft), Matt Mullins (Facebook), and Carter Page (Google)

The future of HBase, via a variety of viewpoints.

6:30pm-8:30pm HBaseCon Party!


  • Anastasia Braginsky


    Anastasia is a Research Scientist at Yahoo!. She works on scalable big data and search platforms. Most recently, she focused on HBase scalability features. She received her PhD in distributed computing from Technion CS in 2015. Prior to Yahoo, she held technical positions at IBM and Intel.

  • Andrew Purtell


    Andrew is the VP/PMC Chair of Apache HBase, and an Architect at Salesforce.com working on cloud storage. Previously, Andrew worked at Intel, Trend Micro, Sparta, and McAfee.

  • Anoop Sam John


    Anoop is part of Intel’s Big Data platform team, and is an Apache HBase committer and PMC member.

  • Ashu Pachauri


    Ashu Pachauri is a software engineer at Facebook.

  • Bhinav Sura


    Bhinav completed his Master's degree in CS from University of Illinois a couple of years ago and now works at Salesforce as a developer on Argus, where he spends most of his time implementing platform services that can scale to handle at least 1 billion events per minute.

  • Carter Page


    Carter is an Engineer and Manager on the Bigtable development team at Google in New York City. For the last 19 years, Carter has worked on high-performance distributed software across several industries, including media, finance, and education.

  • Cesar Delgado


    Cesar is a platform architect at Apple working on Siri. He has also worked on iTunes, iCloud, News and Maps. Has been involved in the Apache Hadoop community since 2008.

  • Chris Larsen


    Chris is a Software Engineer at Yahoo! working on the monitoring team to store and process time-series data at a massive scale. He coordinates development on OpenTSDB and AsyncHBase with a great community of users and contributors. Previously, he helped publish OpenTSDB 2.0 while working at Limelight Networks.

  • Clara Xiong


    Clara is a Senior Software Engineer on Flurry's Platform Team, which builds platform services for mobile data analytics applications, primarily working on Apache HBase and data processing/streaming pipelines. Previously she worked at Microsoft in various areas, including cloud storage and SQL Server scalability.

  • CW Chung


    CW has been using and building Apache Hadoop to solve Big Data problems since 2008. He worked in the Hadoop Engineering Team at Yahoo! before joining Visa. As one of the earliest Hadoop engineers at Visa, he has built Hadoop-based platform and apps there and promotes Hadoop culture and technology internally. CW got his Master of Engineering from Cornell University, and MBA from Haas School of Business at UC Berkeley.

  • Daniel Pol

    Hewlett-Packard Enterprise

    Daniel Pol works on Apache HBase performance in the Big Data R&D team at Hewlett-Packard Enterprise.

  • David Pope


    David is a Production Engineer at Facebook on the Apache HBase team.

  • Duo Zhang


    Duo is an Apache HBase Committer, and a Software Engineer at Xiaomi working on storage systems like HBase and HDFS.

  • Elliott Clark


    Elliott is an Engineer at Facebook on the Apache HBase team. He's also an HBase committer and PMC member.

  • Enis Söztutar


    Enis is a Member of the Technical Staff at Hortonworks, a committer and PMC member on Apache HBase, and a member of the ASF. He has been using and developing Apache Hadoop ecosystem projects since 2007.

  • Eshcar Hillel


    Eshcar is a Research Scientist at Yahoo! working on scalable big data and search platforms with a focus on HBase scalability. Prior to Yahoo!, she held a technical position at HP Labs. She received her PhD in distributed computing from Technion CS in 2011.

  • Francis Liu


    Francis is a Software Engineer at Yahoo!, working mainly on Apache HBase. He is also an Apache Hive contributor. Prior to that, he was involved in the development of a workflow management and incremental processing platform built on top of Apache Hadoop.

  • Gary Helmling


    Gary Helmling is a Software Engineer at Facebook, and an Apache HBase committer and PMC member.

  • Graham Baecher


    Graham is a member of the Data Infrastructure team at HubSpot, helping build, scale and tune backend systems and datastores. His recent work there includes G1GC analysis and tuning for HubSpot's several Apache HBase clusters.

  • Hongbin Ma


    Hongbin is a technical partner at a startup called Kyligence that focuses on open source big data solutions. After receiving his master’s degree at Shanghai Jiaotong University in April 2014, he worked at Microsoft Research Asia on a graph database called Trinity. During his first job at eBay, he became the No.1 committer to Apache Kylin on Github. His focus on Kylin includes storage engine, query optimization, test coverage, connectivity, etc.

  • Ishan Chhabra

    Rocket Fuel

    Ishan is a Technical Lead at Rocket Fuel, with a focus on building the next generation of real time storage and processing systems to enable key business use cases. Prior to Rocket Fuel, he worked at Bell Labs to enable privacy in large scale recommendation systems using a truly distributed middleware. Ishan holds a Bachelors in Computer Science and Engineering.

  • James Taylor


    James is an architect at Salesforce in the Data Platform and Services Cloud. He leads the Apache Phoenix project, an OLTP and operational analytics database on top of HBase, and is a PMC member of Apache Calcite and the Apache Incubator. Prior to working at Salesforce, James worked at BEA Systems on projects such as federated query processing systems and event driven programming platforms and has worked at various other start-ups in the computer industry over the past 20 years.

  • Jason Zhang


    Jason is a Software Engineer on the Data Infrastructure team at Airbnb. Before Airbnb, he worked in the Distributed Data System group at Linkedin. Jason has been an Apache Helix PMC member since 2012.

  • Javier Maestro


    Javier is a Production Engineer at Facebook on the Apache HBase team.

  • Jean-Marc Spaggiari


    Jean-Marc is a Senior Solution Architect at Cloudera with many years of experience, specializing in Apache HBase solutions. An active HBase contributor, Jean-Marc has contributed more than 50 patches to the community and participates in all release testing.

  • Jesse Anderson

    Smoking Hand

    Jesse is a Data Engineer, Creative Engineer, and CEO of Smoking Hand. He trains at companies ranging from startups to Fortune 100 companies on cutting edge technology like Apache Kafka, Apache Hadoop, and Apache Spark. He has taught thousands of students the skills to become Data Engineers. He has been covered in prestigious publications such as The Wall Street Journal, CNN, BBC, NPR, Engadget, and Wired.

  • Jingwei Lu


    Jingwei is on the Data Infrastructure team at Airbnb. He was previously a tech-leader in Facebook data infrastructure team in charge of Bumblebee project (hive/hadoop replacement) query processing and language. Prior to Facebook, he redesigned the SCOPE (Microsoft equivalent of Hive) runtime at Microsoft.

  • John Leach

    Splice Machine

    With over 15 years of software experience under his belt, John’s expertise in analytics and BI drives his role as CTO. Prior to Splice Machine, John founded Incite Retail and led the company’s strategy and development efforts. Prior to Incite Retail, he ran the business intelligence practice at Blue Martini Software and built strategic partnerships with integration partners.

  • Lars Hofhansl


    Lars is an Apache HBase committer and PMC member. He is an Architect at Salesforce.com, where he leads HBase development efforts, recently focusing on performance, backup, and disaster recovery. In the past, Lars held engineering roles at Peoplesoft and Digital Equipment Corp.

  • Liangliang He


    He is a Software Engineer on Xiaomi's storage infrastructure team. He focuses on development and support of Apache HBase and the cloud storage services that are backing various Xiaomi large-scale online services.

  • Liqi Yi


    Liqi is a Senior Java Performance Engineer in Intel’s Software Solution Group. He has extensive experience with Apache HBase performance optimization, Java Garbage Collection tuning, and hardware platform characterization.

  • Liyin Tang


    Liyin is a Software Engineer on the Data Infrastructure team at Airbnb, and an Apache HBase committer and PMC member. Before Airbnb, he worked at Facebook and Dropbox. He currently focuses on building highly available and reliable storage services that can scale in the face of exponential data growth. He holds a master's degree in computer science from USC.

  • Luke Han


    Luke is Co-Founder and CEO at Kyligence, and the co-creator and VP of Apache Kylin. Prior to Kyligence, he was Big Data Product Lead at eBay managing Kylin, engaging customers, and coordinating various teams from different geographies. Prior to eBay, Luke was chief consultant at Actuate China.

  • Maryann Xue


    Maryann is a Software Engineer in the Big Data Technologies team at Intel. She is a PMC member of the Apache Phoenix project and a committer on the Apache Calcite project. Before shifting focus on open source projects, she worked on Intel's Distribution of Hadoop as a technical leader of the HBase team.

  • Matt Mullins


    Matt Mullins is a Production Engineer for the HBase team at Facebook.

  • Matteo Bertozzi


    Matteo is a Software Engineer at Cloudera, and an HBase committer/PMC member.

  • Maxim Lukiyanov


    Maxim is Program Manager on the HDInsight team at Microsoft. He is responsible for the HBase cluster type, focusing primarily on optimizing HBase for cloud environment.

  • Michael O'Reilly


    Michael spent 20 years in the ISP industry in various forms. Eight years ago, he escaped to Google SRE and has immensely enjoyed making full use of the SI prefixes he used to make jokes about in school. He is the Director of Google SRE in Sydney, aka "The Czar of Crazy".

  • Michael Stack


    Michael is a Software Engineer at Cloudera. He is a PMC member on the HBase, Hadoop, and Arrow projects.

  • Nitin Verma


    Nitin is a Senior Software Engineer in the HDInsight group at Microsoft. He worked as a database kernel and storage developer for nearly 12 years on Microsoft SQL Server and Sybase ASE.

  • Partha Saha


    Partha Saha received a Ph.D. in Physics from MIT in 1997, where he built a laser interferometer to study gravitational waves, and since then has held technical positions at Amazon Web Services, Yahoo!, and Microsoft. At Visa, he is involved in the redesign of the data platform behind data products.

  • Patrick Dignan


    Patrick is a member of the Data Infrastructure team at HubSpot, helping build, scale and tune backend systems and datastores. His recent work there focuses on Apache HBase multi-tenancy reliability and performance.

  • Pravin Mittal


    Pravin is a Principal Development Manager in the HDInsight group at Microsoft.

  • Ramakrishna Vasudevan


    Ramakrishnan is part of Intel’s Big Data platform team, and is an Apache HBase committer and PMC member.

  • Rohit Jain


    Rohit is the CTO at Esgyn working on Apache Trafodion, currently in incubation. Trafodion is a transactional SQL-on-HBase RDBMS. Rohit worked for Tandem, Compaq, and Hewlett-Packard for the last 28 of his 39 years in application and database development. His experience spans OLTP and analytic processing on distributed, massively parallel systems.

  • Shaoxuan Wang


    Shaoxuan is a Senior Manager in Alibaba's Search Infrastructure division. Prior to Alibaba, he was a senior software engineer working on social graph and core infrastructure at Facebook.

  • Shylaja Kokoori


    Shylaja is a Software Engineer in the Software and Services Group at Intel. She has 10+ years of experience working in areas like Java Virtual Machine, driver development, content security, and tools and automation. Her recent work focuses on non-volatile memory programming and enabling for Big Data frameworks. She holds a Master’s degree in Bioinformatics and one in Computing Studies from Arizona State University.

  • Swarnim Kulkarni


    Swarnim is a Lead Architect with the Big Data team at Cerner Corp.. At Cerner, his team is focused on designing and development of infrastructure for ingestion of healthcare data in the cloud using Apache Hadoop technologies. He is also a contributor to the Apache Hive project with a focus on the Hive/HBase integration of the project.

  • Ted Malaska


    Ted is a Solutions Architect at Cloudera, a contributor to Apache Spark and Apache HBase, and a co-author of the O’Reilly book, Hadoop Applications Architecture.

  • Tom Valine


    Tom is a Silicon Valley veteran with a passion for producing technology that has a real impact on both business and individuals. Having had the good fortune to work at some of the most recognizable names in the industry, including IBM, NVIDIA, Transmeta, Sun Microsystems, and Atmel, Tom currently is Director of Infrastructure Engineering for the Diagnostics, Visibility and Analytics group at Salesforce.

  • Venkata Deepankar Reddy

    Rocket Fuel

    Venkata Deepankar is a Software Engineer at Rocket Fuel where he builds large-scale data and serving applications. Prior to Rocket Fuel, he interned at Google at INRIA. Venkata Deepankar holds a Bachelors in Computer Science from IIT Bombay with a specialization in statistics.

  • Vladimir Rodionov


    Vladimir is a Senior Member of Technical Staff at Hortonworks, and an Apache HBase contributor. He holds a master’s degree in Applied Math and Physics from the Moscow Institute of Physics and Technology.

  • Yu Li


    Yu is a Technical Expert in Alibaba’s search department, with more than 5 years of experience in distributed storage and systems. He is an active contributor to Apache HBase.

  • Zhan Zhang


    Zhan is a Member of Technical Staff at Hortonworks, where he works on Apache Hadoop ecosystem and Apache Spark. He received his BS/MS degree from Fudan University of China and Ph.D in Computer & Information Science & Engineering from University of Florida. His research interests distributed system and large-scale machine learning platform, with more than 10 papers published in top journals/conferences.


HBaseCon is the best opportunity in the world (literally) to access the entire Apache HBase community under one roof.

Download a prospectus here; email us at hbasecon@cloudera.com for info.

Archives ◆◆◆

2016 - Presentations & Recordings



Session Title

General Session (Morning)    

Michael Stack, Enis Söztutar

Welcome Message/State of Apache HBase (Slides | Recording) / Apache HBase at Yahoo! Scale (Slides)

Intro to HBase

Jesse Anderson

HBase: Just the Basics (Slides | Recording)


Tom Valine and Bhinav Sura (Salesforce)

Argus Production Monitoring at Salesforce (Slides | Recording)


Chris Larsen (Yahoo!)

Update on OpenTSDB and AsyncHBase (Slides | Recording)


Graham Baecher & Patrick Dignan (HubSpot)

Solving Multi-tenancy and G1GC in Apache HBase (Slides | Recording)


Viplava Madasu (HPE) and Michael Stack (Cloudera) / Liqi Yi and Shylaja Kokoori (Intel)

BigBucket Cache, Texas Edition (Slides) / Breaking the Sound Barrier with Persistent Memory (Slides) (Recording)

Dev & Internals   

Duo Zhang and Liangliang He (Xiaomi)

Apache HBase Improvements and Practices at Xiaomi (Slides | Recording)


Anastasia Braginsky (Yahoo!)

Apache HBase: In-Memory Flush and Compaction (Slides | Recording)


Deepankar Reddy and Ishan Chhabra (Rocket Fuel)

Tales from Taming the Long Tail (Slides | Recording)


Yu Li and Shaoxuan Wang (Alibaba)

Apache HBase in Alibaba Search (Slides | Recording)


Anoop Sam John and Ramkrishna Vasudevan (Intel)

Off-heaping the Apache HBase Read Path (Slides | Recording)


Nitin Verma, Pravin Mittal, and Maxim Lukiyanov (Microsoft)    

Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight (Slides | Recording)


Ted Malaska (Cloudera), Jean-Marc Spaggiari (Cloudera), Zhan Zhang (Hortonworks)

Apache Spark on Apache HBase: Current and Future (Slides | Recording)


Swarnim Kulkarni (Cerner)

Apache HBase in the Enterprise Data Hub at Cerner (Slides | Recording)


Jingwei Lu and Jason Zhang (Airbnb)

Apache HBase at Airbnb (Slides | Recording)


Partha Saha and CW Chung (Visa)

Rolling Out Apache HBase for Mobile Offerings at Visa (Slides | Recording)


Vladimir Rodionov (Hortonworks) / Clara Xiong (Flurry/Yahoo!)

Time-Series Apache HBase (Slides) / Date-tiered Compaction Policy for Time-series Data (Slides) (Recording)


Hongbin Ma and Luke Han (Kyligence) / Rohit Jain (Esgyn)

Apache Kylin’s Performance Boost from Apache HBase (Slides) / In Search of Database Nirvana: Challenges of Delivering HTAP (Slides) (Recording)


James Taylor (Salesforce) and Maryann Xue (Intel)

Apache Phoenix: Use Cases and New Features (Slides | Recording)

General Session (Afternoon)

Lars Hofhansl (Salesforce), Matteo Bertozzi (Cloudera), John Leach (Splice Machine), Maxim Lukiyanov (Microsoft), Matt Mullins (Facebook), and Carter Page (Google)

Panel: Future of Apache HBase (Slides | Recording)

2015 - Presentations & Recordings



Session Title

General Session    

Andrew Purtell, Michael Stack, Enis Söztutar, Carter Page, Raghavendra Prabhu, Xun Liu, Matthew Hunt, Sudarshan Kadambi

Welcome Messages / State of HBase (Slides) / Bigtable at Google / Zen @ Pinterest (Slides) / HBase @ Bloomberg (Slides) (Recording)

Intro to HBase

Jesse Anderson

HBase: Just the Basics (Slides | Recording)


Rahul Gidwandi, Ian Friedman (Yahoo!)

HBase Operations in a Flurry (Slides | Recording)


Jeremy Carroll, Tian-Ying Chang (Pinterest)

HBase at Scale in an Online and High-Demand Environment (Slides | Recording)


Chris Larsen (Yahoo!), Benoit Sigoure (Arista Networks)

OpenTSDB and AsyncHBase Update (Slides | Recording)


Francis Liu, Vandana Ayyalasomayajula, Virag Kothari (Yahoo!)

Multitenancy in HBase: Learnings from Yahoo! (Slides | Recording)


Clay Baenziger (Bloomberg), Jeremy Carroll (Pinterest), Elliott Clark (Facebook), Dave Coyle (Dropbox), Max Luebbe (Google), Joey Parsons (Flipboard)

Smooth Operators Panel (Recording)


Shaohui Liu, Jianwei Cui (Xiaomi)

HBase Operations at Xiaomi (Slides | Recording)


Cosmin Lehene (Adobe)

Elastic HBase on Mesos (Slides | Recording)


Nitin Aggarwal, Ishan Chhabra (Rocket Fuel)

DeathStar: Easy, Dynamic, Multi-tenant HBase via YARN (Slides | Recording)

Dev & Internals   

Enis Söztutar (Hortonworks), Solomon Duskis (Google)

Meet HBase 1.0 (Slides | Recording)


Matteo Bertozzi (Cloudera), Sean Busbey (Cloudera), Jingcheng Du (Intel), Lars Hofhansl (Salesforce), Jon Hsieh (Cloudera), Enis Söztutar (Hortonworks), Jimmy Xiang (Cloudera)

HBase 2.0 and Beyond: Panel (Slides | Recording)


Lars Hofhansl (Salesforce)

HBase Performance Tuning @ Salesforce (Slides | Recording)


Abraham Elmahrek (Cloudera), Colin McCabe (Cloudera)

Solving HBase Performance Problems with Apache HTrace (Slides | Recording)


Ted Malaska (Cloudera)

HBase and Spark (Slides | Recording)


David Mackenzie (Box)    

Events @ Box: Using HBase as a Message Queue (Slides | Recording)


Misty Stanley-Jones (Cloudera)

State of HBase Docs and How to Contribute (Slides | Recording)


Gary Helmling (Cask Data)

Reusable Data Access Patterns with CDAP Datasets (Slides | Recording)


Eric Kaczmarek (Intel), Liqi Yi (Intel)

Taming GC Pauses for Large Java Heap in HBase (Slides | Recording)


Ido Karavany (Intel)

HBase as an IoT Stream Analytics Platform for Parkinson's Disease Research (Slides | Recording)


James Taylor (Salesforce), Maryann Xue (Intel)

Apache Phoenix: The Evolution of a Relational Database Layer over HBase (Slides | Recording)


Swarnim Kulkarni (Cerner), Brock Noland (StreamSets), Nick Dimiduk (Hortonworks)

Analyzing HBase Data with Apache Hive (Slides | Recording)


Seshu Adunuthula (eBay)

Apache Kylin: Extreme OLAP Engine for Hadoop (Slides | Recording)


Maxim Lukiyanov (Microsoft), Ashit Gosalia (Microsoft)

Optimizing HBase for the Cloud in Microsoft Azure HDInsight (Slides | Recording)


Jimmy Lin (University of Maryland)

Warcbase: Scaling 'Out' and 'Down' HBase for Web Archiving (Slides | Recording)


Anoop Sharma (HP), Rohit Jain (HP)

Trafodion: Integrating Operational SQL into HBase (Slides | Recording)


Julian Hyde (Hortonworks), Rohit Jain (HP), Dr. Ricardo Jimenez-Peris (LeanXScale), John Leach (Splice Machine), Jacques Nadeau (MapR), James Taylor (Salesforce)

SQL-on-HBase Smackdown: Panel (Recording)

Use Cases

Sang Chi, Jason Culverhouse, Matt Blair (Flipboard)

HBase @ Flipboard (Slides | Recording)


Aaron Carreras (FINRA)     

Graph Processing of Stock Market Order Flow in HBase on AWS (Slides | Recording)


Andrey Gusev (Sift Science)

Running ML Infrastructure on HBase (Slides | Recording)


Shyam Nath (GE), Arnab Guin (GE)

Industrial Internet Case Study using HBase and TSDB (Slides | Recording)


Doyung Yoon (DaumKakao), Taejin Chin (DaumKakao)

S2Graph: A Large-scale Graph Database with HBase (Slides | Recording)


Toshihiro Suzuki (CyberAgent), Hirotaka Kakishima (CyberAgent)

HBase @ CyberAgent (Slides | Recording)


Ishan Chhabra (Rocket Fuel), Nitin Aggarwal (Rocket Fuel), Venkata Deepankar Duvvuru (Rocket Fuel)

Blackbird Collections: In-situ Stream Processing in HBase (Slides | Recording)


Alan Steckley (Salesforce), Poorna Chandra (Cask Data)

NRT Event Processing with Guaranteed Delivery of HTTP Callbacks (Slides | Recording)

2014 - Presentations & Recordings



Session Title

General Session    

Michael Stack, Amr Awadallah, Carter Page, Liyin Tang, Lars Hofhansl

Welcome Messages / Bigtable at Google / HBase @ Salesforce.com (Recording)

Intro to HBase

Jesse Anderson

HBase: Just the Basics (Slides | Recording)


Dheeraj Kapur, Rajiv Chittajallu & Anish Mathew (Yahoo!)

Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity (Slides | Recording)


Jean-Daniel Cryans (Cloudera)

The State of HBase Replication (Slides | Recording)


Bryan Beaudreault (HubSpot)

Real-time HBase: Lessons from the Cloud (Slides | Recording)


Kevin O'Dell, Aleksandr Shulman & Kathleen Ting (Cloudera)

Tales from the Cloudera Field (Slides | Recording)


Jesse Yates (Salesforce.com), Demai Ni, Richard Ding & Jing Chen He (IBM)

HBase Backups (Slides | Recording)


Shreeganesh Ramanan and Mike Davis (Optimizely)

From MongoDB to HBase in Six Easy Months (Slides | Recording)


Moderated by Eric Sammer (Scaling Data)

"Smooth Operators" Panel: Jeremy Carroll (Pinterest), Adam Frank (Flurry), and Paul Tuckfield (Facebook) (Recording)

Features & Internals   

Enis Söztutar and Devaraj Das (Hortonworks)

HBase Read High Availability Using Timeline-Consistent Region Replicas (Slides | Recording)


Andrew Purtell and Ramkrishna Vasudevan (Intel)

New Security Features in Apache HBase 0.98: An Operator's Guide (Slides | Recording)


Eric Chang (Opower) and Jean-Daniel Cryans (Cloudera)

Bulk Loading in the Wild: Ingesting the World's Energy Data (Slides | Recording)


Liang Xie and Honghua Feng (Xiaomi)

HBase at Xiaomi (Slides | Recording)


Nick Dimiduk (Hortonworks) and Nicolas Liochon (Scaled Risk)

HBase: Where Online Meets Low Latency (Slides | Recording)


Lars Hofhansl, Andrew Purtell, Enis Söztutar, Michael Stack, and Liyin Tang    

Meet the Release Managers (Slides | Recording)


Vladimir Rodionov (bigbase.org)

HBase: Extreme Makeover (Slides | Recording)


Eli Levine, James Taylor (Salesforce.com) & Maryann Xue (Intel)

Taming HBase with Apache Phoenix and SQL (Slides | Recording)


Pete Matern and Jonathan Colt (Jive Software)

Tasmo: Building HBase Applications From Event Streams (Slides | Recording)


Jingcheng Du and Ramkrishna Vasudevan (Intel)

Cross-Site BigTable using HBase (Slides | Recording)


Jonathan Natkins (WibiData)

Design Patterns for Building 360-degree Views with HBase and Kiji (Slides | Recording)


Adam Warrington (Cloudera)

HBase Data Modeling and Access Patterns with Kite SDK (Slides | Recording)


Chris Larsen (Limelight Networks) and Benoit Sigoure (Arista Networks)

OpenTSDB 2.0 (Slides | Recording)


Manukranth Kolloju (Facebook)

Presto + HBase: A Distributed SQL Query Execution Engine on Top of HBase (No slides or Recording)

Case Studies

Eric Czech and Alec Zopf (Next Big Sound)

Data Evolution in HBase (Slides | Recording)


Ishan Chhabra, Shrijeet Paliwal & Abhijit Pol (Rocket Fuel)     

Blackbird: Storing Billions of Rows a Couple of Milliseconds Away (Slides | Recording)


Daniel Nelson (Nielsen)

Content Identification using HBase (Slides | Recording)


Ron Buckley (OCLC)

Digital Library Collection Management using HBase (Slides | Recording)


Sudarshan Kadambi and Matthew Hunt (Bloomberg LP)

HBase at Bloomberg (Slides | Recording)


Francis Liu (Yahoo!)

HBase Design Patterns @ Yahoo! (Slides | Recording)


Varun Sharma (Pinterest)

Large-scale Web Apps @ Pinterest (Slides | Recording)


Chris Huang and Scott Miao (Trend Micro)

A Graph Service for Global Web Entities Traversal and Reputation Evaluation Based on HBase (Slides | Recording)


Lars George and Jon Hsieh (Cloudera)

A Survey of HBase Application Archetypes (Slides | Recording)

2013 - Presentations & Recording



Session Title

General Session    

Michael Stack

Welcome (Recording)


Amr Awadallah

The Apache HBase Community: Best Ever and Getting Better (Recording)


Michael Stack & Lars Hofhansl    

State of the Apache HBase Union (Recording)


Aaron Kimball

The Apache HBase Ecosystem (Recording)


Liyin Tang

Overview of Apache HBase at Facebook


Amitanan Aiyer

Reliability: More 9′s for Apache HBase


Jeremy Carroll

Apache HBase Operations at Pinterest (Slides | Recording)


Jonathan Creasy & Geoff Anderson

OpenTSDB at Box (Slides | Recording)


JD Cryans & Kevin O’dell

Apache HBase, Meet Ops. Ops, Meet HBase (Slides | Recording)


Matt Kennedy & Torben Mathiasen

Apache HBase on Flash (Slides | Recording)


Benoit Sigoure

Scalable Network Designs for Apache HBase (Slides | Recording)


Moderated by Eric Sammer

Panel: Jeremy Carroll (Pinterest), Rajiv Chittajallu (Yahoo!), Dave Latham (Flurry), Alex Levchuk (Facebook) (Recording)


Devaraj Das & Nicolas Liochon

How to Get the MTTR Below 1 Minute and More (Slides | Recording)


Jonathan Hsieh & Matteo Bertozzi & Jesse Yates

Apache HBase Table Snapshots (Slides | Recording)


Sergey Shelukhin

Compaction Improvements in Apache HBase (Slides | Recording)


Enis Söztutar

Apache HBase and HDFS: Understanding Filesystem Usage in HBase (Slides | Recording)


Chris Trezzo

Apache HBase Replication (Slides | Recording)


Ian Varley

1500 JIRAs in 20 Minutes (Slides | Recording)


John Weatherford

A Developer's Guide to Coprocessors (Slides | Recording)


Moderated by Todd Lipcon

Panel: Nick Dimiduk (Hortonworks), Jonathan Gray (Continuuity), Lars Hofhansl (Salesforce.com), Andrew Purtell (Intel) (Recording)


Dibyendu Bhattacharya

Using Coprocessors to Index Columns in an Elasticsearch Cluster (Slides | Recording)


Dan Burkert

Honeycomb: MySQL Backed by Apache HBase (Slides | Recording)


Gokhan Capan

Apache HBase for Dealing with Large Matrices (Slides | Recording)


Elliott Clark

Impala: Using SQL to Extract Value from Apache HBase (Recording)


Elliott Clark

Using Metrics to Monitor and Debug Apache HBase (Slides | Recording)


Lars George & Andrew Wang

Project Valta – A Resource Management Layer over Apache HBase (Slides | Recording)


Jacques Nadeau

Apache Drill – A Community Driven Initiative to Deliver ANSI SQL Capabilities for HBase (Slides | Recording)


Jonathan Natkins & Juliet Hougland

Real-Time Model Scoring in Recommender Systems (Slides | Recording)


Andreas Neumann & Alex Baranau

High-Throughput, Transactional Stream Processing on Apache HBase (Slides | Recording)


Steven Noels

HBase SEP: Reliable Maintenance of Auxiliary Index Structures (Slides | Recording)


Hari Shreedharan

Streaming Data into Apache HBase Using Flume (Slides | Recording)


Enis Söztutar & Ashutosh Chauhan

Integration of Apache Hive and Apache HBase (Slides | Recording)


James Taylor

How (and why) Phoenix Puts the SQL Back into NoSQL (Slides | Recording)


Maryann Xue

Full-text Indexing for Apache HBase (Slides | Recording)

Case Studies

Swati Agarwal & Raj Stanneru

Near Real Time Indexing for eBay Search (Slides | Recording)


Murtaza Doctor & Giang Nguyen    

Real-time User Segmentation using Apache HBase: Architectural Case Study (Slides | Recording)


Neil Ferguson

Mixing Low Latency with Analytical Workloads for Customer Experience Management (Slides | Recording)


Ameya Kanitkar

Deal Personalization Engine with HBase (Slides | Recording)


Manoj Khanwalkar & Govind Asawa

ETL for Apache HBase (Slides | Recording)


Doug Meil

Evolving a First-Generation Apache HBase Deployment to Second Generation and Beyond (Slides | Recording)


Jeremy Pollack

Apache HBase, Apache Hadoop, DNA and YOU! (Slides | Recording)


Robert Roland

Rebuilding for Scale on Apache HBase (Slides | Recording)


Varun Sharma

Apache HBase at Pinterest: Scaling Our Feed Storage (Slides | Recording)


Francis Liu & Sumeet Singh

Multi-tenant Apache HBase at Yahoo! (Recording)


Suman Srinivasan

Apache Hadoop and Apache HBase for Real-Time Recording Analytics (Slides | Recording)


Jay Talreja

Being Smarter than the Smart Meter – Cloud Operational Grid Analytics (Slides | Recording)

2012 - Presentations & Recordings



Session Title

General Session    

Michael Stack & Mike Olson    



Amr Awadallah

The Apache HBase Community: Best Ever and Getting Better


Karthik Raganathan

HBase at Facebook


Ryan Thiessen

Case Study of HBase Operations at Facebook


Sunil Sitaula & Madhuwanti Vaidya

HBase Backup


Lars George

HBase Coprocessors - Deploy Shared Functionality Directly on the Cluster (Slides | Recording)


Jeff Bean, Jonathan Hsieh & Kathleen Ting

Supporting HBase: How to Stabilize, Diagnose, and Repair (Recording)


Andrew Purtell

HBase Security for the Enterprise (Slides | Recording)


Moderated by Eric Sammer

Panel: Aravind Gottipati (StumbleUpon), Dave Latham (Flurry), Ryan Thiessen (Facebook) (No recording)


David Wang

Lightning Talk | Base Metrics: What They Mean to You (Slides | Recording)


Robert Berger

Lightning Talk | Orchestrating Clusters with Ironfan and Chef (Slides | Recording)


Elliott Clark

Lightning Talk | Unique Sets on HBase and Hadoop (Slides | Recording)


Rick Tucker

Lightning Talk | Developing Real Time Analytics Applications Using HBase in the Cloud (Slides | Recording)


Todd Lipcon

HBase and HDFS: Past, Present, and Future (Slides | Recording)


Lars George

HBase Filtering (Slides | Recording)


Lars Hofhansl

Learning HBase Internals (Slides | Recording)


Benoit Sigoure

Lessons learned from OpenTSDB (Slides | Recording)


Ian Varley

HBase Schema Design (Slides | Recording)


Mikhail Bautin

HBase Performance Tuning and Optimizations (No recording)


Aaron Kimball

Lightning Talk | Living Data: Applying Adaptable Schemas to HBase (Slides | Recording)


Berk Demir

Lightning Talk | Content Addressable Storages for Fun and Profit (Slides | Recording)


Kyungseog Oh

Lightning Talk | Solbase (Slides | Recording)


Francis Liu

Lightning Talk | Relaxed Transactions for HBase (Slides | Recording)

Applications 1

Jacques Nadeau

Building a Large Search Platform on a Shoestring Budget (Slides | Recording)


Brent Halsey

Real Performance Gains With Real Time Data (Recording)


Nate Putnam

Building Mobile Infrastructure with HBase (Slides | Recording)


Cosmin Lehene

Low Latency "OLAP" with HBase (Slides | Recording)


Blake Matheny

Growing Your Inbox, HBase at Tumblr (Slides | Recording)


Alex Newman

Overcoming Data Deluge with HBase to Help Save the Environment (Slides | Recording)


Vrushali Channapattan

Lightning Talk | HBase powered Merchant Lookup Service at Intuit (Slides | Recording)


Thomas Pan

Lightning Talk | HBase, the Use Case in eBay Cassini (Slides | Recording)


Nick Dimiduk

Lightning Talk | Scaling GIS in Three Acts (Slides | Recording)

Applications 2

Gupta Gogula & Suraj Varma

Gap Inc Direct: Serving Apparel Catalog from HBase for Live Website (Slides | Recording)


Anshuman Singh

Facebook Messages Application Server Using HBase (No recording)


Dan Lynn

Storing and Manipulating Graphs in HBase (Slides | Recording)


Stanislav Barton

Mignify: A Big Data Refinery Built on HBase (Slides | Recording)


Doug Meil

Real-Time and Batch HBase for Healthcare at Explorys (Slides | Recording)


Alex Baranau

Real-time Analytics with HBase (Slides | Recording)


Satnam Alag

Lightning Talk | Leveraging HBase for the World's Largest Curated Genomic Data Collection (Slides | Recording)


Chris Niemiera

Lightning Talk | You've got HBase! How AOL Mail Handles Big Data (Slides | Recording)


Steven Noels

Lightning Talk | Getting Real about Interactive Big Data Management with Lily & HBase (Slides | Recording)


Ron Buckley

Lightning Talk | HBase for the World's Libraries (Slides | Recording)

Conduct ◆◆◆

HBaseCon is dedicated to providing a harassment-free conference experience for everyone, regardless of gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, race, or religion. We do not tolerate harassment of conference participants in any form. Conference participants violating these rules may be sanctioned or expelled from the conference without a refund at the discretion of the conference organizers.

Harassment includes offensive verbal comments related to gender, gender identity and expression, sexual orientation, disability, physical appearance, body size, race, religion, sexual images in public spaces, deliberate intimidation, stalking, following, harassing photography or recording, sustained disruption of talks or other events, inappropriate physical contact, and unwelcome sexual attention. Participants asked to stop any harassing behavior are expected to comply immediately.

If a participant engages in harassing behavior, the conference organizers may take any action they deem appropriate, including warning the offender or expulsion from the conference with no refund. If you are being harassed, notice that someone else is being harassed, or have any other concerns, please contact a member of conference staff immediately. Conference staff can be identified by special badges.

Conference staff will be happy to help participants contact hotel/venue security or local law enforcement, provide escorts, or otherwise assist those experiencing harassment to feel safe for the duration of the conference. We value your attendance.

We expect participants to follow these rules at all conference venues and conference-related social events.