@InterfaceAudience.Private public class RegionSplitter extends Object
RegionSplitter class provides several utilities to help in the
 administration lifecycle for developers who choose to manually split regions
 instead of having HBase handle that automatically. The most useful utilities
 are:
 
Both operations can be safely done on a live server.
 Question: How do I turn off automatic splitting? 
 Answer: Automatic splitting is determined by the configuration value
 HConstants.HREGION_MAX_FILESIZE. It is not recommended that you set this
 to Long.MAX_VALUE in case you forget about manual splits. A suggested setting
 is 100GB, which would result in > 1hr major compactions if reached.
 
 Question: Why did the original authors decide to manually split? 
 Answer: Specific workload characteristics of our use case allowed us
 to benefit from a manual split system.
 
 Question: Why is manual splitting good for this workload? 
 Answer: Although automated splitting is not a bad option, there are
 benefits to manual splitting.
 
 Question: What's the optimal number of pre-split regions to create? 
 Answer: Mileage will vary depending upon your application.
 
The short answer for our application is that we started with 10 pre-split regions / server and watched our data growth over time. It's better to err on the side of too little regions and rolling split later.
 The more complicated answer is that this depends upon the largest storefile
 in your region. With a growing data size, this will get larger over time. You
 want the largest region to be just big enough that the
 HStore compact
 selection algorithm only compacts it due to a timed major. If you don't, your
 cluster can be prone to compaction storms as the algorithm decides to run
 major compactions on a large series of regions all at once. Note that
 compaction storms are due to the uniform data growth, not the manual split
 decision.
 
If you pre-split your regions too thin, you can increase the major compaction interval by configuring HConstants.MAJOR_COMPACTION_PERIOD. If your data size grows too large, use this script to perform a network IO safe rolling split of all regions.
| Modifier and Type | Class and Description | 
|---|---|
| static class  | RegionSplitter.DecimalStringSplitThe format of a DecimalStringSplit region boundary is the ASCII representation of
 reversed sequential number, or any other uniformly distributed decimal value. | 
| static class  | RegionSplitter.HexStringSplitHexStringSplit is a well-known  RegionSplitter.SplitAlgorithmfor choosing region
 boundaries. | 
| static class  | RegionSplitter.NumberStringSplit | 
| static interface  | RegionSplitter.SplitAlgorithmA generic interface for the RegionSplitter code to use for all it's
 functionality. | 
| static class  | RegionSplitter.UniformSplitA SplitAlgorithm that divides the space of possible keys evenly. | 
| Modifier and Type | Field and Description | 
|---|---|
| private static org.slf4j.Logger | LOG | 
| Constructor and Description | 
|---|
| RegionSplitter() | 
| Modifier and Type | Method and Description | 
|---|---|
| (package private) static void | createPresplitTable(TableName tableName,
                   RegionSplitter.SplitAlgorithm splitAlgo,
                   String[] columnFamilies,
                   org.apache.hadoop.conf.Configuration conf) | 
| private static int | getRegionServerCount(Connection connection)Alternative getCurrentNrHRS which is no longer available. | 
| (package private) static LinkedList<Pair<byte[],byte[]>> | getSplits(Connection connection,
         TableName tableName,
         RegionSplitter.SplitAlgorithm splitAlgo) | 
| private static Pair<org.apache.hadoop.fs.Path,org.apache.hadoop.fs.Path> | getTableDirAndSplitFile(org.apache.hadoop.conf.Configuration conf,
                       TableName tableName) | 
| static void | main(String[] args)The main function for the RegionSplitter application. | 
| static RegionSplitter.SplitAlgorithm | newSplitAlgoInstance(org.apache.hadoop.conf.Configuration conf,
                    String splitClassName) | 
| private static byte[] | readFile(org.apache.hadoop.fs.FileSystem fs,
        org.apache.hadoop.fs.Path path) | 
| (package private) static void | rollingSplit(TableName tableName,
            RegionSplitter.SplitAlgorithm splitAlgo,
            org.apache.hadoop.conf.Configuration conf) | 
| (package private) static LinkedList<Pair<byte[],byte[]>> | splitScan(LinkedList<Pair<byte[],byte[]>> regionList,
         Connection connection,
         TableName tableName,
         RegionSplitter.SplitAlgorithm splitAlgo) | 
private static final org.slf4j.Logger LOG
public RegionSplitter()
public static void main(String[] args) throws IOException, InterruptedException, org.apache.hbase.thirdparty.org.apache.commons.cli.ParseException
args - Usage: RegionSplitter <TABLE> <SPLITALGORITHM>
          <-c <# regions> -f <family:family:...> | -r
          [-o <# outstanding splits>]>
          [-D <conf.param=value>]IOException - HBase IO problemInterruptedException - user requested exitorg.apache.hbase.thirdparty.org.apache.commons.cli.ParseException - problem parsing user inputstatic void createPresplitTable(TableName tableName, RegionSplitter.SplitAlgorithm splitAlgo, String[] columnFamilies, org.apache.hadoop.conf.Configuration conf) throws IOException, InterruptedException
IOExceptionInterruptedExceptionprivate static int getRegionServerCount(Connection connection) throws IOException
connection - IOException - if a remote or network exception occursprivate static byte[] readFile(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path) throws IOException
IOExceptionstatic void rollingSplit(TableName tableName, RegionSplitter.SplitAlgorithm splitAlgo, org.apache.hadoop.conf.Configuration conf) throws IOException, InterruptedException
IOExceptionInterruptedExceptionpublic static RegionSplitter.SplitAlgorithm newSplitAlgoInstance(org.apache.hadoop.conf.Configuration conf, String splitClassName) throws IOException
IOException - if the specified SplitAlgorithm class couldn't be
 instantiatedstatic LinkedList<Pair<byte[],byte[]>> splitScan(LinkedList<Pair<byte[],byte[]>> regionList, Connection connection, TableName tableName, RegionSplitter.SplitAlgorithm splitAlgo) throws IOException, InterruptedException
IOExceptionInterruptedExceptionprivate static Pair<org.apache.hadoop.fs.Path,org.apache.hadoop.fs.Path> getTableDirAndSplitFile(org.apache.hadoop.conf.Configuration conf, TableName tableName) throws IOException
conf - tableName - IOException - if a remote or network exception occursstatic LinkedList<Pair<byte[],byte[]>> getSplits(Connection connection, TableName tableName, RegionSplitter.SplitAlgorithm splitAlgo) throws IOException
IOExceptionCopyright © 2007–2021 The Apache Software Foundation. All rights reserved.