@InterfaceAudience.Public public class RoundRobinTableInputFormat extends TableInputFormat
TableInputFormat (TIF) so as to undo any clumping of
 InputSplits around RegionServers. Spread splits broadly to distribute read-load over
 RegionServers in the cluster. The super-class TIF returns splits in hbase:meta table order.
 Adjacent or near-adjacent hbase:meta Regions can be hosted on the same RegionServer -- nothing
 prevents this. This hbase:maeta ordering of InputSplit placement can be lumpy making it so some
 RegionServers end up hosting lots of InputSplit scans while contemporaneously other RegionServers
 host few or none. This class does a pass over the return from the super-class to better spread
 the load. See the below helpful Flipkart blog post for a description and from where the base of
 this code comes from (with permission).https://tech.flipkart.com/is-data-locality-always-out-of-the-box-in-hadoop-not-really-2ae9c95163cb| Modifier and Type | Field and Description | 
|---|---|
| (package private) static String | HBASE_REGIONSIZECALCULATOR_ENABLEBoolean config for whether superclass should produce InputSplits with 'lengths'. | 
| private Boolean | hbaseRegionsizecalculatorEnableOriginalValue | 
INPUT_TABLE, SCAN, SCAN_BATCHSIZE, SCAN_CACHEBLOCKS, SCAN_CACHEDROWS, SCAN_COLUMN_FAMILY, SCAN_COLUMNS, SCAN_MAXVERSIONS, SCAN_ROW_START, SCAN_ROW_STOP, SCAN_TIMERANGE_END, SCAN_TIMERANGE_START, SCAN_TIMESTAMP, SHUFFLE_MAPSMAPREDUCE_INPUT_AUTOBALANCE, MAX_AVERAGE_REGION_SIZE, NUM_MAPPERS_PER_REGION| Constructor and Description | 
|---|
| RoundRobinTableInputFormat() | 
| Modifier and Type | Method and Description | 
|---|---|
| (package private) void | configure()Adds a configuration to the Context disabling remote rpc'ing to figure Region size when
 calculating InputSplits. | 
| List<org.apache.hadoop.mapreduce.InputSplit> | getSplits(org.apache.hadoop.mapreduce.JobContext context)Calculates the splits that will serve as input for the map tasks. | 
| (package private) List<org.apache.hadoop.mapreduce.InputSplit> | getSuperSplits(org.apache.hadoop.mapreduce.JobContext context)Call super-classes' getSplits. | 
| static void | main(String[] args)Pass table name as argument. | 
| (package private) List<org.apache.hadoop.mapreduce.InputSplit> | roundRobin(List<org.apache.hadoop.mapreduce.InputSplit> inputs)Spread the splits list so as to avoid clumping on RegionServers. | 
| (package private) void | unconfigure() | 
addColumns, configureSplitTable, createScanFromConfiguration, getConf, getStartEndKeys, initialize, setConfcalculateAutoBalancedSplits, closeTable, createNInputSplitsUniform, createRecordReader, createRegionSizeCalculator, getAdmin, getRegionLocator, getScan, getTable, includeRegionInSplit, initializeTable, reverseDNS, setScan, setTableRecordReaderprivate Boolean hbaseRegionsizecalculatorEnableOriginalValue
static String HBASE_REGIONSIZECALCULATOR_ENABLE
public RoundRobinTableInputFormat()
public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context) throws IOException
TableInputFormatgetSplits in class TableInputFormatcontext - The current job context.IOException - When creating the list of splits fails.InputFormat.getSplits( org.apache.hadoop.mapreduce.JobContext)List<org.apache.hadoop.mapreduce.InputSplit> getSuperSplits(org.apache.hadoop.mapreduce.JobContext context) throws IOException
IOExceptionList<org.apache.hadoop.mapreduce.InputSplit> roundRobin(List<org.apache.hadoop.mapreduce.InputSplit> inputs) throws IOException
IOExceptionvoid configure()
unconfigure()void unconfigure()
configure()public static void main(String[] args) throws IOException
IOExceptionCopyright © 2007–2020 The Apache Software Foundation. All rights reserved.