@InterfaceAudience.Public public class RoundRobinTableInputFormat extends TableInputFormat
TableInputFormat
(TIF) so as to undo any clumping of
InputSplit
s around RegionServers. Spread splits broadly to distribute read-load over
RegionServers in the cluster. The super-class TIF returns splits in hbase:meta table order.
Adjacent or near-adjacent hbase:meta Regions can be hosted on the same RegionServer -- nothing
prevents this. This hbase:maeta ordering of InputSplit placement can be lumpy making it so some
RegionServers end up hosting lots of InputSplit scans while contemporaneously other RegionServers
host few or none. This class does a pass over the return from the super-class to better spread
the load. See the below helpful Flipkart blog post for a description and from where the base of
this code comes from (with permission).https://tech.flipkart.com/is-data-locality-always-out-of-the-box-in-hadoop-not-really-2ae9c95163cb
Modifier and Type | Field and Description |
---|---|
(package private) static String |
HBASE_REGIONSIZECALCULATOR_ENABLE
Boolean config for whether superclass should produce InputSplits with 'lengths'.
|
private Boolean |
hbaseRegionsizecalculatorEnableOriginalValue |
INPUT_TABLE, SCAN, SCAN_BATCHSIZE, SCAN_CACHEBLOCKS, SCAN_CACHEDROWS, SCAN_COLUMN_FAMILY, SCAN_COLUMNS, SCAN_MAXVERSIONS, SCAN_ROW_START, SCAN_ROW_STOP, SCAN_TIMERANGE_END, SCAN_TIMERANGE_START, SCAN_TIMESTAMP, SHUFFLE_MAPS
MAPREDUCE_INPUT_AUTOBALANCE, MAX_AVERAGE_REGION_SIZE, NUM_MAPPERS_PER_REGION
Constructor and Description |
---|
RoundRobinTableInputFormat() |
Modifier and Type | Method and Description |
---|---|
(package private) void |
configure()
Adds a configuration to the Context disabling remote rpc'ing to figure Region size when
calculating InputSplits.
|
List<org.apache.hadoop.mapreduce.InputSplit> |
getSplits(org.apache.hadoop.mapreduce.JobContext context)
Calculates the splits that will serve as input for the map tasks.
|
(package private) List<org.apache.hadoop.mapreduce.InputSplit> |
getSuperSplits(org.apache.hadoop.mapreduce.JobContext context)
Call super-classes' getSplits.
|
static void |
main(String[] args)
Pass table name as argument.
|
(package private) List<org.apache.hadoop.mapreduce.InputSplit> |
roundRobin(List<org.apache.hadoop.mapreduce.InputSplit> inputs)
Spread the splits list so as to avoid clumping on RegionServers.
|
(package private) void |
unconfigure() |
addColumns, configureSplitTable, createScanFromConfiguration, getConf, getStartEndKeys, initialize, setConf
calculateAutoBalancedSplits, closeTable, createNInputSplitsUniform, createRecordReader, createRegionSizeCalculator, getAdmin, getRegionLocator, getScan, getTable, includeRegionInSplit, initializeTable, reverseDNS, setScan, setTableRecordReader
private Boolean hbaseRegionsizecalculatorEnableOriginalValue
static String HBASE_REGIONSIZECALCULATOR_ENABLE
public RoundRobinTableInputFormat()
public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context) throws IOException
TableInputFormat
getSplits
in class TableInputFormat
context
- The current job context.IOException
- When creating the list of splits fails.InputFormat.getSplits( org.apache.hadoop.mapreduce.JobContext)
List<org.apache.hadoop.mapreduce.InputSplit> getSuperSplits(org.apache.hadoop.mapreduce.JobContext context) throws IOException
IOException
List<org.apache.hadoop.mapreduce.InputSplit> roundRobin(List<org.apache.hadoop.mapreduce.InputSplit> inputs) throws IOException
IOException
void configure()
unconfigure()
void unconfigure()
configure()
public static void main(String[] args) throws IOException
IOException
Copyright © 2007–2020 The Apache Software Foundation. All rights reserved.