@InterfaceAudience.Public public class RoundRobinTableInputFormat extends TableInputFormat
TableInputFormat
(TIF) so as to undo any clumping of
InputSplit
s around RegionServers. Spread splits broadly to distribute read-load over
RegionServers in the cluster. The super-class TIF returns splits in hbase:meta table order.
Adjacent or near-adjacent hbase:meta Regions can be hosted on the same RegionServer -- nothing
prevents this. This hbase:maeta ordering of InputSplit placement can be lumpy making it so some
RegionServers end up hosting lots of InputSplit scans while contemporaneously other RegionServers
host few or none. This class does a pass over the return from the super-class to better spread
the load. See the below helpful Flipkart blog post for a description and from where the base of
this code comes from (with permission).https://tech.flipkart.com/is-data-locality-always-out-of-the-box-in-hadoop-not-really-2ae9c95163cb
INPUT_TABLE, SCAN, SCAN_BATCHSIZE, SCAN_CACHEBLOCKS, SCAN_CACHEDROWS, SCAN_COLUMN_FAMILY, SCAN_COLUMNS, SCAN_MAXVERSIONS, SCAN_ROW_START, SCAN_ROW_STOP, SCAN_TIMERANGE_END, SCAN_TIMERANGE_START, SCAN_TIMESTAMP, SHUFFLE_MAPS
MAPREDUCE_INPUT_AUTOBALANCE, MAX_AVERAGE_REGION_SIZE, NUM_MAPPERS_PER_REGION
Constructor and Description |
---|
RoundRobinTableInputFormat() |
Modifier and Type | Method and Description |
---|---|
List<org.apache.hadoop.mapreduce.InputSplit> |
getSplits(org.apache.hadoop.mapreduce.JobContext context)
Calculates the splits that will serve as input for the map tasks.
|
static void |
main(String[] args)
Pass table name as argument.
|
addColumns, configureSplitTable, createScanFromConfiguration, getConf, getStartEndKeys, initialize, setConf
calculateAutoBalancedSplits, closeTable, createNInputSplitsUniform, createRecordReader, getAdmin, getRegionLocator, getScan, getTable, includeRegionInSplit, initializeTable, setScan, setTableRecordReader
public RoundRobinTableInputFormat()
public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context) throws IOException
TableInputFormat
getSplits
in class TableInputFormat
context
- The current job context.IOException
- When creating the list of splits fails.InputFormat.getSplits( org.apache.hadoop.mapreduce.JobContext)
public static void main(String[] args) throws IOException
IOException
Copyright © 2007–2020 The Apache Software Foundation. All rights reserved.