Class GroupingTableMapper
java.lang.Object
org.apache.hadoop.mapreduce.Mapper<ImmutableBytesWritable,Result,KEYOUT,VALUEOUT>
org.apache.hadoop.hbase.mapreduce.TableMapper<ImmutableBytesWritable,Result>
org.apache.hadoop.hbase.mapreduce.GroupingTableMapper
- All Implemented Interfaces:
org.apache.hadoop.conf.Configurable
@Public
public class GroupingTableMapper
extends TableMapper<ImmutableBytesWritable,Result>
implements org.apache.hadoop.conf.Configurable
Extract grouping columns from input record.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.Mapper
org.apache.hadoop.mapreduce.Mapper.Context
-
Field Summary
Modifier and TypeFieldDescriptionprotected byte[][]
The grouping columns.private org.apache.hadoop.conf.Configuration
The current configuration.static final String
JobConf parameter to specify the columns used to produce the key passed to collect from the map phase. -
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionprotected ImmutableBytesWritable
createGroupKey
(byte[][] vals) Create a key by concatenating multiple column values.protected byte[][]
Extract columns values from the current record.org.apache.hadoop.conf.Configuration
getConf()
Returns the current configuration.static void
initJob
(String table, Scan scan, String groupColumns, Class<? extends TableMapper> mapper, org.apache.hadoop.mapreduce.Job job) Use this before submitting a TableMap job.void
map
(ImmutableBytesWritable key, Result value, org.apache.hadoop.mapreduce.Mapper<ImmutableBytesWritable, Result, ImmutableBytesWritable, Result>.org.apache.hadoop.mapreduce.Mapper.Context context) Extract the grouping columns from value to construct a new key.void
setConf
(org.apache.hadoop.conf.Configuration configuration) Sets the configuration.Methods inherited from class org.apache.hadoop.mapreduce.Mapper
cleanup, run, setup
-
Field Details
-
GROUP_COLUMNS
JobConf parameter to specify the columns used to produce the key passed to collect from the map phase.- See Also:
-
columns
The grouping columns. -
conf
The current configuration.
-
-
Constructor Details
-
GroupingTableMapper
public GroupingTableMapper()
-
-
Method Details
-
initJob
public static void initJob(String table, Scan scan, String groupColumns, Class<? extends TableMapper> mapper, org.apache.hadoop.mapreduce.Job job) throws IOException Use this before submitting a TableMap job. It will appropriately set up the job.- Parameters:
table
- The table to be processed.scan
- The scan with the columns etc.groupColumns
- A space separated list of columns used to form the key used in collect.mapper
- The mapper class.job
- The current job.- Throws:
IOException
- When setting up the job fails.
-
map
public void map(ImmutableBytesWritable key, Result value, org.apache.hadoop.mapreduce.Mapper<ImmutableBytesWritable, Result, throws IOException, InterruptedExceptionImmutableBytesWritable, Result>.org.apache.hadoop.mapreduce.Mapper.Context context) Extract the grouping columns from value to construct a new key. Pass the new key and value to reduce. If any of the grouping columns are not found in the value, the record is skipped.- Overrides:
map
in classorg.apache.hadoop.mapreduce.Mapper<ImmutableBytesWritable,
Result, ImmutableBytesWritable, Result> - Parameters:
key
- The current key.value
- The current value.context
- The current context.- Throws:
IOException
- When writing the record fails.InterruptedException
- When the job is aborted.
-
extractKeyValues
Extract columns values from the current record. This method returns null if any of the columns are not found.Override this method if you want to deal with nulls differently.
- Parameters:
r
- The current values.- Returns:
- Array of byte values.
-
createGroupKey
Create a key by concatenating multiple column values.Override this function in order to produce different types of keys.
- Parameters:
vals
- The current key/values.- Returns:
- A key generated by concatenating multiple column values.
-
getConf
Returns the current configuration.- Specified by:
getConf
in interfaceorg.apache.hadoop.conf.Configurable
- Returns:
- The current configuration.
- See Also:
-
Configurable.getConf()
-
setConf
Sets the configuration. This is used to set up the grouping details.- Specified by:
setConf
in interfaceorg.apache.hadoop.conf.Configurable
- Parameters:
configuration
- The configuration to set.- See Also:
-
Configurable.setConf(org.apache.hadoop.conf.Configuration)
-