public class WARCInputFormat extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,WARCWritable>
LongWritable
(which is 1 for the first record in
a file, 2 for the second record, etc.) and a value of WARCWritable
.Modifier and Type | Class and Description |
---|---|
private static class |
WARCInputFormat.WARCReader |
Constructor and Description |
---|
WARCInputFormat() |
Modifier and Type | Method and Description |
---|---|
org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,WARCWritable> |
createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
org.apache.hadoop.mapreduce.TaskAttemptContext context)
Opens a WARC file (possibly compressed) for reading, and returns a RecordReader for accessing
it.
|
protected boolean |
isSplitable(org.apache.hadoop.mapreduce.JobContext context,
org.apache.hadoop.fs.Path filename)
Always returns false, as WARC files cannot be split.
|
addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputDirRecursive, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, getSplits, listStatus, makeSplit, makeSplit, setInputDirRecursive, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize
public WARCInputFormat()
public org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.LongWritable,WARCWritable> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context) throws IOException, InterruptedException
createRecordReader
in class org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.LongWritable,WARCWritable>
IOException
InterruptedException
protected boolean isSplitable(org.apache.hadoop.mapreduce.JobContext context, org.apache.hadoop.fs.Path filename)
isSplitable
in class org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.LongWritable,WARCWritable>
Copyright © 2007–2020 The Apache Software Foundation. All rights reserved.