Class HFileInputFormat

java.lang.Object
org.apache.hadoop.mapreduce.InputFormat<K,V>
org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.NullWritable,Cell>
org.apache.hadoop.hbase.mapreduce.HFileInputFormat

@Private public class HFileInputFormat extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.NullWritable,Cell>
Simple MR input format for HFiles. This code was borrowed from Apache Crunch project. Updated to the recent version of HBase.
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    private static class 
    Record reader for HFiles.

    Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat

    org.apache.hadoop.mapreduce.lib.input.FileInputFormat.Counter
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    (package private) static final org.apache.hadoop.fs.PathFilter
    File filter that removes all "hidden" files.
    private static final org.slf4j.Logger
     

    Fields inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat

    DEFAULT_LIST_STATUS_NUM_THREADS, INPUT_DIR, INPUT_DIR_NONRECURSIVE_IGNORE_SUBDIRS, INPUT_DIR_RECURSIVE, LIST_STATUS_NUM_THREADS, NUM_INPUT_FILES, PATHFILTER_CLASS, SPLIT_MAXSIZE, SPLIT_MINSIZE
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    private static void
    addFilesRecursively(org.apache.hadoop.mapreduce.JobContext job, org.apache.hadoop.fs.FileStatus status, List<org.apache.hadoop.fs.FileStatus> result)
     
    org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.NullWritable,Cell>
    createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context)
     
    protected boolean
    isSplitable(org.apache.hadoop.mapreduce.JobContext context, org.apache.hadoop.fs.Path filename)
     
    protected List<org.apache.hadoop.fs.FileStatus>
    listStatus(org.apache.hadoop.mapreduce.JobContext job)
     

    Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat

    addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputDirRecursive, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, getSplits, makeSplit, makeSplit, setInputDirRecursive, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize, shrinkStatus

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • LOG

      private static final org.slf4j.Logger LOG
    • HIDDEN_FILE_FILTER

      static final org.apache.hadoop.fs.PathFilter HIDDEN_FILE_FILTER
      File filter that removes all "hidden" files. This might be something worth removing from a more general purpose utility; it accounts for the presence of metadata files created in the way we're doing exports.
  • Constructor Details

  • Method Details

    • listStatus

      protected List<org.apache.hadoop.fs.FileStatus> listStatus(org.apache.hadoop.mapreduce.JobContext job) throws IOException
      Overrides:
      listStatus in class org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.NullWritable,Cell>
      Throws:
      IOException
    • addFilesRecursively

      private static void addFilesRecursively(org.apache.hadoop.mapreduce.JobContext job, org.apache.hadoop.fs.FileStatus status, List<org.apache.hadoop.fs.FileStatus> result) throws IOException
      Throws:
      IOException
    • createRecordReader

      public org.apache.hadoop.mapreduce.RecordReader<org.apache.hadoop.io.NullWritable,Cell> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context) throws IOException, InterruptedException
      Specified by:
      createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.NullWritable,Cell>
      Throws:
      IOException
      InterruptedException
    • isSplitable

      protected boolean isSplitable(org.apache.hadoop.mapreduce.JobContext context, org.apache.hadoop.fs.Path filename)
      Overrides:
      isSplitable in class org.apache.hadoop.mapreduce.lib.input.FileInputFormat<org.apache.hadoop.io.NullWritable,Cell>