Class MultiRowRangeFilter


@Public public class MultiRowRangeFilter extends FilterBase
Filter to support scan multiple row key ranges. It can construct the row key ranges from the passed list which can be accessed by each region server. HBase is quite efficient when scanning only one small row key range. If user needs to specify multiple row key ranges in one scan, the typical solutions are: 1. through FilterList which is a list of row key Filters, 2. using the SQL layer over HBase to join with two table, such as hive, phoenix etc. However, both solutions are inefficient. Both of them can't utilize the range info to perform fast forwarding during scan which is quite time consuming. If the number of ranges are quite big (e.g. millions), join is a proper solution though it is slow. However, there are cases that user wants to specify a small number of ranges to scan (e.g. <1000 ranges). Both solutions can't provide satisfactory performance in such case. MultiRowRangeFilter is to support such usec ase (scan multiple row key ranges), which can construct the row key ranges from user specified list and perform fast-forwarding during scan. Thus, the scan will be quite efficient.
  • Field Details

  • Constructor Details

    • MultiRowRangeFilter

      Parameters:
      list - A list of RowRange
    • MultiRowRangeFilter

      public MultiRowRangeFilter(byte[][] rowKeyPrefixes)
      Constructor for creating a MultiRowRangeFilter from multiple rowkey prefixes. As MultiRowRangeFilter javadoc says (See the solution 1 of the first statement), if you try to create a filter list that scans row keys corresponding to given prefixes (e.g., FilterList composed of multiple PrefixFilters), this constructor provides a way to avoid creating an inefficient one.
      Parameters:
      rowKeyPrefixes - the array of byte array
  • Method Details

    • createRangeListFromRowKeyPrefixes

      private static List<MultiRowRangeFilter.RowRange> createRangeListFromRowKeyPrefixes(byte[][] rowKeyPrefixes)
    • getRowRanges

    • filterAllRemaining

      public boolean filterAllRemaining()
      Description copied from class: FilterBase
      Filters that never filter all remaining can inherit this implementation that never stops the filter early. If this returns true, the scan will terminate. Concrete implementers can signal a failure condition in their code by throwing an IOException.
      Overrides:
      filterAllRemaining in class FilterBase
      Returns:
      true to end scan, false to continue.
    • filterRowKey

      public boolean filterRowKey(Cell firstRowCell)
      Description copied from class: Filter
      Filters a row based on the row key. If this returns true, the entire row will be excluded. If false, each KeyValue in the row will be passed to Filter.filterCell(Cell) below. If Filter.filterAllRemaining() returns true, then Filter.filterRowKey(Cell) should also return true. Concrete implementers can signal a failure condition in their code by throwing an IOException.
      Overrides:
      filterRowKey in class FilterBase
      Parameters:
      firstRowCell - The first cell coming in the new row
      Returns:
      true, remove entire row, false, include the row (maybe).
    • filterKeyValue

      Deprecated.
      Description copied from class: Filter
      A way to filter based on the column family, column qualifier and/or the column value. Return code is described below. This allows filters to filter only certain number of columns, then terminate without matching ever column. If filterRowKey returns true, filterKeyValue needs to be consistent with it. filterKeyValue can assume that filterRowKey has already been called for the row. If your filter returns ReturnCode.NEXT_ROW, it should return ReturnCode.NEXT_ROW until Filter.reset() is called just in case the caller calls for the next row. Concrete implementers can signal a failure condition in their code by throwing an IOException.
      Overrides:
      filterKeyValue in class Filter
      Parameters:
      ignored - the Cell in question
      Returns:
      code as described below, Filter.ReturnCode.INCLUDE by default
      See Also:
    • filterCell

      public Filter.ReturnCode filterCell(Cell ignored)
      Description copied from class: Filter
      A way to filter based on the column family, column qualifier and/or the column value. Return code is described below. This allows filters to filter only certain number of columns, then terminate without matching ever column. If filterRowKey returns true, filterCell needs to be consistent with it. filterCell can assume that filterRowKey has already been called for the row. If your filter returns ReturnCode.NEXT_ROW, it should return ReturnCode.NEXT_ROW until Filter.reset() is called just in case the caller calls for the next row. Concrete implementers can signal a failure condition in their code by throwing an IOException.
      Overrides:
      filterCell in class Filter
      Parameters:
      ignored - the Cell in question
      Returns:
      code as described below
      See Also:
    • getNextCellHint

      public Cell getNextCellHint(Cell currentKV)
      Description copied from class: FilterBase
      Filters that are not sure which key must be next seeked to, can inherit this implementation that, by default, returns a null Cell. If the filter returns the match code SEEK_NEXT_USING_HINT, then it should also tell which is the next key it must seek to. After receiving the match code SEEK_NEXT_USING_HINT, the QueryMatcher would call this function to find out which key it must next seek to. Concrete implementers can signal a failure condition in their code by throwing an IOException.
      Overrides:
      getNextCellHint in class FilterBase
      Returns:
      KeyValue which must be next seeked. return null if the filter is not sure which key to seek to next.
    • toByteArray

      public byte[] toByteArray()
      Returns The filter serialized using pb
      Overrides:
      toByteArray in class FilterBase
      Returns:
      The filter serialized using pb
    • parseFrom

      public static MultiRowRangeFilter parseFrom(byte[] pbBytes) throws DeserializationException
      Parse a serialized representation of MultiRowRangeFilter
      Parameters:
      pbBytes - A pb serialized instance
      Returns:
      An instance of MultiRowRangeFilter
      Throws:
      DeserializationException - if an error occurred
      See Also:
    • areSerializedFieldsEqual

      Returns true if and only if the fields of the filter that are serialized are equal to the corresponding fields in other. Used for testing.
      Overrides:
      areSerializedFieldsEqual in class FilterBase
      Returns:
      true if and only if the fields of the filter that are serialized are equal to the corresponding fields in other. Used for testing.
    • sortAndMerge

      sort the ranges and if the ranges with overlap, then merge them.
      Parameters:
      ranges - the list of ranges to sort and merge.
      Returns:
      the ranges after sort and merge.
    • throwExceptionForInvalidRanges

      private static void throwExceptionForInvalidRanges(List<MultiRowRangeFilter.RowRange> invalidRanges, boolean details)
    • equals

      public boolean equals(Object obj)
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object