Package org.apache.hadoop.hbase.filter
Class MultiRowRangeFilter
java.lang.Object
org.apache.hadoop.hbase.filter.Filter
org.apache.hadoop.hbase.filter.FilterBase
org.apache.hadoop.hbase.filter.MultiRowRangeFilter
Filter to support scan multiple row key ranges. It can construct the row key ranges from the
passed list which can be accessed by each region server. HBase is quite efficient when scanning
only one small row key range. If user needs to specify multiple row key ranges in one scan, the
typical solutions are: 1. through FilterList which is a list of row key Filters, 2. using the SQL
layer over HBase to join with two table, such as hive, phoenix etc. However, both solutions are
inefficient. Both of them can't utilize the range info to perform fast forwarding during scan
which is quite time consuming. If the number of ranges are quite big (e.g. millions), join is a
proper solution though it is slow. However, there are cases that user wants to specify a small
number of ranges to scan (e.g. <1000 ranges). Both solutions can't provide satisfactory
performance in such case. MultiRowRangeFilter is to support such usec ase (scan multiple row key
ranges), which can construct the row key ranges from user specified list and perform
fast-forwarding during scan. Thus, the scan will be quite efficient.
-
Nested Class Summary
Modifier and TypeClassDescriptionprivate static class
private static class
Abstraction over the ranges of rows to return from this filter, regardless of forward or reverse scans being used.private static class
Internal RowRange that reverses the sort-order to handle reverse scans.static class
Nested classes/interfaces inherited from class org.apache.hadoop.hbase.filter.Filter
Filter.ReturnCode
-
Field Summary
Modifier and TypeFieldDescriptionprivate Filter.ReturnCode
private boolean
private int
private final List<MultiRowRangeFilter.RowRange>
private final MultiRowRangeFilter.RangeIteration
private static final int
-
Constructor Summary
ConstructorDescriptionMultiRowRangeFilter
(byte[][] rowKeyPrefixes) Constructor for creating aMultiRowRangeFilter
from multiple rowkey prefixes. -
Method Summary
Modifier and TypeMethodDescription(package private) boolean
Returns true if and only if the fields of the filter that are serialized are equal to the corresponding fields in other.private static List<MultiRowRangeFilter.RowRange>
createRangeListFromRowKeyPrefixes
(byte[][] rowKeyPrefixes) boolean
boolean
Filters that never filter all remaining can inherit this implementation that never stops the filter early.filterCell
(Cell ignored) A way to filter based on the column family, column qualifier and/or the column value.filterKeyValue
(Cell ignored) Deprecated.boolean
filterRowKey
(Cell firstRowCell) Filters a row based on the row key.getNextCellHint
(Cell currentKV) Filters that are not sure which key must be next seeked to, can inherit this implementation that, by default, returns a null Cell.int
hashCode()
static MultiRowRangeFilter
parseFrom
(byte[] pbBytes) Parse a serialized representation ofMultiRowRangeFilter
static List<MultiRowRangeFilter.RowRange>
sortAndMerge
(List<MultiRowRangeFilter.RowRange> ranges) sort the ranges and if the ranges with overlap, then merge them.private static void
throwExceptionForInvalidRanges
(List<MultiRowRangeFilter.RowRange> invalidRanges, boolean details) byte[]
Returns The filter serialized using pbMethods inherited from class org.apache.hadoop.hbase.filter.FilterBase
createFilterFromArguments, filterRow, filterRowCells, filterRowKey, hasFilterRow, isFamilyEssential, reset, toString, transformCell
Methods inherited from class org.apache.hadoop.hbase.filter.Filter
isReversed, setReversed
-
Field Details
-
ROW_BEFORE_FIRST_RANGE
- See Also:
-
rangeList
-
ranges
-
done
-
index
-
range
-
currentReturnCode
-
-
Constructor Details
-
MultiRowRangeFilter
- Parameters:
list
- A list ofRowRange
-
MultiRowRangeFilter
Constructor for creating aMultiRowRangeFilter
from multiple rowkey prefixes. AsMultiRowRangeFilter
javadoc says (See the solution 1 of the first statement), if you try to create a filter list that scans row keys corresponding to given prefixes (e.g.,FilterList
composed of multiplePrefixFilter
s), this constructor provides a way to avoid creating an inefficient one.- Parameters:
rowKeyPrefixes
- the array of byte array
-
-
Method Details
-
createRangeListFromRowKeyPrefixes
private static List<MultiRowRangeFilter.RowRange> createRangeListFromRowKeyPrefixes(byte[][] rowKeyPrefixes) -
getRowRanges
-
filterAllRemaining
Description copied from class:FilterBase
Filters that never filter all remaining can inherit this implementation that never stops the filter early. If this returns true, the scan will terminate. Concrete implementers can signal a failure condition in their code by throwing anIOException
.- Overrides:
filterAllRemaining
in classFilterBase
- Returns:
- true to end scan, false to continue.
-
filterRowKey
Description copied from class:Filter
Filters a row based on the row key. If this returns true, the entire row will be excluded. If false, each KeyValue in the row will be passed toFilter.filterCell(Cell)
below. IfFilter.filterAllRemaining()
returns true, thenFilter.filterRowKey(Cell)
should also return true. Concrete implementers can signal a failure condition in their code by throwing anIOException
.- Overrides:
filterRowKey
in classFilterBase
- Parameters:
firstRowCell
- The first cell coming in the new row- Returns:
- true, remove entire row, false, include the row (maybe).
-
filterKeyValue
Deprecated.Description copied from class:Filter
A way to filter based on the column family, column qualifier and/or the column value. Return code is described below. This allows filters to filter only certain number of columns, then terminate without matching ever column. If filterRowKey returns true, filterKeyValue needs to be consistent with it. filterKeyValue can assume that filterRowKey has already been called for the row. If your filter returnsReturnCode.NEXT_ROW
, it should returnReturnCode.NEXT_ROW
untilFilter.reset()
is called just in case the caller calls for the next row. Concrete implementers can signal a failure condition in their code by throwing anIOException
.- Overrides:
filterKeyValue
in classFilter
- Parameters:
ignored
- the Cell in question- Returns:
- code as described below, Filter.ReturnCode.INCLUDE by default
- See Also:
-
filterCell
Description copied from class:Filter
A way to filter based on the column family, column qualifier and/or the column value. Return code is described below. This allows filters to filter only certain number of columns, then terminate without matching ever column. If filterRowKey returns true, filterCell needs to be consistent with it. filterCell can assume that filterRowKey has already been called for the row. If your filter returnsReturnCode.NEXT_ROW
, it should returnReturnCode.NEXT_ROW
untilFilter.reset()
is called just in case the caller calls for the next row. Concrete implementers can signal a failure condition in their code by throwing anIOException
.- Overrides:
filterCell
in classFilter
- Parameters:
ignored
- the Cell in question- Returns:
- code as described below
- See Also:
-
getNextCellHint
Description copied from class:FilterBase
Filters that are not sure which key must be next seeked to, can inherit this implementation that, by default, returns a null Cell. If the filter returns the match code SEEK_NEXT_USING_HINT, then it should also tell which is the next key it must seek to. After receiving the match code SEEK_NEXT_USING_HINT, the QueryMatcher would call this function to find out which key it must next seek to. Concrete implementers can signal a failure condition in their code by throwing anIOException
.- Overrides:
getNextCellHint
in classFilterBase
- Returns:
- KeyValue which must be next seeked. return null if the filter is not sure which key to seek to next.
-
toByteArray
Returns The filter serialized using pb- Overrides:
toByteArray
in classFilterBase
- Returns:
- The filter serialized using pb
-
parseFrom
Parse a serialized representation ofMultiRowRangeFilter
- Parameters:
pbBytes
- A pb serialized instance- Returns:
- An instance of
MultiRowRangeFilter
- Throws:
DeserializationException
- if an error occurred- See Also:
-
areSerializedFieldsEqual
Returns true if and only if the fields of the filter that are serialized are equal to the corresponding fields in other. Used for testing.- Overrides:
areSerializedFieldsEqual
in classFilterBase
- Returns:
- true if and only if the fields of the filter that are serialized are equal to the corresponding fields in other. Used for testing.
-
sortAndMerge
public static List<MultiRowRangeFilter.RowRange> sortAndMerge(List<MultiRowRangeFilter.RowRange> ranges) sort the ranges and if the ranges with overlap, then merge them.- Parameters:
ranges
- the list of ranges to sort and merge.- Returns:
- the ranges after sort and merge.
-
throwExceptionForInvalidRanges
private static void throwExceptionForInvalidRanges(List<MultiRowRangeFilter.RowRange> invalidRanges, boolean details) -
equals
-
hashCode
-