Class FuzzyRowFilter

All Implemented Interfaces:
HintingFilter

@Public public class FuzzyRowFilter extends FilterBase implements HintingFilter
This is optimized version of a standard FuzzyRowFilter Filters data based on fuzzy row key. Performs fast-forwards during scanning. It takes pairs (row key, fuzzy info) to match row keys. Where fuzzy info is a byte array with 0 or 1 as its values:
  • 0 - means that this byte in provided row key is fixed, i.e. row key's byte at same position must match
  • 1 - means that this byte in provided row key is NOT fixed, i.e. row key's byte at this position can be different from the one in provided row key
Example:

Let's assume row key format is userId_actionId_year_month. Length of userId is fixed and is 4, length of actionId is 2 and year and month are 4 and 2 bytes long respectively.

Let's assume that we need to fetch all users that performed certain action (encoded as "99") in Jan of any year. Then the pair (row key, fuzzy info) would be the following:

 row key = "????_99_????_01" (one can use any value instead of "?")
 fuzzy info = "\x01\x01\x01\x01\x00\x00\x00\x00\x01\x01\x01\x01\x00\x00\x00"
 
I.e. fuzzy info tells the matching mask is "????_99_????_01", where at ? can be any value.
  • Field Details

    • UNSAFE_UNALIGNED

      private static final boolean UNSAFE_UNALIGNED
    • fuzzyKeysData

      private final List<Pair<byte[],byte[]>> fuzzyKeysData
    • filterRow

      private boolean filterRow
    • done

      private boolean done
    • lastFoundIndex

      private int lastFoundIndex
      The index of a last successfully found matching fuzzy string (in fuzzyKeysData). We will start matching next KV with this one. If they do not match then we will return back to the one-by-one iteration over fuzzyKeysData.
    • tracker

      Row tracker (keeps all next rows after SEEK_NEXT_USING_HINT was returned)
  • Constructor Details

  • Method Details

    • preprocessSearchKey

      private void preprocessSearchKey(Pair<byte[],byte[]> p)
    • preprocessMask

      private byte[] preprocessMask(byte[] mask)
      We need to preprocess mask array, as since we treat 2's as unfixed positions and -1 (0xff) as fixed positions
      Returns:
      mask array
    • isPreprocessedMask

      private boolean isPreprocessedMask(byte[] mask)
    • getFuzzyKeys

      public List<Pair<byte[],byte[]>> getFuzzyKeys()
      Returns the Fuzzy keys in the format expected by the constructor.
      Returns:
      the Fuzzy keys in the format expected by the constructor
    • reset

      public void reset() throws IOException
      Description copied from class: FilterBase
      Filters that are purely stateless and do nothing in their reset() methods can inherit this null/empty implementation. Reset the state of the filter between rows. Concrete implementers can signal a failure condition in their code by throwing an IOException.
      Overrides:
      reset in class FilterBase
      Throws:
      IOException - in case an I/O or an filter specific failure needs to be signaled.
    • filterRow

      public boolean filterRow() throws IOException
      Description copied from class: FilterBase
      Filters that never filter by rows based on previously gathered state from Filter.filterCell(Cell) can inherit this implementation that never filters a row. Last chance to veto row based on previous Filter.filterCell(Cell) calls. The filter needs to retain state then return a particular value for this call if they wish to exclude a row if a certain column is missing (for example). Concrete implementers can signal a failure condition in their code by throwing an IOException.
      Overrides:
      filterRow in class FilterBase
      Returns:
      true to exclude row, false to include row.
      Throws:
      IOException - in case an I/O or an filter specific failure needs to be signaled.
    • filterCell

      Description copied from class: Filter
      A way to filter based on the column family, column qualifier and/or the column value. Return code is described below. This allows filters to filter only certain number of columns, then terminate without matching ever column. If filterRowKey returns true, filterCell needs to be consistent with it. filterCell can assume that filterRowKey has already been called for the row. If your filter returns ReturnCode.NEXT_ROW, it should return ReturnCode.NEXT_ROW until Filter.reset() is called just in case the caller calls for the next row. Concrete implementers can signal a failure condition in their code by throwing an IOException.
      Overrides:
      filterCell in class Filter
      Parameters:
      c - the Cell in question
      Returns:
      code as described below
      See Also:
    • getNextCellHint

      public Cell getNextCellHint(Cell currentCell)
      Description copied from class: FilterBase
      Filters that are not sure which key must be next seeked to, can inherit this implementation that, by default, returns a null Cell. If the filter returns the match code SEEK_NEXT_USING_HINT, then it should also tell which is the next key it must seek to. After receiving the match code SEEK_NEXT_USING_HINT, the QueryMatcher would call this function to find out which key it must next seek to. Concrete implementers can signal a failure condition in their code by throwing an IOException. NOTICE: Filter will be evaluate at server side so the returned Cell must be an ExtendedCell, although it is marked as IA.Private.
      Overrides:
      getNextCellHint in class FilterBase
      Returns:
      KeyValue which must be next seeked. return null if the filter is not sure which key to seek to next.
    • filterAllRemaining

      public boolean filterAllRemaining()
      Description copied from class: FilterBase
      Filters that never filter all remaining can inherit this implementation that never stops the filter early. If this returns true, the scan will terminate. Concrete implementers can signal a failure condition in their code by throwing an IOException.
      Overrides:
      filterAllRemaining in class FilterBase
      Returns:
      true to end scan, false to continue.
    • toByteArray

      public byte[] toByteArray()
      Returns The filter serialized using pb
      Overrides:
      toByteArray in class FilterBase
      Returns:
      The filter serialized using pb
    • parseFrom

      public static FuzzyRowFilter parseFrom(byte[] pbBytes) throws DeserializationException
      Parse a serialized representation of FuzzyRowFilter
      Parameters:
      pbBytes - A pb serialized FuzzyRowFilter instance
      Returns:
      An instance of FuzzyRowFilter made from bytes
      Throws:
      DeserializationException - if an error occurred
      See Also:
    • toString

      public String toString()
      Description copied from class: FilterBase
      Return filter's info for debugging and logging purpose.
      Overrides:
      toString in class FilterBase
    • satisfies

      static FuzzyRowFilter.SatisfiesCode satisfies(byte[] row, byte[] fuzzyKeyBytes, byte[] fuzzyKeyMeta)
    • satisfies

      static FuzzyRowFilter.SatisfiesCode satisfies(boolean reverse, byte[] row, byte[] fuzzyKeyBytes, byte[] fuzzyKeyMeta)
    • satisfies

      static FuzzyRowFilter.SatisfiesCode satisfies(boolean reverse, byte[] row, int offset, int length, byte[] fuzzyKeyBytes, byte[] fuzzyKeyMeta)
    • satisfiesNoUnsafe

      static FuzzyRowFilter.SatisfiesCode satisfiesNoUnsafe(boolean reverse, byte[] row, int offset, int length, byte[] fuzzyKeyBytes, byte[] fuzzyKeyMeta)
    • getNextForFuzzyRule

      static byte[] getNextForFuzzyRule(byte[] row, byte[] fuzzyKeyBytes, byte[] fuzzyKeyMeta)
    • getNextForFuzzyRule

      static byte[] getNextForFuzzyRule(boolean reverse, byte[] row, byte[] fuzzyKeyBytes, byte[] fuzzyKeyMeta)
    • getNextForFuzzyRule

      static byte[] getNextForFuzzyRule(boolean reverse, byte[] row, int offset, int length, byte[] fuzzyKeyBytes, byte[] fuzzyKeyMeta)
      Find out the closes next byte array that satisfies fuzzy rule and is after the given one. In the reverse case it returns increased byte array to make sure that the proper row is selected next.
      Returns:
      byte array which is after the given row and which satisfies the fuzzy rule if it exists, null otherwise
    • trimTrailingZeroes

      private static byte[] trimTrailingZeroes(byte[] result, byte[] fuzzyKeyMeta, int toInc)
      For forward scanner, next cell hint should not contain any trailing zeroes unless they are part of fuzzyKeyMeta hint = '\x01\x01\x01\x00\x00' will skip valid row '\x01\x01\x01'
      Parameters:
      toInc - - position of incremented byte
      Returns:
      trimmed version of result
    • areSerializedFieldsEqual

      Returns true if and only if the fields of the filter that are serialized are equal to the corresponding fields in other. Used for testing.
      Overrides:
      areSerializedFieldsEqual in class FilterBase
      Returns:
      true if and only if the fields of the filter that are serialized are equal to the corresponding fields in other. Used for testing.
    • equals

      public boolean equals(Object obj)
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object