Package org.apache.hadoop.hbase.util
Class BloomFilterUtil
java.lang.Object
org.apache.hadoop.hbase.util.BloomFilterUtil
Utility methods related to BloomFilters
-
Field Summary
Modifier and TypeFieldDescriptionstatic final byte[]
Bit-value lookup array to prevent doing the same work over and overstatic final double
Used in computing the optimal Bloom filter size.static final String
private static Random
A random number generator to use for "fake lookups" when testing to estimate the ideal false positive rate.static final String
Record separator for the Bloom filter statistics human-readable string -
Constructor Summary
ModifierConstructorDescriptionprivate
Private constructor to keep this class from being instantiated. -
Method Summary
Modifier and TypeMethodDescriptionstatic double
actualErrorRate
(long maxKeys, long bitSize, int functionCount) Computes the actual error rate for the given number of elements, number of bits, and number of hash functions.(package private) static boolean
Check if bit at specified index is 1.static long
computeBitSize
(long maxKeys, double errorRate) static int
computeFoldableByteSize
(long bitSize, int foldFactor) Increases the given byte size of a Bloom filter until it can be folded by the given factor.static long
computeMaxKeys
(long bitSize, double errorRate, int hashCount) The maximum number of keys we can put into a Bloom filter of a certain size to get the given error rate, with the given number of hash functions.static boolean
contains
(byte[] buf, int offset, int length, ByteBuff bloomBuf, int bloomOffset, int bloomSize, Hash hash, int hashCount) static boolean
contains
(Cell cell, ByteBuff bloomBuf, int bloomOffset, int bloomSize, Hash hash, int hashCount, BloomType type) private static <T> boolean
contains
(ByteBuff bloomBuf, int bloomOffset, int bloomSize, Hash hash, int hashCount, HashKey<T> hashKey) static BloomFilterChunk
createBySize
(int byteSizeHint, double errorRate, int hashType, int foldFactor, BloomType bloomType) Creates a Bloom filter chunk of the given size.static String
formatStats
(BloomFilterBase bloomFilter) A human-readable string with statistics for the given Bloom filter.static byte[]
getBloomFilterParam
(BloomType bloomFilterType, org.apache.hadoop.conf.Configuration conf) static long
idealMaxKeys
(long bitSize, double errorRate) The maximum number of keys we can put into a Bloom filter of a certain size to maintain the given error rate, assuming the number of hash functions is chosen optimally and does not even have to be an integer (hence the "ideal" in the function name).static int
optimalFunctionCount
(int maxKeys, long bitSize) static void
setRandomGeneratorForTest
(Random random) Sets a random generator to be used for look-ups instead of computing hashes.static String
toString
(BloomFilterChunk bloomFilter)
-
Field Details
-
STATS_RECORD_SEP
Record separator for the Bloom filter statistics human-readable string- See Also:
-
LOG2_SQUARED
Used in computing the optimal Bloom filter size. This approximately equals 0.480453. -
randomGeneratorForTest
A random number generator to use for "fake lookups" when testing to estimate the ideal false positive rate. -
PREFIX_LENGTH_KEY
- See Also:
-
bitvals
Bit-value lookup array to prevent doing the same work over and over
-
-
Constructor Details
-
BloomFilterUtil
private BloomFilterUtil()Private constructor to keep this class from being instantiated.
-
-
Method Details
-
computeBitSize
- Returns:
- the number of bits for a Bloom filter than can hold the given number of keys and provide the given error rate, assuming that the optimal number of hash functions is used and it does not have to be an integer.
-
setRandomGeneratorForTest
Sets a random generator to be used for look-ups instead of computing hashes. Can be used to simulate uniformity of accesses better in a test environment. Should not be set in a real environment where correctness matters!This gets used in
contains(ByteBuff, int, int, Hash, int, HashKey)
- Parameters:
random
- The random number source to use, or null to compute actual hashes
-
idealMaxKeys
The maximum number of keys we can put into a Bloom filter of a certain size to maintain the given error rate, assuming the number of hash functions is chosen optimally and does not even have to be an integer (hence the "ideal" in the function name).- Returns:
- maximum number of keys that can be inserted into the Bloom filter
- See Also:
-
computeMaxKeys
The maximum number of keys we can put into a Bloom filter of a certain size to get the given error rate, with the given number of hash functions.- Returns:
- the maximum number of keys that can be inserted in a Bloom filter to maintain the target error rate, if the number of hash functions is provided.
-
actualErrorRate
Computes the actual error rate for the given number of elements, number of bits, and number of hash functions. Taken directly from the Wikipedia Bloom filter article.- Returns:
- the actual error rate
-
computeFoldableByteSize
Increases the given byte size of a Bloom filter until it can be folded by the given factor.- Returns:
- Foldable byte size
-
optimalFunctionCount
-
createBySize
public static BloomFilterChunk createBySize(int byteSizeHint, double errorRate, int hashType, int foldFactor, BloomType bloomType) Creates a Bloom filter chunk of the given size.- Parameters:
byteSizeHint
- the desired number of bytes for the Bloom filter bit array. Will be increased so that folding is possible.errorRate
- target false positive rate of the Bloom filterhashType
- Bloom filter hash function type- Returns:
- the new Bloom filter of the desired size
-
contains
-
contains
-
contains
-
checkBit
Check if bit at specified index is 1.- Parameters:
pos
- index of bit- Returns:
- true if bit at specified index is 1, false if 0.
-
formatStats
A human-readable string with statistics for the given Bloom filter.- Parameters:
bloomFilter
- the Bloom filter to output statistics for;- Returns:
- a string consisting of "<key>: <value>" parts separated by
STATS_RECORD_SEP
.
-
toString
-
getBloomFilterParam
public static byte[] getBloomFilterParam(BloomType bloomFilterType, org.apache.hadoop.conf.Configuration conf) throws IllegalArgumentException - Throws:
IllegalArgumentException
-