Class FilePathStringPool
java.lang.Object
org.apache.hadoop.hbase.io.hfile.bucket.FilePathStringPool
Pool of string values encoded to integer IDs for use in BlockCacheKey. This allows for avoiding
duplicating string values for file names, region and CF values on various BlockCacheKey
instances. Normally, single hfiles have many blocks. This means all blocks from the same file
will have the very same file, region and CF names. On very large BucketCache setups (i.e. file
based cache with TB size order), can save few GBs of memory by avoiding repeating these common
string values on blocks from the same file. The FilePathStringPool is implemented as a singleton,
since the same pool should be shared by all BlockCacheKey instances, as well as the BucketCache
object itself. The Id for an encoded string is an integer. Any new String added to the pool is
assigned the next available integer ID, starting from 0 upwards. That sets the total pool
capacity to Integer.MAX_VALUE. In the event of ID exhaustion (integer overflow when Id values
reach Integer.MAX_VALUE), the encode() method will restart iterating over int values
incrementally from 0 until it finds an unused ID. Strings can be removed from the pool using the
remove() method. BucketCache should call this when evicting all blocks for a given file (see
BucketCache.evictFileBlocksFromCache()).
Thread-safe implementation that maintains bidirectional mappings between strings and IDs.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final ConcurrentHashMap<Integer,String> private static FilePathStringPoolprivate static final org.slf4j.Loggerprivate final AtomicIntegerprivate final ConcurrentHashMap<String,Integer> -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidclear()Clears all mappings from the codec.booleancontains(int id) Checks if a given string ID is already being used.booleanChecks if a given string has been encoded.decode(int id) Decodes an integer ID back to its original file name.intGets or creates an integer ID for the given String.static FilePathStringPoolGets statistics about memory savings from string pooling.booleanRemoves a string value and its ID from the pool.intsize()Gets the number of unique file names currently tracked.
-
Field Details
-
LOG
-
stringToId
-
idToString
-
nextId
-
instance
-
-
Constructor Details
-
FilePathStringPool
private FilePathStringPool()
-
-
Method Details
-
getInstance
-
encode
Gets or creates an integer ID for the given String.- Parameters:
string- value for the file/region/CF name.- Returns:
- the integer ID encoding this string in the pool.
-
decode
Decodes an integer ID back to its original file name.- Parameters:
id- the integer ID- Returns:
- the original file name, or null if not found
-
contains
Checks if a given string ID is already being used.- Parameters:
id- the integer ID to check- Returns:
- true if the ID exists
-
contains
Checks if a given string has been encoded.- Parameters:
string- the value to check- Returns:
- true if the string value has been encoded
-
size
Gets the number of unique file names currently tracked.- Returns:
- the number of entries in the codec
-
remove
Removes a string value and its ID from the pool. This should only be called when all blocks for a file have been evicted from the cache.- Parameters:
string- the file name to remove- Returns:
- true if the file name was removed, false if it wasn't present
-
clear
Clears all mappings from the codec. -
getPoolStats
Gets statistics about memory savings from string pooling.- Returns:
- a formatted string with compression statistics
-