org.apache.hadoop.hbase.io.hfile.bucket.FilePathStringPool

@Private public class FilePathStringPool extends Object

Pool of string values encoded to integer IDs for use in BlockCacheKey. This allows for avoiding duplicating string values for file names, region and CF values on various BlockCacheKey instances. Normally, single hfiles have many blocks. This means all blocks from the same file will have the very same file, region and CF names. On very large BucketCache setups (i.e. file based cache with TB size order), can save few GBs of memory by avoiding repeating these common string values on blocks from the same file. The FilePathStringPool is implemented as a singleton, since the same pool should be shared by all BlockCacheKey instances, as well as the BucketCache object itself. The Id for an encoded string is an integer. Any new String added to the pool is assigned the next available integer ID, starting from 0 upwards. That sets the total pool capacity to Integer.MAX_VALUE. In the event of ID exhaustion (integer overflow when Id values reach Integer.MAX_VALUE), the encode() method will restart iterating over int values incrementally from 0 until it finds an unused ID. Strings can be removed from the pool using the remove() method. BucketCache should call this when evicting all blocks for a given file (see BucketCache.evictFileBlocksFromCache()).

Thread-safe implementation that maintains bidirectional mappings between strings and IDs.

Field Summary

Fields

Modifier and Type

Field

Description

private final ConcurrentHashMap<Integer,String>

idToString

private static FilePathStringPool

instance

private static final org.slf4j.Logger

LOG

private final AtomicInteger

nextId

private final ConcurrentHashMap<String,Integer>

stringToId
Constructor Summary

Constructors

Modifier

Constructor

Description

private

FilePathStringPool()
Method Summary

Modifier and Type

Method

Description

void

clear()

Clears all mappings from the codec.

boolean

contains(int id)

Checks if a given string ID is already being used.

boolean

contains(String string)

Checks if a given string has been encoded.

String

decode(int id)

Decodes an integer ID back to its original file name.

int

encode(String string)

Gets or creates an integer ID for the given String.

static FilePathStringPool

getInstance()

String

getPoolStats()

Gets statistics about memory savings from string pooling.

boolean

remove(String string)

Removes a string value and its ID from the pool.

int

size()

Gets the number of unique file names currently tracked.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- LOG
  
  private static final org.slf4j.Logger LOG
- stringToId
  
  private final ConcurrentHashMap<String,Integer> stringToId
- idToString
  
  private final ConcurrentHashMap<Integer,String> idToString
- nextId
  
  private final AtomicInteger nextId
- instance
  
  private static FilePathStringPool instance
Constructor Details
- FilePathStringPool
  
  private FilePathStringPool()
Method Details
- getInstance
  
  public static FilePathStringPool getInstance()
- encode
  
  public int encode(String string)
  
  Gets or creates an integer ID for the given String.
  
  Parameters:
  
  string - value for the file/region/CF name.
  
  Returns:
  
  the integer ID encoding this string in the pool.
- decode
  
  public String decode(int id)
  
  Decodes an integer ID back to its original file name.
  
  Parameters:
  
  id - the integer ID
  
  Returns:
  
  the original file name, or null if not found
- contains
  
  public boolean contains(int id)
  
  Checks if a given string ID is already being used.
  
  Parameters:
  
  id - the integer ID to check
  
  Returns:
  
  true if the ID exists
- contains
  
  public boolean contains(String string)
  
  Checks if a given string has been encoded.
  
  Parameters:
  
  string - the value to check
  
  Returns:
  
  true if the string value has been encoded
- size
  
  public int size()
  
  Gets the number of unique file names currently tracked.
  
  Returns:
  
  the number of entries in the codec
- remove
  
  public boolean remove(String string)
  
  Removes a string value and its ID from the pool. This should only be called when all blocks for a file have been evicted from the cache.
  
  Parameters:
  
  string - the file name to remove
  
  Returns:
  
  true if the file name was removed, false if it wasn't present
- clear
  
  public void clear()
  
  Clears all mappings from the codec.
- getPoolStats
  
  public String getPoolStats()
  
  Gets statistics about memory savings from string pooling.
  
  Returns:
  
  a formatted string with compression statistics

Class FilePathStringPool

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

LOG

stringToId

idToString

nextId

instance

Constructor Details

FilePathStringPool

Method Details

getInstance

encode

decode

contains

contains

size

remove

clear

getPoolStats