Class FilePathStringPool

java.lang.Object
org.apache.hadoop.hbase.io.hfile.bucket.FilePathStringPool

@Private public class FilePathStringPool extends Object
Pool of string values encoded to integer IDs for use in BlockCacheKey. This allows for avoiding duplicating string values for file names, region and CF values on various BlockCacheKey instances. Normally, single hfiles have many blocks. This means all blocks from the same file will have the very same file, region and CF names. On very large BucketCache setups (i.e. file based cache with TB size order), can save few GBs of memory by avoiding repeating these common string values on blocks from the same file. The FilePathStringPool is implemented as a singleton, since the same pool should be shared by all BlockCacheKey instances, as well as the BucketCache object itself. The Id for an encoded string is an integer. Any new String added to the pool is assigned the next available integer ID, starting from 0 upwards. That sets the total pool capacity to Integer.MAX_VALUE. In the event of ID exhaustion (integer overflow when Id values reach Integer.MAX_VALUE), the encode() method will restart iterating over int values incrementally from 0 until it finds an unused ID. Strings can be removed from the pool using the remove() method. BucketCache should call this when evicting all blocks for a given file (see BucketCache.evictFileBlocksFromCache()).

Thread-safe implementation that maintains bidirectional mappings between strings and IDs.

  • Field Details

  • Constructor Details

  • Method Details

    • getInstance

    • encode

      public int encode(String string)
      Gets or creates an integer ID for the given String.
      Parameters:
      string - value for the file/region/CF name.
      Returns:
      the integer ID encoding this string in the pool.
    • decode

      public String decode(int id)
      Decodes an integer ID back to its original file name.
      Parameters:
      id - the integer ID
      Returns:
      the original file name, or null if not found
    • contains

      public boolean contains(int id)
      Checks if a given string ID is already being used.
      Parameters:
      id - the integer ID to check
      Returns:
      true if the ID exists
    • contains

      public boolean contains(String string)
      Checks if a given string has been encoded.
      Parameters:
      string - the value to check
      Returns:
      true if the string value has been encoded
    • size

      public int size()
      Gets the number of unique file names currently tracked.
      Returns:
      the number of entries in the codec
    • remove

      public boolean remove(String string)
      Removes a string value and its ID from the pool. This should only be called when all blocks for a file have been evicted from the cache.
      Parameters:
      string - the file name to remove
      Returns:
      true if the file name was removed, false if it wasn't present
    • clear

      public void clear()
      Clears all mappings from the codec.
    • getPoolStats

      public String getPoolStats()
      Gets statistics about memory savings from string pooling.
      Returns:
      a formatted string with compression statistics