@InterfaceAudience.Private public final class HFile extends Object
The memory footprint of a HFile includes the following (below is taken from the TFile documentation but applies also to HFile):
File is made of data blocks followed by meta data blocks (if any), a fileinfo block, data block index, meta data block index, and a fixed size trailer which records the offsets at which file changes content type.
<data blocks><meta blocks><fileinfo>< data index><meta index><trailer>Each block has a bit of magic at its start. Block are comprised of key/values. In data blocks, they are both byte arrays. Metadata blocks are a String key and a byte array value. An empty file looks like this:
<fileinfo><trailer>. That is, there are not data nor meta blocks present.
TODO: Do scanners need to be able to take a start and end row? TODO: Should BlockIndex know the name of its file? Should it have a Path that points at its file say for the case where an index lives apart from an HFile instance?
| Modifier and Type | Class and Description | 
|---|---|
| static interface  | HFile.CachingBlockReaderAn abstraction used by the block index. | 
| static interface  | HFile.ReaderAn interface used by clients to open and iterate an  HFile. | 
| static interface  | HFile.WriterAPI required to write an  HFile | 
| static class  | HFile.WriterFactoryThis variety of ways to construct writers is used throughout the code, and
 we want to be able to swap writer implementations. | 
| Modifier and Type | Field and Description | 
|---|---|
| static String | BLOOM_FILTER_DATA_KEYMeta data block name for bloom filter bits. | 
| (package private) static LongAdder | CHECKSUM_FAILURES | 
| static LongAdder | DATABLOCK_READ_COUNT | 
| static int | DEFAULT_BYTES_PER_CHECKSUMThe number of bytes per checksum. | 
| static String | DEFAULT_COMPRESSIONDefault compression name: none. | 
| static Compression.Algorithm | DEFAULT_COMPRESSION_ALGORITHMDefault compression: none. | 
| static String | FORMAT_VERSION_KEYThe configuration key for HFile version to use for new files | 
| (package private) static org.slf4j.Logger | LOG | 
| static int | MAX_FORMAT_VERSIONMaximum supported HFile format version | 
| static int | MAXIMUM_KEY_LENGTHMaximum length of key in HFile. | 
| (package private) static MetricsIO | metricsStatic instance for the metrics so that HFileReaders access the same instance | 
| static int | MIN_FORMAT_VERSIONMinimum supported HFile format version | 
| static int | MIN_FORMAT_VERSION_WITH_TAGSMinimum HFile format version with support for persisting cell tags | 
| static int | MIN_NUM_HFILE_PATH_LEVELSWe assume that HFile path ends with
 ROOT_DIR/TABLE_NAME/REGION_NAME/CF_NAME/HFILE, so it has at least this
 many levels of nesting. | 
| Modifier | Constructor and Description | 
|---|---|
| private  | HFile()Shutdown constructor. | 
| Modifier and Type | Method and Description | 
|---|---|
| static void | checkFormatVersion(int version)Checks the given  HFileformat version, and throws an exception if
 invalid. | 
| static void | checkHFileVersion(org.apache.hadoop.conf.Configuration c) | 
| static HFile.Reader | createReader(org.apache.hadoop.fs.FileSystem fs,
            org.apache.hadoop.fs.Path path,
            CacheConfig cacheConf,
            boolean primaryReplicaReader,
            org.apache.hadoop.conf.Configuration conf) | 
| static HFile.Reader | createReader(org.apache.hadoop.fs.FileSystem fs,
            org.apache.hadoop.fs.Path path,
            org.apache.hadoop.conf.Configuration conf)Creates reader with cache configuration disabled | 
| static HFile.Reader | createReader(ReaderContext context,
            HFileInfo fileInfo,
            CacheConfig cacheConf,
            org.apache.hadoop.conf.Configuration conf)Method returns the reader given the specified arguments. | 
| static long | getAndResetChecksumFailuresCount()Number of checksum verification failures. | 
| static long | getChecksumFailuresCount()Number of checksum verification failures. | 
| static int | getFormatVersion(org.apache.hadoop.conf.Configuration conf) | 
| static List<org.apache.hadoop.fs.Path> | getStoreFiles(org.apache.hadoop.fs.FileSystem fs,
             org.apache.hadoop.fs.Path regionDir)Returns all HFiles belonging to the given region directory. | 
| static String[] | getSupportedCompressionAlgorithms()Get names of supported compression algorithms. | 
| static HFile.WriterFactory | getWriterFactory(org.apache.hadoop.conf.Configuration conf,
                CacheConfig cacheConf)Returns the factory to be used to create  HFilewriters | 
| static HFile.WriterFactory | getWriterFactoryNoCache(org.apache.hadoop.conf.Configuration conf)Returns the factory to be used to create  HFilewriters. | 
| static boolean | isHFileFormat(org.apache.hadoop.fs.FileSystem fs,
             org.apache.hadoop.fs.FileStatus fileStatus)Returns true if the specified file has a valid HFile Trailer. | 
| static boolean | isHFileFormat(org.apache.hadoop.fs.FileSystem fs,
             org.apache.hadoop.fs.Path path)Returns true if the specified file has a valid HFile Trailer. | 
| (package private) static int | longToInt(long l) | 
| static void | main(String[] args) | 
| static void | updateReadLatency(long latencyMillis,
                 boolean pread) | 
| static void | updateWriteLatency(long latencyMillis) | 
static final org.slf4j.Logger LOG
public static final int MAXIMUM_KEY_LENGTH
public static final Compression.Algorithm DEFAULT_COMPRESSION_ALGORITHM
public static final int MIN_FORMAT_VERSION
public static final int MAX_FORMAT_VERSION
public static final int MIN_FORMAT_VERSION_WITH_TAGS
public static final String DEFAULT_COMPRESSION
public static final String BLOOM_FILTER_DATA_KEY
public static final int MIN_NUM_HFILE_PATH_LEVELS
public static final int DEFAULT_BYTES_PER_CHECKSUM
static final LongAdder CHECKSUM_FAILURES
public static final LongAdder DATABLOCK_READ_COUNT
static final MetricsIO metrics
public static final String FORMAT_VERSION_KEY
private HFile()
public static final long getAndResetChecksumFailuresCount()
public static final long getChecksumFailuresCount()
public static final void updateReadLatency(long latencyMillis, boolean pread)
public static final void updateWriteLatency(long latencyMillis)
public static int getFormatVersion(org.apache.hadoop.conf.Configuration conf)
public static final HFile.WriterFactory getWriterFactoryNoCache(org.apache.hadoop.conf.Configuration conf)
HFile writers.
 Disables block cache access for all writers created through the
 returned factory.public static final HFile.WriterFactory getWriterFactory(org.apache.hadoop.conf.Configuration conf, CacheConfig cacheConf)
HFile writerspublic static HFile.Reader createReader(ReaderContext context, HFileInfo fileInfo, CacheConfig cacheConf, org.apache.hadoop.conf.Configuration conf) throws IOException
context - Reader context infofileInfo - HFile infocacheConf - Cache configuation values, cannot be null.conf - ConfigurationIOException - If file is invalid, will throw CorruptHFileException flavored IOExceptionpublic static HFile.Reader createReader(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration conf) throws IOException
fs - filesystempath - Path to file to readconf - ConfigurationIOException - Will throw a CorruptHFileException
   (DoNotRetryIOException subtype) if hfile is corrupt/invalid.public static HFile.Reader createReader(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, CacheConfig cacheConf, boolean primaryReplicaReader, org.apache.hadoop.conf.Configuration conf) throws IOException
fs - filesystempath - Path to file to readcacheConf - This must not be null. @see
          CacheConfig.CacheConfig(Configuration)primaryReplicaReader - true if this is a reader for primary replicaconf - ConfigurationIOException - Will throw a CorruptHFileException (DoNotRetryIOException subtype) if hfile
           is corrupt/invalid.public static boolean isHFileFormat(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path) throws IOException
fs - filesystempath - Path to file to verifyIOException - if failed to read from the underlying streampublic static boolean isHFileFormat(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.FileStatus fileStatus) throws IOException
fs - filesystemfileStatus - the file to verifyIOException - if failed to read from the underlying streampublic static String[] getSupportedCompressionAlgorithms()
static int longToInt(long l)
public static List<org.apache.hadoop.fs.Path> getStoreFiles(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path regionDir) throws IOException
fs - The file system reference.regionDir - The region directory to scan.IOException - When scanning the files fails.public static void checkFormatVersion(int version) throws IllegalArgumentException
HFile format version, and throws an exception if
 invalid. Note that if the version number comes from an input file and has
 not been verified, the caller needs to re-throw an IOException to
 indicate that this is not a software error, but corrupted input.version - an HFile versionIllegalArgumentException - if the version is invalidpublic static void checkHFileVersion(org.apache.hadoop.conf.Configuration c)
Copyright © 2007–2021 The Apache Software Foundation. All rights reserved.