@InterfaceAudience.Private public final class HFile extends Object
The memory footprint of a HFile includes the following (below is taken from the TFile documentation but applies also to HFile):
File is made of data blocks followed by meta data blocks (if any), a fileinfo block, data block index, meta data block index, and a fixed size trailer which records the offsets at which file changes content type.
<data blocks><meta blocks><fileinfo>< data index><meta index><trailer>Each block has a bit of magic at its start. Block are comprised of key/values. In data blocks, they are both byte arrays. Metadata blocks are a String key and a byte array value. An empty file looks like this:
<fileinfo><trailer>. That is, there are not data nor meta blocks present.
TODO: Do scanners need to be able to take a start and end row? TODO: Should BlockIndex know the name of its file? Should it have a Path that points at its file say for the case where an index lives apart from an HFile instance?
Modifier and Type | Class and Description |
---|---|
static interface |
HFile.CachingBlockReader
An abstraction used by the block index.
|
static interface |
HFile.Reader
An interface used by clients to open and iterate an
HFile . |
static interface |
HFile.Writer
API required to write an
HFile |
static class |
HFile.WriterFactory
This variety of ways to construct writers is used throughout the code, and
we want to be able to swap writer implementations.
|
Modifier and Type | Field and Description |
---|---|
static String |
BLOOM_FILTER_DATA_KEY
Meta data block name for bloom filter bits.
|
(package private) static LongAdder |
CHECKSUM_FAILURES |
static LongAdder |
DATABLOCK_READ_COUNT |
static int |
DEFAULT_BYTES_PER_CHECKSUM
The number of bytes per checksum.
|
static String |
DEFAULT_COMPRESSION
Default compression name: none.
|
static Compression.Algorithm |
DEFAULT_COMPRESSION_ALGORITHM
Default compression: none.
|
static String |
FORMAT_VERSION_KEY
The configuration key for HFile version to use for new files
|
(package private) static org.slf4j.Logger |
LOG |
static int |
MAX_FORMAT_VERSION
Maximum supported HFile format version
|
static int |
MAXIMUM_KEY_LENGTH
Maximum length of key in HFile.
|
(package private) static MetricsIO |
metrics
Static instance for the metrics so that HFileReaders access the same instance
|
static int |
MIN_FORMAT_VERSION
Minimum supported HFile format version
|
static int |
MIN_FORMAT_VERSION_WITH_TAGS
Minimum HFile format version with support for persisting cell tags
|
static int |
MIN_NUM_HFILE_PATH_LEVELS
We assume that HFile path ends with
ROOT_DIR/TABLE_NAME/REGION_NAME/CF_NAME/HFILE, so it has at least this
many levels of nesting.
|
Modifier | Constructor and Description |
---|---|
private |
HFile()
Shutdown constructor.
|
Modifier and Type | Method and Description |
---|---|
static void |
checkFormatVersion(int version)
Checks the given
HFile format version, and throws an exception if
invalid. |
static void |
checkHFileVersion(org.apache.hadoop.conf.Configuration c) |
static HFile.Reader |
createReader(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.Path path,
CacheConfig cacheConf,
boolean primaryReplicaReader,
org.apache.hadoop.conf.Configuration conf) |
static HFile.Reader |
createReader(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.Path path,
org.apache.hadoop.conf.Configuration conf)
Creates reader with cache configuration disabled
|
static HFile.Reader |
createReader(ReaderContext context,
HFileInfo fileInfo,
CacheConfig cacheConf,
org.apache.hadoop.conf.Configuration conf)
Method returns the reader given the specified arguments.
|
static long |
getAndResetChecksumFailuresCount()
Number of checksum verification failures.
|
static long |
getChecksumFailuresCount()
Number of checksum verification failures.
|
static int |
getFormatVersion(org.apache.hadoop.conf.Configuration conf) |
static List<org.apache.hadoop.fs.Path> |
getStoreFiles(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.Path regionDir)
Returns all HFiles belonging to the given region directory.
|
static String[] |
getSupportedCompressionAlgorithms()
Get names of supported compression algorithms.
|
static HFile.WriterFactory |
getWriterFactory(org.apache.hadoop.conf.Configuration conf,
CacheConfig cacheConf)
Returns the factory to be used to create
HFile writers |
static HFile.WriterFactory |
getWriterFactoryNoCache(org.apache.hadoop.conf.Configuration conf)
Returns the factory to be used to create
HFile writers. |
static boolean |
isHFileFormat(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.FileStatus fileStatus)
Returns true if the specified file has a valid HFile Trailer.
|
static boolean |
isHFileFormat(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.Path path)
Returns true if the specified file has a valid HFile Trailer.
|
(package private) static int |
longToInt(long l) |
static void |
main(String[] args) |
static void |
updateReadLatency(long latencyMillis,
boolean pread) |
static void |
updateWriteLatency(long latencyMillis) |
static final org.slf4j.Logger LOG
public static final int MAXIMUM_KEY_LENGTH
public static final Compression.Algorithm DEFAULT_COMPRESSION_ALGORITHM
public static final int MIN_FORMAT_VERSION
public static final int MAX_FORMAT_VERSION
public static final int MIN_FORMAT_VERSION_WITH_TAGS
public static final String DEFAULT_COMPRESSION
public static final String BLOOM_FILTER_DATA_KEY
public static final int MIN_NUM_HFILE_PATH_LEVELS
public static final int DEFAULT_BYTES_PER_CHECKSUM
static final LongAdder CHECKSUM_FAILURES
public static final LongAdder DATABLOCK_READ_COUNT
static final MetricsIO metrics
public static final String FORMAT_VERSION_KEY
private HFile()
public static final long getAndResetChecksumFailuresCount()
public static final long getChecksumFailuresCount()
public static final void updateReadLatency(long latencyMillis, boolean pread)
public static final void updateWriteLatency(long latencyMillis)
public static int getFormatVersion(org.apache.hadoop.conf.Configuration conf)
public static final HFile.WriterFactory getWriterFactoryNoCache(org.apache.hadoop.conf.Configuration conf)
HFile
writers.
Disables block cache access for all writers created through the
returned factory.public static final HFile.WriterFactory getWriterFactory(org.apache.hadoop.conf.Configuration conf, CacheConfig cacheConf)
HFile
writerspublic static HFile.Reader createReader(ReaderContext context, HFileInfo fileInfo, CacheConfig cacheConf, org.apache.hadoop.conf.Configuration conf) throws IOException
context
- Reader context infofileInfo
- HFile infocacheConf
- Cache configuation values, cannot be null.conf
- ConfigurationIOException
- If file is invalid, will throw CorruptHFileException flavored IOExceptionpublic static HFile.Reader createReader(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, org.apache.hadoop.conf.Configuration conf) throws IOException
fs
- filesystempath
- Path to file to readconf
- ConfigurationIOException
- Will throw a CorruptHFileException
(DoNotRetryIOException subtype) if hfile is corrupt/invalid.public static HFile.Reader createReader(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, CacheConfig cacheConf, boolean primaryReplicaReader, org.apache.hadoop.conf.Configuration conf) throws IOException
fs
- filesystempath
- Path to file to readcacheConf
- This must not be null. @see
CacheConfig.CacheConfig(Configuration)
primaryReplicaReader
- true if this is a reader for primary replicaconf
- ConfigurationIOException
- Will throw a CorruptHFileException (DoNotRetryIOException subtype) if hfile
is corrupt/invalid.public static boolean isHFileFormat(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path) throws IOException
fs
- filesystempath
- Path to file to verifyIOException
- if failed to read from the underlying streampublic static boolean isHFileFormat(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.FileStatus fileStatus) throws IOException
fs
- filesystemfileStatus
- the file to verifyIOException
- if failed to read from the underlying streampublic static String[] getSupportedCompressionAlgorithms()
static int longToInt(long l)
public static List<org.apache.hadoop.fs.Path> getStoreFiles(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path regionDir) throws IOException
fs
- The file system reference.regionDir
- The region directory to scan.IOException
- When scanning the files fails.public static void checkFormatVersion(int version) throws IllegalArgumentException
HFile
format version, and throws an exception if
invalid. Note that if the version number comes from an input file and has
not been verified, the caller needs to re-throw an IOException
to
indicate that this is not a software error, but corrupted input.version
- an HFile versionIllegalArgumentException
- if the version is invalidpublic static void checkHFileVersion(org.apache.hadoop.conf.Configuration c)
Copyright © 2007–2020 The Apache Software Foundation. All rights reserved.