@InterfaceAudience.Private public class HFile extends Object
The memory footprint of a HFile includes the following (below is taken from the TFile documentation but applies also to HFile):
File is made of data blocks followed by meta data blocks (if any), a fileinfo block, data block index, meta data block index, and a fixed size trailer which records the offsets at which file changes content type.
<data blocks><meta blocks><fileinfo>< data index><meta index><trailer>Each block has a bit of magic at its start. Block are comprised of key/values. In data blocks, they are both byte arrays. Metadata blocks are a String key and a byte array value. An empty file looks like this:
<fileinfo><trailer>. That is, there are not data nor meta blocks present.
TODO: Do scanners need to be able to take a start and end row? TODO: Should BlockIndex know the name of its file? Should it have a Path that points at its file say for the case where an index lives apart from an HFile instance?
Modifier and Type | Class and Description |
---|---|
static interface |
HFile.CachingBlockReader
An abstraction used by the block index.
|
static class |
HFile.FileInfo
Metadata for this file.
|
static interface |
HFile.Reader
An interface used by clients to open and iterate an
HFile . |
static interface |
HFile.Writer
API required to write an
HFile |
static class |
HFile.WriterFactory
This variety of ways to construct writers is used throughout the code, and
we want to be able to swap writer implementations.
|
Modifier and Type | Field and Description |
---|---|
static String |
BLOOM_FILTER_DATA_KEY
Meta data block name for bloom filter bits.
|
(package private) static AtomicLong |
checksumFailures |
static AtomicLong |
dataBlockReadCnt |
static int |
DEFAULT_BYTES_PER_CHECKSUM
The number of bytes per checksum.
|
static String |
DEFAULT_COMPRESSION
Default compression name: none.
|
static Compression.Algorithm |
DEFAULT_COMPRESSION_ALGORITHM
Default compression: none.
|
static String |
FORMAT_VERSION_KEY
The configuration key for HFile version to use for new files
|
(package private) static org.apache.commons.logging.Log |
LOG |
static int |
MAX_FORMAT_VERSION
Maximum supported HFile format version
|
static int |
MAXIMUM_KEY_LENGTH
Maximum length of key in HFile.
|
static int |
MIN_FORMAT_VERSION
Minimum supported HFile format version
|
static int |
MIN_FORMAT_VERSION_WITH_TAGS
Minimum HFile format version with support for persisting cell tags
|
static int |
MIN_NUM_HFILE_PATH_LEVELS
We assume that HFile path ends with
ROOT_DIR/TABLE_NAME/REGION_NAME/CF_NAME/HFILE, so it has at least this
many levels of nesting.
|
Constructor and Description |
---|
HFile() |
Modifier and Type | Method and Description |
---|---|
static void |
checkFormatVersion(int version)
Checks the given
HFile format version, and throws an exception if
invalid. |
static HFile.Reader |
createReader(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.Path path,
CacheConfig cacheConf,
org.apache.hadoop.conf.Configuration conf) |
static HFile.Reader |
createReader(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.Path path,
FSDataInputStreamWrapper fsdis,
long size,
CacheConfig cacheConf,
org.apache.hadoop.conf.Configuration conf)
The sockets and the file descriptors held by the method parameter
FSDataInputStreamWrapper passed will be freed after its usage so caller needs to ensure
that no other threads have access to the same passed reference. |
(package private) static HFile.Reader |
createReaderFromStream(org.apache.hadoop.fs.Path path,
org.apache.hadoop.fs.FSDataInputStream fsdis,
long size,
CacheConfig cacheConf,
org.apache.hadoop.conf.Configuration conf)
This factory method is used only by unit tests.
|
static long |
getChecksumFailuresCount()
Number of checksum verification failures.
|
static int |
getFormatVersion(org.apache.hadoop.conf.Configuration conf) |
(package private) static List<org.apache.hadoop.fs.Path> |
getStoreFiles(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.Path regionDir)
Returns all HFiles belonging to the given region directory.
|
static String[] |
getSupportedCompressionAlgorithms()
Get names of supported compression algorithms.
|
static HFile.WriterFactory |
getWriterFactory(org.apache.hadoop.conf.Configuration conf,
CacheConfig cacheConf)
Returns the factory to be used to create
HFile writers |
static HFile.WriterFactory |
getWriterFactoryNoCache(org.apache.hadoop.conf.Configuration conf)
Returns the factory to be used to create
HFile writers. |
static boolean |
isHFileFormat(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.FileStatus fileStatus)
Returns true if the specified file has a valid HFile Trailer.
|
static boolean |
isHFileFormat(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.Path path)
Returns true if the specified file has a valid HFile Trailer.
|
static boolean |
isReservedFileInfoKey(byte[] key)
Return true if the given file info key is reserved for internal use.
|
(package private) static int |
longToInt(long l) |
static void |
main(String[] args) |
private static HFile.Reader |
openReader(org.apache.hadoop.fs.Path path,
FSDataInputStreamWrapper fsdis,
long size,
CacheConfig cacheConf,
HFileSystem hfs,
org.apache.hadoop.conf.Configuration conf)
Method returns the reader given the specified arguments.
|
static final org.apache.commons.logging.Log LOG
public static final int MAXIMUM_KEY_LENGTH
public static final Compression.Algorithm DEFAULT_COMPRESSION_ALGORITHM
public static final int MIN_FORMAT_VERSION
public static final int MAX_FORMAT_VERSION
public static final int MIN_FORMAT_VERSION_WITH_TAGS
public static final String DEFAULT_COMPRESSION
public static final String BLOOM_FILTER_DATA_KEY
public static final int MIN_NUM_HFILE_PATH_LEVELS
public static final int DEFAULT_BYTES_PER_CHECKSUM
static final AtomicLong checksumFailures
public static final AtomicLong dataBlockReadCnt
public static final String FORMAT_VERSION_KEY
public static final long getChecksumFailuresCount()
public static int getFormatVersion(org.apache.hadoop.conf.Configuration conf)
public static final HFile.WriterFactory getWriterFactoryNoCache(org.apache.hadoop.conf.Configuration conf)
HFile
writers.
Disables block cache access for all writers created through the
returned factory.public static final HFile.WriterFactory getWriterFactory(org.apache.hadoop.conf.Configuration conf, CacheConfig cacheConf)
HFile
writersprivate static HFile.Reader openReader(org.apache.hadoop.fs.Path path, FSDataInputStreamWrapper fsdis, long size, CacheConfig cacheConf, HFileSystem hfs, org.apache.hadoop.conf.Configuration conf) throws IOException
path
- hfile's pathfsdis
- stream of path's filesize
- max size of the trailer.cacheConf
- Cache configuation values, cannot be null.hfs
- IOException
- If file is invalid, will throw CorruptHFileException flavored IOExceptionpublic static HFile.Reader createReader(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, FSDataInputStreamWrapper fsdis, long size, CacheConfig cacheConf, org.apache.hadoop.conf.Configuration conf) throws IOException
FSDataInputStreamWrapper
passed will be freed after its usage so caller needs to ensure
that no other threads have access to the same passed reference.fs
- A file systempath
- Path to HFilefsdis
- a stream of path's filesize
- max size of the trailer.cacheConf
- Cache configuration for hfile's contentsconf
- ConfigurationIOException
- If file is invalid, will throw CorruptHFileException flavored IOExceptionpublic static HFile.Reader createReader(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path, CacheConfig cacheConf, org.apache.hadoop.conf.Configuration conf) throws IOException
fs
- filesystempath
- Path to file to readcacheConf
- This must not be null. @see CacheConfig.CacheConfig(Configuration)
IOException
- Will throw a CorruptHFileException (DoNotRetryIOException subtype) if hfile is corrupt/invalid.static HFile.Reader createReaderFromStream(org.apache.hadoop.fs.Path path, org.apache.hadoop.fs.FSDataInputStream fsdis, long size, CacheConfig cacheConf, org.apache.hadoop.conf.Configuration conf) throws IOException
FSDataInputStreamWrapper
passed will be freed after its usage so caller needs to ensure
that no other threads have access to the same passed reference.IOException
public static boolean isHFileFormat(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path) throws IOException
fs
- filesystempath
- Path to file to verifyIOException
- if failed to read from the underlying streampublic static boolean isHFileFormat(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.FileStatus fileStatus) throws IOException
fs
- filesystemfileStatus
- the file to verifyIOException
- if failed to read from the underlying streampublic static boolean isReservedFileInfoKey(byte[] key)
public static String[] getSupportedCompressionAlgorithms()
static int longToInt(long l)
static List<org.apache.hadoop.fs.Path> getStoreFiles(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path regionDir) throws IOException
fs
- The file system reference.regionDir
- The region directory to scan.IOException
- When scanning the files fails.public static void checkFormatVersion(int version) throws IllegalArgumentException
HFile
format version, and throws an exception if
invalid. Note that if the version number comes from an input file and has
not been verified, the caller needs to re-throw an IOException
to
indicate that this is not a software error, but corrupted input.version
- an HFile versionIllegalArgumentException
- if the version is invalidCopyright © 2007–2019 The Apache Software Foundation. All rights reserved.