@InterfaceAudience.Private @InterfaceStability.Evolving public class SnapshotFileCache extends Object implements Stoppable
A cache of files is kept to avoid querying the FileSystem
frequently. If there is a cache
miss the directory modification time is used to ensure that we don't rescan directories that we
already have in cache. We only check the modification times of the snapshot directories
(/hbase/.snapshot/[snapshot_name]) to determine if the files need to be loaded into the cache.
New snapshots will be added to the cache and deleted snapshots will be removed when we refresh the cache. If the files underneath a snapshot directory are changed, but not the snapshot itself, we will ignore updates to that snapshot's files.
This is sufficient because each snapshot has its own directory and is added via an atomic rename once, when the snapshot is created. We don't need to worry about the data in the snapshot being run.
Further, the cache is periodically refreshed ensure that files in snapshots that were deleted are also removed from the cache.
A SnapshotFileCache.SnapshotFileInspector
must be passed when creating this to
allow extraction of files under /hbase/.snapshot/[snapshot name] directory, for each snapshot.
This allows you to only cache files under, for instance, all the logs in the .logs directory or
all the files under all the regions.
this also considers all running snapshots (those under /hbase/.snapshot/.tmp) as valid snapshots and will attempt to cache files from those snapshots as well.
Queries about a given file are thread-safe with respect to multiple queries and cache refreshes.
Modifier and Type | Class and Description |
---|---|
class |
SnapshotFileCache.RefreshCacheTask
Simple helper task that just periodically attempts to refresh the cache
|
private static class |
SnapshotFileCache.SnapshotDirectoryInfo
Information about a snapshot directory
|
(package private) static interface |
SnapshotFileCache.SnapshotFileInspector |
Modifier and Type | Field and Description |
---|---|
private org.apache.hbase.thirdparty.com.google.common.collect.ImmutableSet<String> |
cache |
private SnapshotFileCache.SnapshotFileInspector |
fileInspector |
private org.apache.hadoop.fs.FileSystem |
fs |
private static int |
LOCK_TIMEOUT_MS |
private static org.slf4j.Logger |
LOG |
private Timer |
refreshTimer |
private org.apache.hadoop.fs.Path |
snapshotDir |
private org.apache.hbase.thirdparty.com.google.common.collect.ImmutableMap<String,SnapshotFileCache.SnapshotDirectoryInfo> |
snapshots
This is a helper map of information about the snapshot directories so we don't need to rescan
them if they haven't changed since the last time we looked.
|
private boolean |
stop |
private org.apache.hadoop.fs.FileSystem |
workingFs |
private org.apache.hadoop.fs.Path |
workingSnapshotDir |
Constructor and Description |
---|
SnapshotFileCache(org.apache.hadoop.conf.Configuration conf,
long cacheRefreshPeriod,
long cacheRefreshDelay,
String refreshThreadName,
SnapshotFileCache.SnapshotFileInspector inspectSnapshotFiles)
Create a snapshot file cache for all snapshots under the specified [root]/.snapshot on the
filesystem.
|
SnapshotFileCache(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.Path rootDir,
org.apache.hadoop.fs.FileSystem workingFs,
org.apache.hadoop.fs.Path workingDir,
long cacheRefreshPeriod,
long cacheRefreshDelay,
String refreshThreadName,
SnapshotFileCache.SnapshotFileInspector inspectSnapshotFiles)
Create a snapshot file cache for all snapshots under the specified [root]/.snapshot on the
filesystem
|
Modifier and Type | Method and Description |
---|---|
(package private) List<String> |
getSnapshotsInProgress() |
Iterable<org.apache.hadoop.fs.FileStatus> |
getUnreferencedFiles(Iterable<org.apache.hadoop.fs.FileStatus> files,
SnapshotManager snapshotManager)
Check to see if any of the passed file names is contained in any of the snapshots.
|
boolean |
isStopped()
Returns True if
Stoppable.stop(String) has been closed. |
private void |
refreshCache() |
void |
stop(String why)
Stop this service.
|
void |
triggerCacheRefreshForTesting()
Trigger a cache refresh, even if its before the next cache refresh.
|
private static final org.slf4j.Logger LOG
private volatile boolean stop
private final org.apache.hadoop.fs.FileSystem fs
private final org.apache.hadoop.fs.FileSystem workingFs
private final SnapshotFileCache.SnapshotFileInspector fileInspector
private final org.apache.hadoop.fs.Path snapshotDir
private final org.apache.hadoop.fs.Path workingSnapshotDir
private volatile org.apache.hbase.thirdparty.com.google.common.collect.ImmutableSet<String> cache
private org.apache.hbase.thirdparty.com.google.common.collect.ImmutableMap<String,SnapshotFileCache.SnapshotDirectoryInfo> snapshots
private final Timer refreshTimer
private static final int LOCK_TIMEOUT_MS
public SnapshotFileCache(org.apache.hadoop.conf.Configuration conf, long cacheRefreshPeriod, long cacheRefreshDelay, String refreshThreadName, SnapshotFileCache.SnapshotFileInspector inspectSnapshotFiles) throws IOException
Immediately loads the file cache.
conf
- to extract the configured FileSystem
where the snapshots
are stored and hbase root directorycacheRefreshPeriod
- frequency (ms) with which the cache should be refreshedcacheRefreshDelay
- amount of time to wait for the cache to be refreshedrefreshThreadName
- name of the cache refresh threadinspectSnapshotFiles
- Filter to apply to each snapshot to extract the files.IOException
- if the FileSystem
or root directory cannot be loadedpublic SnapshotFileCache(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path rootDir, org.apache.hadoop.fs.FileSystem workingFs, org.apache.hadoop.fs.Path workingDir, long cacheRefreshPeriod, long cacheRefreshDelay, String refreshThreadName, SnapshotFileCache.SnapshotFileInspector inspectSnapshotFiles)
fs
- FileSystem
where the snapshots are storedrootDir
- hbase root directoryworkingFs
- FileSystem
where ongoing snapshot mainifest files are
storedworkingDir
- Location to store ongoing snapshot manifest filescacheRefreshPeriod
- period (ms) with which the cache should be refreshedcacheRefreshDelay
- amount of time to wait for the cache to be refreshedrefreshThreadName
- name of the cache refresh threadinspectSnapshotFiles
- Filter to apply to each snapshot to extract the files.public void triggerCacheRefreshForTesting()
public Iterable<org.apache.hadoop.fs.FileStatus> getUnreferencedFiles(Iterable<org.apache.hadoop.fs.FileStatus> files, SnapshotManager snapshotManager) throws IOException
Note this may lead to periodic false positives for the file being referenced. Periodically, the cache is refreshed even if there are no requests to ensure that the false negatives get removed eventually. For instance, suppose you have a file in the snapshot and it gets loaded into the cache. Then at some point later that snapshot is deleted. If the cache has not been refreshed at that point, cache will still think the file system contains that file and return true, even if it is no longer present (false positive). However, if the file never was on the filesystem, we will never find it and always return false.
files
- file to check, NOTE: Relies that files are loaded from hdfs before method is
called (NOT LAZY)IOException
- if there is an unexpected error reaching the filesystem.private void refreshCache() throws IOException
IOException
List<String> getSnapshotsInProgress() throws IOException
IOException
public void stop(String why)
Stoppable
public boolean isStopped()
Stoppable
Stoppable.stop(String)
has been closed.Copyright © 2007–2020 The Apache Software Foundation. All rights reserved.