Class FileLink
The Problem:
- HDFS doesn't have support for hardlinks, and this make impossible to referencing the same data blocks using different names.
- HBase store files in one location (e.g. table/region/family/) and when the file is not needed anymore (e.g. compaction, region deletion, ...) moves it to an archive directory.
HFileLink
is a more concrete implementation of the FileLink
.
Back-references: To help the CleanerChore
to keep track of the links to a particular file, during the FileLink
creation, a new file
is placed inside a back-reference directory. There's one back-reference directory for each file
that has links, and in the directory there's one file per link.
HFileLink Example
- /hbase/table/region-x/cf/file-k (Original File)
- /hbase/table-cloned/region-y/cf/file-k.region-x.table (HFileLink to the original file)
- /hbase/table-2nd-cloned/region-z/cf/file-k.region-x.table (HFileLink to the original file)
- /hbase/.archive/table/region-x/.links-file-k/region-y.table-cloned (Back-reference to the link in table-cloned)
- /hbase/.archive/table/region-x/.links-file-k/region-z.table-2nd-cloned (Back-reference to the link in table-2nd-cloned)
-
Nested Class Summary
Modifier and TypeClassDescriptionprotected static class
FileLink InputStream that handles the switch between the original path and the alternative locations, when the file is moved. -
Field Summary
Modifier and TypeFieldDescriptionstatic final String
Define the Back-reference directory name prefix: .links-<hfile>/private org.apache.hadoop.fs.Path[]
private static final org.slf4j.Logger
-
Constructor Summary
ModifierConstructorDescriptionprotected
FileLink()
FileLink
(Collection<org.apache.hadoop.fs.Path> locations) FileLink
(org.apache.hadoop.fs.Path originPath, org.apache.hadoop.fs.Path... alternativePaths) -
Method Summary
Modifier and TypeMethodDescriptionboolean
boolean
exists
(org.apache.hadoop.fs.FileSystem fs) Returns true if the file pointed by the link existsorg.apache.hadoop.fs.Path
getAvailablePath
(org.apache.hadoop.fs.FileSystem fs) Returns the path of the first available link.static String
getBackReferenceFileName
(org.apache.hadoop.fs.Path dirPath) Get the referenced file name from the reference link directory path.static org.apache.hadoop.fs.Path
getBackReferencesDir
(org.apache.hadoop.fs.Path storeDir, String fileName) Get the directory to store the link back referencesorg.apache.hadoop.fs.FileStatus
getFileStatus
(org.apache.hadoop.fs.FileSystem fs) Get the FileStatus of the referenced file.org.apache.hadoop.fs.Path[]
Returns the locations to look for the linked file.static org.apache.hadoop.fs.FSDataInputStream
getUnderlyingFileLinkInputStream
(org.apache.hadoop.fs.FSDataInputStream stream) If the passed FSDataInputStream is backed by a FileLink, returns the underlying InputStream for the resolved link target.private static IOException
handleAccessLocationException
(FileLink fileLink, IOException newException, IOException previousException) Handle exceptions which are thrown when access locations of file linkint
hashCode()
static boolean
isBackReferencesDir
(org.apache.hadoop.fs.Path dirPath) Checks if the specified directory path is a back reference links folder.org.apache.hadoop.fs.FSDataInputStream
open
(org.apache.hadoop.fs.FileSystem fs) Open the FileLink for read.org.apache.hadoop.fs.FSDataInputStream
open
(org.apache.hadoop.fs.FileSystem fs, int bufferSize) Open the FileLink for read.protected void
setLocations
(org.apache.hadoop.fs.Path originPath, org.apache.hadoop.fs.Path... alternativePaths) NOTE: This method must be used only in the constructor! It creates a List with the specified locations for the link.toString()
-
Field Details
-
LOG
-
BACK_REFERENCES_DIRECTORY_PREFIX
Define the Back-reference directory name prefix: .links-<hfile>/- See Also:
-
locations
-
-
Constructor Details
-
FileLink
protected FileLink() -
FileLink
public FileLink(org.apache.hadoop.fs.Path originPath, org.apache.hadoop.fs.Path... alternativePaths) - Parameters:
originPath
- Original location of the file to linkalternativePaths
- Alternative locations to look for the linked file
-
FileLink
- Parameters:
locations
- locations to look for the linked file
-
-
Method Details
-
getLocations
Returns the locations to look for the linked file. -
toString
-
exists
Returns true if the file pointed by the link exists- Throws:
IOException
-
getAvailablePath
public org.apache.hadoop.fs.Path getAvailablePath(org.apache.hadoop.fs.FileSystem fs) throws IOException Returns the path of the first available link.- Throws:
IOException
-
getFileStatus
public org.apache.hadoop.fs.FileStatus getFileStatus(org.apache.hadoop.fs.FileSystem fs) throws IOException Get the FileStatus of the referenced file.- Parameters:
fs
-FileSystem
on which to get the file status- Returns:
- InputStream for the hfile link.
- Throws:
IOException
- on unexpected error.
-
handleAccessLocationException
private static IOException handleAccessLocationException(FileLink fileLink, IOException newException, IOException previousException) throws IOException Handle exceptions which are thrown when access locations of file link- Parameters:
fileLink
- the file linknewException
- the exception caught by access the current locationpreviousException
- the previous exception caught by access the other locations- Returns:
- return AccessControlException if access one of the locations caught, otherwise return
FileNotFoundException. The AccessControlException is threw if user scan snapshot
feature is enabled, see
SnapshotScannerHDFSAclController
. - Throws:
IOException
- if the exception is neither AccessControlException nor FileNotFoundException
-
open
public org.apache.hadoop.fs.FSDataInputStream open(org.apache.hadoop.fs.FileSystem fs) throws IOException Open the FileLink for read.It uses a wrapper of FSDataInputStream that is agnostic to the location of the file, even if the file switches between locations.
- Parameters:
fs
-FileSystem
on which to open the FileLink- Returns:
- InputStream for reading the file link.
- Throws:
IOException
- on unexpected error.
-
open
public org.apache.hadoop.fs.FSDataInputStream open(org.apache.hadoop.fs.FileSystem fs, int bufferSize) throws IOException Open the FileLink for read.It uses a wrapper of FSDataInputStream that is agnostic to the location of the file, even if the file switches between locations.
- Parameters:
fs
-FileSystem
on which to open the FileLinkbufferSize
- the size of the buffer to be used.- Returns:
- InputStream for reading the file link.
- Throws:
IOException
- on unexpected error.
-
getUnderlyingFileLinkInputStream
public static org.apache.hadoop.fs.FSDataInputStream getUnderlyingFileLinkInputStream(org.apache.hadoop.fs.FSDataInputStream stream) If the passed FSDataInputStream is backed by a FileLink, returns the underlying InputStream for the resolved link target. Otherwise, returns null. -
setLocations
protected void setLocations(org.apache.hadoop.fs.Path originPath, org.apache.hadoop.fs.Path... alternativePaths) NOTE: This method must be used only in the constructor! It creates a List with the specified locations for the link. -
getBackReferencesDir
public static org.apache.hadoop.fs.Path getBackReferencesDir(org.apache.hadoop.fs.Path storeDir, String fileName) Get the directory to store the link back referencesTo simplify the reference count process, during the FileLink creation a back-reference is added to the back-reference directory of the specified file.
- Parameters:
storeDir
- Root directory for the link reference folderfileName
- File Name with links- Returns:
- Path for the link back references.
-
getBackReferenceFileName
Get the referenced file name from the reference link directory path.- Parameters:
dirPath
- Link references directory path- Returns:
- Name of the file referenced
-
isBackReferencesDir
Checks if the specified directory path is a back reference links folder.- Parameters:
dirPath
- Directory path to verify- Returns:
- True if the specified directory is a link references folder
-
equals
-
hashCode
-