Class FileLink
The Problem:
- HDFS doesn't have support for hardlinks, and this make impossible to referencing the same data blocks using different names.
- HBase store files in one location (e.g. table/region/family/) and when the file is not needed anymore (e.g. compaction, region deletion, ...) moves it to an archive directory.
HFileLink is a more concrete implementation of the FileLink.
Back-references: To help the CleanerChore
to keep track of the links to a particular file, during the FileLink creation, a new file
is placed inside a back-reference directory. There's one back-reference directory for each file
that has links, and in the directory there's one file per link.
HFileLink Example
- /hbase/table/region-x/cf/file-k (Original File)
- /hbase/table-cloned/region-y/cf/file-k.region-x.table (HFileLink to the original file)
- /hbase/table-2nd-cloned/region-z/cf/file-k.region-x.table (HFileLink to the original file)
- /hbase/.archive/table/region-x/.links-file-k/region-y.table-cloned (Back-reference to the link in table-cloned)
- /hbase/.archive/table/region-x/.links-file-k/region-z.table-2nd-cloned (Back-reference to the link in table-2nd-cloned)
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprotected static classFileLink InputStream that handles the switch between the original path and the alternative locations, when the file is moved. -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final StringDefine the Back-reference directory name prefix: .links-<hfile>/private org.apache.hadoop.fs.Path[]private static final org.slf4j.Logger -
Constructor Summary
ConstructorsModifierConstructorDescriptionprotectedFileLink()FileLink(Collection<org.apache.hadoop.fs.Path> locations) FileLink(org.apache.hadoop.fs.Path originPath, org.apache.hadoop.fs.Path... alternativePaths) -
Method Summary
Modifier and TypeMethodDescriptionbooleanbooleanexists(org.apache.hadoop.fs.FileSystem fs) Returns true if the file pointed by the link existsorg.apache.hadoop.fs.PathgetAvailablePath(org.apache.hadoop.fs.FileSystem fs) Returns the path of the first available link.static StringgetBackReferenceFileName(org.apache.hadoop.fs.Path dirPath) Get the referenced file name from the reference link directory path.static org.apache.hadoop.fs.PathgetBackReferencesDir(org.apache.hadoop.fs.Path storeDir, String fileName) Get the directory to store the link back referencesorg.apache.hadoop.fs.FileStatusgetFileStatus(org.apache.hadoop.fs.FileSystem fs) Get the FileStatus of the referenced file.org.apache.hadoop.fs.Path[]Returns the locations to look for the linked file.static org.apache.hadoop.fs.FSDataInputStreamgetUnderlyingFileLinkInputStream(org.apache.hadoop.fs.FSDataInputStream stream) If the passed FSDataInputStream is backed by a FileLink, returns the underlying InputStream for the resolved link target.private static IOExceptionhandleAccessLocationException(FileLink fileLink, IOException newException, IOException previousException) Handle exceptions which are thrown when access locations of file linkinthashCode()static booleanisBackReferencesDir(org.apache.hadoop.fs.Path dirPath) Checks if the specified directory path is a back reference links folder.org.apache.hadoop.fs.FSDataInputStreamopen(org.apache.hadoop.fs.FileSystem fs) Open the FileLink for read.org.apache.hadoop.fs.FSDataInputStreamopen(org.apache.hadoop.fs.FileSystem fs, int bufferSize) Open the FileLink for read.protected voidsetLocations(org.apache.hadoop.fs.Path originPath, org.apache.hadoop.fs.Path... alternativePaths) NOTE: This method must be used only in the constructor! It creates a List with the specified locations for the link.toString()
-
Field Details
-
LOG
-
BACK_REFERENCES_DIRECTORY_PREFIX
Define the Back-reference directory name prefix: .links-<hfile>/- See Also:
-
locations
-
-
Constructor Details
-
FileLink
protected FileLink() -
FileLink
public FileLink(org.apache.hadoop.fs.Path originPath, org.apache.hadoop.fs.Path... alternativePaths) - Parameters:
originPath- Original location of the file to linkalternativePaths- Alternative locations to look for the linked file
-
FileLink
- Parameters:
locations- locations to look for the linked file
-
-
Method Details
-
getLocations
Returns the locations to look for the linked file. -
toString
-
exists
Returns true if the file pointed by the link exists- Throws:
IOException
-
getAvailablePath
public org.apache.hadoop.fs.Path getAvailablePath(org.apache.hadoop.fs.FileSystem fs) throws IOException Returns the path of the first available link.- Throws:
IOException
-
getFileStatus
public org.apache.hadoop.fs.FileStatus getFileStatus(org.apache.hadoop.fs.FileSystem fs) throws IOException Get the FileStatus of the referenced file.- Parameters:
fs-FileSystemon which to get the file status- Returns:
- InputStream for the hfile link.
- Throws:
IOException- on unexpected error.
-
handleAccessLocationException
private static IOException handleAccessLocationException(FileLink fileLink, IOException newException, IOException previousException) throws IOException Handle exceptions which are thrown when access locations of file link- Parameters:
fileLink- the file linknewException- the exception caught by access the current locationpreviousException- the previous exception caught by access the other locations- Returns:
- return AccessControlException if access one of the locations caught, otherwise return
FileNotFoundException. The AccessControlException is threw if user scan snapshot
feature is enabled, see
SnapshotScannerHDFSAclController. - Throws:
IOException- if the exception is neither AccessControlException nor FileNotFoundException
-
open
public org.apache.hadoop.fs.FSDataInputStream open(org.apache.hadoop.fs.FileSystem fs) throws IOException Open the FileLink for read.It uses a wrapper of FSDataInputStream that is agnostic to the location of the file, even if the file switches between locations.
- Parameters:
fs-FileSystemon which to open the FileLink- Returns:
- InputStream for reading the file link.
- Throws:
IOException- on unexpected error.
-
open
public org.apache.hadoop.fs.FSDataInputStream open(org.apache.hadoop.fs.FileSystem fs, int bufferSize) throws IOException Open the FileLink for read.It uses a wrapper of FSDataInputStream that is agnostic to the location of the file, even if the file switches between locations.
- Parameters:
fs-FileSystemon which to open the FileLinkbufferSize- the size of the buffer to be used.- Returns:
- InputStream for reading the file link.
- Throws:
IOException- on unexpected error.
-
getUnderlyingFileLinkInputStream
public static org.apache.hadoop.fs.FSDataInputStream getUnderlyingFileLinkInputStream(org.apache.hadoop.fs.FSDataInputStream stream) If the passed FSDataInputStream is backed by a FileLink, returns the underlying InputStream for the resolved link target. Otherwise, returns null. -
setLocations
protected void setLocations(org.apache.hadoop.fs.Path originPath, org.apache.hadoop.fs.Path... alternativePaths) NOTE: This method must be used only in the constructor! It creates a List with the specified locations for the link. -
getBackReferencesDir
public static org.apache.hadoop.fs.Path getBackReferencesDir(org.apache.hadoop.fs.Path storeDir, String fileName) Get the directory to store the link back referencesTo simplify the reference count process, during the FileLink creation a back-reference is added to the back-reference directory of the specified file.
- Parameters:
storeDir- Root directory for the link reference folderfileName- File Name with links- Returns:
- Path for the link back references.
-
getBackReferenceFileName
Get the referenced file name from the reference link directory path.- Parameters:
dirPath- Link references directory path- Returns:
- Name of the file referenced
-
isBackReferencesDir
Checks if the specified directory path is a back reference links folder.- Parameters:
dirPath- Directory path to verify- Returns:
- True if the specified directory is a link references folder
-
equals
-
hashCode
-