Class HFileArchiver

java.lang.Object
org.apache.hadoop.hbase.backup.HFileArchiver

@Private public class HFileArchiver extends Object
Utility class to handle the removal of HFiles (or the respective StoreFiles) for a HRegion from the FileSystem. The hfiles will be archived or deleted, depending on the state of the system.
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    private static class 
    Wrapper to handle file operations uniformly
    private static class 
    A HFileArchiver.File that wraps a simple Path on a FileSystem.
    private static class 
    HFileArchiver.File adapter for a HStoreFile living on a FileSystem .
    private static class 
    Adapt a type to match the HFileArchiver.File interface, which is used internally for handling archival/removal of files
    private static class 
    Convert a FileStatus to something we can manage in the archiving
    private static class 
    Convert the HStoreFile into something we can manage in the archive methods
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    private static ThreadPoolExecutor
     
    private static final int
    Number of retries in case of fs operation failure
    private static final Function<HFileArchiver.File,org.apache.hadoop.fs.Path>
     
    private static final org.slf4j.Logger
     
    private static final String
     
  • Constructor Summary

    Constructors
    Modifier
    Constructor
    Description
    private
     
  • Method Summary

    Modifier and Type
    Method
    Description
    private static void
    archive(org.apache.hadoop.fs.FileSystem fs, RegionInfo regionInfo, byte[] family, Collection<HStoreFile> compactedFiles, org.apache.hadoop.fs.Path storeArchiveDir)
     
    static void
    archiveFamily(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.conf.Configuration conf, RegionInfo parent, org.apache.hadoop.fs.Path tableDir, byte[] family)
    Remove from the specified region the store files of the specified column family, either by archiving them or outright deletion
    static void
    archiveFamilyByFamilyDir(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.conf.Configuration conf, RegionInfo parent, org.apache.hadoop.fs.Path familyDir, byte[] family)
    Removes from the specified region the store files of the specified column family, either by archiving them or outright deletion
    static void
    archiveRecoveredEdits(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.FileSystem fs, RegionInfo regionInfo, byte[] family, Collection<HStoreFile> replayedEdits)
    Archive recovered edits using existing logic for archiving store files.
    static void
    archiveRegion(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.FileSystem fs, RegionInfo info)
    Cleans up all the files for a HRegion by archiving the HFiles to the archive directory
    static boolean
    archiveRegion(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path rootdir, org.apache.hadoop.fs.Path tableDir, org.apache.hadoop.fs.Path regionDir)
    Remove an entire region from the table directory via archiving the region's hfiles.
    static void
    archiveRegions(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path rootDir, org.apache.hadoop.fs.Path tableDir, List<org.apache.hadoop.fs.Path> regionDirList)
    Archive the specified regions in parallel.
    static void
    archiveStoreFile(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.FileSystem fs, RegionInfo regionInfo, org.apache.hadoop.fs.Path tableDir, byte[] family, org.apache.hadoop.fs.Path storeFile)
    Archive the store file
    static void
    archiveStoreFiles(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.FileSystem fs, RegionInfo regionInfo, org.apache.hadoop.fs.Path tableDir, byte[] family, Collection<HStoreFile> compactedFiles)
    Remove the store files, either by archiving them or outright deletion
    private static boolean
    deleteRegionWithoutArchiving(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path regionDir)
    Without regard for backup, delete a region.
    private static void
    Just do a simple delete of the given store files
    static boolean
    exists(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.FileSystem fs, RegionInfo info)
    Returns True if the Region exits in the filesystem.
    private static ThreadPoolExecutor
    getArchiveExecutor(org.apache.hadoop.conf.Configuration conf)
     
    private static ThreadFactory
     
    private static List<HFileArchiver.File>
    resolveAndArchive(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path baseArchiveDir, Collection<HFileArchiver.File> toArchive, long start)
    Resolve any conflict with an existing archive file via timestamp-append renaming of the existing file and then archive the passed in files.
    private static boolean
    resolveAndArchiveFile(org.apache.hadoop.fs.Path archiveDir, HFileArchiver.File currentFile, String archiveStartTime)
    Attempt to archive the passed in file to the archive directory.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

  • Constructor Details

  • Method Details

    • exists

      public static boolean exists(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.FileSystem fs, RegionInfo info) throws IOException
      Returns True if the Region exits in the filesystem.
      Throws:
      IOException
    • archiveRegion

      public static void archiveRegion(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.FileSystem fs, RegionInfo info) throws IOException
      Cleans up all the files for a HRegion by archiving the HFiles to the archive directory
      Parameters:
      conf - the configuration to use
      fs - the file system object
      info - RegionInfo for region to be deleted
      Throws:
      IOException
    • archiveRegion

      public static boolean archiveRegion(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path rootdir, org.apache.hadoop.fs.Path tableDir, org.apache.hadoop.fs.Path regionDir) throws IOException
      Remove an entire region from the table directory via archiving the region's hfiles.
      Parameters:
      fs - FileSystem from which to remove the region
      rootdir - Path to the root directory where hbase files are stored (for building the archive path)
      tableDir - Path to where the table is being stored (for building the archive path)
      regionDir - Path to where a region is being stored (for building the archive path)
      Returns:
      true if the region was successfully deleted. false if the filesystem operations could not complete.
      Throws:
      IOException - if the request cannot be completed
    • archiveRegions

      public static void archiveRegions(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path rootDir, org.apache.hadoop.fs.Path tableDir, List<org.apache.hadoop.fs.Path> regionDirList) throws IOException
      Archive the specified regions in parallel.
      Parameters:
      conf - the configuration to use
      fs - FileSystem from which to remove the region
      rootDir - Path to the root directory where hbase files are stored (for building the archive path)
      tableDir - Path to where the table is being stored (for building the archive path)
      regionDirList - Path to where regions are being stored (for building the archive path)
      Throws:
      IOException - if the request cannot be completed
    • getArchiveExecutor

      private static ThreadPoolExecutor getArchiveExecutor(org.apache.hadoop.conf.Configuration conf)
    • getThreadFactory

      private static ThreadFactory getThreadFactory()
    • archiveFamily

      public static void archiveFamily(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.conf.Configuration conf, RegionInfo parent, org.apache.hadoop.fs.Path tableDir, byte[] family) throws IOException
      Remove from the specified region the store files of the specified column family, either by archiving them or outright deletion
      Parameters:
      fs - the filesystem where the store files live
      conf - Configuration to examine to determine the archive directory
      parent - Parent region hosting the store files
      tableDir - Path to where the table is being stored (for building the archive path)
      family - the family hosting the store files
      Throws:
      IOException - if the files could not be correctly disposed.
    • archiveFamilyByFamilyDir

      public static void archiveFamilyByFamilyDir(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.conf.Configuration conf, RegionInfo parent, org.apache.hadoop.fs.Path familyDir, byte[] family) throws IOException
      Removes from the specified region the store files of the specified column family, either by archiving them or outright deletion
      Parameters:
      fs - the filesystem where the store files live
      conf - Configuration to examine to determine the archive directory
      parent - Parent region hosting the store files
      familyDir - Path to where the family is being stored
      family - the family hosting the store files
      Throws:
      IOException - if the files could not be correctly disposed.
    • archiveStoreFiles

      public static void archiveStoreFiles(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.FileSystem fs, RegionInfo regionInfo, org.apache.hadoop.fs.Path tableDir, byte[] family, Collection<HStoreFile> compactedFiles) throws IOException
      Remove the store files, either by archiving them or outright deletion
      Parameters:
      conf - Configuration to examine to determine the archive directory
      fs - the filesystem where the store files live
      regionInfo - RegionInfo of the region hosting the store files
      family - the family hosting the store files
      compactedFiles - files to be disposed of. No further reading of these files should be attempted; otherwise likely to cause an IOException
      Throws:
      IOException - if the files could not be correctly disposed.
    • archiveRecoveredEdits

      public static void archiveRecoveredEdits(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.FileSystem fs, RegionInfo regionInfo, byte[] family, Collection<HStoreFile> replayedEdits) throws IOException
      Archive recovered edits using existing logic for archiving store files. This is currently only relevant when hbase.region.archive.recovered.edits is true, as recovered edits shouldn't be kept after replay. In theory, we could use very same method available for archiving store files, but supporting WAL dir and store files on different FileSystems added the need for extra validation of the passed FileSystem instance and the path where the archiving edits should be placed.
      Parameters:
      conf - Configuration to determine the archive directory.
      fs - the filesystem used for storing WAL files.
      regionInfo - RegionInfo a pseudo region representation for the archiving logic.
      family - a pseudo familiy representation for the archiving logic.
      replayedEdits - the recovered edits to be archived.
      Throws:
      IOException - if files can't be achived due to some internal error.
    • archive

      private static void archive(org.apache.hadoop.fs.FileSystem fs, RegionInfo regionInfo, byte[] family, Collection<HStoreFile> compactedFiles, org.apache.hadoop.fs.Path storeArchiveDir) throws IOException
      Throws:
      IOException
    • archiveStoreFile

      public static void archiveStoreFile(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.FileSystem fs, RegionInfo regionInfo, org.apache.hadoop.fs.Path tableDir, byte[] family, org.apache.hadoop.fs.Path storeFile) throws IOException
      Archive the store file
      Parameters:
      fs - the filesystem where the store files live
      regionInfo - region hosting the store files
      conf - Configuration to examine to determine the archive directory
      tableDir - Path to where the table is being stored (for building the archive path)
      family - the family hosting the store files
      storeFile - file to be archived
      Throws:
      IOException - if the files could not be correctly disposed.
    • resolveAndArchive

      private static List<HFileArchiver.File> resolveAndArchive(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path baseArchiveDir, Collection<HFileArchiver.File> toArchive, long start) throws IOException
      Resolve any conflict with an existing archive file via timestamp-append renaming of the existing file and then archive the passed in files.
      Parameters:
      fs - FileSystem on which to archive the files
      baseArchiveDir - base archive directory to store the files. If any of the files to archive are directories, will append the name of the directory to the base archive directory name, creating a parallel structure.
      toArchive - files/directories that need to be archvied
      start - time the archiving started - used for resolving archive conflicts.
      Returns:
      the list of failed to archive files.
      Throws:
      IOException - if an unexpected file operation exception occurred
    • resolveAndArchiveFile

      private static boolean resolveAndArchiveFile(org.apache.hadoop.fs.Path archiveDir, HFileArchiver.File currentFile, String archiveStartTime) throws IOException
      Attempt to archive the passed in file to the archive directory.

      If the same file already exists in the archive, it is moved to a timestamped directory under the archive directory and the new file is put in its place.

      Parameters:
      archiveDir - Path to the directory that stores the archives of the hfiles
      currentFile - Path to the original HFile that will be archived
      archiveStartTime - time the archiving started, to resolve naming conflicts
      Returns:
      true if the file is successfully archived. false if there was a problem, but the operation still completed.
      Throws:
      IOException - on failure to complete FileSystem operations.
    • deleteRegionWithoutArchiving

      private static boolean deleteRegionWithoutArchiving(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path regionDir) throws IOException
      Without regard for backup, delete a region. Should be used with caution.
      Parameters:
      regionDir - Path to the region to be deleted.
      fs - FileSystem from which to delete the region
      Returns:
      true on successful deletion, false otherwise
      Throws:
      IOException - on filesystem operation failure
    • deleteStoreFilesWithoutArchiving

      private static void deleteStoreFilesWithoutArchiving(Collection<HStoreFile> compactedFiles) throws IOException
      Just do a simple delete of the given store files

      A best effort is made to delete each of the files, rather than bailing on the first failure.

      Parameters:
      compactedFiles - store files to delete from the file system.
      Throws:
      IOException - if a file cannot be deleted. All files will be attempted to deleted before throwing the exception, rather than failing at the first file.