java.lang.Object

org.apache.hadoop.conf.Configured

org.apache.hadoop.hbase.tool.BulkLoadHFilesTool

All Implemented Interfaces:: org.apache.hadoop.conf.Configurable, BulkLoadHFiles, org.apache.hadoop.util.Tool

@LimitedPrivate("Tools") public class BulkLoadHFilesTool extends org.apache.hadoop.conf.Configured implements BulkLoadHFiles, org.apache.hadoop.util.Tool

The implementation for BulkLoadHFiles, and also can be executed from command line as a tool.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

private static interface

BulkLoadHFilesTool.BulkHFileVisitor<TFamily>

Nested classes/interfaces inherited from interface org.apache.hadoop.hbase.tool.BulkLoadHFiles
BulkLoadHFiles.LoadQueueItem
Field Summary

Fields

Modifier and Type

Field

Description

private boolean

assignSeqIds

static final String

BULK_LOAD_HFILES_BY_FAMILY

HBASE-24221 Support bulkLoadHFile by family to avoid long time waiting of bulkLoadHFile because of compacting at server side

private boolean

bulkLoadByFamily

private String

bulkToken

private List<String>

clusterIds

private static final boolean

DEFAULT_LOCALITY_SENSITIVE

static final String

FAIL_IF_NEED_SPLIT_HFILE

private boolean

failIfNeedSplitHFile

private FsDelegationToken

fsDelegationToken

static final String

LOCALITY_SENSITIVE_CONF_KEY

Keep locality while generating HFiles for bulkload.

private static final org.slf4j.Logger

LOG

private int

maxFilesPerRegionPerFamily

static final String

NAME

private int

nrThreads

private final AtomicInteger

numRetries

private boolean

replicate

(package private) static final String

TMP_DIR

private UserProvider

userProvider

private static final String

VALIDATE_HFILES

Whether to run validation on hfiles before loading.

Fields inherited from interface org.apache.hadoop.hbase.tool.BulkLoadHFiles
ALWAYS_COPY_FILES, ASSIGN_SEQ_IDS, CREATE_TABLE_CONF_KEY, IGNORE_UNMATCHED_CF_CONF_KEY, MAX_FILES_PER_REGION_PER_FAMILY, RETRY_ON_IO_EXCEPTION
Constructor Summary

Constructors

Constructor

Description

BulkLoadHFilesTool(org.apache.hadoop.conf.Configuration conf)
Method Summary

Modifier and Type

Method

Description

Map<BulkLoadHFiles.LoadQueueItem,ByteBuffer>

bulkLoad(TableName tableName, Map<byte[],List<org.apache.hadoop.fs.Path>> family2Files)

Perform a bulk load of the given directory into the given pre-existing table.

Map<BulkLoadHFiles.LoadQueueItem,ByteBuffer>

bulkLoad(TableName tableName, org.apache.hadoop.fs.Path dir)

Perform a bulk load of the given directory into the given pre-existing table.

protected void

bulkLoadPhase(AsyncClusterConnection conn, TableName tableName, Deque<BulkLoadHFiles.LoadQueueItem> queue, org.apache.hbase.thirdparty.com.google.common.collect.Multimap<ByteBuffer,BulkLoadHFiles.LoadQueueItem> regionGroups, boolean copyFiles, Map<BulkLoadHFiles.LoadQueueItem,ByteBuffer> item2RegionMap)

This takes the LQI's grouped by likely regions and attempts to bulk load them.

private boolean

checkHFilesCountPerRegionPerFamily(org.apache.hbase.thirdparty.com.google.common.collect.Multimap<ByteBuffer,BulkLoadHFiles.LoadQueueItem> regionGroups)

private void

checkRegionIndexValid(int idx, List<Pair<byte[],byte[]>> startEndKeys, TableName tableName)

we can consider there is a region hole or overlap in following conditions.

private void

cleanup(AsyncClusterConnection conn, TableName tableName, Deque<BulkLoadHFiles.LoadQueueItem> queue, ExecutorService pool)

private static void

copyHFileHalf(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path inFile, org.apache.hadoop.fs.Path outFile, Reference reference, ColumnFamilyDescriptor familyDescriptor, AsyncTableRegionLocator loc)

Copy half of an HFile into a new HFile with favored nodes.

private ExecutorService

createExecutorService()

private void

createTable(TableName tableName, org.apache.hadoop.fs.Path hfofDir, AsyncAdmin admin)

If the table is created for the first time, then "completebulkload" reads the files twice.

void

disableReplication()

Disables replication for all bulkloads done via this instance, when bulkload replication is configured.

private static void

discoverLoadQueue(org.apache.hadoop.conf.Configuration conf, Deque<BulkLoadHFiles.LoadQueueItem> ret, org.apache.hadoop.fs.Path hfofDir, boolean validateHFile)

Walk the given directory for all HFiles, and return a Queue containing all such files.

private Map<BulkLoadHFiles.LoadQueueItem,ByteBuffer>

doBulkLoad(AsyncClusterConnection conn, TableName tableName, Map<byte[],List<org.apache.hadoop.fs.Path>> map, boolean silence, boolean copyFile)

Perform a bulk load of the given map of families to hfiles into the given pre-existing table.

private Map<BulkLoadHFiles.LoadQueueItem,ByteBuffer>

doBulkLoad(AsyncClusterConnection conn, TableName tableName, org.apache.hadoop.fs.Path hfofDir, boolean silence, boolean copyFile)

Perform a bulk load of the given directory into the given pre-existing table.

private int

getRegionIndex(List<Pair<byte[],byte[]>> startEndKeys, byte[] key)

private String

getUniqueName()

private Map<byte[],Collection<BulkLoadHFiles.LoadQueueItem>>

groupByFamilies(Collection<BulkLoadHFiles.LoadQueueItem> itemsInRegion)

protected Pair<List<BulkLoadHFiles.LoadQueueItem>,String>

groupOrSplit(AsyncClusterConnection conn, TableName tableName, org.apache.hbase.thirdparty.com.google.common.collect.Multimap<ByteBuffer,BulkLoadHFiles.LoadQueueItem> regionGroups, BulkLoadHFiles.LoadQueueItem item, List<Pair<byte[],byte[]>> startEndKeys)

Attempt to assign the given load queue item into its target region group.

private Pair<org.apache.hbase.thirdparty.com.google.common.collect.Multimap<ByteBuffer,BulkLoadHFiles.LoadQueueItem>,Set<String>>

groupOrSplitPhase(AsyncClusterConnection conn, TableName tableName, ExecutorService pool, Deque<BulkLoadHFiles.LoadQueueItem> queue, List<Pair<byte[],byte[]>> startEndKeys)

static byte[][]

inferBoundaries(SortedMap<byte[],Integer> bdryMap)

Infers region boundaries for a new table.

void

initialize()

private static StoreFileWriter

initStoreFileWriter(org.apache.hadoop.conf.Configuration conf, Cell cell, HFileContext hFileContext, CacheConfig cacheConf, BloomType bloomFilterType, org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path outFile, AsyncTableRegionLocator loc)

private boolean

isAlwaysCopyFiles()

private boolean

isCreateTable()

boolean

isReplicationDisabled()

Returns true if replication has been disabled.

private boolean

isSilence()

void

loadHFileQueue(AsyncClusterConnection conn, TableName tableName, Deque<BulkLoadHFiles.LoadQueueItem> queue, boolean copyFiles)

Used by the replication sink to load the hfiles from the source cluster.

static void

main(String[] args)

private Map<BulkLoadHFiles.LoadQueueItem,ByteBuffer>

performBulkLoad(AsyncClusterConnection conn, TableName tableName, Deque<BulkLoadHFiles.LoadQueueItem> queue, ExecutorService pool, boolean copyFile)

private static void

populateLoadQueue(Deque<BulkLoadHFiles.LoadQueueItem> ret, Map<byte[],List<org.apache.hadoop.fs.Path>> map)

Populate the Queue with given HFiles

static void

prepareHFileQueue(org.apache.hadoop.conf.Configuration conf, AsyncClusterConnection conn, TableName tableName, org.apache.hadoop.fs.Path hfilesDir, Deque<BulkLoadHFiles.LoadQueueItem> queue, boolean validateHFile, boolean silence)

Prepare a collection of LoadQueueItem from list of source hfiles contained in the passed directory and validates whether the prepared queue has all the valid table column families in it.

static void

prepareHFileQueue(AsyncClusterConnection conn, TableName tableName, Map<byte[],List<org.apache.hadoop.fs.Path>> map, Deque<BulkLoadHFiles.LoadQueueItem> queue, boolean silence)

Prepare a collection of LoadQueueItem from list of source hfiles contained in the passed directory and validates whether the prepared queue has all the valid table column families in it.

int

run(String[] args)

void

setBulkToken(String bulkToken)

void

setClusterIds(List<String> clusterIds)

private static boolean

shouldCopyHFileMetaKey(byte[] key)

(package private) static void

splitStoreFile(AsyncTableRegionLocator loc, org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path inFile, ColumnFamilyDescriptor familyDesc, byte[] splitKey, org.apache.hadoop.fs.Path bottomOut, org.apache.hadoop.fs.Path topOut)

Split a storefile into a top and bottom half with favored nodes, maintaining the metadata, recreating bloom filters, etc.

private List<BulkLoadHFiles.LoadQueueItem>

splitStoreFile(AsyncTableRegionLocator loc, BulkLoadHFiles.LoadQueueItem item, TableDescriptor tableDesc, byte[] splitKey)

private void

tableExists(AsyncClusterConnection conn, TableName tableName)

private void

throwAndLogTableNotFoundException(TableName tn)

protected CompletableFuture<Collection<BulkLoadHFiles.LoadQueueItem>>

tryAtomicRegionLoad(AsyncClusterConnection conn, TableName tableName, boolean copyFiles, byte[] first, Collection<BulkLoadHFiles.LoadQueueItem> lqis)

Attempts to do an atomic load of many hfiles into a region.

private void

usage()

private static void

validateFamiliesInHFiles(TableDescriptor tableDesc, Deque<BulkLoadHFiles.LoadQueueItem> queue, boolean silence)

Checks whether there is any invalid family name in HFiles to be bulk loaded.

private static <TFamily> void

visitBulkHFiles(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path bulkDir, BulkLoadHFilesTool.BulkHFileVisitor<TFamily> visitor, boolean validateHFile)

Iterate over the bulkDir hfiles.

Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConf

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf

Field Details
- LOG
  
  private static final org.slf4j.Logger LOG
- LOCALITY_SENSITIVE_CONF_KEY
  
  public static final String LOCALITY_SENSITIVE_CONF_KEY
  
  Keep locality while generating HFiles for bulkload. See HBASE-12596
  See Also:
  
  Constant Field Values
- DEFAULT_LOCALITY_SENSITIVE
  
  private static final boolean DEFAULT_LOCALITY_SENSITIVE
  See Also:
  
  Constant Field Values
- NAME
  
  public static final String NAME
  See Also:
  
  Constant Field Values
- VALIDATE_HFILES
  
  private static final String VALIDATE_HFILES
  
  Whether to run validation on hfiles before loading.
  See Also:
  
  Constant Field Values
- BULK_LOAD_HFILES_BY_FAMILY
  
  public static final String BULK_LOAD_HFILES_BY_FAMILY
  
  HBASE-24221 Support bulkLoadHFile by family to avoid long time waiting of bulkLoadHFile because of compacting at server side
  See Also:
  
  Constant Field Values
- FAIL_IF_NEED_SPLIT_HFILE
  
  public static final String FAIL_IF_NEED_SPLIT_HFILE
  See Also:
  
  Constant Field Values
- TMP_DIR
  
  static final String TMP_DIR
  See Also:
  
  Constant Field Values
- maxFilesPerRegionPerFamily
  
  private int maxFilesPerRegionPerFamily
- assignSeqIds
  
  private boolean assignSeqIds
- bulkLoadByFamily
  
  private boolean bulkLoadByFamily
- fsDelegationToken
  
  private FsDelegationToken fsDelegationToken
- userProvider
  
  private UserProvider userProvider
- nrThreads
  
  private int nrThreads
- numRetries
  
  private final AtomicInteger numRetries
- bulkToken
  
  private String bulkToken
- clusterIds
  
  private List<String> clusterIds
- replicate
  
  private boolean replicate
- failIfNeedSplitHFile
  
  private boolean failIfNeedSplitHFile
Constructor Details
- BulkLoadHFilesTool
  
  public BulkLoadHFilesTool(org.apache.hadoop.conf.Configuration conf)
Method Details
- initialize
  
  public void initialize()
- createExecutorService
  
  private ExecutorService createExecutorService()
- isCreateTable
  
  private boolean isCreateTable()
- isSilence
  
  private boolean isSilence()
- isAlwaysCopyFiles
  
  private boolean isAlwaysCopyFiles()
- shouldCopyHFileMetaKey
  
  private static boolean shouldCopyHFileMetaKey(byte[] key)
- validateFamiliesInHFiles
  
  private static void validateFamiliesInHFiles(TableDescriptor tableDesc, Deque<BulkLoadHFiles.LoadQueueItem> queue, boolean silence) throws IOException
  
  Checks whether there is any invalid family name in HFiles to be bulk loaded.
  
  Throws:
  
  IOException
- populateLoadQueue
  
  private static void populateLoadQueue(Deque<BulkLoadHFiles.LoadQueueItem> ret, Map<byte[],List<org.apache.hadoop.fs.Path>> map)
  
  Populate the Queue with given HFiles
- visitBulkHFiles
  
  private static <TFamily> void visitBulkHFiles(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path bulkDir, BulkLoadHFilesTool.BulkHFileVisitor<TFamily> visitor, boolean validateHFile) throws IOException
  
  Iterate over the bulkDir hfiles. Skip reference, HFileLink, files starting with "_". Check and skip non-valid hfiles by default, or skip this validation by setting VALIDATE_HFILES to false.
  
  Throws:
  
  IOException
- discoverLoadQueue
  
  private static void discoverLoadQueue(org.apache.hadoop.conf.Configuration conf, Deque<BulkLoadHFiles.LoadQueueItem> ret, org.apache.hadoop.fs.Path hfofDir, boolean validateHFile) throws IOException
  
  Walk the given directory for all HFiles, and return a Queue containing all such files.
  
  Throws:
  
  IOException
- prepareHFileQueue
  
  public static void prepareHFileQueue(AsyncClusterConnection conn, TableName tableName, Map<byte[],List<org.apache.hadoop.fs.Path>> map, Deque<BulkLoadHFiles.LoadQueueItem> queue, boolean silence) throws IOException
  
  Prepare a collection of LoadQueueItem from list of source hfiles contained in the passed directory and validates whether the prepared queue has all the valid table column families in it.
  
  Parameters:
  
  map - map of family to List of hfiles
  
  tableName - table to which hfiles should be loaded
  
  queue - queue which needs to be loaded into the table
  
  silence - true to ignore unmatched column families
  
  Throws:
  
  IOException - If any I/O or network error occurred
- prepareHFileQueue
  
  public static void prepareHFileQueue(org.apache.hadoop.conf.Configuration conf, AsyncClusterConnection conn, TableName tableName, org.apache.hadoop.fs.Path hfilesDir, Deque<BulkLoadHFiles.LoadQueueItem> queue, boolean validateHFile, boolean silence) throws IOException
  
  Prepare a collection of LoadQueueItem from list of source hfiles contained in the passed directory and validates whether the prepared queue has all the valid table column families in it.
  
  Parameters:
  
  hfilesDir - directory containing list of hfiles to be loaded into the table
  
  queue - queue which needs to be loaded into the table
  
  validateHFile - if true hfiles will be validated for its format
  
  silence - true to ignore unmatched column families
  
  Throws:
  
  IOException - If any I/O or network error occurred
- loadHFileQueue
  
  public void loadHFileQueue(AsyncClusterConnection conn, TableName tableName, Deque<BulkLoadHFiles.LoadQueueItem> queue, boolean copyFiles) throws IOException
  Used by the replication sink to load the hfiles from the source cluster. It does the following,
  
  groupOrSplitPhase(AsyncClusterConnection, TableName, ExecutorService, Deque, List)
  
  bulkLoadPhase(AsyncClusterConnection, TableName, Deque, Multimap, boolean, Map)
  Parameters:
  
  conn - Connection to use
  
  tableName - Table to which these hfiles should be loaded to
  
  queue - LoadQueueItem has hfiles yet to be loaded
  
  Throws:
  
  IOException
- tryAtomicRegionLoad
  
  @Private protected CompletableFuture<Collection<BulkLoadHFiles.LoadQueueItem>> tryAtomicRegionLoad(AsyncClusterConnection conn, TableName tableName, boolean copyFiles, byte[] first, Collection<BulkLoadHFiles.LoadQueueItem> lqis)
  
  Attempts to do an atomic load of many hfiles into a region. If it fails, it returns a list of hfiles that need to be retried. If it is successful it will return an empty list. NOTE: To maintain row atomicity guarantees, region server side should succeed atomically and fails atomically.
  
  Parameters:
  
  conn - Connection to use
  
  tableName - Table to which these hfiles should be loaded to
  
  copyFiles - whether replicate to peer cluster while bulkloading
  
  first - the start key of region
  
  lqis - hfiles should be loaded
  
  Returns:
  
  empty list if success, list of items to retry on recoverable failure
- bulkLoadPhase
  
  @Private protected void bulkLoadPhase(AsyncClusterConnection conn, TableName tableName, Deque<BulkLoadHFiles.LoadQueueItem> queue, org.apache.hbase.thirdparty.com.google.common.collect.Multimap<ByteBuffer,BulkLoadHFiles.LoadQueueItem> regionGroups, boolean copyFiles, Map<BulkLoadHFiles.LoadQueueItem,ByteBuffer> item2RegionMap) throws IOException
  
  This takes the LQI's grouped by likely regions and attempts to bulk load them. Any failures are re-queued for another pass with the groupOrSplitPhase.
  protected for testing.
  
  Throws:
  
  IOException
- groupByFamilies
  
  private Map<byte[],Collection<BulkLoadHFiles.LoadQueueItem>> groupByFamilies(Collection<BulkLoadHFiles.LoadQueueItem> itemsInRegion)
- checkHFilesCountPerRegionPerFamily
  
  private boolean checkHFilesCountPerRegionPerFamily(org.apache.hbase.thirdparty.com.google.common.collect.Multimap<ByteBuffer,BulkLoadHFiles.LoadQueueItem> regionGroups)
- groupOrSplitPhase
  
  private Pair<org.apache.hbase.thirdparty.com.google.common.collect.Multimap<ByteBuffer,BulkLoadHFiles.LoadQueueItem>,Set<String>> groupOrSplitPhase(AsyncClusterConnection conn, TableName tableName, ExecutorService pool, Deque<BulkLoadHFiles.LoadQueueItem> queue, List<Pair<byte[],byte[]>> startEndKeys) throws IOException
  
  Parameters:
  
  conn - the HBase cluster connection
  
  tableName - the table name of the table to load into
  
  pool - the ExecutorService
  
  queue - the queue for LoadQueueItem
  
  startEndKeys - start and end keys
  
  Returns:
  
  A map that groups LQI by likely bulk load region targets and Set of missing hfiles.
  
  Throws:
  
  IOException
- getUniqueName
  
  private String getUniqueName()
- splitStoreFile
  
  private List<BulkLoadHFiles.LoadQueueItem> splitStoreFile(AsyncTableRegionLocator loc, BulkLoadHFiles.LoadQueueItem item, TableDescriptor tableDesc, byte[] splitKey) throws IOException
  
  Throws:
  
  IOException
- getRegionIndex
  
  private int getRegionIndex(List<Pair<byte[],byte[]>> startEndKeys, byte[] key)
  
  Parameters:
  
  startEndKeys - the start/end keys of regions belong to this table, the list in ascending order by start key
  
  key - the key need to find which region belong to
  
  Returns:
  
  region index
- checkRegionIndexValid
  
  private void checkRegionIndexValid(int idx, List<Pair<byte[],byte[]>> startEndKeys, TableName tableName) throws IOException
  
  we can consider there is a region hole or overlap in following conditions. 1) if idx < 0,then first region info is lost. 2) if the endkey of a region is not equal to the startkey of the next region. 3) if the endkey of the last region is not empty.
  
  Throws:
  
  IOException
- groupOrSplit
  
  @Private protected Pair<List<BulkLoadHFiles.LoadQueueItem>,String> groupOrSplit(AsyncClusterConnection conn, TableName tableName, org.apache.hbase.thirdparty.com.google.common.collect.Multimap<ByteBuffer,BulkLoadHFiles.LoadQueueItem> regionGroups, BulkLoadHFiles.LoadQueueItem item, List<Pair<byte[],byte[]>> startEndKeys) throws IOException
  
  Attempt to assign the given load queue item into its target region group. If the hfile boundary no longer fits into a region, physically splits the hfile such that the new bottom half will fit and returns the list of LQI's corresponding to the resultant hfiles.
  protected for testing
  
  Throws:
  
  IOException - if an IO failure is encountered
- splitStoreFile
  
  @Private static void splitStoreFile(AsyncTableRegionLocator loc, org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path inFile, ColumnFamilyDescriptor familyDesc, byte[] splitKey, org.apache.hadoop.fs.Path bottomOut, org.apache.hadoop.fs.Path topOut) throws IOException
  
  Split a storefile into a top and bottom half with favored nodes, maintaining the metadata, recreating bloom filters, etc.
  
  Throws:
  
  IOException
- initStoreFileWriter
  
  private static StoreFileWriter initStoreFileWriter(org.apache.hadoop.conf.Configuration conf, Cell cell, HFileContext hFileContext, CacheConfig cacheConf, BloomType bloomFilterType, org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path outFile, AsyncTableRegionLocator loc) throws IOException
  
  Throws:
  
  IOException
- copyHFileHalf
  
  private static void copyHFileHalf(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path inFile, org.apache.hadoop.fs.Path outFile, Reference reference, ColumnFamilyDescriptor familyDescriptor, AsyncTableRegionLocator loc) throws IOException
  
  Copy half of an HFile into a new HFile with favored nodes.
  
  Throws:
  
  IOException
- inferBoundaries
  
  public static byte[][] inferBoundaries(SortedMap<byte[],Integer> bdryMap)
  Infers region boundaries for a new table.
  Parameter:
  bdryMap is a map between keys to an integer belonging to {+1, -1}
  
  If a key is a start key of a file, then it maps to +1
  
  If a key is an end key of a file, then it maps to -1
  
  Algo:
  
  Poll on the keys in order:
  
  Keep adding the mapped values to these keys (runningSum)
  
  Each time runningSum reaches 0, add the start Key from when the runningSum had started to a boundary list.
  
  Return the boundary list.
- createTable
  
  private void createTable(TableName tableName, org.apache.hadoop.fs.Path hfofDir, AsyncAdmin admin) throws IOException
  
  If the table is created for the first time, then "completebulkload" reads the files twice. More modifications necessary if we want to avoid doing it.
  
  Throws:
  
  IOException
- performBulkLoad
  
  private Map<BulkLoadHFiles.LoadQueueItem,ByteBuffer> performBulkLoad(AsyncClusterConnection conn, TableName tableName, Deque<BulkLoadHFiles.LoadQueueItem> queue, ExecutorService pool, boolean copyFile) throws IOException
  
  Throws:
  
  IOException
- cleanup
  
  private void cleanup(AsyncClusterConnection conn, TableName tableName, Deque<BulkLoadHFiles.LoadQueueItem> queue, ExecutorService pool) throws IOException
  
  Throws:
  
  IOException
- doBulkLoad
  
  private Map<BulkLoadHFiles.LoadQueueItem,ByteBuffer> doBulkLoad(AsyncClusterConnection conn, TableName tableName, Map<byte[],List<org.apache.hadoop.fs.Path>> map, boolean silence, boolean copyFile) throws IOException
  
  Perform a bulk load of the given map of families to hfiles into the given pre-existing table. This method is not threadsafe.
  
  Parameters:
  
  map - map of family to List of hfiles
  
  tableName - table to load the hfiles
  
  silence - true to ignore unmatched column families
  
  copyFile - always copy hfiles if true
  
  Throws:
  
  IOException
- doBulkLoad
  
  private Map<BulkLoadHFiles.LoadQueueItem,ByteBuffer> doBulkLoad(AsyncClusterConnection conn, TableName tableName, org.apache.hadoop.fs.Path hfofDir, boolean silence, boolean copyFile) throws IOException
  
  Perform a bulk load of the given directory into the given pre-existing table. This method is not threadsafe.
  
  Parameters:
  
  tableName - table to load the hfiles
  
  hfofDir - the directory that was provided as the output path of a job using HFileOutputFormat
  
  silence - true to ignore unmatched column families
  
  copyFile - always copy hfiles if true
  
  Throws:
  
  IOException
- bulkLoad
  
  public Map<BulkLoadHFiles.LoadQueueItem,ByteBuffer> bulkLoad(TableName tableName, Map<byte[],List<org.apache.hadoop.fs.Path>> family2Files) throws IOException
  
  Description copied from interface: BulkLoadHFiles
  
  Perform a bulk load of the given directory into the given pre-existing table.
  
  Specified by:
  
  bulkLoad in interface BulkLoadHFiles
  
  Parameters:
  
  tableName - the table to load into
  
  family2Files - map of family to List of hfiles
  
  Throws:
  
  TableNotFoundException - if table does not yet exist
  
  IOException
- bulkLoad
  
  public Map<BulkLoadHFiles.LoadQueueItem,ByteBuffer> bulkLoad(TableName tableName, org.apache.hadoop.fs.Path dir) throws IOException
  
  Description copied from interface: BulkLoadHFiles
  
  Perform a bulk load of the given directory into the given pre-existing table.
  
  Specified by:
  
  bulkLoad in interface BulkLoadHFiles
  
  Parameters:
  
  tableName - the table to load into
  
  dir - the directory that was provided as the output path of a job using HFileOutputFormat
  
  Throws:
  
  TableNotFoundException - if table does not yet exist
  
  IOException
- tableExists
  
  private void tableExists(AsyncClusterConnection conn, TableName tableName) throws IOException
  
  Throws:
  
  TableNotFoundException - if table does not exist.
  
  IOException
- throwAndLogTableNotFoundException
  
  private void throwAndLogTableNotFoundException(TableName tn) throws TableNotFoundException
  
  Throws:
  
  TableNotFoundException
- setBulkToken
  
  public void setBulkToken(String bulkToken)
- setClusterIds
  
  public void setClusterIds(List<String> clusterIds)
- usage
  
  private void usage()
- run
  
  public int run(String[] args) throws Exception
  
  Specified by:
  
  run in interface org.apache.hadoop.util.Tool
  
  Throws:
  
  Exception
- main
  
  public static void main(String[] args) throws Exception
  
  Throws:
  
  Exception
- disableReplication
  
  public void disableReplication()
  
  Description copied from interface: BulkLoadHFiles
  
  Disables replication for all bulkloads done via this instance, when bulkload replication is configured.
  
  Specified by:
  
  disableReplication in interface BulkLoadHFiles
- isReplicationDisabled
  
  public boolean isReplicationDisabled()
  
  Description copied from interface: BulkLoadHFiles
  
  Returns true if replication has been disabled.
  
  Specified by:
  
  isReplicationDisabled in interface BulkLoadHFiles

Class BulkLoadHFilesTool

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.hadoop.hbase.tool.BulkLoadHFiles

Field Summary

Fields inherited from interface org.apache.hadoop.hbase.tool.BulkLoadHFiles

Constructor Summary

Method Summary

Methods inherited from class org.apache.hadoop.conf.Configured

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.hadoop.conf.Configurable

Field Details

LOG

LOCALITY_SENSITIVE_CONF_KEY

DEFAULT_LOCALITY_SENSITIVE

NAME

VALIDATE_HFILES

BULK_LOAD_HFILES_BY_FAMILY

FAIL_IF_NEED_SPLIT_HFILE

TMP_DIR

maxFilesPerRegionPerFamily

assignSeqIds

bulkLoadByFamily

fsDelegationToken

userProvider

nrThreads

numRetries

bulkToken

clusterIds

replicate

failIfNeedSplitHFile

Constructor Details

BulkLoadHFilesTool

Method Details

initialize

createExecutorService

isCreateTable

isSilence

isAlwaysCopyFiles

shouldCopyHFileMetaKey

validateFamiliesInHFiles

populateLoadQueue

visitBulkHFiles

discoverLoadQueue

prepareHFileQueue

prepareHFileQueue

loadHFileQueue

tryAtomicRegionLoad

bulkLoadPhase

groupByFamilies

checkHFilesCountPerRegionPerFamily

groupOrSplitPhase

getUniqueName

splitStoreFile

getRegionIndex

checkRegionIndexValid

groupOrSplit

splitStoreFile

initStoreFileWriter

copyHFileHalf

inferBoundaries

createTable

performBulkLoad

cleanup

doBulkLoad

doBulkLoad

bulkLoad

bulkLoad

tableExists

throwAndLogTableNotFoundException

setBulkToken

setClusterIds

usage

run

main

disableReplication

isReplicationDisabled