org.apache.hadoop.hbase.spark.HBaseRDDFunctions

GenericHBaseRDDFunctions

implicit class GenericHBaseRDDFunctions[T] extends AnyRef

These are implicit methods for a RDD that contains any type of data.

T

This is any type

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. GenericHBaseRDDFunctions
  2. AnyRef
  3. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new GenericHBaseRDDFunctions(rdd: RDD[T])

    rdd

    This is for rdd of any type

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  7. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  9. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  10. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  11. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  12. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  13. def hbaseBulkDelete(hc: HBaseContext, tableName: TableName, f: (T) ⇒ Delete, batchSize: Int): Unit

    Implicit method that gives easy access to HBaseContext's bulk Delete.

    Implicit method that gives easy access to HBaseContext's bulk Delete. This will not return a new RDD.

    hc

    The hbaseContext object to identify which HBase cluster connection to use

    tableName

    The tableName that the deletes will be sent to

    f

    The function that will convert the RDD value into a HBase Delete Object

    batchSize

    The number of Deletes to be sent in a single batch

  14. def hbaseBulkGet(hc: HBaseContext, tableName: TableName, batchSize: Int, f: (T) ⇒ Get): RDD[(ImmutableBytesWritable, Result)]

    Implicit method that gives easy access to HBaseContext's bulk get.

    Implicit method that gives easy access to HBaseContext's bulk get. This will return a new RDD. Think about it as a RDD map function. In that every RDD value will get a new value out of HBase. That new value will populate the newly generated RDD.

    hc

    The hbaseContext object to identify which HBase cluster connection to use

    tableName

    The tableName that the put will be sent to

    batchSize

    How many gets to execute in a single batch

    f

    The function that will turn the RDD values in HBase Get objects

    returns

    A resulting RDD with type R objects

  15. def hbaseBulkGet[R](hc: HBaseContext, tableName: TableName, batchSize: Int, f: (T) ⇒ Get, convertResult: (Result) ⇒ R)(implicit arg0: ClassTag[R]): RDD[R]

    Implicit method that gives easy access to HBaseContext's bulk get.

    Implicit method that gives easy access to HBaseContext's bulk get. This will return a new RDD. Think about it as a RDD map function. In that every RDD value will get a new value out of HBase. That new value will populate the newly generated RDD.

    R

    The type of Object that will be coming out of the resulting RDD

    hc

    The hbaseContext object to identify which HBase cluster connection to use

    tableName

    The tableName that the put will be sent to

    batchSize

    How many gets to execute in a single batch

    f

    The function that will turn the RDD values in HBase Get objects

    convertResult

    The function that will convert a HBase Result object into a value that will go into the resulting RDD

    returns

    A resulting RDD with type R objects

  16. def hbaseBulkLoad(hc: HBaseContext, tableName: TableName, flatMap: (T) ⇒ Iterator[(KeyFamilyQualifier, Array[Byte])], stagingDir: String, familyHFileWriteOptionsMap: Map[Array[Byte], FamilyHFileWriteOptions] = ..., compactionExclude: Boolean = false, maxSize: Long = HConstants.DEFAULT_MAX_FILE_SIZE): Unit

    Spark Implementation of HBase Bulk load for wide rows or when values are not already combined at the time of the map process

    Spark Implementation of HBase Bulk load for wide rows or when values are not already combined at the time of the map process

    A Spark Implementation of HBase Bulk load

    This will take the content from an existing RDD then sort and shuffle it with respect to region splits. The result of that sort and shuffle will be written to HFiles.

    After this function is executed the user will have to call LoadIncrementalHFiles.doBulkLoad(...) to move the files into HBase

    Also note this version of bulk load is different from past versions in that it includes the qualifier as part of the sort process. The reason for this is to be able to support rows will very large number of columns.

    tableName

    The HBase table we are loading into

    flatMap

    A flapMap function that will make every row in the RDD into N cells for the bulk load

    stagingDir

    The location on the FileSystem to bulk load into

    familyHFileWriteOptionsMap

    Options that will define how the HFile for a column family is written

    compactionExclude

    Compaction excluded for the HFiles

    maxSize

    Max size for the HFiles before they roll

  17. def hbaseBulkLoadThinRows(hc: HBaseContext, tableName: TableName, mapFunction: (T) ⇒ (ByteArrayWrapper, FamiliesQualifiersValues), stagingDir: String, familyHFileWriteOptionsMap: Map[Array[Byte], FamilyHFileWriteOptions] = ..., compactionExclude: Boolean = false, maxSize: Long = HConstants.DEFAULT_MAX_FILE_SIZE): Unit

    Implicit method that gives easy access to HBaseContext's bulkLoadThinRows method.

    Implicit method that gives easy access to HBaseContext's bulkLoadThinRows method.

    Spark Implementation of HBase Bulk load for short rows some where less then a 1000 columns. This bulk load should be faster for tables will thinner rows then the other spark implementation of bulk load that puts only one value into a record going into a shuffle

    This will take the content from an existing RDD then sort and shuffle it with respect to region splits. The result of that sort and shuffle will be written to HFiles.

    After this function is executed the user will have to call LoadIncrementalHFiles.doBulkLoad(...) to move the files into HBase

    In this implementation only the rowKey is given to the shuffle as the key and all the columns are already linked to the RowKey before the shuffle stage. The sorting of the qualifier is done in memory out side of the shuffle stage

    tableName

    The HBase table we are loading into

    mapFunction

    A function that will convert the RDD records to the key value format used for the shuffle to prep for writing to the bulk loaded HFiles

    stagingDir

    The location on the FileSystem to bulk load into

    familyHFileWriteOptionsMap

    Options that will define how the HFile for a column family is written

    compactionExclude

    Compaction excluded for the HFiles

    maxSize

    Max size for the HFiles before they roll

  18. def hbaseBulkPut(hc: HBaseContext, tableName: TableName, f: (T) ⇒ Put): Unit

    Implicit method that gives easy access to HBaseContext's bulk put.

    Implicit method that gives easy access to HBaseContext's bulk put. This will not return a new RDD. Think of it like a foreach

    hc

    The hbaseContext object to identify which HBase cluster connection to use

    tableName

    The tableName that the put will be sent to

    f

    The function that will turn the RDD values into HBase Put objects.

  19. def hbaseForeachPartition(hc: HBaseContext, f: (Iterator[T], Connection) ⇒ Unit): Unit

    Implicit method that gives easy access to HBaseContext's foreachPartition method.

    Implicit method that gives easy access to HBaseContext's foreachPartition method. This will ack very much like a normal RDD foreach method but for the fact that you will now have a HBase connection while iterating through the values.

    hc

    The hbaseContext object to identify which HBase cluster connection to use

    f

    This function will get an iterator for a Partition of an RDD along with a connection object to HBase

  20. def hbaseMapPartitions[R](hc: HBaseContext, f: (Iterator[T], Connection) ⇒ Iterator[R])(implicit arg0: ClassTag[R]): RDD[R]

    Implicit method that gives easy access to HBaseContext's mapPartitions method.

    Implicit method that gives easy access to HBaseContext's mapPartitions method. This will ask very much like a normal RDD map partitions method but for the fact that you will now have a HBase connection while iterating through the values

    R

    This is the type of objects that will go into the resulting RDD

    hc

    The hbaseContext object to identify which HBase cluster connection to use

    f

    This function will get an iterator for a Partition of an RDD along with a connection object to HBase

    returns

    A resulting RDD of type R

  21. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  22. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  23. final def notify(): Unit

    Definition Classes
    AnyRef
  24. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  25. val rdd: RDD[T]

    This is for rdd of any type

  26. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  27. def toString(): String

    Definition Classes
    AnyRef → Any
  28. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  29. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  30. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AnyRef

Inherited from Any

Ungrouped