GenericHBaseRDDFunctions

Instance Constructors

new GenericHBaseRDDFunctions(rdd: RDD[T])

rdd
This is for rdd of any type

Value Members

final def !=(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def !=(arg0: Any): Boolean

Definition Classes
Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def ==(arg0: Any): Boolean

Definition Classes
Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def hashCode(): Int

Definition Classes
AnyRef → Any
def hbaseBulkDelete(hc: HBaseContext, tableName: TableName, f: (T) ⇒ Delete, batchSize: Int): Unit

Implicit method that gives easy access to HBaseContext's bulk Delete.
Implicit method that gives easy access to HBaseContext's bulk Delete. This will not return a new RDD.
hc
The hbaseContext object to identify which HBase cluster connection to use
tableName
The tableName that the deletes will be sent to
f
The function that will convert the RDD value into a HBase Delete Object
batchSize
The number of Deletes to be sent in a single batch
def hbaseBulkGet(hc: HBaseContext, tableName: TableName, batchSize: Int, f: (T) ⇒ Get): RDD[(ImmutableBytesWritable, Result)]

Implicit method that gives easy access to HBaseContext's bulk get.
Implicit method that gives easy access to HBaseContext's bulk get. This will return a new RDD. Think about it as a RDD map function. In that every RDD value will get a new value out of HBase. That new value will populate the newly generated RDD.
hc
The hbaseContext object to identify which HBase cluster connection to use
tableName
The tableName that the put will be sent to
batchSize
How many gets to execute in a single batch
f
The function that will turn the RDD values in HBase Get objects
returns
A resulting RDD with type R objects
def hbaseBulkGet[R](hc: HBaseContext, tableName: TableName, batchSize: Int, f: (T) ⇒ Get, convertResult: (Result) ⇒ R)(implicit arg0: ClassTag[R]): RDD[R]

Implicit method that gives easy access to HBaseContext's bulk get.
Implicit method that gives easy access to HBaseContext's bulk get. This will return a new RDD. Think about it as a RDD map function. In that every RDD value will get a new value out of HBase. That new value will populate the newly generated RDD.
R
The type of Object that will be coming out of the resulting RDD
hc
The hbaseContext object to identify which HBase cluster connection to use
tableName
The tableName that the put will be sent to
batchSize
How many gets to execute in a single batch
f
The function that will turn the RDD values in HBase Get objects
convertResult
The function that will convert a HBase Result object into a value that will go into the resulting RDD
returns
A resulting RDD with type R objects
def hbaseBulkLoad(hc: HBaseContext, tableName: TableName, flatMap: (T) ⇒ Iterator[(KeyFamilyQualifier, Array[Byte])], stagingDir: String, familyHFileWriteOptionsMap: Map[Array[Byte], FamilyHFileWriteOptions] = ..., compactionExclude: Boolean = false, maxSize: Long = HConstants.DEFAULT_MAX_FILE_SIZE): Unit

Spark Implementation of HBase Bulk load for wide rows or when values are not already combined at the time of the map process
Spark Implementation of HBase Bulk load for wide rows or when values are not already combined at the time of the map process
A Spark Implementation of HBase Bulk load
This will take the content from an existing RDD then sort and shuffle it with respect to region splits. The result of that sort and shuffle will be written to HFiles.
After this function is executed the user will have to call LoadIncrementalHFiles.doBulkLoad(...) to move the files into HBase
Also note this version of bulk load is different from past versions in that it includes the qualifier as part of the sort process. The reason for this is to be able to support rows will very large number of columns.
tableName
The HBase table we are loading into
flatMap
A flapMap function that will make every row in the RDD into N cells for the bulk load
stagingDir
The location on the FileSystem to bulk load into
familyHFileWriteOptionsMap
Options that will define how the HFile for a column family is written
compactionExclude
Compaction excluded for the HFiles
maxSize
Max size for the HFiles before they roll
def hbaseBulkLoadThinRows(hc: HBaseContext, tableName: TableName, mapFunction: (T) ⇒ (ByteArrayWrapper, FamiliesQualifiersValues), stagingDir: String, familyHFileWriteOptionsMap: Map[Array[Byte], FamilyHFileWriteOptions] = ..., compactionExclude: Boolean = false, maxSize: Long = HConstants.DEFAULT_MAX_FILE_SIZE): Unit

Implicit method that gives easy access to HBaseContext's bulkLoadThinRows method.
Implicit method that gives easy access to HBaseContext's bulkLoadThinRows method.
Spark Implementation of HBase Bulk load for short rows some where less then a 1000 columns. This bulk load should be faster for tables will thinner rows then the other spark implementation of bulk load that puts only one value into a record going into a shuffle
This will take the content from an existing RDD then sort and shuffle it with respect to region splits. The result of that sort and shuffle will be written to HFiles.
After this function is executed the user will have to call LoadIncrementalHFiles.doBulkLoad(...) to move the files into HBase
In this implementation only the rowKey is given to the shuffle as the key and all the columns are already linked to the RowKey before the shuffle stage. The sorting of the qualifier is done in memory out side of the shuffle stage
tableName
The HBase table we are loading into
mapFunction
A function that will convert the RDD records to the key value format used for the shuffle to prep for writing to the bulk loaded HFiles
stagingDir
The location on the FileSystem to bulk load into
familyHFileWriteOptionsMap
Options that will define how the HFile for a column family is written
compactionExclude
Compaction excluded for the HFiles
maxSize
Max size for the HFiles before they roll
def hbaseBulkPut(hc: HBaseContext, tableName: TableName, f: (T) ⇒ Put): Unit

Implicit method that gives easy access to HBaseContext's bulk put.
Implicit method that gives easy access to HBaseContext's bulk put. This will not return a new RDD. Think of it like a foreach
hc
The hbaseContext object to identify which HBase cluster connection to use
tableName
The tableName that the put will be sent to
f
The function that will turn the RDD values into HBase Put objects.
def hbaseForeachPartition(hc: HBaseContext, f: (Iterator[T], Connection) ⇒ Unit): Unit

Implicit method that gives easy access to HBaseContext's foreachPartition method.
Implicit method that gives easy access to HBaseContext's foreachPartition method. This will ack very much like a normal RDD foreach method but for the fact that you will now have a HBase connection while iterating through the values.
hc
The hbaseContext object to identify which HBase cluster connection to use
f
This function will get an iterator for a Partition of an RDD along with a connection object to HBase
def hbaseMapPartitions[R](hc: HBaseContext, f: (Iterator[T], Connection) ⇒ Iterator[R])(implicit arg0: ClassTag[R]): RDD[R]

Implicit method that gives easy access to HBaseContext's mapPartitions method.
Implicit method that gives easy access to HBaseContext's mapPartitions method. This will ask very much like a normal RDD map partitions method but for the fact that you will now have a HBase connection while iterating through the values
R
This is the type of objects that will go into the resulting RDD
hc
The hbaseContext object to identify which HBase cluster connection to use
f
This function will get an iterator for a Partition of an RDD along with a connection object to HBase
returns
A resulting RDD of type R
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
val rdd: RDD[T]

This is for rdd of any type
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

implicit class GenericHBaseRDDFunctions[T] extends AnyRef

Instance Constructors

new GenericHBaseRDDFunctions(rdd: RDD[T])

Value Members

final def !=(arg0: AnyRef): Boolean

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: AnyRef): Boolean

final def ==(arg0: Any): Boolean

final def asInstanceOf[T0]: T0

def clone(): AnyRef

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def finalize(): Unit

final def getClass(): Class[_]

def hashCode(): Int

def hbaseBulkDelete(hc: HBaseContext, tableName: TableName, f: (T) ⇒ Delete, batchSize: Int): Unit

def hbaseBulkGet(hc: HBaseContext, tableName: TableName, batchSize: Int, f: (T) ⇒ Get): RDD[(ImmutableBytesWritable, Result)]

def hbaseBulkGet[R](hc: HBaseContext, tableName: TableName, batchSize: Int, f: (T) ⇒ Get, convertResult: (Result) ⇒ R)(implicit arg0: ClassTag[R]): RDD[R]

def hbaseBulkPut(hc: HBaseContext, tableName: TableName, f: (T) ⇒ Put): Unit

def hbaseForeachPartition(hc: HBaseContext, f: (Iterator[T], Connection) ⇒ Unit): Unit

def hbaseMapPartitions[R](hc: HBaseContext, f: (Iterator[T], Connection) ⇒ Iterator[R])(implicit arg0: ClassTag[R]): RDD[R]

final def isInstanceOf[T0]: Boolean

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

val rdd: RDD[T]

final def synchronized[T0](arg0: ⇒ T0): T0

def toString(): String

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from AnyRef

Inherited from Any

Ungrouped