JavaHBaseContext

Instance Constructors

new JavaHBaseContext(jsc: JavaSparkContext, config: Configuration)

jsc
This is the JavaSparkContext that we will wrap
config
This is the config information to out HBase cluster

Value Members

final def !=(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def !=(arg0: Any): Boolean

Definition Classes
Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def ==(arg0: Any): Boolean

Definition Classes
Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def bulkDelete[T](javaRdd: JavaRDD[T], tableName: TableName, f: Function[T, Delete], batchSize: Integer): Unit

A simple abstraction over the HBaseContext.
A simple abstraction over the HBaseContext.foreachPartition method.
It allow addition support for a user to take a JavaRDD and generate delete and send them to HBase.
The complexity of managing the Connection is removed from the developer
javaRdd
Original JavaRDD with data to iterate over
tableName
The name of the table to delete from
f
Function to convert a value in the JavaRDD to a HBase Deletes
batchSize
The number of deletes to batch before sending to HBase
def bulkGet[T, U](tableName: TableName, batchSize: Integer, javaRdd: JavaRDD[T], makeGet: Function[T, Get], convertResult: Function[Result, U]): JavaRDD[U]

A simple abstraction over the HBaseContext.
A simple abstraction over the HBaseContext.mapPartition method.
It allow addition support for a user to take a JavaRDD and generates a new RDD based on Gets and the results they bring back from HBase
tableName
The name of the table to get from
batchSize
batch size of how many gets to retrieve in a single fetch
javaRdd
Original JavaRDD with data to iterate over
makeGet
Function to convert a value in the JavaRDD to a HBase Get
convertResult
This will convert the HBase Result object to what ever the user wants to put in the resulting JavaRDD
returns
New JavaRDD that is created by the Get to HBase
def bulkLoad[T](javaRdd: JavaRDD[T], tableName: TableName, mapFunc: Function[T, Pair[KeyFamilyQualifier, Array[Byte]]], stagingDir: String, familyHFileWriteOptionsMap: Map[Array[Byte], FamilyHFileWriteOptions], compactionExclude: Boolean, maxSize: Long): Unit

A simple abstraction over the HBaseContext.
A simple abstraction over the HBaseContext.bulkLoad method. It allow addition support for a user to take a JavaRDD and convert into new JavaRDD[Pair] based on MapFunction, and HFiles will be generated in stagingDir for bulk load
javaRdd
The javaRDD we are bulk loading from
tableName
The HBase table we are loading into
mapFunc
A Function that will convert a value in JavaRDD to Pair(KeyFamilyQualifier, Array[Byte])
stagingDir
The location on the FileSystem to bulk load into
familyHFileWriteOptionsMap
Options that will define how the HFile for a column family is written
compactionExclude
Compaction excluded for the HFiles
maxSize
Max size for the HFiles before they roll
def bulkLoadThinRows[T](javaRdd: JavaRDD[T], tableName: TableName, mapFunc: Function[T, Pair[ByteArrayWrapper, FamiliesQualifiersValues]], stagingDir: String, familyHFileWriteOptionsMap: Map[Array[Byte], FamilyHFileWriteOptions], compactionExclude: Boolean, maxSize: Long): Unit

A simple abstraction over the HBaseContext.
A simple abstraction over the HBaseContext.bulkLoadThinRows method. It allow addition support for a user to take a JavaRDD and convert into new JavaRDD[Pair] based on MapFunction, and HFiles will be generated in stagingDir for bulk load
javaRdd
The javaRDD we are bulk loading from
tableName
The HBase table we are loading into
mapFunc
A Function that will convert a value in JavaRDD to Pair(ByteArrayWrapper, FamiliesQualifiersValues)
stagingDir
The location on the FileSystem to bulk load into
familyHFileWriteOptionsMap
Options that will define how the HFile for a column family is written
compactionExclude
Compaction excluded for the HFiles
maxSize
Max size for the HFiles before they roll
def bulkPut[T](javaRdd: JavaRDD[T], tableName: TableName, f: Function[T, Put]): Unit

A simple abstraction over the HBaseContext.
A simple abstraction over the HBaseContext.foreachPartition method.
It allow addition support for a user to take JavaRDD and generate puts and send them to HBase. The complexity of managing the Connection is removed from the developer
javaRdd
Original JavaRDD with data to iterate over
tableName
The name of the table to put into
f
Function to convert a value in the JavaRDD to a HBase Put
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def foreachPartition[T](javaDstream: JavaDStream[T], f: VoidFunction[(Iterator[T], Connection)]): Unit

A simple enrichment of the traditional Spark Streaming dStream foreach This function differs from the original in that it offers the developer access to a already connected Connection object
A simple enrichment of the traditional Spark Streaming dStream foreach This function differs from the original in that it offers the developer access to a already connected Connection object
Note: Do not close the Connection object. All Connection management is handled outside this method
javaDstream
Original DStream with data to iterate over
f
Function to be given a iterator to iterate through the JavaDStream values and a Connection object to interact with HBase
def foreachPartition[T](javaRdd: JavaRDD[T], f: VoidFunction[(Iterator[T], Connection)]): Unit

A simple enrichment of the traditional Spark javaRdd foreachPartition.
A simple enrichment of the traditional Spark javaRdd foreachPartition. This function differs from the original in that it offers the developer access to a already connected Connection object
Note: Do not close the Connection object. All Connection management is handled outside this method
javaRdd
Original javaRdd with data to iterate over
f
Function to be given a iterator to iterate through the RDD values and a Connection object to interact with HBase
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def hashCode(): Int

Definition Classes
AnyRef → Any
val hbaseContext: HBaseContext
def hbaseRDD(tableName: TableName, scans: Scan): JavaRDD[(ImmutableBytesWritable, Result)]

A overloaded version of HBaseContext hbaseRDD that define the type of the resulting JavaRDD
A overloaded version of HBaseContext hbaseRDD that define the type of the resulting JavaRDD
tableName
The name of the table to scan
scans
The HBase scan object to use to read data from HBase
returns
New JavaRDD with results from scan
def hbaseRDD[U](tableName: TableName, scans: Scan, f: Function[(ImmutableBytesWritable, Result), U]): JavaRDD[U]

This function will use the native HBase TableInputFormat with the given scan object to generate a new JavaRDD
This function will use the native HBase TableInputFormat with the given scan object to generate a new JavaRDD
tableName
The name of the table to scan
scans
The HBase scan object to use to read data from HBase
f
Function to convert a Result object from HBase into What the user wants in the final generated JavaRDD
returns
New JavaRDD with results from scan
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def mapPartitions[T, R](javaRdd: JavaRDD[T], f: FlatMapFunction[(Iterator[T], Connection), R]): JavaRDD[R]

A simple enrichment of the traditional Spark JavaRDD mapPartition.
A simple enrichment of the traditional Spark JavaRDD mapPartition. This function differs from the original in that it offers the developer access to a already connected Connection object
Note: Do not close the Connection object. All Connection management is handled outside this method
Note: Make sure to partition correctly to avoid memory issue when getting data from HBase
javaRdd
Original JavaRdd with data to iterate over
f
Function to be given a iterator to iterate through the RDD values and a Connection object to interact with HBase
returns
Returns a new RDD generated by the user definition function just like normal mapPartition
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
def streamBulkDelete[T](javaDStream: JavaDStream[T], tableName: TableName, f: Function[T, Delete], batchSize: Integer): Unit

A simple abstraction over the HBaseContext.
A simple abstraction over the HBaseContext.streamBulkMutation method.
It allow addition support for a user to take a JavaDStream and generate Delete and send them to HBase.
The complexity of managing the Connection is removed from the developer
javaDStream
Original DStream with data to iterate over
tableName
The name of the table to delete from
f
Function to convert a value in the JavaDStream to a HBase Delete
batchSize
The number of deletes to be sent at once
def streamBulkGet[T, U](tableName: TableName, batchSize: Integer, javaDStream: JavaDStream[T], makeGet: Function[T, Get], convertResult: Function[Result, U]): JavaDStream[U]

A simple abstraction over the HBaseContext.
A simple abstraction over the HBaseContext.streamMap method.
It allow addition support for a user to take a DStream and generates a new DStream based on Gets and the results they bring back from HBase
tableName
The name of the table to get from
batchSize
The number of gets to be batched together
javaDStream
Original DStream with data to iterate over
makeGet
Function to convert a value in the JavaDStream to a HBase Get
convertResult
This will convert the HBase Result object to what ever the user wants to put in the resulting JavaDStream
returns
New JavaDStream that is created by the Get to HBase
def streamBulkPut[T](javaDstream: JavaDStream[T], tableName: TableName, f: Function[T, Put]): Unit

A simple abstraction over the HBaseContext.
A simple abstraction over the HBaseContext.streamMapPartition method.
It allow addition support for a user to take a JavaDStream and generate puts and send them to HBase.
The complexity of managing the Connection is removed from the developer
javaDstream
Original DStream with data to iterate over
tableName
The name of the table to put into
f
Function to convert a value in the JavaDStream to a HBase Put
def streamMap[T, U](javaDstream: JavaDStream[T], mp: Function[(Iterator[T], Connection), Iterator[U]]): JavaDStream[U]

A simple enrichment of the traditional Spark Streaming JavaDStream mapPartition.
A simple enrichment of the traditional Spark Streaming JavaDStream mapPartition.
This function differs from the original in that it offers the developer access to a already connected Connection object
Note: Do not close the Connection object. All Connection management is handled outside this method
Note: Make sure to partition correctly to avoid memory issue when getting data from HBase
javaDstream
Original JavaDStream with data to iterate over
mp
Function to be given a iterator to iterate through the JavaDStream values and a Connection object to interact with HBase
returns
Returns a new JavaDStream generated by the user definition function just like normal mapPartition
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

class JavaHBaseContext extends Serializable

Instance Constructors

new JavaHBaseContext(jsc: JavaSparkContext, config: Configuration)

Value Members

final def !=(arg0: AnyRef): Boolean

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: AnyRef): Boolean

final def ==(arg0: Any): Boolean

final def asInstanceOf[T0]: T0

def bulkDelete[T](javaRdd: JavaRDD[T], tableName: TableName, f: Function[T, Delete], batchSize: Integer): Unit

def bulkGet[T, U](tableName: TableName, batchSize: Integer, javaRdd: JavaRDD[T], makeGet: Function[T, Get], convertResult: Function[Result, U]): JavaRDD[U]

def bulkLoad[T](javaRdd: JavaRDD[T], tableName: TableName, mapFunc: Function[T, Pair[KeyFamilyQualifier, Array[Byte]]], stagingDir: String, familyHFileWriteOptionsMap: Map[Array[Byte], FamilyHFileWriteOptions], compactionExclude: Boolean, maxSize: Long): Unit

def bulkLoadThinRows[T](javaRdd: JavaRDD[T], tableName: TableName, mapFunc: Function[T, Pair[ByteArrayWrapper, FamiliesQualifiersValues]], stagingDir: String, familyHFileWriteOptionsMap: Map[Array[Byte], FamilyHFileWriteOptions], compactionExclude: Boolean, maxSize: Long): Unit

def bulkPut[T](javaRdd: JavaRDD[T], tableName: TableName, f: Function[T, Put]): Unit

def clone(): AnyRef

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def finalize(): Unit

def foreachPartition[T](javaDstream: JavaDStream[T], f: VoidFunction[(Iterator[T], Connection)]): Unit

def foreachPartition[T](javaRdd: JavaRDD[T], f: VoidFunction[(Iterator[T], Connection)]): Unit

final def getClass(): Class[_]

def hashCode(): Int

val hbaseContext: HBaseContext

def hbaseRDD(tableName: TableName, scans: Scan): JavaRDD[(ImmutableBytesWritable, Result)]

def hbaseRDD[U](tableName: TableName, scans: Scan, f: Function[(ImmutableBytesWritable, Result), U]): JavaRDD[U]

final def isInstanceOf[T0]: Boolean

def mapPartitions[T, R](javaRdd: JavaRDD[T], f: FlatMapFunction[(Iterator[T], Connection), R]): JavaRDD[R]

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

def streamBulkDelete[T](javaDStream: JavaDStream[T], tableName: TableName, f: Function[T, Delete], batchSize: Integer): Unit

def streamBulkGet[T, U](tableName: TableName, batchSize: Integer, javaDStream: JavaDStream[T], makeGet: Function[T, Get], convertResult: Function[Result, U]): JavaDStream[U]

def streamBulkPut[T](javaDstream: JavaDStream[T], tableName: TableName, f: Function[T, Put]): Unit

def streamMap[T, U](javaDstream: JavaDStream[T], mp: Function[(Iterator[T], Connection), Iterator[U]]): JavaDStream[U]

final def synchronized[T0](arg0: ⇒ T0): T0

def toString(): String

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped