Implicit method that gives easy access to HBaseContext's bulk Delete.
Implicit method that gives easy access to HBaseContext's bulk Delete. This will not return a new RDD.
The hbaseContext object to identify which HBase cluster connection to use
The tableName that the deletes will be sent to
The function that will convert the RDD value into a HBase Delete Object
The number of Deletes to be sent in a single batch
Implicit method that gives easy access to HBaseContext's bulk get.
Implicit method that gives easy access to HBaseContext's bulk get. This will return a new RDD. Think about it as a RDD map function. In that every RDD value will get a new value out of HBase. That new value will populate the newly generated RDD.
The hbaseContext object to identify which HBase cluster connection to use
The tableName that the put will be sent to
How many gets to execute in a single batch
The function that will turn the RDD values in HBase Get objects
A resulting RDD with type R objects
Implicit method that gives easy access to HBaseContext's bulk get.
Implicit method that gives easy access to HBaseContext's bulk get. This will return a new RDD. Think about it as a RDD map function. In that every RDD value will get a new value out of HBase. That new value will populate the newly generated RDD.
The type of Object that will be coming out of the resulting RDD
The hbaseContext object to identify which HBase cluster connection to use
The tableName that the put will be sent to
How many gets to execute in a single batch
The function that will turn the RDD values in HBase Get objects
The function that will convert a HBase Result object into a value that will go into the resulting RDD
A resulting RDD with type R objects
Spark Implementation of HBase Bulk load for wide rows or when values are not already combined at the time of the map process
Spark Implementation of HBase Bulk load for wide rows or when values are not already combined at the time of the map process
A Spark Implementation of HBase Bulk load
This will take the content from an existing RDD then sort and shuffle it with respect to region splits. The result of that sort and shuffle will be written to HFiles.
After this function is executed the user will have to call LoadIncrementalHFiles.doBulkLoad(...) to move the files into HBase
Also note this version of bulk load is different from past versions in that it includes the qualifier as part of the sort process. The reason for this is to be able to support rows will very large number of columns.
The HBase table we are loading into
A flapMap function that will make every row in the RDD into N cells for the bulk load
The location on the FileSystem to bulk load into
Options that will define how the HFile for a column family is written
Compaction excluded for the HFiles
Max size for the HFiles before they roll
Implicit method that gives easy access to HBaseContext's bulkLoadThinRows method.
Implicit method that gives easy access to HBaseContext's bulkLoadThinRows method.
Spark Implementation of HBase Bulk load for short rows some where less then a 1000 columns. This bulk load should be faster for tables will thinner rows then the other spark implementation of bulk load that puts only one value into a record going into a shuffle
This will take the content from an existing RDD then sort and shuffle it with respect to region splits. The result of that sort and shuffle will be written to HFiles.
After this function is executed the user will have to call LoadIncrementalHFiles.doBulkLoad(...) to move the files into HBase
In this implementation only the rowKey is given to the shuffle as the key and all the columns are already linked to the RowKey before the shuffle stage. The sorting of the qualifier is done in memory out side of the shuffle stage
The HBase table we are loading into
A function that will convert the RDD records to the key value format used for the shuffle to prep for writing to the bulk loaded HFiles
The location on the FileSystem to bulk load into
Options that will define how the HFile for a column family is written
Compaction excluded for the HFiles
Max size for the HFiles before they roll
Implicit method that gives easy access to HBaseContext's bulk put.
Implicit method that gives easy access to HBaseContext's bulk put. This will not return a new RDD. Think of it like a foreach
The hbaseContext object to identify which HBase cluster connection to use
The tableName that the put will be sent to
The function that will turn the RDD values into HBase Put objects.
Implicit method that gives easy access to HBaseContext's foreachPartition method.
Implicit method that gives easy access to HBaseContext's foreachPartition method. This will ack very much like a normal RDD foreach method but for the fact that you will now have a HBase connection while iterating through the values.
The hbaseContext object to identify which HBase cluster connection to use
This function will get an iterator for a Partition of an RDD along with a connection object to HBase
Implicit method that gives easy access to HBaseContext's mapPartitions method.
Implicit method that gives easy access to HBaseContext's mapPartitions method. This will ask very much like a normal RDD map partitions method but for the fact that you will now have a HBase connection while iterating through the values
This is the type of objects that will go into the resulting RDD
The hbaseContext object to identify which HBase cluster connection to use
This function will get an iterator for a Partition of an RDD along with a connection object to HBase
A resulting RDD of type R
This is for rdd of any type
These are implicit methods for a RDD that contains any type of data.
This is any type