Class KeyValueHeap
- All Implemented Interfaces:
Closeable
,AutoCloseable
,InternalScanner
,KeyValueScanner
,Shipper
- Direct Known Subclasses:
ReversedKeyValueHeap
Implements KeyValueScanner itself.
This class is used at the Region level to merge across Stores and at the Store level to merge across the memstore and StoreFiles.
In the Region case, we also need InternalScanner.next(List), so this class also implements InternalScanner. WARNING: As is, if you try to use this as an InternalScanner at the Store level, you will get runtime exceptions.
-
Nested Class Summary
-
Field Summary
Modifier and TypeFieldDescriptionprotected KeyValueHeap.KVScannerComparator
protected KeyValueScanner
The current sub-scanner, i.e.protected PriorityQueue<KeyValueScanner>
private static final org.slf4j.Logger
protected List<KeyValueScanner>
Fields inherited from interface org.apache.hadoop.hbase.regionserver.KeyValueScanner
NO_NEXT_INDEXED_KEY
-
Constructor Summary
ConstructorDescriptionKeyValueHeap
(List<? extends KeyValueScanner> scanners, CellComparator comparator) Constructor.KeyValueHeap
(List<? extends KeyValueScanner> scanners, KeyValueHeap.KVScannerComparator comparator) Constructor. -
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
Close the KeyValue scanner.private boolean
generalizedSeek
(boolean isLazy, ExtendedCell seekKey, boolean forward, boolean useBloom) (package private) KeyValueScanner
getHeap()
Returns the current Heap(package private) boolean
next()
Return the next Cell in this scanner, iterating the scannerboolean
next
(List<? super ExtendedCell> result, ScannerContext scannerContext) Gets the next row of keys from the top-most scanner.peek()
Look at the next Cell in this scanner, but do not iterate scanner.protected KeyValueScanner
Fetches the top sub-scanner from the priority queue, ensuring that a real seek has been done on it.void
recordBlockSize
(IntConsumer blockSizeConsumer) Record the size of the current block in bytes, passing as an argument to the blockSizeConsumer.boolean
requestSeek
(ExtendedCell key, boolean forward, boolean useBloom) Similar toKeyValueScanner.seek(org.apache.hadoop.hbase.ExtendedCell)
(orKeyValueScanner.reseek(org.apache.hadoop.hbase.ExtendedCell)
if forward is true) but only does a seek operation after checking that it is really necessary for the row/column combination specified by the kv parameter.boolean
reseek
(ExtendedCell seekKey) This function is identical to theseek(ExtendedCell)
function except that scanner.seek(seekKey) is changed to scanner.reseek(seekKey).boolean
seek
(ExtendedCell seekKey) Seeks all scanners at or below the specified seek key.void
shipped()
Called after a batch of rows scanned and set to be returned to client.Methods inherited from class org.apache.hadoop.hbase.regionserver.NonReversedNonLazyKeyValueScanner
backwardSeek, seekToLastRow, seekToPreviousRow
Methods inherited from class org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner
doRealSeek, enforceSeek, getFilePath, isFileScanner, realSeekDone, shouldUseScanner
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.apache.hadoop.hbase.regionserver.InternalScanner
next
Methods inherited from interface org.apache.hadoop.hbase.regionserver.KeyValueScanner
backwardSeek, enforceSeek, getFilePath, getScannerOrder, isFileScanner, realSeekDone, seekToLastRow, seekToPreviousRow, shouldUseScanner
-
Field Details
-
LOG
-
heap
-
scannersForDelayedClose
-
current
The current sub-scanner, i.e. the one that contains the next key/value to return to the client. This scanner is NOT included inheap
(but we frequently add it back to the heap and pull the new winner out). We maintain an invariant that the current sub-scanner has already done a real seek, and that current.peek() is always a real key/value (or null) except for the fake last-key-on-row-column supplied by the multi-column Bloom filter optimization, which is OK to propagate to StoreScanner. In order to ensure that, always usepollRealKV()
to update current. -
comparator
-
-
Constructor Details
-
KeyValueHeap
public KeyValueHeap(List<? extends KeyValueScanner> scanners, CellComparator comparator) throws IOException Constructor. This KeyValueHeap will handle closing of passed in KeyValueScanners.- Throws:
IOException
-
KeyValueHeap
KeyValueHeap(List<? extends KeyValueScanner> scanners, KeyValueHeap.KVScannerComparator comparator) throws IOException Constructor.- Throws:
IOException
-
-
Method Details
-
peek
Description copied from interface:KeyValueScanner
Look at the next Cell in this scanner, but do not iterate scanner. NOTICE: The returned cell has not been passed into ScanQueryMatcher. So it may not be what the user need.- Specified by:
peek
in interfaceKeyValueScanner
- Returns:
- the next Cell
-
isLatestCellFromMemstore
boolean isLatestCellFromMemstore() -
recordBlockSize
Description copied from interface:KeyValueScanner
Record the size of the current block in bytes, passing as an argument to the blockSizeConsumer. Implementations should ensure that blockSizeConsumer is only called once per block.- Specified by:
recordBlockSize
in interfaceKeyValueScanner
- Overrides:
recordBlockSize
in classNonLazyKeyValueScanner
- Parameters:
blockSizeConsumer
- to be called with block size in bytes, once per block.
-
next
Description copied from interface:KeyValueScanner
Return the next Cell in this scanner, iterating the scanner- Specified by:
next
in interfaceKeyValueScanner
- Returns:
- the next Cell
- Throws:
IOException
-
next
public boolean next(List<? super ExtendedCell> result, ScannerContext scannerContext) throws IOException Gets the next row of keys from the top-most scanner.This method takes care of updating the heap.
This can ONLY be called when you are using Scanners that implement InternalScanner as well as KeyValueScanner (a
StoreScanner
).- Specified by:
next
in interfaceInternalScanner
- Parameters:
result
- return output array. We will only add ExtendedCell to this list, but for CP users, you'd better just useRawCell
asExtendedCell
is IA.Private.- Returns:
- true if more rows exist after this one, false if scanner is done
- Throws:
IOException
-
close
Description copied from interface:KeyValueScanner
Close the KeyValue scanner.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Specified by:
close
in interfaceInternalScanner
- Specified by:
close
in interfaceKeyValueScanner
-
seek
Seeks all scanners at or below the specified seek key. If we earlied-out of a row, we may end up skipping values that were never reached yet. Rather than iterating down, we want to give the opportunity to re-seek.As individual scanners may run past their ends, those scanners are automatically closed and removed from the heap.
This function (and
reseek(ExtendedCell)
) does not do multi-column Bloom filter and lazy-seek optimizations. To enable those, callrequestSeek(ExtendedCell, boolean, boolean)
.- Specified by:
seek
in interfaceKeyValueScanner
- Parameters:
seekKey
- KeyValue to seek at or after- Returns:
- true if KeyValues exist at or after specified key, false if not
- Throws:
IOException
-
reseek
This function is identical to theseek(ExtendedCell)
function except that scanner.seek(seekKey) is changed to scanner.reseek(seekKey).- Specified by:
reseek
in interfaceKeyValueScanner
- Parameters:
seekKey
- seek value (should be non-null)- Returns:
- true if scanner has values left, false if end of scanner
- Throws:
IOException
-
requestSeek
Similar toKeyValueScanner.seek(org.apache.hadoop.hbase.ExtendedCell)
(orKeyValueScanner.reseek(org.apache.hadoop.hbase.ExtendedCell)
if forward is true) but only does a seek operation after checking that it is really necessary for the row/column combination specified by the kv parameter. This function was added to avoid unnecessary disk seeks by checking row-column Bloom filters before a seek on multi-column get/scan queries, and to optimize by looking up more recent files first.- Specified by:
requestSeek
in interfaceKeyValueScanner
- Overrides:
requestSeek
in classNonLazyKeyValueScanner
forward
- do a forward-only "reseek" instead of a random-access seekuseBloom
- whether to enable multi-column Bloom filter optimization- Throws:
IOException
-
generalizedSeek
private boolean generalizedSeek(boolean isLazy, ExtendedCell seekKey, boolean forward, boolean useBloom) throws IOException - Parameters:
isLazy
- whether we are trying to seek to exactly the given row/col. Enables Bloom filter and most-recent-file-first optimizations for multi-column get/scan queries.seekKey
- key to seek toforward
- whether to seek forward (also known as reseek)useBloom
- whether to optimize seeks using Bloom filters- Throws:
IOException
-
pollRealKV
Fetches the top sub-scanner from the priority queue, ensuring that a real seek has been done on it. Works by fetching the top sub-scanner, and if it has not done a real seek, making it do so (which will modify its top KV), putting it back, and repeating this until success. Relies on the fact that on a lazy seek we set the current key of a StoreFileScanner to a KV that is not greater than the real next KV to be read from that file, so the scanner that bubbles up to the top of the heap will have global next KV in this scanner heap if (1) it has done a real seek and (2) its KV is the top among all top KVs (some of which are fake) in the scanner heap.- Throws:
IOException
-
getHeap
Returns the current Heap -
getCurrentForTesting
-
getNextIndexedKey
- Specified by:
getNextIndexedKey
in interfaceKeyValueScanner
- Overrides:
getNextIndexedKey
in classNonLazyKeyValueScanner
- Returns:
- the next key in the index, usually the first key of next block OR a key that falls between last key of current block and first key of next block.. see HFileWriterImpl#getMidpoint, or null if not known.
-
shipped
Description copied from interface:Shipper
Called after a batch of rows scanned and set to be returned to client. Any in between cleanup can be done here.- Specified by:
shipped
in interfaceShipper
- Overrides:
shipped
in classNonLazyKeyValueScanner
- Throws:
IOException
-