Class KeyValueHeap
- All Implemented Interfaces:
Closeable,AutoCloseable,InternalScanner,KeyValueScanner,Shipper
- Direct Known Subclasses:
ReversedKeyValueHeap
Implements KeyValueScanner itself.
This class is used at the Region level to merge across Stores and at the Store level to merge across the memstore and StoreFiles.
In the Region case, we also need InternalScanner.next(List), so this class also implements InternalScanner. WARNING: As is, if you try to use this as an InternalScanner at the Store level, you will get runtime exceptions.
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected KeyValueHeap.KVScannerComparatorprotected KeyValueScannerThe current sub-scanner, i.e.protected PriorityQueue<KeyValueScanner>private static final org.slf4j.Loggerprotected List<KeyValueScanner>Fields inherited from interface org.apache.hadoop.hbase.regionserver.KeyValueScanner
NO_NEXT_INDEXED_KEY -
Constructor Summary
ConstructorsConstructorDescriptionKeyValueHeap(List<? extends KeyValueScanner> scanners, CellComparator comparator) Constructor.KeyValueHeap(List<? extends KeyValueScanner> scanners, KeyValueHeap.KVScannerComparator comparator) Constructor. -
Method Summary
Modifier and TypeMethodDescriptionvoidclose()Close the KeyValue scanner.private booleangeneralizedSeek(boolean isLazy, Cell seekKey, boolean forward, boolean useBloom) (package private) KeyValueScannergetHeap()Returns the current Heap(package private) booleannext()Return the next Cell in this scanner, iterating the scannerbooleannext(List<Cell> result, ScannerContext scannerContext) Gets the next row of keys from the top-most scanner.peek()Look at the next Cell in this scanner, but do not iterate scanner.protected KeyValueScannerFetches the top sub-scanner from the priority queue, ensuring that a real seek has been done on it.voidrecordBlockSize(IntConsumer blockSizeConsumer) Record the size of the current block in bytes, passing as an argument to the blockSizeConsumer.booleanrequestSeek(Cell key, boolean forward, boolean useBloom) Similar toKeyValueScanner.seek(org.apache.hadoop.hbase.Cell)(orKeyValueScanner.reseek(org.apache.hadoop.hbase.Cell)if forward is true) but only does a seek operation after checking that it is really necessary for the row/column combination specified by the kv parameter.booleanThis function is identical to theseek(Cell)function except that scanner.seek(seekKey) is changed to scanner.reseek(seekKey).booleanSeeks all scanners at or below the specified seek key.voidshipped()Called after a batch of rows scanned and set to be returned to client.Methods inherited from class org.apache.hadoop.hbase.regionserver.NonReversedNonLazyKeyValueScanner
backwardSeek, seekToLastRow, seekToPreviousRowMethods inherited from class org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner
doRealSeek, enforceSeek, getFilePath, isFileScanner, realSeekDone, shouldUseScannerMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.hadoop.hbase.regionserver.InternalScanner
nextMethods inherited from interface org.apache.hadoop.hbase.regionserver.KeyValueScanner
backwardSeek, enforceSeek, getFilePath, getScannerOrder, isFileScanner, realSeekDone, seekToLastRow, seekToPreviousRow, shouldUseScanner
-
Field Details
-
LOG
-
heap
-
scannersForDelayedClose
-
current
The current sub-scanner, i.e. the one that contains the next key/value to return to the client. This scanner is NOT included inheap(but we frequently add it back to the heap and pull the new winner out). We maintain an invariant that the current sub-scanner has already done a real seek, and that current.peek() is always a real key/value (or null) except for the fake last-key-on-row-column supplied by the multi-column Bloom filter optimization, which is OK to propagate to StoreScanner. In order to ensure that, always usepollRealKV()to update current. -
comparator
-
-
Constructor Details
-
KeyValueHeap
public KeyValueHeap(List<? extends KeyValueScanner> scanners, CellComparator comparator) throws IOException Constructor. This KeyValueHeap will handle closing of passed in KeyValueScanners.- Throws:
IOException
-
KeyValueHeap
KeyValueHeap(List<? extends KeyValueScanner> scanners, KeyValueHeap.KVScannerComparator comparator) throws IOException Constructor.- Throws:
IOException
-
-
Method Details
-
peek
Description copied from interface:KeyValueScannerLook at the next Cell in this scanner, but do not iterate scanner. NOTICE: The returned cell has not been passed into ScanQueryMatcher. So it may not be what the user need.- Specified by:
peekin interfaceKeyValueScanner- Returns:
- the next Cell
-
isLatestCellFromMemstore
boolean isLatestCellFromMemstore() -
recordBlockSize
Description copied from interface:KeyValueScannerRecord the size of the current block in bytes, passing as an argument to the blockSizeConsumer. Implementations should ensure that blockSizeConsumer is only called once per block.- Specified by:
recordBlockSizein interfaceKeyValueScanner- Overrides:
recordBlockSizein classNonLazyKeyValueScanner- Parameters:
blockSizeConsumer- to be called with block size in bytes, once per block.
-
next
Description copied from interface:KeyValueScannerReturn the next Cell in this scanner, iterating the scanner- Specified by:
nextin interfaceKeyValueScanner- Returns:
- the next Cell
- Throws:
IOException
-
next
Gets the next row of keys from the top-most scanner.This method takes care of updating the heap.
This can ONLY be called when you are using Scanners that implement InternalScanner as well as KeyValueScanner (a
StoreScanner).- Specified by:
nextin interfaceInternalScanner- Parameters:
result- return output array- Returns:
- true if more rows exist after this one, false if scanner is done
- Throws:
IOException- e
-
close
Description copied from interface:KeyValueScannerClose the KeyValue scanner.- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Specified by:
closein interfaceInternalScanner- Specified by:
closein interfaceKeyValueScanner
-
seek
Seeks all scanners at or below the specified seek key. If we earlied-out of a row, we may end up skipping values that were never reached yet. Rather than iterating down, we want to give the opportunity to re-seek.As individual scanners may run past their ends, those scanners are automatically closed and removed from the heap.
This function (and
reseek(Cell)) does not do multi-column Bloom filter and lazy-seek optimizations. To enable those, callrequestSeek(Cell, boolean, boolean).- Specified by:
seekin interfaceKeyValueScanner- Parameters:
seekKey- KeyValue to seek at or after- Returns:
- true if KeyValues exist at or after specified key, false if not
- Throws:
IOException
-
reseek
This function is identical to theseek(Cell)function except that scanner.seek(seekKey) is changed to scanner.reseek(seekKey).- Specified by:
reseekin interfaceKeyValueScanner- Parameters:
seekKey- seek value (should be non-null)- Returns:
- true if scanner has values left, false if end of scanner
- Throws:
IOException
-
requestSeek
Similar toKeyValueScanner.seek(org.apache.hadoop.hbase.Cell)(orKeyValueScanner.reseek(org.apache.hadoop.hbase.Cell)if forward is true) but only does a seek operation after checking that it is really necessary for the row/column combination specified by the kv parameter. This function was added to avoid unnecessary disk seeks by checking row-column Bloom filters before a seek on multi-column get/scan queries, and to optimize by looking up more recent files first.- Specified by:
requestSeekin interfaceKeyValueScanner- Overrides:
requestSeekin classNonLazyKeyValueScannerforward- do a forward-only "reseek" instead of a random-access seekuseBloom- whether to enable multi-column Bloom filter optimization- Throws:
IOException
-
generalizedSeek
private boolean generalizedSeek(boolean isLazy, Cell seekKey, boolean forward, boolean useBloom) throws IOException - Parameters:
isLazy- whether we are trying to seek to exactly the given row/col. Enables Bloom filter and most-recent-file-first optimizations for multi-column get/scan queries.seekKey- key to seek toforward- whether to seek forward (also known as reseek)useBloom- whether to optimize seeks using Bloom filters- Throws:
IOException
-
pollRealKV
Fetches the top sub-scanner from the priority queue, ensuring that a real seek has been done on it. Works by fetching the top sub-scanner, and if it has not done a real seek, making it do so (which will modify its top KV), putting it back, and repeating this until success. Relies on the fact that on a lazy seek we set the current key of a StoreFileScanner to a KV that is not greater than the real next KV to be read from that file, so the scanner that bubbles up to the top of the heap will have global next KV in this scanner heap if (1) it has done a real seek and (2) its KV is the top among all top KVs (some of which are fake) in the scanner heap.- Throws:
IOException
-
getHeap
Returns the current Heap -
getCurrentForTesting
-
getNextIndexedKey
- Specified by:
getNextIndexedKeyin interfaceKeyValueScanner- Overrides:
getNextIndexedKeyin classNonLazyKeyValueScanner- Returns:
- the next key in the index, usually the first key of next block OR a key that falls between last key of current block and first key of next block.. see HFileWriterImpl#getMidpoint, or null if not known.
-
shipped
Description copied from interface:ShipperCalled after a batch of rows scanned and set to be returned to client. Any in between cleanup can be done here.- Specified by:
shippedin interfaceShipper- Overrides:
shippedin classNonLazyKeyValueScanner- Throws:
IOException
-