Class StoreFileScanner
java.lang.Object
org.apache.hadoop.hbase.regionserver.StoreFileScanner
- All Implemented Interfaces:
Closeable,AutoCloseable,KeyValueScanner,Shipper
@LimitedPrivate("Phoenix")
@Evolving
public class StoreFileScanner
extends Object
implements KeyValueScanner
KeyValueScanner adaptor over the Reader. It also provides hooks into bloom filter things.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final booleanprivate booleanprivate Cellprivate booleanprivate Cellprivate final booleanprivate final booleanprivate final HFileScannerprivate final booleanprivate Cellprivate final StoreFileReaderprivate final longprivate booleanprivate final longprivate static LongAdderprivate booleanFields inherited from interface org.apache.hadoop.hbase.regionserver.KeyValueScanner
NO_NEXT_INDEXED_KEY -
Constructor Summary
ConstructorsConstructorDescriptionStoreFileScanner(StoreFileReader reader, HFileScanner hfs, boolean useMVCC, boolean hasMVCC, long readPt, long scannerOrder, boolean canOptimizeForNonNullColumn, boolean isFastSeekingEncoding) Implements aKeyValueScanneron top of the specifiedHFileScanner -
Method Summary
Modifier and TypeMethodDescriptionbooleanbackwardSeek(Cell key) Seek the scanner at or before the row of specified Cell, it firstly tries to seek the scanner at or after the specified Cell, return if peek KeyValue of scanner has the same row with specified Cell, otherwise seek the scanner at the first Cell of the row which is the previous row of specified KeyValuevoidclose()Close the KeyValue scanner.voidDoes the real seek operation in case it was skipped by seekToRowCol(KeyValue, boolean) (TODO: Whats this?).(package private) CellComparatororg.apache.hadoop.fs.Path(package private) StoreFileReaderlongGet the order of this KeyValueScanner.static List<StoreFileScanner>getScannersForCompaction(Collection<HStoreFile> files, boolean canUseDropBehind, long readPt) Get scanners for compaction.static List<StoreFileScanner>getScannersForStoreFiles(Collection<HStoreFile> files, boolean cacheBlocks, boolean usePread, boolean isCompaction, boolean useDropBehind, long readPt) Return an array of scanners corresponding to the given set of store files.static List<StoreFileScanner>getScannersForStoreFiles(Collection<HStoreFile> files, boolean cacheBlocks, boolean usePread, boolean isCompaction, boolean canUseDrop, ScanQueryMatcher matcher, long readPt) Return an array of scanners corresponding to the given set of store files, And set the ScanQueryMatcher for each store file scanner for further optimization(package private) static final long(package private) static final voidbooleanReturns true if this is a file scanner.private booleannext()Return the next Cell in this scanner, iterating the scannerpeek()Look at the next Cell in this scanner, but do not iterate scanner.booleanWe optimize our store scanners by checking the most recent store file first, so we sometimes pretend we have done a seek but delay it until the store scanner bubbles up to the top of the key-value heap.voidrecordBlockSize(IntConsumer blockSizeConsumer) Record the size of the current block in bytes, passing as an argument to the blockSizeConsumer.booleanrequestSeek(Cell kv, boolean forward, boolean useBloom) Pretend we have done a seek but don't do it yet, if possible.booleanReseek the scanner at or after the specified KeyValue.private booleanreseekAtOrAfter(Cell seekKey) (package private) static booleanreseekAtOrAfter(HFileScanner s, Cell k) booleanSeek the scanner at or after the specified KeyValue.private booleanseekAtOrAfter(Cell seekKey) static booleanseekAtOrAfter(HFileScanner s, Cell k) Returns false if not found or if k is after the end.private booleanseekBefore(Cell seekKey) private voidseekBeforeAndSaveKeyToPreviousRow(Cell seekKey) Seeks before the seek target cell and saves the location topreviousRow.booleanSeek the scanner at the first KeyValue of last rowbooleanseekToPreviousRow(Cell originalKey) Seek the scanner at the first Cell of the row which is the previous row of specified keyprivate booleanseekToPreviousRowStateless(Cell originalKey) This variant of theseekToPreviousRow(Cell)method requires two seeks.private booleanThis variant of theseekToPreviousRow(Cell)method requires one seek and one reseek.private booleanseekToPreviousRowWithoutHint(Cell originalKey) This variant of theseekToPreviousRow(Cell)method requires two seeks and one reseek.protected voidsetCurrentCell(Cell newVal) voidshipped()Called after a batch of rows scanned and set to be returned to client.booleanshouldUseScanner(Scan scan, HStore store, long oldestUnexpiredTS) Allows to filter out scanners (both StoreFile and memstore) that we don't want to use based on criteria such as Bloom filters and timestamp ranges.protected booleanprivate booleantoString()
-
Field Details
-
reader
-
hfs
-
cur
-
closed
-
realSeekDone
-
delayedReseek
-
delayedSeekKV
-
enforceMVCC
-
hasMVCCInfo
-
stopSkippingKVsIfNextRow
-
previousRow
-
isFastSeekingEncoding
-
seekCount
-
canOptimizeForNonNullColumn
-
readPt
-
scannerOrder
-
-
Constructor Details
-
StoreFileScanner
public StoreFileScanner(StoreFileReader reader, HFileScanner hfs, boolean useMVCC, boolean hasMVCC, long readPt, long scannerOrder, boolean canOptimizeForNonNullColumn, boolean isFastSeekingEncoding) Implements aKeyValueScanneron top of the specifiedHFileScanner- Parameters:
useMVCC- If true, scanner will filter out updates with MVCC larger thanreadPt.readPt- MVCC value to use to filter out the updates newer than this scanner.hasMVCC- Set to true if underlying store file reader has MVCC info.scannerOrder- Order of the scanner relative to other scanners. SeeKeyValueScanner.getScannerOrder().canOptimizeForNonNullColumn-trueif we can make sure there is no null column, otherwisefalse. This is a hint for optimization.isFastSeekingEncoding-trueif the data block encoding can seek quickly from the beginning of a block (i.e. RIV1), otherwisefalse. This is a hint for optimization.
-
-
Method Details
-
getScannersForStoreFiles
public static List<StoreFileScanner> getScannersForStoreFiles(Collection<HStoreFile> files, boolean cacheBlocks, boolean usePread, boolean isCompaction, boolean useDropBehind, long readPt) throws IOException Return an array of scanners corresponding to the given set of store files.- Throws:
IOException
-
getScannersForStoreFiles
public static List<StoreFileScanner> getScannersForStoreFiles(Collection<HStoreFile> files, boolean cacheBlocks, boolean usePread, boolean isCompaction, boolean canUseDrop, ScanQueryMatcher matcher, long readPt) throws IOException Return an array of scanners corresponding to the given set of store files, And set the ScanQueryMatcher for each store file scanner for further optimization- Throws:
IOException
-
getScannersForCompaction
public static List<StoreFileScanner> getScannersForCompaction(Collection<HStoreFile> files, boolean canUseDropBehind, long readPt) throws IOException Get scanners for compaction. We will create a separated reader for each store file to avoid contention with normal read request.- Throws:
IOException
-
toString
-
peek
Description copied from interface:KeyValueScannerLook at the next Cell in this scanner, but do not iterate scanner. NOTICE: The returned cell has not been passed into ScanQueryMatcher. So it may not be what the user need.- Specified by:
peekin interfaceKeyValueScanner- Returns:
- the next Cell
-
next
Description copied from interface:KeyValueScannerReturn the next Cell in this scanner, iterating the scanner- Specified by:
nextin interfaceKeyValueScanner- Returns:
- the next Cell
- Throws:
IOException
-
seek
Description copied from interface:KeyValueScannerSeek the scanner at or after the specified KeyValue.- Specified by:
seekin interfaceKeyValueScanner- Parameters:
key- seek value- Returns:
- true if scanner has values left, false if end of scanner
- Throws:
IOException
-
reseek
Description copied from interface:KeyValueScannerReseek the scanner at or after the specified KeyValue. This method is guaranteed to seek at or after the required key only if the key comes after the current position of the scanner. Should not be used to seek to a key which may come before the current position.- Specified by:
reseekin interfaceKeyValueScanner- Parameters:
key- seek value (should be non-null)- Returns:
- true if scanner has values left, false if end of scanner
- Throws:
IOException
-
setCurrentCell
- Throws:
IOException
-
skipKVsNewerThanReadpoint
- Throws:
IOException
-
close
Description copied from interface:KeyValueScannerClose the KeyValue scanner.- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Specified by:
closein interfaceKeyValueScanner
-
seekAtOrAfter
Returns false if not found or if k is after the end.- Throws:
IOException
-
reseekAtOrAfter
- Throws:
IOException
-
getScannerOrder
Description copied from interface:KeyValueScannerGet the order of this KeyValueScanner. This is only relevant for StoreFileScanners. This is required for comparing multiple files to find out which one has the latest data. StoreFileScanners are ordered from 0 (oldest) to newest in increasing order.- Specified by:
getScannerOrderin interfaceKeyValueScanner- See Also:
-
requestSeek
Pretend we have done a seek but don't do it yet, if possible. The hope is that we find requested columns in more recent files and won't have to seek in older files. Creates a fake key/value with the given row/column and the highest (most recent) possible timestamp we might get from this file. When users of such "lazy scanner" need to know the next KV precisely (e.g. when this scanner is at the top of the heap), they runenforceSeek().Note that this function does guarantee that the current KV of this scanner will be advanced to at least the given KV. Because of this, it does have to do a real seek in cases when the seek timestamp is older than the highest timestamp of the file, e.g. when we are trying to seek to the next row/column and use OLDEST_TIMESTAMP in the seek key.
- Specified by:
requestSeekin interfaceKeyValueScannerforward- do a forward-only "reseek" instead of a random-access seekuseBloom- whether to enable multi-column Bloom filter optimization- Throws:
IOException
-
getReader
-
getComparator
-
realSeekDone
Description copied from interface:KeyValueScannerWe optimize our store scanners by checking the most recent store file first, so we sometimes pretend we have done a seek but delay it until the store scanner bubbles up to the top of the key-value heap. This method is then used to ensure the top store file scanner has done a seek operation.- Specified by:
realSeekDonein interfaceKeyValueScanner
-
enforceSeek
Description copied from interface:KeyValueScannerDoes the real seek operation in case it was skipped by seekToRowCol(KeyValue, boolean) (TODO: Whats this?). Note that this function should be never called on scanners that always do real seek operations (i.e. most of the scanners). The easiest way to achieve this is to callKeyValueScanner.realSeekDone()first.- Specified by:
enforceSeekin interfaceKeyValueScanner- Throws:
IOException
-
isFileScanner
Description copied from interface:KeyValueScannerReturns true if this is a file scanner. Otherwise a memory scanner is assumed.- Specified by:
isFileScannerin interfaceKeyValueScanner
-
recordBlockSize
Description copied from interface:KeyValueScannerRecord the size of the current block in bytes, passing as an argument to the blockSizeConsumer. Implementations should ensure that blockSizeConsumer is only called once per block.- Specified by:
recordBlockSizein interfaceKeyValueScanner- Parameters:
blockSizeConsumer- to be called with block size in bytes, once per block.
-
getFilePath
- Specified by:
getFilePathin interfaceKeyValueScanner- Returns:
- the file path if this is a file scanner, otherwise null.
- See Also:
-
getSeekCount
-
instrument
-
shouldUseScanner
Description copied from interface:KeyValueScannerAllows to filter out scanners (both StoreFile and memstore) that we don't want to use based on criteria such as Bloom filters and timestamp ranges.- Specified by:
shouldUseScannerin interfaceKeyValueScanner- Parameters:
scan- the scan that we are selecting scanners forstore- the store we are performing the scan on.oldestUnexpiredTS- the oldest timestamp we are interested in for this query, based on TTL- Returns:
- true if the scanner should be included in the query
-
seekToPreviousRow
Description copied from interface:KeyValueScannerSeek the scanner at the first Cell of the row which is the previous row of specified key- Specified by:
seekToPreviousRowin interfaceKeyValueScanner- Parameters:
originalKey- seek value- Returns:
- true if the scanner at the first valid Cell of previous row, false if not existing such Cell
- Throws:
IOException
-
seekToPreviousRowWithHint
This variant of theseekToPreviousRow(Cell)method requires one seek and one reseek. This method maintains state inpreviousRowwhich only makes sense in the context of a sequential row-by-row reverse scan.previousRowshould be reset if that is not the case. The reasoning for why this method is faster thanseekToPreviousRowStateless(Cell)is that seeks are slower as they need to start from the beginning of the file, while reseeks go forward from the current position.- Throws:
IOException
-
seekToPreviousRowWithoutHint
This variant of theseekToPreviousRow(Cell)method requires two seeks and one reseek. The extra expense/seek is with the intent of speeding up subsequent calls by using theseekToPreviousRowWithHint()which this method seeds the state for by settingpreviousRow- Throws:
IOException
-
seekToPreviousRowStateless
This variant of theseekToPreviousRow(Cell)method requires two seeks. It should be used if the cost for seeking is lower i.e. when using a fast seeking data block encoding like RIV1.- Throws:
IOException
-
seekBefore
- Throws:
IOException
-
seekBeforeAndSaveKeyToPreviousRow
Seeks before the seek target cell and saves the location topreviousRow. If there doesn't exist a KV in this file before the seek target cell, reposition the scanner at the beginning of the storefile (in preparation to a reseek at or after the seek key) and set thepreviousRowto null. IfpreviousRowis ever non-null and then transitions to being null again via this method, that's because there doesn't exist a row before the seek target in the storefile (i.e. we're at the beginning of the storefile)- Throws:
IOException
-
seekAtOrAfter
- Throws:
IOException
-
reseekAtOrAfter
- Throws:
IOException
-
isStillAtSeekTargetAfterSkippingNewerKvs
- Throws:
IOException
-
skipKvsNewerThanReadpointReversed
- Throws:
IOException
-
seekToLastRow
Description copied from interface:KeyValueScannerSeek the scanner at the first KeyValue of last row- Specified by:
seekToLastRowin interfaceKeyValueScanner- Returns:
- true if scanner has values left, false if the underlying data is empty
- Throws:
IOException
-
backwardSeek
Description copied from interface:KeyValueScannerSeek the scanner at or before the row of specified Cell, it firstly tries to seek the scanner at or after the specified Cell, return if peek KeyValue of scanner has the same row with specified Cell, otherwise seek the scanner at the first Cell of the row which is the previous row of specified KeyValue- Specified by:
backwardSeekin interfaceKeyValueScanner- Parameters:
key- seek KeyValue- Returns:
- true if the scanner is at the valid KeyValue, false if such KeyValue does not exist
- Throws:
IOException
-
getNextIndexedKey
- Specified by:
getNextIndexedKeyin interfaceKeyValueScanner- Returns:
- the next key in the index, usually the first key of next block OR a key that falls between last key of current block and first key of next block.. see HFileWriterImpl#getMidpoint, or null if not known.
-
shipped
Description copied from interface:ShipperCalled after a batch of rows scanned and set to be returned to client. Any in between cleanup can be done here.- Specified by:
shippedin interfaceShipper- Throws:
IOException
-