@InterfaceAudience.Public public class Scan extends Query
All operations are identical to Get
with the exception of instantiation. Rather than
specifying a single row, an optional startRow and stopRow may be defined. If rows are not
specified, the Scanner will iterate over all rows.
To get all columns from all rows of a Table, create an instance with no constraints; use the
Scan()
constructor. To constrain the scan to specific column families, call
addFamily
for each family to retrieve on your Scan instance.
To get specific columns, call addColumn
for each column to
retrieve.
To only retrieve columns within a specific range of version timestamps, call
setTimeRange
.
To only retrieve columns with a specific timestamp, call setTimestamp
.
To limit the number of versions of each column to be returned, call setMaxVersions
.
To limit the maximum number of values returned for each call to next(), call
setBatch
.
To add a filter, call setFilter
.
For small scan, it is deprecated in 2.0.0. Now we have a setLimit(int)
method in Scan
object which is used to tell RS how many rows we want. If the rows return reaches the limit, the
RS will close the RegionScanner automatically. And we will also fetch data when openScanner in
the new implementation, this means we can also finish a scan operation in one rpc call. And we
have also introduced a setReadType(ReadType)
method. You can use this method to tell RS
to use pread explicitly.
Expert: To explicitly disable server-side block caching for this scan, execute
setCacheBlocks(boolean)
.
Note: Usage alters Scan instances. Internally, attributes are updated as the Scan runs and if enabled, metrics accumulate in the Scan instance. Be aware this is the case when you go to clone a Scan instance or if you go to reuse a created Scan instance; safer is create a Scan instance per usage.
Modifier and Type | Class and Description |
---|---|
static class |
Scan.ReadType |
Modifier and Type | Field and Description |
---|---|
static boolean |
DEFAULT_HBASE_CLIENT_SCANNER_ASYNC_PREFETCH
Default value of
HBASE_CLIENT_SCANNER_ASYNC_PREFETCH . |
static String |
HBASE_CLIENT_SCANNER_ASYNC_PREFETCH
Parameter name for client scanner sync/async prefetch toggle.
|
static String |
SCAN_ATTRIBUTES_METRICS_DATA
Deprecated.
|
static String |
SCAN_ATTRIBUTES_METRICS_ENABLE
Deprecated.
since 1.0.0. Use
setScanMetricsEnabled(boolean) |
static String |
SCAN_ATTRIBUTES_TABLE_NAME |
colFamTimeRangeMap, consistency, filter, loadColumnFamiliesOnDemand, targetReplicaId
ID_ATRIBUTE
Constructor and Description |
---|
Scan()
Create a Scan operation across all rows.
|
Scan(byte[] startRow)
Deprecated.
since 2.0.0 and will be removed in 3.0.0. Use
new Scan().withStartRow(startRow) instead. |
Scan(byte[] startRow,
byte[] stopRow)
Deprecated.
since 2.0.0 and will be removed in 3.0.0. Use
new Scan().withStartRow(startRow).withStopRow(stopRow) instead. |
Scan(byte[] startRow,
Filter filter)
Deprecated.
since 2.0.0 and will be removed in 3.0.0. Use
new Scan().withStartRow(startRow).setFilter(filter) instead. |
Scan(Get get)
Builds a scan object with the same specs as get.
|
Scan(Scan scan)
Creates a new instance of this class while copying all values.
|
Modifier and Type | Method and Description |
---|---|
Scan |
addColumn(byte[] family,
byte[] qualifier)
Get the column from the specified family with the specified qualifier.
|
Scan |
addFamily(byte[] family)
Get all columns from the specified family.
|
static Scan |
createScanFromCursor(Cursor cursor)
Create a new Scan with a cursor.
|
boolean |
getAllowPartialResults() |
int |
getBatch() |
boolean |
getCacheBlocks()
Get whether blocks should be cached for this Scan.
|
int |
getCaching() |
byte[][] |
getFamilies() |
Map<byte[],NavigableSet<byte[]>> |
getFamilyMap()
Getting the familyMap
|
Filter |
getFilter() |
Map<String,Object> |
getFingerprint()
Compile the table and column family (i.e.
|
int |
getLimit() |
long |
getMaxResultSize() |
int |
getMaxResultsPerColumnFamily() |
int |
getMaxVersions() |
Scan.ReadType |
getReadType() |
int |
getRowOffsetPerColumnFamily()
Method for retrieving the scan's offset per row per column
family (#kvs to be skipped)
|
org.apache.hadoop.hbase.client.metrics.ScanMetrics |
getScanMetrics()
Deprecated.
Use
ResultScanner.getScanMetrics() instead. And notice that, please do not
use this method and ResultScanner.getScanMetrics() together, the metrics
will be messed up. |
byte[] |
getStartRow() |
byte[] |
getStopRow() |
TimeRange |
getTimeRange() |
boolean |
hasFamilies() |
boolean |
hasFilter() |
boolean |
includeStartRow() |
boolean |
includeStopRow() |
Boolean |
isAsyncPrefetch() |
boolean |
isGetScan() |
boolean |
isNeedCursorResult() |
boolean |
isRaw() |
boolean |
isReversed()
Get whether this scan is a reversed one.
|
boolean |
isScanMetricsEnabled() |
boolean |
isSmall()
Deprecated.
since 2.0.0 and will be removed in 3.0.0. See the comment of
setSmall(boolean) |
int |
numFamilies() |
Scan |
readAllVersions()
Get all available versions.
|
Scan |
readVersions(int versions)
Get up to the specified number of versions of each column.
|
Scan |
setACL(Map<String,org.apache.hadoop.hbase.security.access.Permission> perms) |
Scan |
setACL(String user,
org.apache.hadoop.hbase.security.access.Permission perms) |
Scan |
setAllowPartialResults(boolean allowPartialResults)
Setting whether the caller wants to see the partial results when server returns
less-than-expected cells.
|
Scan |
setAsyncPrefetch(boolean asyncPrefetch) |
Scan |
setAttribute(String name,
byte[] value)
Sets an attribute.
|
Scan |
setAuthorizations(org.apache.hadoop.hbase.security.visibility.Authorizations authorizations)
Sets the authorizations to be used by this Query
|
Scan |
setBatch(int batch)
Set the maximum number of cells to return for each call to next().
|
Scan |
setCacheBlocks(boolean cacheBlocks)
Set whether blocks should be cached for this Scan.
|
Scan |
setCaching(int caching)
Set the number of rows for caching that will be passed to scanners.
|
Scan |
setColumnFamilyTimeRange(byte[] cf,
long minStamp,
long maxStamp)
Get versions of columns only within the specified timestamp range,
[minStamp, maxStamp) on a per CF bases.
|
Scan |
setConsistency(Consistency consistency)
Sets the consistency level for this operation
|
Scan |
setFamilyMap(Map<byte[],NavigableSet<byte[]>> familyMap)
Setting the familyMap
|
Scan |
setFilter(Filter filter)
Apply the specified server-side filter when performing the Query.
|
Scan |
setId(String id)
This method allows you to set an identifier on an operation.
|
Scan |
setIsolationLevel(IsolationLevel level)
Set the isolation level for this query.
|
Scan |
setLimit(int limit)
Set the limit of rows for this scan.
|
Scan |
setLoadColumnFamiliesOnDemand(boolean value)
Set the value indicating whether loading CFs on demand should be allowed (cluster
default is false).
|
Scan |
setMaxResultSize(long maxResultSize)
Set the maximum result size.
|
Scan |
setMaxResultsPerColumnFamily(int limit)
Set the maximum number of values to return per row per Column Family
|
Scan |
setMaxVersions()
Deprecated.
since 2.0.0 and will be removed in 3.0.0. It is easy to misunderstand with column
family's max versions, so use
readAllVersions() instead. |
Scan |
setMaxVersions(int maxVersions)
Deprecated.
since 2.0.0 and will be removed in 3.0.0. It is easy to misunderstand with column
family's max versions, so use
readVersions(int) instead. |
Scan |
setNeedCursorResult(boolean needCursorResult)
When the server is slow or we scan a table with many deleted data or we use a sparse filter,
the server will response heartbeat to prevent timeout.
|
Scan |
setOneRowLimit()
Call this when you only want to get one row.
|
Scan |
setPriority(int priority) |
Scan |
setRaw(boolean raw)
Enable/disable "raw" mode for this scan.
|
Scan |
setReadType(Scan.ReadType readType)
Set the read type for this scan.
|
Scan |
setReplicaId(int Id)
Specify region replica id where Query will fetch data from.
|
Scan |
setReversed(boolean reversed)
Set whether this scan is a reversed one
|
Scan |
setRowOffsetPerColumnFamily(int offset)
Set offset for the row per Column Family.
|
Scan |
setRowPrefixFilter(byte[] rowPrefix)
Set a filter (using stopRow and startRow) so the result set only contains rows where the
rowKey starts with the specified prefix.
|
Scan |
setScanMetricsEnabled(boolean enabled)
Enable collection of
ScanMetrics . |
Scan |
setSmall(boolean small)
Deprecated.
since 2.0.0 and will be removed in 3.0.0. Use
setLimit(int) and
setReadType(ReadType) instead. And for the one rpc optimization, now we will also
fetch data when openScanner, and if the number of rows reaches the limit then we will close
the scanner automatically which means we will fall back to one rpc. |
Scan |
setStartRow(byte[] startRow)
Deprecated.
since 2.0.0 and will be removed in 3.0.0. Use
withStartRow(byte[])
instead. This method may change the inclusive of the stop row to keep compatible with the old
behavior. |
Scan |
setStopRow(byte[] stopRow)
Deprecated.
since 2.0.0 and will be removed in 3.0.0. Use
withStopRow(byte[]) instead.
This method may change the inclusive of the stop row to keep compatible with the old
behavior. |
Scan |
setTimeRange(long minStamp,
long maxStamp)
Get versions of columns only within the specified timestamp range,
[minStamp, maxStamp).
|
Scan |
setTimestamp(long timestamp)
Get versions of columns with the specified timestamp.
|
Scan |
setTimeStamp(long timestamp)
Deprecated.
As of release 2.0.0, this will be removed in HBase 3.0.0.
Use
setTimestamp(long) instead |
Map<String,Object> |
toMap(int maxCols)
Compile the details beyond the scope of getFingerprint (row, columns,
timestamps, etc.) into a Map along with the fingerprinted information.
|
Scan |
withStartRow(byte[] startRow)
Set the start row of the scan.
|
Scan |
withStartRow(byte[] startRow,
boolean inclusive)
Set the start row of the scan.
|
Scan |
withStopRow(byte[] stopRow)
Set the stop row of the scan.
|
Scan |
withStopRow(byte[] stopRow,
boolean inclusive)
Set the stop row of the scan.
|
doLoadColumnFamiliesOnDemand, getACL, getAuthorizations, getColumnFamilyTimeRange, getConsistency, getIsolationLevel, getLoadColumnFamiliesOnDemandValue, getReplicaId
getAttribute, getAttributeSize, getAttributesMap, getId, getPriority
@Deprecated public static final String SCAN_ATTRIBUTES_METRICS_ENABLE
setScanMetricsEnabled(boolean)
@Deprecated public static final String SCAN_ATTRIBUTES_METRICS_DATA
getScanMetrics()
public static final String SCAN_ATTRIBUTES_TABLE_NAME
public static final String HBASE_CLIENT_SCANNER_ASYNC_PREFETCH
public static final boolean DEFAULT_HBASE_CLIENT_SCANNER_ASYNC_PREFETCH
HBASE_CLIENT_SCANNER_ASYNC_PREFETCH
.public Scan()
@Deprecated public Scan(byte[] startRow, Filter filter)
new Scan().withStartRow(startRow).setFilter(filter)
instead.@Deprecated public Scan(byte[] startRow)
new Scan().withStartRow(startRow)
instead.If the specified row does not exist, the Scanner will start from the next closest row after the specified row.
startRow
- row to start scanner at or after@Deprecated public Scan(byte[] startRow, byte[] stopRow)
new Scan().withStartRow(startRow).withStopRow(stopRow)
instead.startRow
- row to start scanner at or after (inclusive)stopRow
- row to stop scanner before (exclusive)public Scan(Scan scan) throws IOException
scan
- The scan instance to copy from.IOException
- When copying the values fails.public boolean isGetScan()
public Scan addFamily(byte[] family)
Overrides previous calls to addColumn for this family.
family
- family namepublic Scan addColumn(byte[] family, byte[] qualifier)
Overrides previous calls to addFamily for this family.
family
- family namequalifier
- column qualifierpublic Scan setTimeRange(long minStamp, long maxStamp) throws IOException
minStamp
- minimum timestamp value, inclusivemaxStamp
- maximum timestamp value, exclusiveIOException
setMaxVersions()
,
setMaxVersions(int)
@Deprecated public Scan setTimeStamp(long timestamp) throws IOException
setTimestamp(long)
insteadtimestamp
- version timestampIOException
setMaxVersions()
,
setMaxVersions(int)
public Scan setTimestamp(long timestamp)
timestamp
- version timestampsetMaxVersions()
,
setMaxVersions(int)
public Scan setColumnFamilyTimeRange(byte[] cf, long minStamp, long maxStamp)
Query
setColumnFamilyTimeRange
in class Query
cf
- the column family for which you want to restrictminStamp
- minimum timestamp value, inclusivemaxStamp
- maximum timestamp value, exclusive@Deprecated public Scan setStartRow(byte[] startRow)
withStartRow(byte[])
instead. This method may change the inclusive of the stop row to keep compatible with the old
behavior.If the specified row does not exist, the Scanner will start from the next closest row after the specified row.
startRow
- row to start scanner at or afterIllegalArgumentException
- if startRow does not meet criteria for a row key (when length
exceeds HConstants.MAX_ROW_LENGTH
)withStartRow(byte[])
,
HBASE-17320public Scan withStartRow(byte[] startRow)
If the specified row does not exist, the Scanner will start from the next closest row after the specified row.
startRow
- row to start scanner at or afterIllegalArgumentException
- if startRow does not meet criteria for a row key (when length
exceeds HConstants.MAX_ROW_LENGTH
)public Scan withStartRow(byte[] startRow, boolean inclusive)
If the specified row does not exist, or the inclusive
is false
, the Scanner
will start from the next closest row after the specified row.
startRow
- row to start scanner at or afterinclusive
- whether we should include the start row when scanIllegalArgumentException
- if startRow does not meet criteria for a row key (when length
exceeds HConstants.MAX_ROW_LENGTH
)@Deprecated public Scan setStopRow(byte[] stopRow)
withStopRow(byte[])
instead.
This method may change the inclusive of the stop row to keep compatible with the old
behavior.The scan will include rows that are lexicographically less than the provided stopRow.
Note: When doing a filter for a rowKey Prefix use
setRowPrefixFilter(byte[])
. The 'trailing 0' will not yield the desired result.
stopRow
- row to end at (exclusive)IllegalArgumentException
- if stopRow does not meet criteria for a row key (when length
exceeds HConstants.MAX_ROW_LENGTH
)withStopRow(byte[])
,
HBASE-17320public Scan withStopRow(byte[] stopRow)
The scan will include rows that are lexicographically less than the provided stopRow.
Note: When doing a filter for a rowKey Prefix use
setRowPrefixFilter(byte[])
. The 'trailing 0' will not yield the desired result.
stopRow
- row to end at (exclusive)IllegalArgumentException
- if stopRow does not meet criteria for a row key (when length
exceeds HConstants.MAX_ROW_LENGTH
)public Scan withStopRow(byte[] stopRow, boolean inclusive)
The scan will include rows that are lexicographically less than (or equal to if
inclusive
is true
) the provided stopRow.
stopRow
- row to end atinclusive
- whether we should include the stop row when scanIllegalArgumentException
- if stopRow does not meet criteria for a row key (when length
exceeds HConstants.MAX_ROW_LENGTH
)public Scan setRowPrefixFilter(byte[] rowPrefix)
Set a filter (using stopRow and startRow) so the result set only contains rows where the rowKey starts with the specified prefix.
This is a utility method that converts the desired rowPrefix into the appropriate values for the startRow and stopRow to achieve the desired result.
This can safely be used in combination with setFilter.
NOTE: Doing a setStartRow(byte[])
and/or setStopRow(byte[])
after this method will yield undefined results.
rowPrefix
- the prefix all rows must start with. (Set null to remove the filter.)@Deprecated public Scan setMaxVersions()
readAllVersions()
instead.readAllVersions()
,
HBASE-17125@Deprecated public Scan setMaxVersions(int maxVersions)
readVersions(int)
instead.maxVersions
- maximum versions for each columnreadVersions(int)
,
HBASE-17125public Scan readAllVersions()
public Scan readVersions(int versions)
versions
- specified number of versions for each columnpublic Scan setBatch(int batch)
setAllowPartialResults(boolean)
.
If you don't allow partial results, the number of cells in each Result must equal to your
batch setting unless it is the last Result for current row. So this method is helpful in paging
queries. If you just want to prevent OOM at client, use setAllowPartialResults(true) is better.batch
- the maximum number of valuesResult.mayHaveMoreCellsInRow()
public Scan setMaxResultsPerColumnFamily(int limit)
limit
- the maximum number of values returned / row / CFpublic Scan setRowOffsetPerColumnFamily(int offset)
offset
- is the number of kvs that will be skipped.public Scan setCaching(int caching)
HConstants.HBASE_CLIENT_SCANNER_CACHING
will
apply.
Higher caching values will enable faster scanners but will use more memory.caching
- the number of rows for cachingpublic long getMaxResultSize()
setMaxResultSize(long)
public Scan setMaxResultSize(long maxResultSize)
maxResultSize
- The maximum result size in bytes.public Scan setFilter(Filter filter)
Query
Filter.filterCell(org.apache.hadoop.hbase.Cell)
is called AFTER all tests for ttl,
column match, deletes and column family's max versions have been run.public Scan setFamilyMap(Map<byte[],NavigableSet<byte[]>> familyMap)
familyMap
- map of family to qualifierpublic Map<byte[],NavigableSet<byte[]>> getFamilyMap()
public int numFamilies()
public boolean hasFamilies()
public byte[][] getFamilies()
public byte[] getStartRow()
public boolean includeStartRow()
public byte[] getStopRow()
public boolean includeStopRow()
public int getMaxVersions()
public int getBatch()
public int getMaxResultsPerColumnFamily()
public int getRowOffsetPerColumnFamily()
public int getCaching()
public TimeRange getTimeRange()
public boolean hasFilter()
public Scan setCacheBlocks(boolean cacheBlocks)
This is true by default. When true, default settings of the table and family are used (this will never override caching blocks if the block cache is disabled for that family or entirely).
cacheBlocks
- if false, default settings are overridden and blocks
will not be cachedpublic boolean getCacheBlocks()
public Scan setReversed(boolean reversed)
This is false by default which means forward(normal) scan.
reversed
- if true, scan will be backward orderpublic boolean isReversed()
public Scan setAllowPartialResults(boolean allowPartialResults)
allowPartialResults
- Result.mayHaveMoreCellsInRow()
,
setBatch(int)
public boolean getAllowPartialResults()
ResultScanner.next()
public Scan setLoadColumnFamiliesOnDemand(boolean value)
Query
setLoadColumnFamiliesOnDemand
in class Query
public Map<String,Object> getFingerprint()
getFingerprint
in class Operation
public Map<String,Object> toMap(int maxCols)
public Scan setRaw(boolean raw)
raw
- True/False to enable/disable "raw" mode.public boolean isRaw()
@Deprecated public Scan setSmall(boolean small)
setLimit(int)
and
setReadType(ReadType)
instead. And for the one rpc optimization, now we will also
fetch data when openScanner, and if the number of rows reaches the limit then we will close
the scanner automatically which means we will fall back to one rpc.Small scan should use pread and big scan can use seek + read seek + read is fast but can cause two problem (1) resource contention (2) cause too much network io [89-fb] Using pread for non-compaction read request https://issues.apache.org/jira/browse/HBASE-7266 On the other hand, if setting it true, we would do openScanner,next,closeScanner in one RPC call. It means the better performance for small scan. [HBASE-9488]. Generally, if the scan range is within one data block(64KB), it could be considered as a small scan.
small
- setLimit(int)
,
setReadType(ReadType)
,
HBASE-17045@Deprecated public boolean isSmall()
setSmall(boolean)
public Scan setAttribute(String name, byte[] value)
Attributes
setAttribute
in interface Attributes
setAttribute
in class OperationWithAttributes
name
- attribute namevalue
- attribute valuepublic Scan setId(String id)
OperationWithAttributes
setId
in class OperationWithAttributes
id
- id to set for the scanpublic Scan setAuthorizations(org.apache.hadoop.hbase.security.visibility.Authorizations authorizations)
Query
setAuthorizations
in class Query
public Scan setConsistency(Consistency consistency)
Query
setConsistency
in class Query
consistency
- the consistency levelpublic Scan setReplicaId(int Id)
Query
Query.setConsistency(Consistency)
passing Consistency.TIMELINE
to read data from
a specific replicaId.
setReplicaId
in class Query
public Scan setIsolationLevel(IsolationLevel level)
Query
setIsolationLevel
in class Query
level
- IsolationLevel for this querypublic Scan setPriority(int priority)
setPriority
in class OperationWithAttributes
public Scan setScanMetricsEnabled(boolean enabled)
ScanMetrics
. For advanced users.enabled
- Set to true to enable accumulating scan metricspublic boolean isScanMetricsEnabled()
@Deprecated public org.apache.hadoop.hbase.client.metrics.ScanMetrics getScanMetrics()
ResultScanner.getScanMetrics()
instead. And notice that, please do not
use this method and ResultScanner.getScanMetrics()
together, the metrics
will be messed up.setScanMetricsEnabled(boolean)
public Boolean isAsyncPrefetch()
public Scan setAsyncPrefetch(boolean asyncPrefetch)
public int getLimit()
public Scan setLimit(int limit)
This condition will be tested at last, after all other conditions such as stopRow, filter, etc.
limit
- the limit of rows for this scanpublic Scan setOneRowLimit()
limit
to 1
, and also
set readType
to Scan.ReadType.PREAD
.public Scan.ReadType getReadType()
public Scan setReadType(Scan.ReadType readType)
Notice that we may choose to use pread even if you specific Scan.ReadType.STREAM
here. For
example, we will always use pread if this is a get scan.
public Scan setNeedCursorResult(boolean needCursorResult)
Result.isCursor()
Result.getCursor()
Cursor
public boolean isNeedCursorResult()
public static Scan createScanFromCursor(Cursor cursor)
Result.isCursor()
Result.getCursor()
Cursor
Copyright © 2007–2021 The Apache Software Foundation. All rights reserved.