Class OrderedBytes
Bytes, these methods
produce byte arrays which maintain the sort order of the original values.
Encoding Format summary
Each value is encoded as one or more bytes. The first byte of the encoding, its meaning, and a terse description of the bytes that follow is given by the following table:
| Content Type | Encoding |
|---|---|
| NULL | 0x05 |
| negative infinity | 0x07 |
| negative large | 0x08, ~E, ~M |
| negative medium | 0x13-E, ~M |
| negative small | 0x14, -E, ~M |
| zero | 0x15 |
| positive small | 0x16, ~-E, M |
| positive medium | 0x17+E, M |
| positive large | 0x22, E, M |
| positive infinity | 0x23 |
| NaN | 0x25 |
| fixed-length 32-bit integer | 0x27, I |
| fixed-length 64-bit integer | 0x28, I |
| fixed-length 8-bit integer | 0x29 |
| fixed-length 16-bit integer | 0x2a |
| fixed-length 32-bit float | 0x30, F |
| fixed-length 64-bit float | 0x31, F |
| TEXT | 0x33, T |
| variable length BLOB | 0x35, B |
| byte-for-byte BLOB | 0x36, X |
Null Encoding
Each value that is a NULL encodes as a single byte of 0x05. Since every other value encoding begins with a byte greater than 0x05, this forces NULL values to sort first.
Text Encoding
Each text value begins with a single byte of 0x33 and ends with a single byte of 0x00. There are zero or more intervening bytes that encode the text value. The intervening bytes are chosen so that the encoding will sort in the desired collating order. The intervening bytes may not contain a 0x00 character; the only 0x00 byte allowed in a text encoding is the final byte.
The text encoding ends in 0x00 in order to ensure that when there are two strings where one is a prefix of the other that the shorter string will sort first.
Binary Encoding
There are two encoding strategies for binary fields, referred to as "BlobVar" and "BlobCopy". BlobVar is less efficient in both space and encoding time. It has no limitations on the range of encoded values. BlobCopy is a byte-for-byte copy of the input data followed by a termination byte. It is extremely fast to encode and decode. It carries the restriction of not allowing a 0x00 value in the input byte[] as this value is used as the termination byte.
BlobVar
"BlobVar" encodes the input byte[] in a manner similar to a variable length integer encoding. As
with the other OrderedBytes encodings, the first encoded byte is used to indicate what
kind of value follows. This header byte is 0x37 for BlobVar encoded values. As with the
traditional varint encoding, the most significant bit of each subsequent encoded byte is
used as a continuation marker. The 7 remaining bits contain the 7 most significant bits of the
first unencoded byte. The next encoded byte starts with a continuation marker in the MSB. The
least significant bit from the first unencoded byte follows, and the remaining 6 bits contain the
6 MSBs of the second unencoded byte. The encoding continues, encoding 7 bytes on to 8 encoded
bytes. The MSB of the final encoded byte contains a termination marker rather than a continuation
marker, and any remaining bits from the final input byte. Any trailing bits in the final encoded
byte are zeros.
BlobCopy
"BlobCopy" is a simple byte-for-byte copy of the input data. It uses 0x38 as the header byte, and is terminated by 0x00 in the DESCENDING case. This alternative encoding is faster and more space-efficient, but it cannot accept values containing a 0x00 byte in DESCENDING order.
Variable-length Numeric Encoding
Numeric values must be coded so as to sort in numeric order. We assume that numeric values can be
both integer and floating point values. Clients must be careful to use inspection methods for
encoded values (such as isNumericInfinite(PositionedByteRange) and
isNumericNaN(PositionedByteRange) to protect against decoding values into object which
do not support these numeric concepts (such as Long and BigDecimal).
Simplest cases first: If the numeric value is a NaN, then the encoding is a single byte of 0x25. This causes NaN values to sort after every other numeric value.
If the numeric value is a negative infinity then the encoding is a single byte of 0x07. Since every other numeric value except NaN has a larger initial byte, this encoding ensures that negative infinity will sort prior to every other numeric value other than NaN.
If the numeric value is a positive infinity then the encoding is a single byte of 0x23. Every other numeric value encoding begins with a smaller byte, ensuring that positive infinity always sorts last among numeric values. 0x23 is also smaller than 0x33, the initial byte of a text value, ensuring that every numeric value sorts before every text value.
If the numeric value is exactly zero then it is encoded as a single byte of 0x15. Finite negative values will have initial bytes of 0x08 through 0x14 and finite positive values will have initial bytes of 0x16 through 0x22.
For all numeric values, we compute a mantissa M and an exponent E. The mantissa is a base-100 representation of the value. The exponent E determines where to put the decimal point.
Each centimal digit of the mantissa is stored in a byte. If the value of the centimal digit is X (hence X≥0 and X≤99) then the byte value will be 2*X+1 for every byte of the mantissa, except for the last byte which will be 2*X+0. The mantissa must be the minimum number of bytes necessary to represent the value; trailing X==0 digits are omitted. This means that the mantissa will never contain a byte with the value 0x00.
If we assume all digits of the mantissa occur to the right of the decimal point, then the exponent E is the power of one hundred by which one must multiply the mantissa to recover the original value.
Values are classified as large, medium, or small according to the value of E. If E is 11 or more, the value is large. For E between 0 and 10, the value is medium. For E less than zero, the value is small.
Large positive values are encoded as a single byte 0x22 followed by E as a varint and then M. Medium positive values are a single byte of 0x17+E followed by M. Small positive values are encoded as a single byte 0x16 followed by the ones-complement of the varint for -E followed by M.
Small negative values are encoded as a single byte 0x14 followed by -E as a varint and then the ones-complement of M. Medium negative values are encoded as a byte 0x13-E followed by the ones-complement of M. Large negative values consist of the single byte 0x08 followed by the ones-complement of the varint encoding of E followed by the ones-complement of M.
Fixed-length Integer Encoding
All 4-byte integers are serialized to a 5-byte, fixed-width, sortable byte format. All 8-byte integers are serialized to the equivelant 9-byte format. Serialization is performed by writing a header byte, inverting the integer sign bit and writing the resulting bytes to the byte array in big endian order.
Fixed-length Floating Point Encoding
32-bit and 64-bit floating point numbers are encoded to a 5-byte and 9-byte encoding format, respectively. The format is identical, save for the precision respected in each step of the operation.
This format ensures the following total ordering of floating point values: Float.NEGATIVE_INFINITY < -Float.MAX_VALUE < ... < -Float.MIN_VALUE < -0.0 < +0.0; < Float.MIN_VALUE < ... < Float.MAX_VALUE < Float.POSITIVE_INFINITY < Float.NaN
Floating point numbers are encoded as specified in IEEE 754. A 32-bit single precision float consists of a sign bit, 8-bit unsigned exponent encoded in offset-127 notation, and a 23-bit significand. The format is described further in the Single Precision Floating Point Wikipedia page
The value of a normal float is -1 sign bit × 2exponent - 127 × 1.significand
The IEE754 floating point format already preserves sort ordering for positive floating point numbers when the raw bytes are compared in most significant byte order. This is discussed further at http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm
Thus, we need only ensure that negative numbers sort in the the exact opposite order as positive numbers (so that say, negative infinity is less than negative 1), and that all negative numbers compare less than any positive number. To accomplish this, we invert the sign bit of all floating point numbers, and we also invert the exponent and significand bits if the floating point number was negative.
More specifically, we first store the floating point bits into a 32-bit int j using
Float.floatToIntBits(float). This method collapses all NaNs into a single, canonical NaN value
but otherwise leaves the bits unchanged. We then compute
j ˆ= (j >> (Integer.SIZE - 1)) | Integer.MIN_SIZE
which inverts the sign bit and XOR's all other bits with the sign bit itself. Comparing the raw
bytes of j in most significant byte order is equivalent to performing a single precision
floating point comparison on the underlying bits (ignoring NaN comparisons, as NaNs don't compare
equal to anything when performing floating point comparisons).
The resulting integer is then converted into a byte array by serializing the integer one byte at a time in most significant byte order. The serialized integer is prefixed by a single header byte. All serialized values are 5 bytes in length.
OrderedBytes encodings are heavily influenced by the
SQLite4 Key Encoding. Slight
deviations are make in the interest of order correctness and user extensibility. Fixed-width
Long and Double encodings are based on implementations from the now defunct
Orderly library.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static final byteprivate static final bytestatic final MathContextThe context used to normalizeBigDecimalvalues.private static final byteprivate static final byteprivate static final byteprivate static final byteprivate static final byteprivate static final bytestatic final intMax precision guaranteed to fit into along.private static final byteprivate static final byteprivate static final byteprivate static final byteprivate static final byteprivate static final byteprivate static final byteprivate static final byteprivate static final byteprivate static final byteprivate static final byteprivate static final byteprivate static final byteprivate static final bytestatic final Charsetprivate static final byte -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescription(package private) static intblobVarDecodedLength(int len) Calculate the expected BlobVar decoded length based on encoded length.static intblobVarEncodedLength(int len) Calculate the expected BlobVar encoded length based on unencoded length.static byte[]Decode a Blob value, byte-for-byte copy.static byte[]Decode a blob value that was encoded using BlobVar encoding.static floatDecode a 32-bit floating point value using the fixed-length encoding.static doubleDecode a 64-bit floating point value using the fixed-length encoding.static shortDecode anint16value.static intDecode anint32value.static longDecode anint64value.static byteDecode anint8value.static BigDecimalDecode aBigDecimalvalue from the variable-length encoding.static doubleDecode a primitivedoublevalue from the Numeric encoding.static longDecode a primitivelongvalue from the Numeric encoding.private static BigDecimalDecode aBigDecimalfromsrc.private static BigDecimaldecodeSignificand(PositionedByteRange src, int e, boolean comp) Read significand digits fromsrcaccording to the magnitude ofe.static StringDecode a String value.static intencodeBlobCopy(PositionedByteRange dst, byte[] val, int voff, int vlen, Order ord) Encode a Blob value as a byte-for-byte copy.static intencodeBlobCopy(PositionedByteRange dst, byte[] val, Order ord) Encode a Blob value as a byte-for-byte copy.static intencodeBlobVar(PositionedByteRange dst, byte[] val, int voff, int vlen, Order ord) Encode a Blob value using a modified varint encoding scheme.static intencodeBlobVar(PositionedByteRange dst, byte[] val, Order ord) Encode a blob value using a modified varint encoding scheme.static intencodeFloat32(PositionedByteRange dst, float val, Order ord) Encode a 32-bit floating point value using the fixed-length encoding.static intencodeFloat64(PositionedByteRange dst, double val, Order ord) Encode a 64-bit floating point value using the fixed-length encoding.static intencodeInt16(PositionedByteRange dst, short val, Order ord) Encode anint16value using the fixed-length encoding.static intencodeInt32(PositionedByteRange dst, int val, Order ord) Encode anint32value using the fixed-length encoding.static intencodeInt64(PositionedByteRange dst, long val, Order ord) Encode anint64value using the fixed-length encoding.static intencodeInt8(PositionedByteRange dst, byte val, Order ord) Encode anint8value using the fixed-length encoding.static intencodeNull(PositionedByteRange dst, Order ord) Encode a null value.static intencodeNumeric(PositionedByteRange dst, double val, Order ord) Encode a numerical value using the variable-length encoding.static intencodeNumeric(PositionedByteRange dst, long val, Order ord) Encode a numerical value using the variable-length encoding.static intencodeNumeric(PositionedByteRange dst, BigDecimal val, Order ord) Encode a numerical value using the variable-length encoding.private static intencodeNumericLarge(PositionedByteRange dst, BigDecimal val) Encode the large magnitude floating point numbervalusing the key encoding.private static intencodeNumericSmall(PositionedByteRange dst, BigDecimal val) Encode the small magnitude floating point numbervalusing the key encoding.static intencodeString(PositionedByteRange dst, String val, Order ord) Encode a String value.private static voidencodeToCentimal(PositionedByteRange dst, BigDecimal val) Encode a value val in [0.01, 1.0) into Centimals.(package private) static longgetVaruint64(PositionedByteRange src, boolean comp) Decode a sequence of bytes insrcas a varuint64.static booleanReturn true when the next encoded value insrcuses BlobCopy encoding, false otherwise.static booleanReturn true when the next encoded value insrcuses BlobVar encoding, false otherwise.static booleanReturns true whensrcappears to be positioned an encoded value, false otherwise.static booleanReturn true when the next encoded value insrcuses fixed-width Float32 encoding, false otherwise.static booleanReturn true when the next encoded value insrcuses fixed-width Float64 encoding, false otherwise.static booleanReturn true when the next encoded value insrcuses fixed-width Int16 encoding, false otherwise.static booleanReturn true when the next encoded value insrcuses fixed-width Int32 encoding, false otherwise.static booleanReturn true when the next encoded value insrcuses fixed-width Int64 encoding, false otherwise.static booleanReturn true when the next encoded value insrcuses fixed-width Int8 encoding, false otherwise.static booleanReturn true when the next encoded value insrcis null, false otherwise.static booleanReturn true when the next encoded value insrcuses Numeric encoding, false otherwise.static booleanReturn true when the next encoded value insrcuses Numeric encoding and isInfinite, false otherwise.static booleanReturn true when the next encoded value insrcuses Numeric encoding and isNaN, false otherwise.static booleanReturn true when the next encoded value insrcuses Numeric encoding and is0, false otherwise.static booleanReturn true when the next encoded value insrcuses Text encoding, false otherwise.static intlength(PositionedByteRange buff) Return the number of encoded entries remaining inbuff.(package private) static intlengthVaruint64(PositionedByteRange src, boolean comp) Inspectsrcfor an encoded varuint64 for its length in bytes.(package private) static BigDecimalnormalize(BigDecimal val) Strip all trailing zeros to ensure that no digit will be zero and round using our default context to ensure precision doesn't exceed max allowed.private static intputUint32(PositionedByteRange dst, int val) Write a 32-bit unsigned integer todstas 4 big-endian bytes.(package private) static intputVaruint64(PositionedByteRange dst, long val, boolean comp) Encode an unsigned 64-bit unsigned integervalintodst.static intskip(PositionedByteRange src) Skipbuff's position forward over one encoded value.private static intskipSignificand(PositionedByteRange src, boolean comp) Skipsrcover the significand bytes.(package private) static intskipVaruint64(PositionedByteRange src, boolean cmp) Skipsrcover the encoded varuint64.private static IllegalArgumentExceptionunexpectedHeader(byte header) Creates the standard exception when the encoded header byte is unexpected for the decoding context.private static intunsignedCmp(long x1, long x2) Perform unsigned comparison between two long values.
-
Field Details
-
NULL
- See Also:
-
NEG_INF
- See Also:
-
NEG_LARGE
- See Also:
-
NEG_MED_MIN
- See Also:
-
NEG_MED_MAX
- See Also:
-
NEG_SMALL
- See Also:
-
ZERO
- See Also:
-
POS_SMALL
- See Also:
-
POS_MED_MIN
- See Also:
-
POS_MED_MAX
- See Also:
-
POS_LARGE
- See Also:
-
POS_INF
- See Also:
-
NAN
- See Also:
-
FIXED_INT8
- See Also:
-
FIXED_INT16
- See Also:
-
FIXED_INT32
- See Also:
-
FIXED_INT64
- See Also:
-
FIXED_FLOAT32
- See Also:
-
FIXED_FLOAT64
- See Also:
-
TEXT
- See Also:
-
BLOB_VAR
- See Also:
-
BLOB_COPY
- See Also:
-
UTF8
-
TERM
- See Also:
-
MAX_PRECISION
Max precision guaranteed to fit into along.- See Also:
-
DEFAULT_MATH_CONTEXT
The context used to normalizeBigDecimalvalues.
-
-
Constructor Details
-
OrderedBytes
public OrderedBytes()
-
-
Method Details
-
unexpectedHeader
Creates the standard exception when the encoded header byte is unexpected for the decoding context.- Parameters:
header- value used in error message.
-
unsignedCmp
Perform unsigned comparison between two long values. Conforms to the same interface asCellComparator. -
putUint32
Write a 32-bit unsigned integer todstas 4 big-endian bytes.- Returns:
- number of bytes written.
-
putVaruint64
Encode an unsigned 64-bit unsigned integervalintodst.- Parameters:
dst- The destination to which encoded bytes are written.val- The value to write.comp- Compliment the encoded value whencompis true.- Returns:
- number of bytes written.
-
lengthVaruint64
Inspectsrcfor an encoded varuint64 for its length in bytes. Preserves the state ofsrc.- Parameters:
src- source buffercomp- if true, parse the compliment of the value.- Returns:
- the number of bytes consumed by this value.
-
skipVaruint64
Skipsrcover the encoded varuint64.- Parameters:
src- source buffercmp- if true, parse the compliment of the value.- Returns:
- the number of bytes skipped.
-
getVaruint64
Decode a sequence of bytes insrcas a varuint64. Compliment the encoded value whencompis true.- Returns:
- the decoded value.
-
normalize
Strip all trailing zeros to ensure that no digit will be zero and round using our default context to ensure precision doesn't exceed max allowed. From Phoenix'sNumberUtil.- Returns:
- new
BigDecimalinstance
-
decodeSignificand
Read significand digits fromsrcaccording to the magnitude ofe.- Parameters:
src- The source from which to read encoded digits.e- The magnitude of the first digit read.comp- Treat encoded bytes as compliments whencompis true.- Returns:
- The decoded value.
- Throws:
IllegalArgumentException- when read exceeds the remaining length ofsrc.
-
skipSignificand
Skipsrcover the significand bytes.- Parameters:
src- The source from which to read encoded digits.comp- Treat encoded bytes as compliments whencompis true.- Returns:
- the number of bytes skipped.
-
encodeNumericSmall
Encode the small magnitude floating point number
valusing the key encoding. The caller guarantees that 1.0 > abs(val) > 0.0.A floating point value is encoded as an integer exponent
Eand a mantissaM. The original value is equal to(M * 100^E).Eis set to the smallest value possible without makingMgreater than or equal to 1.0.For this routine,
Ewill always be zero or negative, since the original value is less than one. The encoding written by this routine is the ones-complement of the varint of the negative ofEfollowed by the mantissa:Encoding: ~-E M
- Parameters:
dst- The destination to which encoded digits are written.val- The value to encode.- Returns:
- the number of bytes written.
-
encodeNumericLarge
Encode the large magnitude floating point numbervalusing the key encoding. The caller guarantees thatvalwill be finite and abs(val) >= 1.0.A floating point value is encoded as an integer exponent
Eand a mantissaM. The original value is equal to(M * 100^E).Eis set to the smallest value possible without makingMgreater than or equal to 1.0.Each centimal digit of the mantissa is stored in a byte. If the value of the centimal digit is
X(henceX>=0andX<=99) then the byte value will be2*X+1for every byte of the mantissa, except for the last byte which will be2*X+0. The mantissa must be the minimum number of bytes necessary to represent the value; trailingX==0digits are omitted. This means that the mantissa will never contain a byte with the value0x00.If
E > 10, then this routine writes ofEas a varint followed by the mantissa as described above. Otherwise, ifE <= 10, this routine only writes the mantissa and leaves theEvalue to be encoded as part of the opening byte of the field by the calling function.Encoding: M (if E<=10) E M (if E>10)- Parameters:
dst- The destination to which encoded digits are written.val- The value to encode.- Returns:
- the number of bytes written.
-
encodeToCentimal
Encode a value val in [0.01, 1.0) into Centimals. Util function forencodeNumericLarge(PositionedByteRange, BigDecimal)andencodeNumericSmall(PositionedByteRange, BigDecimal)- Parameters:
dst- The destination to which encoded digits are written.val- A BigDecimal after the normalization. The value must be in [0.01, 1.0).
-
encodeNumeric
Encode a numerical value using the variable-length encoding.- Parameters:
dst- The destination to which encoded digits are written.val- The value to encode.ord- TheOrderto respect while encodingval.- Returns:
- the number of bytes written.
-
encodeNumeric
Encode a numerical value using the variable-length encoding.- Parameters:
dst- The destination to which encoded digits are written.val- The value to encode.ord- TheOrderto respect while encodingval.- Returns:
- the number of bytes written.
-
encodeNumeric
Encode a numerical value using the variable-length encoding. If the number of significant digits of the value exceeds theMAX_PRECISION, the exceeding part will be lost.- Parameters:
dst- The destination to which encoded digits are written.val- The value to encode.ord- TheOrderto respect while encodingval.- Returns:
- the number of bytes written.
-
decodeNumericValue
Decode aBigDecimalfromsrc. Assumessrcencodes a value in Numeric encoding and is within the valid range ofBigDecimalvalues.BigDecimaldoes not supportNaNorInfintevalues. -
decodeNumericAsDouble
Decode a primitivedoublevalue from the Numeric encoding. Numeric encoding is based onBigDecimal; in the event the encoded value is larger than can be represented in adouble, this method performs an implicit narrowing conversion as described inBigDecimal.doubleValue().- Throws:
NullPointerException- when the encoded value isNULL.IllegalArgumentException- when the encoded value is not a Numeric.- See Also:
-
decodeNumericAsLong
Decode a primitivelongvalue from the Numeric encoding. Numeric encoding is based onBigDecimal; in the event the encoded value is larger than can be represented in along, this method performs an implicit narrowing conversion as described inBigDecimal.doubleValue().- Throws:
NullPointerException- when the encoded value isNULL.IllegalArgumentException- when the encoded value is not a Numeric.- See Also:
-
decodeNumericAsBigDecimal
Decode aBigDecimalvalue from the variable-length encoding.- Throws:
IllegalArgumentException- when the encoded value is not a Numeric.- See Also:
-
encodeString
Encode a String value. String encoding is 0x00-terminated and so it does not support
-