@InterfaceAudience.Private public class ColumnSectionWriter extends Object
Takes the tokenized family or qualifier data and flattens it into a stream of bytes. The family section is written after the row section, and qualifier section after family section.
The family and qualifier tries, or "column tries", are structured differently than the row trie. The trie cannot be reassembled without external data about the offsets of the leaf nodes, and these external pointers are stored in the nubs and leaves of the row trie. For each cell in a row, the row trie contains a list of offsets into the column sections (along with pointers to timestamps and other per-cell fields). These offsets point to the last column node/token that comprises the column name. To assemble the column name, the trie is traversed in reverse (right to left), with the rightmost tokens pointing to the start of their "parent" node which is the node to the left.This choice was made to reduce the size of the column trie by storing the minimum amount of offset data. As a result, to find a specific qualifier within a row, you must do a binary search of the column nodes, reassembling each one as you search. Future versions of the PrefixTree might encode the columns in both a forward and reverse trie, which would convert binary searches into more efficient trie searches which would be beneficial for wide rows.
Modifier and Type | Field and Description |
---|---|
private ArrayList<TokenizerNode> |
allNodes |
private PrefixTreeBlockMeta |
blockMeta
fields
|
private ArrayList<ColumnNodeWriter> |
columnNodeWriters |
static int |
EXPECTED_NUBS_PLUS_LEAVES |
private ArrayList<TokenizerNode> |
leaves |
private ColumnNodeType |
nodeType |
private ArrayList<TokenizerNode> |
nonLeaves |
private int |
numBytes |
private List<Integer> |
outputArrayOffsets |
private Tokenizer |
tokenizer |
Constructor and Description |
---|
ColumnSectionWriter()
construct
|
ColumnSectionWriter(PrefixTreeBlockMeta blockMeta,
Tokenizer builder,
ColumnNodeType nodeType) |
Modifier and Type | Method and Description |
---|---|
ColumnSectionWriter |
compile()
methods
|
protected void |
compilerInternals() |
ArrayList<ColumnNodeWriter> |
getColumnNodeWriters()
get/set
|
ArrayList<TokenizerNode> |
getLeaves() |
ArrayList<TokenizerNode> |
getNonLeaves() |
int |
getNumBytes() |
int |
getOutputArrayOffset(int sortedIndex) |
void |
reconstruct(PrefixTreeBlockMeta blockMeta,
Tokenizer builder,
ColumnNodeType nodeType) |
void |
reset() |
void |
writeBytes(OutputStream os) |
public static final int EXPECTED_NUBS_PLUS_LEAVES
private PrefixTreeBlockMeta blockMeta
private ColumnNodeType nodeType
private Tokenizer tokenizer
private int numBytes
private ArrayList<TokenizerNode> nonLeaves
private ArrayList<TokenizerNode> leaves
private ArrayList<TokenizerNode> allNodes
private ArrayList<ColumnNodeWriter> columnNodeWriters
public ColumnSectionWriter()
public ColumnSectionWriter(PrefixTreeBlockMeta blockMeta, Tokenizer builder, ColumnNodeType nodeType)
public void reconstruct(PrefixTreeBlockMeta blockMeta, Tokenizer builder, ColumnNodeType nodeType)
public void reset()
public ColumnSectionWriter compile()
protected void compilerInternals()
public void writeBytes(OutputStream os) throws IOException
IOException
public ArrayList<ColumnNodeWriter> getColumnNodeWriters()
public int getNumBytes()
public int getOutputArrayOffset(int sortedIndex)
public ArrayList<TokenizerNode> getNonLeaves()
public ArrayList<TokenizerNode> getLeaves()
Copyright © 2007–2019 The Apache Software Foundation. All rights reserved.