public class IntegrationTestBigLinkedList extends IntegrationTestBase
This is an integration test borrowed from goraci, written by Keith Turner, which is in turn inspired by the Accumulo test called continous ingest (ci). The original source code can be found here:
Apache Accumulo [0] has a simple test suite that verifies that data is not lost at scale. This test suite is called continuous ingest. This test runs many ingest clients that continually create linked lists containing 25 million nodes. At some point the clients are stopped and a map reduce job is run to ensure no linked list has a hole. A hole indicates data was lost.
The nodes in the linked list are random. This causes each linked list to spread across the table. Therefore if one part of a table loses data, then it will be detected by references in another part of the table.
The key is that nodes only reference flushed nodes. Therefore a node should never reference a missing node, even if the ingest client is killed at any point in time.
When running this test suite w/ Accumulo there is a script running in parallel called the Aggitator that randomly and continuously kills server processes. The outcome was that many data loss bugs were found in Accumulo by doing this. This test suite can also help find bugs that impact uptime and stability when run for days or weeks.
This test suite consists the following
When generating data, its best to have each map task generate a multiple of 25 million. The reason for this is that circular linked list are generated every 25M. Not generating a multiple in 25M will result in some nodes in the linked list not having references. The loss of an unreferenced node can not be detected.
Generator
- A map only job that generates data. As stated previously, its best to
generate data in multiples of 25M. An option is also available to allow concurrent walkers to
select and walk random flushed loops during this phase.Verify
- A map reduce job that looks for holes. Look at the counts after running.
REFERENCED
and UNREFERENCED
are ok, any UNDEFINED
counts are bad. Do not
run at the same time as the Generator.Walker
- A standalone program that start following a linked list and emits timing
info.Print
- A standalone program that prints nodes in the linked listDelete
- A standalone program that deletes a single nodeex:
./hbase org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList loop 2 1 100000 /temp 1 1000 50 1 0
Modifier and Type | Class and Description |
---|---|
(package private) static class |
IntegrationTestBigLinkedList.CINode |
private static class |
IntegrationTestBigLinkedList.Clean |
private static class |
IntegrationTestBigLinkedList.Delete
A stand alone program that deletes a single node.
|
(package private) static class |
IntegrationTestBigLinkedList.Generator
A Map only job that generates random linked list and stores them.
|
(package private) static class |
IntegrationTestBigLinkedList.Loop
Executes Generate and Verify in a loop.
|
private static class |
IntegrationTestBigLinkedList.Print
A stand alone program that prints out portions of a list created by
IntegrationTestBigLinkedList.Generator |
(package private) static class |
IntegrationTestBigLinkedList.Search
Tool to search missing rows in WALs and hfiles.
|
(package private) static class |
IntegrationTestBigLinkedList.Verify
A Map Reduce job that verifies that the linked lists generated by
IntegrationTestBigLinkedList.Generator do not have
any holes. |
private static class |
IntegrationTestBigLinkedList.Walker
A stand alone program that follows a linked list created by
IntegrationTestBigLinkedList.Generator and prints timing
info. |
(package private) static class |
IntegrationTestBigLinkedList.WalkerBase |
Modifier and Type | Field and Description |
---|---|
private static byte[] |
BIG_FAMILY_NAME |
protected static byte[] |
COLUMN_CLIENT |
protected static byte[] |
COLUMN_COUNT |
protected static byte[] |
COLUMN_PREV |
private static int |
CONCURRENT_WALKER_DEFAULT |
private static String |
CONCURRENT_WALKER_KEY |
protected static String |
DEFAULT_TABLE_NAME |
protected static byte[] |
FAMILY_NAME |
private static String |
GENERATOR_NUM_MAPPERS_KEY |
private static String |
GENERATOR_NUM_ROWS_PER_MAP_KEY
How many rows to write per map task.
|
private static String |
GENERATOR_WIDTH_KEY |
private static String |
GENERATOR_WRAP_KEY |
private static int |
MISSING_ROWS_TO_LOG |
protected static byte[] |
NO_KEY |
protected int |
NUM_SLAVES_BASE |
protected String[] |
otherArgs |
private static int |
ROWKEY_LENGTH |
protected static String |
TABLE_NAME_KEY |
private static byte[] |
TINY_FAMILY_NAME |
protected String |
toRun |
private static int |
WIDTH_DEFAULT |
private static int |
WRAP_DEFAULT
The 'wrap multipler' default.
|
CHAOS_MONKEY_PROPS, monkey, MONKEY_LONG_OPT, monkeyProps, monkeyToUse, NO_CLUSTER_CLEANUP_LONG_OPT, noClusterCleanUp, util
Constructor and Description |
---|
IntegrationTestBigLinkedList() |
Modifier and Type | Method and Description |
---|---|
void |
cleanUpCluster() |
private static IntegrationTestBigLinkedList.CINode |
getCINode(org.apache.hadoop.hbase.client.Result result,
IntegrationTestBigLinkedList.CINode node) |
protected Set<String> |
getColumnFamilies()
Provides the name of the CFs that are protected from random Chaos monkey activity (alter)
|
org.apache.hadoop.hbase.TableName |
getTablename()
Provides the name of the table that is protected from random Chaos monkey activity
|
(package private) static org.apache.hadoop.hbase.TableName |
getTableName(org.apache.hadoop.conf.Configuration conf) |
private static boolean |
isMultiUnevenColumnFamilies(org.apache.hadoop.conf.Configuration conf) |
static void |
main(String[] args) |
private void |
printCommands() |
protected void |
processOptions(org.apache.hbase.thirdparty.org.apache.commons.cli.CommandLine cmd) |
int |
runTestFromCommandLine() |
private static void |
setJobConf(org.apache.hadoop.mapreduce.Job job,
int numMappers,
long numNodes,
Integer width,
Integer wrapMultiplier,
Integer numWalkers) |
static void |
setJobScannerConf(org.apache.hadoop.mapreduce.Job job) |
void |
setUpCluster() |
void |
testContinuousIngest() |
private void |
usage() |
addOptions, cleanUp, cleanUpMonkey, cleanUpMonkey, doWork, getConf, getDefaultMonkeyFactory, getTestingUtil, loadMonkeyProperties, processBaseOptions, setUp, setUpMonkey, startMonkey
addOption, addOptNoArg, addOptNoArg, addOptWithArg, addOptWithArg, addRequiredOption, addRequiredOptWithArg, addRequiredOptWithArg, doStaticMain, getOptionAsDouble, getOptionAsInt, getOptionAsInt, getOptionAsLong, getOptionAsLong, newParser, parseArgs, parseInt, parseLong, printUsage, printUsage, processOldArgs, run, setConf
protected static final byte[] NO_KEY
protected static String TABLE_NAME_KEY
protected static String DEFAULT_TABLE_NAME
protected static byte[] FAMILY_NAME
private static byte[] BIG_FAMILY_NAME
private static byte[] TINY_FAMILY_NAME
protected static final byte[] COLUMN_PREV
protected static final byte[] COLUMN_CLIENT
protected static final byte[] COLUMN_COUNT
private static final String GENERATOR_NUM_ROWS_PER_MAP_KEY
private static final String GENERATOR_NUM_MAPPERS_KEY
private static final String GENERATOR_WIDTH_KEY
private static final String GENERATOR_WRAP_KEY
private static final String CONCURRENT_WALKER_KEY
protected int NUM_SLAVES_BASE
private static final int MISSING_ROWS_TO_LOG
private static final int WIDTH_DEFAULT
private static final int WRAP_DEFAULT
private static final int ROWKEY_LENGTH
private static final int CONCURRENT_WALKER_DEFAULT
public IntegrationTestBigLinkedList()
static org.apache.hadoop.hbase.TableName getTableName(org.apache.hadoop.conf.Configuration conf)
private static IntegrationTestBigLinkedList.CINode getCINode(org.apache.hadoop.hbase.client.Result result, IntegrationTestBigLinkedList.CINode node)
public void setUpCluster() throws Exception
setUpCluster
in class IntegrationTestBase
Exception
public void cleanUpCluster() throws Exception
cleanUpCluster
in class IntegrationTestBase
Exception
private static boolean isMultiUnevenColumnFamilies(org.apache.hadoop.conf.Configuration conf)
public void testContinuousIngest() throws IOException, Exception
IOException
Exception
private void usage()
private void printCommands()
protected void processOptions(org.apache.hbase.thirdparty.org.apache.commons.cli.CommandLine cmd)
processOptions
in class IntegrationTestBase
public int runTestFromCommandLine() throws Exception
runTestFromCommandLine
in class IntegrationTestBase
Exception
public org.apache.hadoop.hbase.TableName getTablename()
IntegrationTestBase
getTablename
in class IntegrationTestBase
protected Set<String> getColumnFamilies()
IntegrationTestBase
getColumnFamilies
in class IntegrationTestBase
private static void setJobConf(org.apache.hadoop.mapreduce.Job job, int numMappers, long numNodes, Integer width, Integer wrapMultiplier, Integer numWalkers)
public static void setJobScannerConf(org.apache.hadoop.mapreduce.Job job)
Copyright © 2007–2020 The Apache Software Foundation. All rights reserved.