public class IntegrationTestBigLinkedList extends IntegrationTestBase
This is an integration test borrowed from goraci, written by Keith Turner, which is in turn inspired by the Accumulo test called continous ingest (ci). The original source code can be found here:
Apache Accumulo [0] has a simple test suite that verifies that data is not lost at scale. This test suite is called continuous ingest. This test runs many ingest clients that continually create linked lists containing 25 million nodes. At some point the clients are stopped and a map reduce job is run to ensure no linked list has a hole. A hole indicates data was lost.
The nodes in the linked list are random. This causes each linked list to spread across the table. Therefore if one part of a table loses data, then it will be detected by references in another part of the table.
The key is that nodes only reference flushed nodes. Therefore a node should never reference a missing node, even if the ingest client is killed at any point in time.
When running this test suite w/ Accumulo there is a script running in parallel called the Aggitator that randomly and continuously kills server processes. The outcome was that many data loss bugs were found in Accumulo by doing this. This test suite can also help find bugs that impact uptime and stability when run for days or weeks.
This test suite consists the following
When generating data, its best to have each map task generate a multiple of 25 million. The reason for this is that circular linked list are generated every 25M. Not generating a multiple in 25M will result in some nodes in the linked list not having references. The loss of an unreferenced node can not be detected.
Generator - A map only job that generates data. As stated previously, its best to
generate data in multiples of 25M. An option is also available to allow concurrent walkers to
select and walk random flushed loops during this phase.Verify - A map reduce job that looks for holes. Look at the counts after running.
REFERENCED and UNREFERENCED are ok, any UNDEFINED counts are bad. Do not
run at the same time as the Generator.Walker - A standalone program that start following a linked list and emits timing
info.Print - A standalone program that prints nodes in the linked listDelete - A standalone program that deletes a single nodeex:
./hbase org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList
loop 2 1 100000 /temp 1 1000 50 1 0
| Modifier and Type | Class and Description |
|---|---|
(package private) static class |
IntegrationTestBigLinkedList.CINode |
private static class |
IntegrationTestBigLinkedList.Clean |
private static class |
IntegrationTestBigLinkedList.Delete
A stand alone program that deletes a single node.
|
(package private) static class |
IntegrationTestBigLinkedList.Generator
A Map only job that generates random linked list and stores them.
|
(package private) static class |
IntegrationTestBigLinkedList.Loop
Executes Generate and Verify in a loop.
|
private static class |
IntegrationTestBigLinkedList.Print
A stand alone program that prints out portions of a list created by
IntegrationTestBigLinkedList.Generator |
(package private) static class |
IntegrationTestBigLinkedList.Search
Tool to search missing rows in WALs and hfiles.
|
(package private) static class |
IntegrationTestBigLinkedList.Verify
A Map Reduce job that verifies that the linked lists generated by
IntegrationTestBigLinkedList.Generator do not have
any holes. |
private static class |
IntegrationTestBigLinkedList.Walker
A stand alone program that follows a linked list created by
IntegrationTestBigLinkedList.Generator and prints timing
info. |
(package private) static class |
IntegrationTestBigLinkedList.WalkerBase |
| Modifier and Type | Field and Description |
|---|---|
private static byte[] |
BIG_FAMILY_NAME |
protected static byte[] |
COLUMN_CLIENT |
protected static byte[] |
COLUMN_COUNT |
protected static byte[] |
COLUMN_PREV |
private static int |
CONCURRENT_WALKER_DEFAULT |
private static String |
CONCURRENT_WALKER_KEY |
protected static String |
DEFAULT_TABLE_NAME |
protected static byte[] |
FAMILY_NAME |
private static String |
GENERATOR_NUM_MAPPERS_KEY |
private static String |
GENERATOR_NUM_ROWS_PER_MAP_KEY
How many rows to write per map task.
|
private static String |
GENERATOR_WIDTH_KEY |
private static String |
GENERATOR_WRAP_KEY |
private static int |
MISSING_ROWS_TO_LOG |
protected static byte[] |
NO_KEY |
protected int |
NUM_SLAVES_BASE |
protected String[] |
otherArgs |
private static int |
ROWKEY_LENGTH |
protected static String |
TABLE_NAME_KEY |
private static byte[] |
TINY_FAMILY_NAME |
protected String |
toRun |
private static int |
WIDTH_DEFAULT |
private static int |
WRAP_DEFAULT
The 'wrap multipler' default.
|
CHAOS_MONKEY_PROPS, monkey, MONKEY_LONG_OPT, monkeyProps, monkeyToUse, NO_CLUSTER_CLEANUP_LONG_OPT, noClusterCleanUp, util| Constructor and Description |
|---|
IntegrationTestBigLinkedList() |
| Modifier and Type | Method and Description |
|---|---|
void |
cleanUpCluster() |
private static IntegrationTestBigLinkedList.CINode |
getCINode(org.apache.hadoop.hbase.client.Result result,
IntegrationTestBigLinkedList.CINode node) |
protected Set<String> |
getColumnFamilies()
Provides the name of the CFs that are protected from random Chaos monkey activity (alter)
|
org.apache.hadoop.hbase.TableName |
getTablename()
Provides the name of the table that is protected from random Chaos monkey activity
|
(package private) static org.apache.hadoop.hbase.TableName |
getTableName(org.apache.hadoop.conf.Configuration conf) |
private static boolean |
isMultiUnevenColumnFamilies(org.apache.hadoop.conf.Configuration conf) |
static void |
main(String[] args) |
private void |
printCommands() |
protected void |
processOptions(org.apache.hbase.thirdparty.org.apache.commons.cli.CommandLine cmd) |
int |
runTestFromCommandLine() |
private static void |
setJobConf(org.apache.hadoop.mapreduce.Job job,
int numMappers,
long numNodes,
Integer width,
Integer wrapMultiplier,
Integer numWalkers) |
static void |
setJobScannerConf(org.apache.hadoop.mapreduce.Job job) |
void |
setUpCluster() |
void |
testContinuousIngest() |
private void |
usage() |
addOptions, cleanUp, cleanUpMonkey, cleanUpMonkey, doWork, getConf, getDefaultMonkeyFactory, getTestingUtil, loadMonkeyProperties, processBaseOptions, setUp, setUpMonkey, startMonkeyaddOption, addOptNoArg, addOptNoArg, addOptWithArg, addOptWithArg, addRequiredOption, addRequiredOptWithArg, addRequiredOptWithArg, doStaticMain, getOptionAsDouble, getOptionAsInt, getOptionAsInt, getOptionAsLong, getOptionAsLong, newParser, parseArgs, parseInt, parseLong, printUsage, printUsage, processOldArgs, run, setConfprotected static final byte[] NO_KEY
protected static String TABLE_NAME_KEY
protected static String DEFAULT_TABLE_NAME
protected static byte[] FAMILY_NAME
private static byte[] BIG_FAMILY_NAME
private static byte[] TINY_FAMILY_NAME
protected static final byte[] COLUMN_PREV
protected static final byte[] COLUMN_CLIENT
protected static final byte[] COLUMN_COUNT
private static final String GENERATOR_NUM_ROWS_PER_MAP_KEY
private static final String GENERATOR_NUM_MAPPERS_KEY
private static final String GENERATOR_WIDTH_KEY
private static final String GENERATOR_WRAP_KEY
private static final String CONCURRENT_WALKER_KEY
protected int NUM_SLAVES_BASE
private static final int MISSING_ROWS_TO_LOG
private static final int WIDTH_DEFAULT
private static final int WRAP_DEFAULT
private static final int ROWKEY_LENGTH
private static final int CONCURRENT_WALKER_DEFAULT
public IntegrationTestBigLinkedList()
static org.apache.hadoop.hbase.TableName getTableName(org.apache.hadoop.conf.Configuration conf)
private static IntegrationTestBigLinkedList.CINode getCINode(org.apache.hadoop.hbase.client.Result result, IntegrationTestBigLinkedList.CINode node)
public void setUpCluster() throws Exception
setUpCluster in class IntegrationTestBaseExceptionpublic void cleanUpCluster() throws Exception
cleanUpCluster in class IntegrationTestBaseExceptionprivate static boolean isMultiUnevenColumnFamilies(org.apache.hadoop.conf.Configuration conf)
public void testContinuousIngest() throws IOException, Exception
IOExceptionExceptionprivate void usage()
private void printCommands()
protected void processOptions(org.apache.hbase.thirdparty.org.apache.commons.cli.CommandLine cmd)
processOptions in class IntegrationTestBasepublic int runTestFromCommandLine() throws Exception
runTestFromCommandLine in class IntegrationTestBaseExceptionpublic org.apache.hadoop.hbase.TableName getTablename()
IntegrationTestBasegetTablename in class IntegrationTestBaseprotected Set<String> getColumnFamilies()
IntegrationTestBasegetColumnFamilies in class IntegrationTestBaseprivate static void setJobConf(org.apache.hadoop.mapreduce.Job job, int numMappers, long numNodes, Integer width, Integer wrapMultiplier, Integer numWalkers)
public static void setJobScannerConf(org.apache.hadoop.mapreduce.Job job)
Copyright © 2007–2020 The Apache Software Foundation. All rights reserved.