public class TestFanOutOneBlockAsyncDFSOutputHang extends AsyncFSTestBase
TestFanOutOneBlockAsyncDFSOutput because we will send heartbeat to DN when there is no
out going packet, the timeout is controlled by
TestFanOutOneBlockAsyncDFSOutput.READ_TIMEOUT_MS,which is 2 seconds, it will keep sending
package out and DN will respond immedately and then mess up the testing handler added by us. So
in this test class we use the default value for timeout which is 60 seconds and it is enough for
this test.| Modifier and Type | Field and Description |
|---|---|
private static Class<? extends org.apache.hbase.thirdparty.io.netty.channel.Channel> |
CHANNEL_CLASS |
static HBaseClassTestRule |
CLASS_RULE |
private static org.apache.hbase.thirdparty.io.netty.channel.EventLoopGroup |
EVENT_LOOP_GROUP |
private static org.apache.hadoop.hdfs.DistributedFileSystem |
FS |
private static org.slf4j.Logger |
LOG |
private static org.apache.hadoop.hbase.io.asyncfs.monitor.StreamSlowMonitor |
MONITOR |
org.junit.rules.TestName |
name |
private static org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput |
OUT |
CLUSTER, CLUSTER_TEST_DIR, UTIL| Constructor and Description |
|---|
TestFanOutOneBlockAsyncDFSOutputHang() |
| Modifier and Type | Method and Description |
|---|---|
private static org.apache.hadoop.hdfs.MiniDFSCluster.DataNodeProperties |
findAndKillFirstDataNode(org.apache.hadoop.hdfs.protocol.DatanodeInfo firstDatanodeInfo) |
static void |
setUp() |
static void |
tearDown() |
void |
testFlushHangWhenOneDataNodeFailedBeforeOtherDataNodeAck()
This test is for HBASE-26679.
|
setupClusterTestDir, shutdownMiniDFSCluster, startMiniDFSClusterpublic static final HBaseClassTestRule CLASS_RULE
private static final org.slf4j.Logger LOG
private static org.apache.hadoop.hdfs.DistributedFileSystem FS
private static org.apache.hbase.thirdparty.io.netty.channel.EventLoopGroup EVENT_LOOP_GROUP
private static Class<? extends org.apache.hbase.thirdparty.io.netty.channel.Channel> CHANNEL_CLASS
private static org.apache.hadoop.hbase.io.asyncfs.monitor.StreamSlowMonitor MONITOR
private static org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput OUT
public org.junit.rules.TestName name
public TestFanOutOneBlockAsyncDFSOutputHang()
public void testFlushHangWhenOneDataNodeFailedBeforeOtherDataNodeAck() throws Exception
This test is for HBASE-26679. Consider there are two dataNodes: dn1 and dn2,dn2 is a slow DN. The threads sequence before HBASE-26679 is: 1.We write some data toFanOutOneBlockAsyncDFSOutputand then flush it, there are oneFanOutOneBlockAsyncDFSOutput.CallbackinFanOutOneBlockAsyncDFSOutput.waitingAckQueue. 2.The ack from dn1 arrives firstly and triggers Netty to invokeFanOutOneBlockAsyncDFSOutput.completed(org.apache.hbase.thirdparty.io.netty.channel.Channel)with dn1's channel, then inFanOutOneBlockAsyncDFSOutput.completed(org.apache.hbase.thirdparty.io.netty.channel.Channel), dn1's channel is removed fromFanOutOneBlockAsyncDFSOutput.Callback.unfinishedReplicas. 3.But dn2 responds slowly, before dn2 sending ack,dn1 is shut down or have a exception, soFanOutOneBlockAsyncDFSOutput.failed(org.apache.hbase.thirdparty.io.netty.channel.Channel, java.util.function.Supplier<java.lang.Throwable>)is triggered by Netty with dn1's channel, and because theFanOutOneBlockAsyncDFSOutput.Callback.unfinishedReplicasdoes not contain dn1's channel,theFanOutOneBlockAsyncDFSOutput.Callbackis skipped inFanOutOneBlockAsyncDFSOutput.failed(org.apache.hbase.thirdparty.io.netty.channel.Channel, java.util.function.Supplier<java.lang.Throwable>)method,andFanOutOneBlockAsyncDFSOutput.stateis set toFanOutOneBlockAsyncDFSOutput.State.BROKEN,and dn1,dn2 are all closed at the end ofFanOutOneBlockAsyncDFSOutput.failed(org.apache.hbase.thirdparty.io.netty.channel.Channel, java.util.function.Supplier<java.lang.Throwable>). 4.FanOutOneBlockAsyncDFSOutput.failed(org.apache.hbase.thirdparty.io.netty.channel.Channel, java.util.function.Supplier<java.lang.Throwable>)is triggered again by dn2 because it is closed, but becauseFanOutOneBlockAsyncDFSOutput.stateis alreadyFanOutOneBlockAsyncDFSOutput.State.BROKEN,the wholeFanOutOneBlockAsyncDFSOutput.failed(org.apache.hbase.thirdparty.io.netty.channel.Channel, java.util.function.Supplier<java.lang.Throwable>)is skipped. So wait on the future returned byFanOutOneBlockAsyncDFSOutput.flush(boolean)would be stuck for ever. After HBASE-26679, for above step 4,even if theFanOutOneBlockAsyncDFSOutput.stateis alreadyFanOutOneBlockAsyncDFSOutput.State.BROKEN, we would still try to triggerFanOutOneBlockAsyncDFSOutput.Callback.future.
Exceptionprivate static org.apache.hadoop.hdfs.MiniDFSCluster.DataNodeProperties findAndKillFirstDataNode(org.apache.hadoop.hdfs.protocol.DatanodeInfo firstDatanodeInfo)
Copyright © 2007–2020 The Apache Software Foundation. All rights reserved.