public class TestFanOutOneBlockAsyncDFSOutputHang extends AsyncFSTestBase
TestFanOutOneBlockAsyncDFSOutput
because we will send heartbeat to DN when there is no
out going packet, the timeout is controlled by
TestFanOutOneBlockAsyncDFSOutput.READ_TIMEOUT_MS
,which is 2 seconds, it will keep sending
package out and DN will respond immedately and then mess up the testing handler added by us. So
in this test class we use the default value for timeout which is 60 seconds and it is enough for
this test.Modifier and Type | Field and Description |
---|---|
private static Class<? extends org.apache.hbase.thirdparty.io.netty.channel.Channel> |
CHANNEL_CLASS |
static HBaseClassTestRule |
CLASS_RULE |
private static org.apache.hbase.thirdparty.io.netty.channel.EventLoopGroup |
EVENT_LOOP_GROUP |
private static org.apache.hadoop.hdfs.DistributedFileSystem |
FS |
private static org.slf4j.Logger |
LOG |
private static org.apache.hadoop.hbase.io.asyncfs.monitor.StreamSlowMonitor |
MONITOR |
org.junit.rules.TestName |
name |
private static org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput |
OUT |
CLUSTER, CLUSTER_TEST_DIR, UTIL
Constructor and Description |
---|
TestFanOutOneBlockAsyncDFSOutputHang() |
Modifier and Type | Method and Description |
---|---|
private static org.apache.hadoop.hdfs.MiniDFSCluster.DataNodeProperties |
findAndKillFirstDataNode(org.apache.hadoop.hdfs.protocol.DatanodeInfo firstDatanodeInfo) |
static void |
setUp() |
static void |
tearDown() |
void |
testFlushHangWhenOneDataNodeFailedBeforeOtherDataNodeAck()
This test is for HBASE-26679.
|
setupClusterTestDir, shutdownMiniDFSCluster, startMiniDFSCluster
public static final HBaseClassTestRule CLASS_RULE
private static final org.slf4j.Logger LOG
private static org.apache.hadoop.hdfs.DistributedFileSystem FS
private static org.apache.hbase.thirdparty.io.netty.channel.EventLoopGroup EVENT_LOOP_GROUP
private static Class<? extends org.apache.hbase.thirdparty.io.netty.channel.Channel> CHANNEL_CLASS
private static org.apache.hadoop.hbase.io.asyncfs.monitor.StreamSlowMonitor MONITOR
private static org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput OUT
public org.junit.rules.TestName name
public TestFanOutOneBlockAsyncDFSOutputHang()
public void testFlushHangWhenOneDataNodeFailedBeforeOtherDataNodeAck() throws Exception
This test is for HBASE-26679. Consider there are two dataNodes: dn1 and dn2,dn2 is a slow DN. The threads sequence before HBASE-26679 is: 1.We write some data toFanOutOneBlockAsyncDFSOutput
and then flush it, there are oneFanOutOneBlockAsyncDFSOutput.Callback
inFanOutOneBlockAsyncDFSOutput.waitingAckQueue
. 2.The ack from dn1 arrives firstly and triggers Netty to invokeFanOutOneBlockAsyncDFSOutput.completed(org.apache.hbase.thirdparty.io.netty.channel.Channel)
with dn1's channel, then inFanOutOneBlockAsyncDFSOutput.completed(org.apache.hbase.thirdparty.io.netty.channel.Channel)
, dn1's channel is removed fromFanOutOneBlockAsyncDFSOutput.Callback.unfinishedReplicas
. 3.But dn2 responds slowly, before dn2 sending ack,dn1 is shut down or have a exception, soFanOutOneBlockAsyncDFSOutput.failed(org.apache.hbase.thirdparty.io.netty.channel.Channel, java.util.function.Supplier<java.lang.Throwable>)
is triggered by Netty with dn1's channel, and because theFanOutOneBlockAsyncDFSOutput.Callback.unfinishedReplicas
does not contain dn1's channel,theFanOutOneBlockAsyncDFSOutput.Callback
is skipped inFanOutOneBlockAsyncDFSOutput.failed(org.apache.hbase.thirdparty.io.netty.channel.Channel, java.util.function.Supplier<java.lang.Throwable>)
method,andFanOutOneBlockAsyncDFSOutput.state
is set toFanOutOneBlockAsyncDFSOutput.State.BROKEN
,and dn1,dn2 are all closed at the end ofFanOutOneBlockAsyncDFSOutput.failed(org.apache.hbase.thirdparty.io.netty.channel.Channel, java.util.function.Supplier<java.lang.Throwable>)
. 4.FanOutOneBlockAsyncDFSOutput.failed(org.apache.hbase.thirdparty.io.netty.channel.Channel, java.util.function.Supplier<java.lang.Throwable>)
is triggered again by dn2 because it is closed, but becauseFanOutOneBlockAsyncDFSOutput.state
is alreadyFanOutOneBlockAsyncDFSOutput.State.BROKEN
,the wholeFanOutOneBlockAsyncDFSOutput.failed(org.apache.hbase.thirdparty.io.netty.channel.Channel, java.util.function.Supplier<java.lang.Throwable>)
is skipped. So wait on the future returned byFanOutOneBlockAsyncDFSOutput.flush(boolean)
would be stuck for ever. After HBASE-26679, for above step 4,even if theFanOutOneBlockAsyncDFSOutput.state
is alreadyFanOutOneBlockAsyncDFSOutput.State.BROKEN
, we would still try to triggerFanOutOneBlockAsyncDFSOutput.Callback.future
.
Exception
private static org.apache.hadoop.hdfs.MiniDFSCluster.DataNodeProperties findAndKillFirstDataNode(org.apache.hadoop.hdfs.protocol.DatanodeInfo firstDatanodeInfo)
Copyright © 2007–2020 The Apache Software Foundation. All rights reserved.