Class TestFanOutOneBlockAsyncDFSOutputHang


Testcase for HBASE-26679, here we introduce a separate test class and not put the testcase in TestFanOutOneBlockAsyncDFSOutput because we will send heartbeat to DN when there is no out going packet, the timeout is controlled by TestFanOutOneBlockAsyncDFSOutput.READ_TIMEOUT_MS,which is 2 seconds, it will keep sending package out and DN will respond immedately and then mess up the testing handler added by us. So in this test class we use the default value for timeout which is 60 seconds and it is enough for this test.
  • Field Details


      public static final HBaseClassTestRule CLASS_RULE
    • LOG

      private static final org.slf4j.Logger LOG
    • FS

      private static org.apache.hadoop.hdfs.DistributedFileSystem FS

      private static EVENT_LOOP_GROUP

      private static Class<? extends> CHANNEL_CLASS

      private static MONITOR
    • OUT

      private static OUT
    • name

      public org.junit.rules.TestName name
  • Constructor Details

  • Method Details

    • setUp

      public static void setUp() throws Exception
    • tearDown

      public static void tearDown() throws Exception
    • testFlushHangWhenOneDataNodeFailedBeforeOtherDataNodeAck

       This test is for HBASE-26679. Consider there are two dataNodes: dn1 and dn2,dn2 is a slow DN.
       The threads sequence before HBASE-26679 is:
       1.We write some data to FanOutOneBlockAsyncDFSOutput and then flush it, there are one
         FanOutOneBlockAsyncDFSOutput.Callback in
       2.The ack from dn1 arrives firstly and triggers Netty to invoke
         FanOutOneBlockAsyncDFSOutput.completed( with dn1's channel, then in
         FanOutOneBlockAsyncDFSOutput.completed(, dn1's channel is removed from
       3.But dn2 responds slowly, before dn2 sending ack,dn1 is shut down or have a exception,
         so FanOutOneBlockAsyncDFSOutput.failed(, java.util.function.Supplier<java.lang.Throwable>) is triggered by Netty with dn1's channel,
         and because the FanOutOneBlockAsyncDFSOutput.Callback#unfinishedReplicas does not
         contain dn1's channel,the FanOutOneBlockAsyncDFSOutput.Callback is skipped in
         FanOutOneBlockAsyncDFSOutput.failed(, java.util.function.Supplier<java.lang.Throwable>) method,and
         FanOutOneBlockAsyncDFSOutput.state is set to
         FanOutOneBlockAsyncDFSOutput.State#BROKEN,and dn1,dn2 are all closed at the end of
         FanOutOneBlockAsyncDFSOutput.failed(, java.util.function.Supplier<java.lang.Throwable>).
       4.FanOutOneBlockAsyncDFSOutput.failed(, java.util.function.Supplier<java.lang.Throwable>) is triggered again by dn2 because it is closed,
         but because FanOutOneBlockAsyncDFSOutput.state is already
         FanOutOneBlockAsyncDFSOutput.State#BROKEN,the whole
         FanOutOneBlockAsyncDFSOutput.failed(, java.util.function.Supplier<java.lang.Throwable>) is skipped. So wait on the future
         returned by FanOutOneBlockAsyncDFSOutput.flush(boolean) would be stuck for ever.
       After HBASE-26679, for above step 4,even if the FanOutOneBlockAsyncDFSOutput.state
       is already FanOutOneBlockAsyncDFSOutput.State#BROKEN, we would still try to trigger
    • findAndKillFirstDataNode

      private static org.apache.hadoop.hdfs.MiniDFSCluster.DataNodeProperties findAndKillFirstDataNode(org.apache.hadoop.hdfs.protocol.DatanodeInfo firstDatanodeInfo)