Class FSDataInputStreamWrapper

java.lang.Object
org.apache.hadoop.hbase.io.FSDataInputStreamWrapper
All Implemented Interfaces:
Closeable, AutoCloseable

@Private public class FSDataInputStreamWrapper extends Object implements Closeable
Wrapper for input stream(s) that takes care of the interaction of FS and HBase checksums, as well as closing streams. Initialization is not thread-safe, but normal operation is; see method comments.
  • Field Details

    • hfs

      private final HFileSystem hfs
    • path

      private final org.apache.hadoop.fs.Path path
    • doCloseStreams

      private final boolean doCloseStreams
    • dropBehind

      private final boolean dropBehind
    • readahead

      private final long readahead
    • stream

      private volatile org.apache.hadoop.fs.FSDataInputStream stream
      Two stream handles, one with and one without FS-level checksum. HDFS checksum setting is on FS level, not single read level, so you have to keep two FS objects and two handles open to interleave different reads freely, which is very sad. This is what we do: 1) First, we need to read the trailer of HFile to determine checksum parameters. We always use FS checksum to do that, so ctor opens stream. 2.1) After that, if HBase checksum is not used, we'd just always use stream; 2.2) If HBase checksum can be used, we'll open streamNoFsChecksum, and close stream. User MUST call prepareForBlockReader for that to happen; if they don't, (2.1) will be the default. 3) The users can call shouldUseHBaseChecksum(), and pass its result to getStream(boolean) to get stream (if Java had out/pointer params we could return both in one call). This stream is guaranteed to be set. 4) The first time HBase checksum fails, one would call fallbackToFsChecksum(int). That will take lock, and open stream. While this is going on, others will continue to use the old stream; if they also want to fall back, they'll also call fallbackToFsChecksum(int), and block until stream is set. 5) After some number of checksumOk() calls, we will go back to using HBase checksum. We will have 2 handles; however we presume checksums fail so rarely that we don't care.
    • streamNoFsChecksum

      private volatile org.apache.hadoop.fs.FSDataInputStream streamNoFsChecksum
    • streamNoFsChecksumFirstCreateLock

    • useHBaseChecksumConfigured

      private boolean useHBaseChecksumConfigured
    • useHBaseChecksum

      private volatile boolean useHBaseChecksum
    • hbaseChecksumOffCount

    • readStatistics

    • readerPath

      protected org.apache.hadoop.fs.Path readerPath
  • Constructor Details

  • Method Details

    • setStreamOptions

      private void setStreamOptions(org.apache.hadoop.fs.FSDataInputStream in)
    • prepareForBlockReader

      public void prepareForBlockReader(boolean forceNoHBaseChecksum) throws IOException
      Prepares the streams for block reader. NOT THREAD SAFE. Must be called once, after any reads finish and before any other reads start (what happens in reality is we read the tail, then call this based on what's in the tail, then read blocks).
      Parameters:
      forceNoHBaseChecksum - Force not using HBase checksum.
      Throws:
      IOException
    • shouldUseHBaseChecksum

      public boolean shouldUseHBaseChecksum()
      Returns Whether we are presently using HBase checksum.
    • getStream

      public org.apache.hadoop.fs.FSDataInputStream getStream(boolean useHBaseChecksum)
      Get the stream to use. Thread-safe.
      Parameters:
      useHBaseChecksum - must be the value that shouldUseHBaseChecksum has returned at some point in the past, otherwise the result is undefined.
    • fallbackToFsChecksum

      public org.apache.hadoop.fs.FSDataInputStream fallbackToFsChecksum(int offCount) throws IOException
      Read from non-checksum stream failed, fall back to FS checksum. Thread-safe.
      Parameters:
      offCount - For how many checksumOk calls to turn off the HBase checksum.
      Throws:
      IOException
    • checksumOk

      public void checksumOk()
      Report that checksum was ok, so we may ponder going back to HBase checksum.
    • updateInputStreamStatistics

      private void updateInputStreamStatistics(org.apache.hadoop.fs.FSDataInputStream stream)
    • getTotalBytesRead

      public static long getTotalBytesRead()
    • getLocalBytesRead

      public static long getLocalBytesRead()
    • getShortCircuitBytesRead

      public static long getShortCircuitBytesRead()
    • getZeroCopyBytesRead

      public static long getZeroCopyBytesRead()
    • close

      public void close()
      CloseClose stream(s) if necessary.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
    • getHfs

      public HFileSystem getHfs()
    • unbuffer

      public void unbuffer()
      This will free sockets and file descriptors held by the stream only when the stream implements org.apache.hadoop.fs.CanUnbuffer. NOT THREAD SAFE. Must be called only when all the clients using this stream to read the blocks have finished reading. If by chance the stream is unbuffered and there are clients still holding this stream for read then on next client read request a new socket will be opened by Datanode without client knowing about it and will serve its read request. Note: If this socket is idle for some time then the DataNode will close the socket and the socket will move into CLOSE_WAIT state and on the next client request on this stream, the current socket will be closed and a new socket will be opened to serve the requests.
    • getReaderPath

      public org.apache.hadoop.fs.Path getReaderPath()
    • setShouldUseHBaseChecksum