public class WARCFileReader extends Object
WARCRecord
s from a WARC file, using Hadoop's filesystem APIs. (This means you can
read from HDFS, S3 or any other filesystem supported by Hadoop). This implementation is not tied
to the MapReduce APIs -- that link is provided by the mapred
com.martinkl.warc.mapred.WARCInputFormat
and the mapreduce
com.martinkl.warc.mapreduce.WARCInputFormat
.Modifier and Type | Class and Description |
---|---|
private class |
WARCFileReader.CountingInputStream |
Modifier and Type | Field and Description |
---|---|
private long |
bytesRead |
private WARCFileReader.CountingInputStream |
byteStream |
private DataInputStream |
dataStream |
private long |
fileSize |
private static org.slf4j.Logger |
logger |
private long |
recordsRead |
Constructor and Description |
---|
WARCFileReader(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path filePath)
Opens a file for reading.
|
Modifier and Type | Method and Description |
---|---|
void |
close()
Closes the file.
|
long |
getBytesRead()
Returns the number of bytes that have been read from file since it was opened.
|
float |
getProgress()
Returns the proportion of the file that has been read, as a number between 0.0 and 1.0.
|
long |
getRecordsRead()
Returns the number of records that have been read since the file was opened.
|
WARCRecord |
read()
Reads the next record from the file.
|
private static final org.slf4j.Logger logger
private final long fileSize
private WARCFileReader.CountingInputStream byteStream
private DataInputStream dataStream
private long bytesRead
private long recordsRead
public WARCFileReader(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path filePath) throws IOException
conf
- The Hadoop configuration.filePath
- The Hadoop path to the file that should be read.IOException
public WARCRecord read() throws IOException
IOException
public void close() throws IOException
IOException
public long getRecordsRead()
public long getBytesRead()
public float getProgress()
Copyright © 2007–2020 The Apache Software Foundation. All rights reserved.