public class WARCFileReader extends Object
WARCRecords from a WARC file, using Hadoop's filesystem APIs. (This means you can
read from HDFS, S3 or any other filesystem supported by Hadoop). This implementation is not tied
to the MapReduce APIs -- that link is provided by the mapred
com.martinkl.warc.mapred.WARCInputFormat and the mapreduce
com.martinkl.warc.mapreduce.WARCInputFormat.| Modifier and Type | Class and Description |
|---|---|
private class |
WARCFileReader.CountingInputStream |
| Modifier and Type | Field and Description |
|---|---|
private long |
bytesRead |
private WARCFileReader.CountingInputStream |
byteStream |
private DataInputStream |
dataStream |
private long |
fileSize |
private static org.slf4j.Logger |
logger |
private long |
recordsRead |
| Constructor and Description |
|---|
WARCFileReader(org.apache.hadoop.conf.Configuration conf,
org.apache.hadoop.fs.Path filePath)
Opens a file for reading.
|
| Modifier and Type | Method and Description |
|---|---|
void |
close()
Closes the file.
|
long |
getBytesRead()
Returns the number of bytes that have been read from file since it was opened.
|
float |
getProgress()
Returns the proportion of the file that has been read, as a number between 0.0 and 1.0.
|
long |
getRecordsRead()
Returns the number of records that have been read since the file was opened.
|
WARCRecord |
read()
Reads the next record from the file.
|
private static final org.slf4j.Logger logger
private final long fileSize
private WARCFileReader.CountingInputStream byteStream
private DataInputStream dataStream
private long bytesRead
private long recordsRead
public WARCFileReader(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path filePath) throws IOException
conf - The Hadoop configuration.filePath - The Hadoop path to the file that should be read.IOExceptionpublic WARCRecord read() throws IOException
IOExceptionpublic void close() throws IOException
IOExceptionpublic long getRecordsRead()
public long getBytesRead()
public float getProgress()
Copyright © 2007–2020 The Apache Software Foundation. All rights reserved.