Class WARCRecord

java.lang.Object
org.apache.hadoop.hbase.test.util.warc.WARCRecord

public class WARCRecord extends Object
Immutable implementation of a record in a WARC file. You create a WARCRecord by parsing it out of a DataInput stream.

The file format is documented in the ISO Standard. In a nutshell, it's a textual format consisting of lines delimited by `\r\n`. Each record has the following structure:

  1. A line indicating the WARC version number, such as `WARC/1.0`.
  2. Several header lines (in key-value format, similar to HTTP or email headers), giving information about the record. The header is terminated by an empty line.
  3. A body consisting of raw bytes (the number of bytes is indicated in one of the headers).
  4. A final separator of `\r\n\r\n` before the next record starts.
There are various different types of records, as documented on WARCRecord.Header.getRecordType().