public class RegexSequenceRecordReader extends FileRecordReader implements SequenceRecordReader
Pattern
and Matcher
to do the splitting into groups
Example: Data in format "2016-01-01 23:59:59.001 1 DEBUG First entry message!"RegexSequenceRecordReader.LineErrorHandling
. Invalid
lines that don't match the provided regex can result in an exception (FailOnInvalid), can be skipped silently (SkipInvalid),
or skip invalid but log a warning (SkipInvalidWithWarning)Modifier and Type | Class and Description |
---|---|
static class |
RegexSequenceRecordReader.LineErrorHandling
Error handling mode: How should invalid lines (i.e., those that don't match the provided regex) be handled?
|
Modifier and Type | Field and Description |
---|---|
static Charset |
DEFAULT_CHARSET |
static RegexSequenceRecordReader.LineErrorHandling |
DEFAULT_ERROR_HANDLING |
static org.slf4j.Logger |
LOG |
static String |
SKIP_NUM_LINES |
appendLabel, conf, currentUri, labels, locationsIterator
inputSplit, listeners, streamCreatorFn
APPEND_LABEL, LABELS, NAME_SPACE
Constructor and Description |
---|
RegexSequenceRecordReader(String regex,
int skipNumLines) |
RegexSequenceRecordReader(String regex,
int skipNumLines,
Charset encoding,
RegexSequenceRecordReader.LineErrorHandling errorHandling) |
Modifier and Type | Method and Description |
---|---|
void |
initialize(Configuration conf,
InputSplit split)
Called once at initialization.
|
List<SequenceRecord> |
loadSequenceFromMetaData(List<RecordMetaData> recordMetaDatas)
Load multiple sequence records from the given a list of
RecordMetaData instances |
SequenceRecord |
loadSequenceFromMetaData(RecordMetaData recordMetaData)
Load a single sequence record from the given
RecordMetaData instanceNote: that for data that isn't splittable (i.e., text data that needs to be scanned/split), it is more efficient to load multiple records at once using SequenceRecordReader.loadSequenceFromMetaData(List) |
SequenceRecord |
nextSequence()
Similar to
SequenceRecordReader.sequenceRecord() , but returns a Record object, that may include metadata such as the source
of the data |
void |
reset()
Reset record reader iterator
|
List<List<Writable>> |
sequenceRecord()
Returns a sequence record.
|
List<List<Writable>> |
sequenceRecord(URI uri,
DataInputStream dataInputStream)
Load a sequence record from the given DataInputStream
Unlike
RecordReader.next() the internal state of the RecordReader is not modified
Implementations of this method should not close the DataInputStream |
close, doInitialize, getConf, getCurrentLabel, getLabel, getLabels, hasNext, initialize, loadFromMetaData, loadFromMetaData, next, next, nextRecord, record, resetSupported, setConf, setLabels
batchesSupported, getListeners, invokeListeners, setListeners, setListeners
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
batchesSupported, getLabels, getListeners, hasNext, initialize, loadFromMetaData, loadFromMetaData, next, next, nextRecord, record, resetSupported, setListeners, setListeners
getConf, setConf
public static final String SKIP_NUM_LINES
public static final Charset DEFAULT_CHARSET
public static final RegexSequenceRecordReader.LineErrorHandling DEFAULT_ERROR_HANDLING
public static final org.slf4j.Logger LOG
public RegexSequenceRecordReader(String regex, int skipNumLines)
public RegexSequenceRecordReader(String regex, int skipNumLines, Charset encoding, RegexSequenceRecordReader.LineErrorHandling errorHandling)
public void initialize(Configuration conf, InputSplit split) throws IOException, InterruptedException
RecordReader
initialize
in interface RecordReader
initialize
in class FileRecordReader
conf
- a configuration for initializationsplit
- the split that defines the range of records to readIOException
InterruptedException
public List<List<Writable>> sequenceRecord()
SequenceRecordReader
sequenceRecord
in interface SequenceRecordReader
public List<List<Writable>> sequenceRecord(URI uri, DataInputStream dataInputStream) throws IOException
SequenceRecordReader
RecordReader.next()
the internal state of the RecordReader is not modified
Implementations of this method should not close the DataInputStreamsequenceRecord
in interface SequenceRecordReader
IOException
- if error occurs during reading from the input streampublic void reset()
RecordReader
reset
in interface RecordReader
reset
in class FileRecordReader
public SequenceRecord nextSequence()
SequenceRecordReader
SequenceRecordReader.sequenceRecord()
, but returns a Record
object, that may include metadata such as the source
of the datanextSequence
in interface SequenceRecordReader
public SequenceRecord loadSequenceFromMetaData(RecordMetaData recordMetaData) throws IOException
SequenceRecordReader
RecordMetaData
instanceSequenceRecordReader.loadSequenceFromMetaData(List)
loadSequenceFromMetaData
in interface SequenceRecordReader
recordMetaData
- Metadata for the sequence record that we want to load fromIOException
- If I/O error occurs during loadingpublic List<SequenceRecord> loadSequenceFromMetaData(List<RecordMetaData> recordMetaDatas) throws IOException
SequenceRecordReader
RecordMetaData
instancesloadSequenceFromMetaData
in interface SequenceRecordReader
recordMetaDatas
- Metadata for the records that we want to load fromIOException
- If I/O error occurs during loadingCopyright © 2020. All rights reserved.