public class RecordReaderDataSetIterator extends Object implements DataSetIterator
RecordReader
as input, and handles the conversion to ND4J
DataSet objects as well as producing minibatches from individual records.RecordReaderDataSetIterator.Builder
class is also available.
RecordReader rr = new ImageRecordReader(28,28,3); //28x28 RGB images
rr.initialize(new FileSplit(new File("/path/to/directory")));
DataSetIterator iter = new RecordReaderDataSetIterator.Builder(rr, 32)
//Label index (first arg): Always value 1 when using ImageRecordReader. For CSV etc: use index of the column
// that contains the label (should contain an integer value, 0 to nClasses-1 inclusive). Column indexes start
// at 0. Number of classes (second arg): number of label classes (i.e., 10 for MNIST - 10 digits)
.classification(1, nClasses)
.preProcessor(new ImagePreProcessingScaler()) //For normalization of image values 0-255 to 0-1
.build()
RecordReader rr = new CsvRecordReader(0, ','); //Skip 0 header lines, comma separated
rr.initialize(new FileSplit(new File("/path/to/myCsv.txt")));
DataSetIterator iter = new RecordReaderDataSetIterator.Builder(rr, 128)
//Specify the columns that the regression labels/targets appear in. Note that all other columns will be
// treated as features. Columns indexes start at 0
.regression(labelColFrom, labelColTo)
.build()
Modifier and Type | Class and Description |
---|---|
static class |
RecordReaderDataSetIterator.Builder
Builder class for RecordReaderDataSetIterator
|
Modifier and Type | Field and Description |
---|---|
protected int |
batchNum |
protected int |
batchSize |
protected WritableConverter |
converter |
protected int |
labelIndex |
protected int |
labelIndexTo |
protected DataSet |
last |
protected int |
maxNumBatches |
protected int |
numPossibleLabels |
protected DataSetPreProcessor |
preProcessor |
protected RecordReader |
recordReader |
protected boolean |
regression |
protected Iterator<List<Writable>> |
sequenceIter |
protected boolean |
useCurrent |
Modifier | Constructor and Description |
---|---|
protected |
RecordReaderDataSetIterator(RecordReaderDataSetIterator.Builder b) |
|
RecordReaderDataSetIterator(RecordReader recordReader,
int batchSize)
Constructor for classification, where:
(a) the label index is assumed to be the very last Writable/column, and (b) the number of classes is inferred from RecordReader.getLabels() Note that if RecordReader.getLabels() returns null, no output labels will be produced |
|
RecordReaderDataSetIterator(RecordReader recordReader,
int batchSize,
int labelIndex,
int numPossibleLabels)
Main constructor for classification.
|
|
RecordReaderDataSetIterator(RecordReader recordReader,
int batchSize,
int labelIndexFrom,
int labelIndexTo,
boolean regression)
Main constructor for multi-label regression (i.e., regression with multiple outputs).
|
|
RecordReaderDataSetIterator(RecordReader recordReader,
int batchSize,
int labelIndex,
int numPossibleLabels,
int maxNumBatches)
Constructor for classification, where the maximum number of returned batches is limited to the specified value
|
|
RecordReaderDataSetIterator(RecordReader recordReader,
WritableConverter converter,
int batchSize,
int labelIndexFrom,
int labelIndexTo,
int numPossibleLabels,
int maxNumBatches,
boolean regression)
Main constructor
|
Modifier and Type | Method and Description |
---|---|
boolean |
asyncSupported()
Does this DataSetIterator support asynchronous prefetching of multiple DataSet objects?
|
int |
batch()
Batch size
|
List<String> |
getLabels()
Get dataset iterator class labels, if any.
|
boolean |
hasNext() |
int |
inputColumns()
Input columns for the dataset
|
DataSet |
loadFromMetaData(List<RecordMetaData> list)
Load a multiple examples to a DataSet, using the provided RecordMetaData instances.
|
DataSet |
loadFromMetaData(RecordMetaData recordMetaData)
Load a single example to a DataSet, using the provided RecordMetaData.
|
DataSet |
next() |
DataSet |
next(int num)
Like the standard next method but allows a
customizable number of examples returned
|
void |
remove() |
void |
reset()
Resets the iterator back to the beginning
|
boolean |
resetSupported()
Is resetting supported by this DataSetIterator?
|
void |
setCollectMetaData(boolean collectMetaData)
When set to true: metadata for the current examples will be present in the returned DataSet.
|
void |
setPreProcessor(DataSetPreProcessor preProcessor)
Set a pre processor
|
int |
totalOutcomes()
The number of labels for the dataset
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getPreProcessor
forEachRemaining
protected RecordReader recordReader
protected WritableConverter converter
protected int batchSize
protected int maxNumBatches
protected int batchNum
protected int labelIndex
protected int labelIndexTo
protected int numPossibleLabels
protected DataSet last
protected boolean useCurrent
protected boolean regression
protected DataSetPreProcessor preProcessor
public RecordReaderDataSetIterator(RecordReader recordReader, int batchSize)
recordReader
- Record reader to use as the source of databatchSize
- Minibatch size, for each call of .next()public RecordReaderDataSetIterator(RecordReader recordReader, int batchSize, int labelIndex, int numPossibleLabels)
recordReader
- RecordReader: provides the source of the databatchSize
- Batch size (number of examples) for the output DataSet objectslabelIndex
- Index of the label Writable (usually an IntWritable), as obtained by recordReader.next()numPossibleLabels
- Number of classes (possible labels) for classificationpublic RecordReaderDataSetIterator(RecordReader recordReader, int batchSize, int labelIndex, int numPossibleLabels, int maxNumBatches)
recordReader
- the recordreader to uselabelIndex
- the index/column of the label (for classification)numPossibleLabels
- the number of possible labels for classification. Not used if regression == truemaxNumBatches
- The maximum number of batches to return between resets. Set to -1 to return all available datapublic RecordReaderDataSetIterator(RecordReader recordReader, int batchSize, int labelIndexFrom, int labelIndexTo, boolean regression)
recordReader
- RecordReader to get data fromlabelIndexFrom
- Index of the first regression targetlabelIndexTo
- Index of the last regression target, inclusivebatchSize
- Minibatch sizeregression
- Require regression = true. Mainly included to avoid clashing with other constructors previously defined :/public RecordReaderDataSetIterator(RecordReader recordReader, WritableConverter converter, int batchSize, int labelIndexFrom, int labelIndexTo, int numPossibleLabels, int maxNumBatches, boolean regression)
recordReader
- the recordreader to useconverter
- Converter. May be null.batchSize
- Minibatch size - number of examples returned for each call of .next()labelIndexFrom
- the index of the label (for classification), or the first index of the labels for multi-output regressionlabelIndexTo
- only used if regression == true. The last index inclusive of the multi-output regressionnumPossibleLabels
- the number of possible labels for classification. Not used if regression == truemaxNumBatches
- Maximum number of batches to returnregression
- if true: regression. If false: classification (assume labelIndexFrom is the class it belongs to)protected RecordReaderDataSetIterator(RecordReaderDataSetIterator.Builder b)
public void setCollectMetaData(boolean collectMetaData)
collectMetaData
- Whether to collect metadata or notpublic DataSet next(int num)
DataSetIterator
next
in interface DataSetIterator
num
- the number of examplespublic int inputColumns()
DataSetIterator
inputColumns
in interface DataSetIterator
public int totalOutcomes()
DataSetIterator
totalOutcomes
in interface DataSetIterator
public boolean resetSupported()
DataSetIterator
resetSupported
in interface DataSetIterator
public boolean asyncSupported()
DataSetIterator
asyncSupported
in interface DataSetIterator
public void reset()
DataSetIterator
reset
in interface DataSetIterator
public int batch()
DataSetIterator
batch
in interface DataSetIterator
public void setPreProcessor(DataSetPreProcessor preProcessor)
DataSetIterator
setPreProcessor
in interface DataSetIterator
preProcessor
- a pre processor to setpublic List<String> getLabels()
DataSetIterator
getLabels
in interface DataSetIterator
public DataSet loadFromMetaData(RecordMetaData recordMetaData) throws IOException
loadFromMetaData(List)
recordMetaData
- RecordMetaData to load from. Should have been produced by the given record readerIOException
- If an error occurs during loading of the datapublic DataSet loadFromMetaData(List<RecordMetaData> list) throws IOException
list
- List of RecordMetaData instances to load from. Should have been produced by the record reader provided
to the RecordReaderDataSetIterator constructorIOException
- If an error occurs during loading of the dataCopyright © 2020. All rights reserved.