DataVec Operations

Usage

Operations, such as a Function, help execute transforms and load data into DataVec. The concept of operations is low-level, meaning that most of the time you will not need to worry about them.

Loading data into Spark

If you’re using Apache Spark, functions will iterate over the dataset and load it into a Spark RDD and convert the raw data format into a Writable.

import org.datavec.api.writable.Writable;
import org.datavec.api.records.reader.impl.csv.CSVRecordReader;
import org.datavec.spark.transform.misc.StringToWritablesFunction;

SparkConf conf = new SparkConf();
JavaSparkContext sc = new JavaSparkContext(conf)

String customerInfoPath = new ClassPathResource("CustomerInfo.csv").getFile().getPath();
JavaRDD<List<Writable>> customerInfo = sc.textFile(customerInfoPath).map(new StringToWritablesFunction(rr));

The above code loads a CSV file into a 2D java RDD. Once your RDD is loaded, you can transform it, perform joins and use reducers to wrangle the data any way you want.

Available ops


AggregableCheckingOp

[source]

Created by huitseeker on 5/8/17.


AggregableMultiOp

[source]

It is used to execute many reduction operations in parallel on the same column, datavec#238

Created by huitseeker on 5/8/17.


ByteWritableOp

[source]

supports a conversion to Byte.

Created by huitseeker on 5/14/17.


DispatchOp

[source]

Created by huitseeker on 5/14/17.


DispatchWithConditionOp

[source]

before dispatching the appropriate column of this element to its operation.

Created by huitseeker on 5/14/17.


DoubleWritableOp

[source]

supports a conversion to Double.

Created by huitseeker on 5/14/17.


FloatWritableOp

[source]

supports a conversion to Float.

Created by huitseeker on 5/14/17.


IntWritableOp

[source]

supports a conversion to Integer.

Created by huitseeker on 5/14/17.


LongWritableOp

[source]

supports a conversion to Long.

Created by huitseeker on 5/14/17.


StringWritableOp

[source]

supports a conversion to TextWritable. Created by huitseeker on 5/14/17.


CalculateSortedRank

[source]

CalculateSortedRank: calculate the rank of each example, after sorting example. For example, we might have some numerical “score” column, and we want to know for the rank (sort order) for each example, according to that column.
The rank of each example (after sorting) will be added in a new Long column. Indexing is done from 0; examples will have values 0 to dataSetSize - 1.

Currently, CalculateSortedRank can only be applied on standard (i.e., non-sequence) data. Furthermore, the current implementation can only sort on one column

transform
public Schema transform(Schema inputSchema) 
  • param newColumnName Name of the new column (will contain the rank for each example)
  • param sortOnColumn Name of the column to sort on
  • param comparator Comparator used to sort examples
outputColumnName
public String outputColumnName() 

The output column name after the operation has been applied

  • return the output column name
columnName
public String columnName() 

The output column names This will often be the same as the input

  • return the output column names

API Reference

API Reference

Detailed API docs for all libraries including DL4J, ND4J, DataVec, and Arbiter.

Examples

Examples

Explore sample projects and demos for DL4J, ND4J, and DataVec in multiple languages including Java and Kotlin.

Tutorials

Tutorials

Step-by-step tutorials for learning concepts in deep learning while using the DL4J API.

Guide

Guide

In-depth documentation on different scenarios including import, distributed training, early stopping, and GPU setup.

Deploying models? There's a tool for that.