DataVec Executors

Local or remote execution?

Because datasets are commonly large by nature, you can decide on an execution mechanism that best suits your needs. For example, if you are vectorizing a large training dataset, you can process it in a distributed Spark cluster. However, if you need to do real-time inference, DataVec also provides a local executor that doesn’t require any additional setup.

Executing a transform process

Once you’ve created your TransformProcess using your Schema, and you’ve either loaded your dataset into a Apache Spark JavaRDD or have a RecordReader that load your dataset, you can execute a transform.

Locally this looks like:

import org.datavec.local.transforms.LocalTransformExecutor;

List<List<Writable>> transformed = LocalTransformExecutor.execute(recordReader, transformProcess)

List<List<List<Writable>>> transformedSeq = LocalTransformExecutor.executeToSequence(sequenceReader, transformProcess)

List<List<Writable>> joined = LocalTransformExecutor.executeJoin(join, leftReader, rightReader)

When using Spark this looks like:

import org.datavec.spark.transforms.SparkTransformExecutor;

JavaRDD<List<Writable>> transformed = SparkTransformExecutor.execute(inputRdd, transformProcess)

JavaRDD<List<List<Writable>>> transformedSeq = SparkTransformExecutor.executeToSequence(inputSequenceRdd, transformProcess)

JavaRDD<List<Writable>> joined = SparkTransformExecutor.executeJoin(join, leftRdd, rightRdd)

Available executors


LocalTransformExecutor

[source]

Local transform executor

isTryCatch
public static boolean isTryCatch() 

Execute the specified TransformProcess with the given input data
Note: this method can only be used if the TransformProcess returns non-sequence data. For TransformProcesses that return a sequence, use {- link #executeToSequence(List, TransformProcess)}

  • param inputWritables Input data to process
  • param transformProcess TransformProcess to execute
  • return Processed data

SparkTransformExecutor

[source]

Execute a datavec transform process on spark rdds.

isTryCatch
public static boolean isTryCatch() 
  • deprecated Use static methods instead of instance methods on SparkTransformExecutor

API Reference

API Reference

Detailed API docs for all libraries including DL4J, ND4J, DataVec, and Arbiter.

Examples

Examples

Explore sample projects and demos for DL4J, ND4J, and DataVec in multiple languages including Java and Kotlin.

Tutorials

Tutorials

Step-by-step tutorials for learning concepts in deep learning while using the DL4J API.

Guide

Guide

In-depth documentation on different scenarios including import, distributed training, early stopping, and GPU setup.

Deploying models? There's a tool for that.