Using Deeplearning4j with cuDNN

Using Deeplearning4j with cuDNN

Deeplearning4j supports CUDA but can be further accelerated with cuDNN. Most 2D CNN layers (such as ConvolutionLayer, SubsamplingLayer, etc), and also LSTM and BatchNormalization layers support CuDNN.

To use cuDNN, you will first need to switch ND4J to the CUDA backend. This can be done by replacing nd4j-native with nd4j-cuda-8.0, nd4j-cuda-9.0, or nd4j-cuda-9.2 in your pom.xml files, ideally adding a dependency on nd4j-cuda-8.0-platform, nd4j-cuda-9.0-platform, or nd4j-cuda-9.2-platform to include automatically binaries from all platforms:

<dependency>
	<groupId>org.nd4j</groupId>
	<artifactId>nd4j-cuda-8.0-platform</artifactId>
	<version>1.0.0-beta3</version>
</dependency>

or

<dependency>
	<groupId>org.nd4j</groupId>
	<artifactId>nd4j-cuda-9.0-platform</artifactId>
	<version>1.0.0-beta3</version>
</dependency>

or

<dependency>
	<groupId>org.nd4j</groupId>
	<artifactId>nd4j-cuda-9.2-platform</artifactId>
	<version>1.0.0-beta3</version>
</dependency>

More information about that can be found among the installation instructions for ND4J.

The only other thing we need to do to have DL4J load cuDNN is to add a dependency on deeplearning4j-cuda-8.0, deeplearning4j-cuda-9.0, or deeplearning4j-cuda-9.2, for example:

<dependency>
	<groupId>org.deeplearning4j</groupId>
	<artifactId>deeplearning4j-cuda-8.0</artifactId>
	<version>1.0.0-beta3</version>
</dependency>

or

<dependency>
	<groupId>org.deeplearning4j</groupId>
	<artifactId>deeplearning4j-cuda-9.0</artifactId>
	<version>1.0.0-beta3</version>
</dependency>

or

<dependency>
	<groupId>org.deeplearning4j</groupId>
	<artifactId>deeplearning4j-cuda-9.2</artifactId>
	<version>1.0.0-beta3</version>
</dependency>

The actual library for cuDNN is not bundled, so be sure to download and install the appropriate package for your platform from NVIDIA:

Note there are multiple combinations of cuDNN and CUDA supported. At this time the following combinations are supported by Deeplearning4j:

CUDA Version cuDNN Version
8.06.0
9.07.0
9.27.1

To install, simply extract the library to a directory found in the system path used by native libraries. The easiest way is to place it alongside other libraries from CUDA in the default directory (/usr/local/cuda/lib64/ on Linux, /usr/local/cuda/lib/ on Mac OS X, and C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\bin\, C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin\, or C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2\bin\ on Windows).

Alternatively, in the case of CUDA 9.2, cuDNN comes bundled with the “redist” package of the JavaCPP Presets for CUDA. After agreeing to the license, we can add the following dependencies instead of installing CUDA and cuDNN:

 <dependency>
     <groupId>org.bytedeco.javacpp-presets</groupId>
     <artifactId>cuda</artifactId>
     <version>9.2-7.1-1.4.2</version>
     <classifier>linux-x86_64-redist</classifier>
 </dependency>
 <dependency>
     <groupId>org.bytedeco.javacpp-presets</groupId>
     <artifactId>cuda</artifactId>
     <version>9.2-7.1-1.4.2</version>
     <classifier>linux-ppc64le-redist</classifier>
 </dependency>
 <dependency>
     <groupId>org.bytedeco.javacpp-presets</groupId>
     <artifactId>cuda</artifactId>
     <version>9.2-7.1-1.4.2</version>
     <classifier>macosx-x86_64-redist</classifier>
 </dependency>
 <dependency>
     <groupId>org.bytedeco.javacpp-presets</groupId>
     <artifactId>cuda</artifactId>
     <version>9.2-7.1-1.4.2</version>
     <classifier>windows-x86_64-redist</classifier>
 </dependency>

Also note that, by default, Deeplearning4j will use the fastest algorithms available according to cuDNN, but memory usage may be excessive, causing strange launch errors. When this happens, try to reduce memory usage by using the NO_WORKSPACE mode settable via the network configuration, instead of the default of ConvolutionLayer.AlgoMode.PREFER_FASTEST, for example:

    // for the whole network
    new NeuralNetConfiguration.Builder()
            .cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE)
            // ...
    // or separately for each layer
    new ConvolutionLayer.Builder(h, w)
            .cudnnAlgoMode(ConvolutionLayer.AlgoMode.NO_WORKSPACE)
            // ...

API Reference

API Reference

Detailed API docs for all libraries including DL4J, ND4J, DataVec, and Arbiter.

Examples

Examples

Explore sample projects and demos for DL4J, ND4J, and DataVec in multiple languages including Java and Kotlin.

Tutorials

Tutorials

Step-by-step tutorials for learning concepts in deep learning while using the DL4J API.

Guide

Guide

In-depth documentation on different scenarios including import, distributed training, early stopping, and GPU setup.

Deploying models? There's a tool for that.