Here’s a non-exhaustive list of Deeplearning4j’s features. We’ll be updating it as new nets and tools are added.


  • Spark
  • Hadoop/YARN
  • Model Import from Keras


  • Scala
  • Java



Since Deeplearning4j is a composable framework, users can arrange shallow nets to create various types of deeper nets. Combining convolutional nets with recurrent nets, for example, is how Google accurately generated captions from images in late 2014.


DL4J contains the following built-in vectorization algorithms:

DL4J supports the following type of optimization algorithms:

  • Stochastic gradient descent
  • Stochastic gradient descent with line search
  • Conjugate gradient line search (c.f. Hinton 2006)
  • L-BFGS

Each of these optimization algorithms may be paired with training features (known as ‘updaters’ in DL4J) such as:

  • SGD (learning rate only)
  • Nesterovs momentum
  • Adagrad
  • RMSProp
  • Adam
  • AdaDelta


  • Dropout (random ommission of feature detectors to prevent overfitting)
  • Sparsity (force activations of sparse/rare inputs)
  • Adagrad (feature-specific learning-rate optimization)
  • L1 and L2 regularization (weight decay)
  • Weight transforms (useful for deep autoencoders)
  • Probability distribution manipulation for initial weight generation
  • Gradient normalization and clipping

Loss/objective functions

  • MSE: Mean Squared Error: Linear Regression
  • EXPLL: Exponential log likelihood: Poisson Regression
  • XENT: Cross Entropy: Binary Classification
  • MCXENT: Multiclass Cross Entropy
  • RMSE_XENT: RMSE Cross Entropy
  • SQUARED_LOSS: Squared Loss
  • NEGATIVELOGLIKELIHOOD: Negative Log Likelihood

Activation functions

Activations functions are defined in ND4J here

  • ReLU
  • Leaky ReLU
  • Tanh
  • Sigmoid
  • Hard Tanh
  • Softmax
  • Identity
  • ELU: Exponential Linear Units
  • Softsign
  • Softplus
Chat with us on Gitter