Updaters

What are updaters?

The main difference among the updaters is how they treat the learning rate. Stochastic Gradient Descent, the most common learning algorithm in deep learning, relies on Theta (the weights in hidden layers) and alpha (the learning rate). Different updaters help optimize the learning rate until the neural network converges on its most performant state.

Usage

To use the updaters, pass a new class to the updater() method in either a ComputationGraph or MultiLayerNetwork.

ComputationGraphConfiguration conf = new NeuralNetConfiguration.Builder()
    .updater(new Adam(0.01))
    // add your layers and hyperparameters below
    .build();

Available updaters


NoOpUpdater

[source]

NoOp updater: gradient updater that makes no changes to the gradient


RmsPropUpdater

[source]

RMS Prop updates:

http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf http://cs231n.github.io/neural-networks-3/#ada


SgdUpdater

[source]

SGD updater applies a learning rate only


AMSGradUpdater

[source]

The AMSGrad updater
Reference: On the Convergence of Adam and Beyond - https://openreview.net/forum?id=ryQu7f-RZ


NesterovsUpdater

[source]

Nesterov’s momentum. Keep track of the previous layer’s gradient and use it as a way of updating the gradient.

applyUpdater
public void applyUpdater(INDArray gradient, int iteration, int epoch) 

Get the nesterov update

  • param gradient the gradient to get the update for
  • param iteration
  • return

AdaMaxUpdater

[source]

The AdaMax updater, a variant of Adam. http://arxiv.org/abs/1412.6980

applyUpdater
public void applyUpdater(INDArray gradient, int iteration, int epoch) 

Calculate the update based on the given gradient

  • param gradient the gradient to get the update for
  • param iteration
  • return the gradient

AdaDeltaUpdater

[source]

http://www.matthewzeiler.com/pubs/googleTR2012/googleTR2012.pdf https://arxiv.org/pdf/1212.5701v1.pdf

Ada delta updater. More robust adagrad that keeps track of a moving window average of the gradient rather than the every decaying learning rates of adagrad

applyUpdater
public void applyUpdater(INDArray gradient, int iteration, int epoch) 

Get the updated gradient for the given gradient and also update the state of ada delta.

  • param gradient the gradient to get the updated gradient for
  • param iteration
  • return the update gradient

GradientUpdater

[source]

Gradient modifications: Calculates an update and tracks related information for gradient changes over time for handling updates.


AdaGradUpdater

[source]

Vectorized Learning Rate used per Connection Weight

Adapted from: http://xcorr.net/2014/01/23/adagrad-eliminating-learning-rates-in-stochastic-gradient-descent See also http://cs231n.github.io/neural-networks-3/#ada

applyUpdater
public void applyUpdater(INDArray gradient, int iteration, int epoch) 

Gets feature specific learning rates Adagrad keeps a history of gradients being passed in. Note that each gradient passed in becomes adapted over time, hence the opName adagrad

  • param gradient the gradient to get learning rates for
  • param iteration

AdamUpdater

[source]

The Adam updater. http://arxiv.org/abs/1412.6980

applyUpdater
public void applyUpdater(INDArray gradient, int iteration, int epoch) 

Calculate the update based on the given gradient

  • param gradient the gradient to get the update for
  • param iteration
  • return the gradient

NadamUpdater

[source]

The Nadam updater. https://arxiv.org/pdf/1609.04747.pdf

applyUpdater
public void applyUpdater(INDArray gradient, int iteration, int epoch) 

Calculate the update based on the given gradient

  • param gradient the gradient to get the update for
  • param iteration
  • return the gradient

API Reference

API Reference

Detailed API docs for all libraries including DL4J, ND4J, DataVec, and Arbiter.

Examples

Examples

Explore sample projects and demos for DL4J, ND4J, and DataVec in multiple languages including Java and Kotlin.

Tutorials

Tutorials

Step-by-step tutorials for learning concepts in deep learning while using the DL4J API.

Guide

Guide

In-depth documentation on different scenarios including import, distributed training, early stopping, and GPU setup.

Deploying models? There's a tool for that.