## What are activations?

At a simple level, activation functions help decide whether a neuron should be activated. This helps determine whether the information that the neuron is receiving is relevant for the input. The activation function is a non-linear transformation that happens over an input signal, and the transformed output is sent to the next neuron.

## Usage

The recommended method to use activations is to add an activation layer in your neural network, and configure your desired activation:

``````GraphBuilder graphBuilder = new NeuralNetConfiguration.Builder()
// add hyperparameters and other layers
// add more layers and output
.build();
``````

## Available activations

### ActivationRectifiedTanh

Rectified tanh

Essentially max(0, tanh(x))

Underlying implementation is in native code

### ActivationELU

f(x) = alpha (exp(x) - 1.0); x < 0 = x ; x>= 0

alpha defaults to 1, if not specified

f(x) = max(0, x)

### ActivationRationalTanh

Rational tanh approximation From https://arxiv.org/pdf/1508.01292v3

f(x) = 1.7159 tanh(2x/3) where tanh is approximated as follows, tanh(y) ~ sgn(y) { 1 - 1/(1+|y|+y^2+1.41645y^4)}

Underlying implementation is in native code

### ActivationThresholdedReLU

Thresholded RELU

f(x) = x for x > theta, f(x) = 0 otherwise. theta defaults to 1.0

### ActivationReLU6

f(x) = min(max(input, cutoff), 6)

### ActivationHardTanH

⎧ 1, if x > 1 f(x) = ⎨ -1, if x < -1 ⎩ x, otherwise

### ActivationSigmoid

f(x) = 1 / (1 + exp(-x))

### ActivationGELU

GELU activation function - Gaussian Error Linear Units

### ActivationPReLU

/ Parametrized Rectified Linear Unit (PReLU)

f(x) = alpha x for x < 0, f(x) = x for x >= 0

alpha has the same shape as x and is a learned parameter.

f(x) = x

### ActivationSoftSign

 f_i(x) = x_i / (1+ x_i )

### ActivationHardSigmoid

f(x) = min(1, max(0, 0.2x + 0.5))

### ActivationSoftmax

f_i(x) = exp(x_i - shift) / sum_j exp(x_j - shift) where shift = max_i(x_i)

f(x) = x^3

### ActivationRReLU

f(x) = max(0,x) + alpha min(0, x)

alpha is drawn from uniform(l,u) during training and is set to l+u/2 during test l and u default to 1/8 and 1/3 respectively

Empirical Evaluation of Rectified Activations in Convolutional Network

### ActivationTanH

f(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))

### ActivationSELU

https://arxiv.org/pdf/1706.02515.pdf

### ActivationLReLU

Leaky RELU f(x) = max(0, x) + alpha min(0, x) alpha defaults to 0.01

### ActivationSwish

f(x) = x sigmoid(x)

### ActivationSoftPlus

f(x) = log(1+e^x)

#### API Reference

Detailed API docs for all libraries including DL4J, ND4J, DataVec, and Arbiter.

#### Examples

Explore sample projects and demos for DL4J, ND4J, and DataVec in multiple languages including Java and Kotlin.

#### Tutorials

Step-by-step tutorials for learning concepts in deep learning while using the DL4J API.

#### Guide

In-depth documentation on different scenarios including import, distributed training, early stopping, and GPU setup.