Embedding layer: feed-forward layer that expects single integers per example as input (class numbers, in range 0 to numClass-1)
as input. This input has shape [numExamples,1] instead of [numExamples,numClasses] for the equivalent one-hot representation.
Mathematically, EmbeddingLayer is equivalent to using a DenseLayer with a one-hot representation for the input; however,
it can be much more efficient with a large number of classes (as a dense layer + one-hot input does a matrix multiply
with all but one value being zero). Note: can only be used as the first layer for a network Note 2: For a given example index i, the output is activationFunction(weights.getRow(i) + bias), hence the
weight rows can be considered a vector/embedding for each example.
epsilon - w^(L+1)*delta^(L+1). Or, equiv: dC/da, i.e., (dC/dz)*(dz/da) = dC/da, where C
is cost function a=sigma(z) is activation.
workspaceMgr - Workspace manager
Pair where Gradient is gradient for this layer, INDArray is epsilon (activation gradient)
needed by next layer, but before element-wise multiply by sigmaPrime(z). So for standard feed-forward layer, if this layer is
L, then return.getSecond() == dL/dIn = (w^(L)*(delta^(L))^T)^T. Note that the returned array should be placed in the
ArrayType.ACTIVATION_GRAD workspace via the workspace manager