DL4J layers typically produce the sum of the gradients during the backward pass for each layer, and if required
(if minibatch=true) then divide by the minibatch size.
However, there are some exceptions, such as the batch norm mean/variance estimate parameters: these "gradients"
are actually not gradients, but are updates to be applied directly to the parameter vector. Put another way,
most gradients should be divided by the minibatch to get the average; some "gradients" are actually final updates
already, and should not be divided by the minibatch size.
paramName - Name of the parameter
True if gradients should be divided by minibatch (most params); false otherwise (edge cases like batch norm mean/variance estimates)