Interface  Description 

NativeOps 
Native interface for
op execution on cpu

Class  Description 

BaseNativeNDArrayFactory 
Base class with
NativeOps 
LongPointerWrapper 
Wrapper for DoublePointer > LongPointer

NativeLapack 
Created by agibsonccc on 2/20/16.

NativeOpsGPUInfoProvider  
NativeOpsHolder  
Nd4jBlas 
CBlas bindings
Original credit:
https://github.com/uncomplicate/neanderthalatlas

Nd4jCpu  
Nd4jCpu._loader 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.absolute_difference_loss 
Implementation of Absolute Difference loss function predictions  labels
Input arrays:
0: predictions  the predicted values, type float.
1: weights  is used for weighting (multiplying) of loss values, type float.

Nd4jCpu.absolute_difference_loss_grad  
Nd4jCpu.add 
This is one of autobroadcastable operations.

Nd4jCpu.add_bp  
Nd4jCpu.adjust_contrast 
This operation adjusts image contrast by given factor ( z = (x  mean) * factor + mean )
Input arrays:
0  input array with rank >= 3, must have last one dimension equal 3, that is dimension containing channels.
1  optional argument, input scalararray containing saturation contrast factor
T arguments:
0  optional argument, contrast factor

Nd4jCpu.adjust_contrast_v2  
Nd4jCpu.adjust_hue 
This operation adjusts image hue by delta
Input arrays:
0  input array with rank >= 3, must have at least one dimension equal 3, that is dimension containing channels.
1  optional argument, input scalararray containing delta
T arguments:
0  optional argument, delta value
Int arguments:
0  optional argument, corresponds to dimension with 3 channels

Nd4jCpu.adjust_saturation 
This operation adjusts image saturation by delta
Input arrays:
0  input array with rank >= 3, must have at least one dimension equal 3, that is dimension containing channels.
1  optional argument, input scalararray containing saturation factor
T arguments:
0  optional argument, saturation factor
Int arguments:
0  optional argument, corresponds to dimension with 3 channels

Nd4jCpu.alpha_dropout_bp  
Nd4jCpu.apply_sgd 
This operation updates parameters with provided gradients, wrt learning rate
Expected arguments:
x: parameters, any shape
y: gradients. same shape as x
lr: optional, learning rate
T args:
0: optional, learning rate

Nd4jCpu.argmax 
This operation returns index of max element in a given NDArray (optionally: along given dimension(s))
Expected input:
0: Ndimensional array
1: optional axis vector
Int args:
0: optional axis

Nd4jCpu.argmin 
This operation returns index of min element in a given NDArray (optionally: along given dimension(s))
Expected input:
0: Ndimensional array
1: optional axis vector
Int args:
0: optional axis

Nd4jCpu.ArgumentsList 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.Assert  
Nd4jCpu.assign 
This is one of autobroadcastable operations.

Nd4jCpu.assign_bp  
Nd4jCpu.avgpool2d 
This op implements average pooling for convolution networks.

Nd4jCpu.avgpool2d_bp  
Nd4jCpu.avgpool3dnew  
Nd4jCpu.avgpool3dnew_bp  
Nd4jCpu.axpy 
This op is simple implementation of BLAS AXPY method.

Nd4jCpu.batch_to_space  
Nd4jCpu.batch_to_space_nd  
Nd4jCpu.batched_gemm 
This operation implements batched matrix multiplication
Expected arguments:
alpha: vector of T
beta: vector of T
...: A, B matrices sequentially. i.e: AAAAABBBBB
Integer arguments:
transA, transB, M, N, K, ldA, ldB, ldC  usual BLAS gemm arguments
batchCount  number of operations in this batch
PLEASE NOTE: M, N, K, ldA, ldB, ldC should be equal for all matrices within batch.

Nd4jCpu.batchnorm 
Batch normalization implementation.

Nd4jCpu.batchnorm_bp 
back prop in batch normalization
Expected arguments:
input: input array (any number of dimensions)
mean:
variance:
gamma: optional
beta: optional
dLdOut: next epsilon
Int args:
0: apply scale
1: apply offset
T args:
0: epsilon
output arrays:
dL/dInput
dL/dMean
dL/dVariance
dL/dGamma, optional
dL/dBeta, optional

Nd4jCpu.betainc 
This op calculates regularized incomplete beta integral Ix(a, b).

Nd4jCpu.biasadd 
This operation is added for compatibility purposes mostly.

Nd4jCpu.biasadd_bp  
Nd4jCpu.bincount 
bincount operation return a vector with element counted.

Nd4jCpu.bitcast 
This operation change type of input and modified shape of output to conform with given data type
all as above op

Nd4jCpu.bits_hamming_distance 
This operation returns hamming distance based on bits
PLEASE NOTE: This operation is applicable only to integer data types
\tparam T

Nd4jCpu.bitwise_and 
This operation applies bitwise AND
PLEASE NOTE: This operation is applicable only to integer data types
\tparam T

Nd4jCpu.bitwise_or 
This operation applies bitwise OR
PLEASE NOTE: This operation is applicable only to integer data types
\tparam T

Nd4jCpu.bitwise_xor 
This operation applies bitwise XOR
PLEASE NOTE: This operation is applicable only to integer data types
\tparam T

Nd4jCpu.boolean_and  
Nd4jCpu.boolean_not  
Nd4jCpu.boolean_or  
Nd4jCpu.boolean_xor  
Nd4jCpu.BooleanOp 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.broadcast_dynamic_shape 
broadcast_dynamic_shape op.

Nd4jCpu.broadcast_to 
This op broadcast given input up to given shape
inputs:
input array  array to be broadcasted to given shape
shape array  array containing shape be broadcasted to

Nd4jCpu.BroadcastableBoolOp 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.BroadcastableOp 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.broadcastgradientargs  
Nd4jCpu.cast 
This operation casts elements of input array to specified data type
PLEASE NOTE: This op is disabled atm, and reserved for future releases.

Nd4jCpu.check_numerics 
This op checks for Inf/NaN values within input array, and throws exception if there's at least one

Nd4jCpu.cholesky  
Nd4jCpu.choose 
This op takes either 1 argument and 1 scalar
or 1 argument and another comparison array
and runs a pre defined conditional op.

Nd4jCpu.clip_by_global_norm 
clip a list of given tensors with given average norm when needed
Input:
a list of tensors (at least one)
Input floating point argument:
clip_norm  a value that used as threshold value and norm to be used
return a list of clipped tensors
and global_norm as scalar tensor at the end

Nd4jCpu.clipbyavgnorm  
Nd4jCpu.clipbyavgnorm_bp  
Nd4jCpu.clipbynorm  
Nd4jCpu.clipbynorm_bp  
Nd4jCpu.clipbyvalue 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.clone_list 
This operation clones given NDArrayList

Nd4jCpu.col2im 
This op implements col2im algorithm, widely used in convolution neural networks
Input: 6D input expected (like output of im2col op)
Int args:
0: stride height
1: stride width
2: padding height
3: padding width
4: image height
5: image width
6: dilation height
7: dilation width

Nd4jCpu.compare_and_bitpack 
compare_and_bitpack  compare with greater and pack result with uint8
input params:
0  NDArray (input)
1  0D Tensor  threshold
output:
0  NDArray with the same shape as input and type uint8

Nd4jCpu.concat  
Nd4jCpu.concat_bp  
Nd4jCpu.Conditional  
Nd4jCpu.confusion_matrix 
This operation calculate the confusion matrix for a
pair of prediction and label 1D arrays.

Nd4jCpu.ConstantDataBuffer 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.ConstantDescriptor 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.ConstNDArrayVector  
Nd4jCpu.ConstNDArrayVector.Iterator  
Nd4jCpu.Context 
This class defines input desired for any given node/operation within graph

Nd4jCpu.ContextBuffers 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.ContextPrototype 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.conv1d 
1D temporal convolution implementation
Expected input:
x: 3D array
weight: 3D Array
bias: optional vector
Int args:
0: kernel
1: stride
2: padding

Nd4jCpu.conv1d_bp  
Nd4jCpu.conv2d 
2D convolution implementation
Expected input:
x: 4D array
weight: 4D Array
bias: optional vector, length of outputChannels
IntArgs:
0: kernel height
1: kernel width
2: stride height
3: stride width
4: padding height
5: padding width
6: dilation height
7: dilation width
8: same mode: 1 true, 0 false
9: data format: 1 NHWC, 0 NCHW

Nd4jCpu.conv2d_bp  
Nd4jCpu.conv2d_input_bp  
Nd4jCpu.conv3dnew  
Nd4jCpu.conv3dnew_bp  
Nd4jCpu.cosine_distance_loss 
Implementation of cosinedistance loss function 1.  (predictions * labels).reduce_sum_along(dimension)
Input arrays:
0: predictions  the predicted values, type float
1: weights  is used for weighting (multiplying) of loss values, type float.

Nd4jCpu.cosine_distance_loss_grad  
Nd4jCpu.create 
This operation creates new array
Input:
array with shape values
IArgs:
order value
data type value
BArgs:
initialization option

Nd4jCpu.create_list 
This operation creates new empty NDArrayList

Nd4jCpu.crelu 
This is Concatenated RELU implementation.

Nd4jCpu.crelu_bp  
Nd4jCpu.crop_and_resize 
This op make bilinear or nearest neighbor interpolated resize for given tensor
input array:
0  4DTensor with shape (batch, sizeX, sizeY, channels) numeric type
1  2DTensor with shape (num_boxes, 4) float type
2  1DTensor with shape (num_boxes) int type
3  1DTensor with 2 values (newWidth, newHeight) (optional) int type
float arguments (optional)
0  exprapolation_value (optional) default 0.f
int arguments: (optional)
0  mode (default 0  bilinear interpolation)
output array:
the 4DTensor with resized to crop_size images given  float type

Nd4jCpu.cross 
This op calculates crossproduct between input arguments
Input arguments
0  vector or tensor A
1  vector or tensor B

Nd4jCpu.cube 
This is Cube activation function.

Nd4jCpu.cube_bp  
Nd4jCpu.cumprod  
Nd4jCpu.cumprod_bp  
Nd4jCpu.cumsum  
Nd4jCpu.cumsum_bp  
Nd4jCpu.CurrentIndexing 
Indexing information
for bounds checking

Nd4jCpu.cyclic_rshift_bits 
This operation shift individual bits of each element in array, shifting to the right
PLEASE NOTE: This operation is applicable only to integer data types
\tparam T

Nd4jCpu.cyclic_shift_bits 
This operation shift individual bits of each element in array, shifting to the left
PLEASE NOTE: This operation is applicable only to integer data types
\tparam T

Nd4jCpu.DataBuffer 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.DebugInfo 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.DeclarableCustomOp 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.DeclarableListOp 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.DeclarableOp 
This class is the basic building block of Graph Operations.

Nd4jCpu.DeclarableReductionOp 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.deconv2d 
2D deconvolution implementation
IntArgs:
0: kernel height
1: kernel width
2: stride height
3: stride width
4: padding height
5: padding width
6: dilation height
7: dilation width
8: same mode: 0 false, 1 true

Nd4jCpu.deconv2d_bp  
Nd4jCpu.deconv2d_tf  
Nd4jCpu.deconv3d 
3D deconvolution implementation
IntArgs:
0: filter(kernel) depth
1: filter(kernel) height
2: filter(kernel) width
3: strides depth
4: strides height
5: strides width
6: paddings depth
7: paddings height
8: paddings width
9: dilations depth
10: dilations height
11: dilations width
12: same mode: 0 false, 1 true
13: data format (optional): 0NDHWC, 1NCDHW, default is 1

Nd4jCpu.deconv3d_bp  
Nd4jCpu.depth_to_space 
This operation rearranges data from depth into blocks of spatial data.

Nd4jCpu.depthwise_conv2d  
Nd4jCpu.depthwise_conv2d_bp  
Nd4jCpu.diag 
Returns a diagonal tensor with a given diagonal values.

Nd4jCpu.diag_part 
Returns a diagonal tensor with a given diagonal values.

Nd4jCpu.digamma 
This op calculates digamma function psi(x) = derivative of log(Gamma(x))
Input arrays:
0: x  abscissa points where to evaluate the digamma function, type float
Output array:
0: values of digamma function at corresponding x, type float

Nd4jCpu.dilation2d 
Dilation2D op
Int args:
0: isSameMode

Nd4jCpu.divide 
This is one of autobroadcastable operations.

Nd4jCpu.divide_bp  
Nd4jCpu.divide_no_nan 
This is one of autobroadcastable operations.

Nd4jCpu.dot_product_attention 
This operation performs dot product attention on the given timeseries input with the given queries
out = sum(similarity(k_i, q) * v_i)
similarity(k, q) = softmax(k * q) where x * q is the dot product of x and q
Optionally with normalization step:
similarity(k, q) = softmax(k * q / sqrt(size(q))
See also "Attention is all you need" (https://arxiv.org/abs/1706.03762, p. 4, eq. 1)
Note: This supports multiple queries at once, if only one query is available the queries vector still has to
be 3D but can have queryCount = 1
Note: keys and values usually is the same array.

Nd4jCpu.dot_product_attention_bp  
Nd4jCpu.draw_bounding_boxes 
draw_bounding_boxes op  modified input image with given colors exept given boxes.

Nd4jCpu.dropout 
This op calculates dropout of input
Input arguments
0  input tensor
1  noise_shape  (vector with shape to reduce)  optional
int parameter  seed for random numbers
T parameter  probability (should be between 0 and 1)
return value  a tensor with the same shape as target or input

Nd4jCpu.dropout_bp  
Nd4jCpu.dynamic_bidirectional_rnn 
Implementation of operation "static RNN time sequences" with peep hole connections:
Input arrays:
0: input with shape [time x batchSize x inSize] or [batchSize x time x inSize], time  number of time steps, batchSize  batch size, inSize  number of features
1: inputtohidden weights for forward RNN, [inSize x numUnitsFW]
2: hiddentohidden weights for forward RNN, [numUnitsFW x numUnitsFW]
3: biases for forward RNN, [2*numUnitsFW]
4: inputtohidden weights for backward RNN, [inSize x numUnitsBW]
5: hiddentohidden weights for backward RNN, [numUnitsBW x numUnitsBW]
6: biases for backward RNN, [2*numUnitsBW]
7: (optional) initial cell output for forward RNN [batchSize x numUnitsFW], that is at time step = 0
8: (optional) initial cell output for backward RNN [batchSize x numUnitsBW], that is at time step = 0
9: (optional) vector with shape [batchSize] containing integer values within [0,time), each element of this vector set max time step per each input in batch, this provides no calculations for time >= maxTimeStep
Input integer arguments:
0: (optional) timeMajor  if non zero then input shape is [time, batchSize, ...], else [batchSize, time, ...]

Nd4jCpu.dynamic_partition 
dynamic_partition  partition a input tensor onto num_partitions
accordingly to index array given.

Nd4jCpu.dynamic_partition_bp  
Nd4jCpu.dynamic_rnn 
Implementation of operation "static RNN time sequences" with peep hole connections:
Input arrays:
0: input with shape [time x batchSize x inSize] or [batchSize x time x numUnits], time  number of time steps, batchSize  batch size, inSize  number of features
1: inputtohidden weights, [inSize x numUnits]
2: hiddentohidden weights, [numUnits x numUnits]
3: biases, [2*numUnits]
4: (optional) initial cell output [batchSize x numUnits], that is at time step = 0
5: (optional) vector with shape [batchSize] containing integer values within [0,time), each element of this vector set max time step per each input in batch, this provides no calculations for time >= maxTimeStep
Input integer arguments:
0: (optional) timeMajor  if non zero then input shape is [time, batchSize, ...], else [batchSize, time, ...]

Nd4jCpu.dynamic_stitch 
dynamic_stitch  merge partitions from the second param a input tensor
into a single tensor accordingly to index array given.

Nd4jCpu.elu 
This op is ELU activation function.

Nd4jCpu.elu_bp  
Nd4jCpu.embedding_lookup 
embedding_lookup  search for submatrices in given matrix and retunts them
accordingly to index array given.

Nd4jCpu.Environment 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.eq_scalar 
This is scalar boolean op.

Nd4jCpu.equals 
This op takes 2 equally shaped arrays as input, and provides binary matrix as output.

Nd4jCpu.ErrorReference 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.evaluate_reduction_shape  
Nd4jCpu.expand_dims  
Nd4jCpu.expose 
This operations exposes given arguments as it's own outputs, but does it only once.

Nd4jCpu.ExternalWorkspace 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.extract_image_patches 
extract_image_patches op  Extract patches from images and put them in the "depth" output dimension.

Nd4jCpu.eye 
creates identity 2D matrix or batch of identical 2D identity matrices
Input array:
provide some array  in any case operation simply neglects it
Input float argument (if passed):
TArgs[0]  type of elements of output array, default value is 5 (float)
Input integer arguments:
IArgs[0]  order of output identity matrix, 99 > 'c'order, 102 > 'f'order
IArgs[1]  the number of rows in output innermost 2D identity matrix
IArgs[2]  optional, the number of columns in output innermost 2D identity matrix, if this argument is not provided then it is taken to be equal to number of rows
IArgs[3,4,...]  optional, shape of batch, output matrix will have leading batch dimensions of this shape

Nd4jCpu.fake_quant_with_min_max_vars 
fake_quant_with_min_max_vals  tf.quantization.fake_quant_with_min_max_vars
input params:
0  NDArray (input)
1  0D Tensor  min value
2  0D Tensor  max value
int params (optional):
0  num_bits (allowed interval [2, 16], default 8)
1  narrow_range (default False)
output:
0  NDArray with the same shape as input

Nd4jCpu.fake_quant_with_min_max_vars_per_channel 
fake_quant_with_min_max_vals_per_channel  tf.quantization.fake_quant_with_min_max_vars_per_channel
input params:
0  NDArray (input)  at least 2D.
1  1D Tensor  min values (min length equals to last dim of input)
2  1D Tensor  max value (length equals to min)
int params (optional):
0  num_bits (allowed interval [2, 16], default 8)
1  narrow_range (default False)
output:
0  NDArray with the same shape as input

Nd4jCpu.fill 
This operation takes shape as first argument, and returns new NDArray filled with specific scalar value.

Nd4jCpu.fill_as 
This operation takes input's shape, and returns new NDArray filled with specified value
Expected arguments:
input: Ndimensional array
T args:
0: scalar value, used to fill NDArray

Nd4jCpu.firas_sparse 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.flatten  
Nd4jCpu.Floor  
Nd4jCpu.floordiv 
This is one of autobroadcastable operations.

Nd4jCpu.floordiv_bp  
Nd4jCpu.floormod 
This is one of autobroadcastable operations.

Nd4jCpu.floormod_bp  
Nd4jCpu.FlowPath 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.fused_batch_norm 
This operation performs batch normalization of layer, it is based on following article https://arxiv.org/abs/1502.03167.

Nd4jCpu.gather  
Nd4jCpu.gather_list 
This operation builds NDArray from NDArrayList using indices
Expected arguments:
x: nonempty list
indices: vector with indices for gather operation

Nd4jCpu.gather_nd  
Nd4jCpu.get_seed  
Nd4jCpu.GraphProfile 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.GraphState 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.greater 
This op takes 2 equally shaped arrays as input, and provides binary matrix as output.

Nd4jCpu.greater_equal 
This op takes 2 equally shaped arrays as input, and provides binary matrix as output.

Nd4jCpu.gru 
Implementation of gated Recurrent Unit:
Input arrays:
0: input with shape [time x batchSize x inSize], time  number of time steps, batchSize  batch size, inSize  number of features
1: initial cell output [batchSize x numUnits], that is at time step = 0
2: inputtohidden weights, [inSize x 3*numUnits]
3: hiddentohidden weights, [numUnits x 3*numUnits]
4: biases, [3*numUnits]
Output arrays:
0: cell outputs [time x batchSize x numUnits], that is per each time step

Nd4jCpu.gru_bp  
Nd4jCpu.gruCell 
Implementation of gated Recurrent Unit cell:
Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio
"Learning Phrase Representations using RNN EncoderDecoder for Statistical Machine Translation"
Input arrays:
0: input with shape [batchSize x inSize], batchSize  batch size, inSize  number of features
1: previous cell output [batchSize x numUnits], that is at previous time step t1
2: RU weights  [(inSize+numUnits), 2*numUnits]  reset and update gates (input/recurrent weights)
3: C weights  [(inSize+numUnits), numUnits]  cell gate (input/recurrent weights)
4: reset and update biases, [2*numUnits]  reset and update gates
5: cell biases, [numUnits]
Output arrays:
0: Reset gate output [bS, numUnits]
1: Update gate output [bS, numUnits]
2: Cell gate output [bS, numUnits]
3: Current cell output [bS, numUnits]

Nd4jCpu.gruCell_bp  
Nd4jCpu.gt_scalar 
This is scalar boolean op.

Nd4jCpu.gte_scalar 
This is scalar boolean op.

Nd4jCpu.hardsigmoid 
This is HardSigmoid activation function.

Nd4jCpu.hardsigmoid_bp  
Nd4jCpu.hardtanh 
This is HardTanh activation function.

Nd4jCpu.hardtanh_bp  
Nd4jCpu.hashcode 
This operation calculates hash code, optionally along dimension

Nd4jCpu.hinge_loss 
Implementation of hinge loss function max(0, 1  labels*logits)
Input arrays:
0: logits  logits, type float
1: weights  is used for weighting (multiplying) of loss values, type float.

Nd4jCpu.hinge_loss_grad  
Nd4jCpu.histogram 
This operation calculates number of entries per bin

Nd4jCpu.histogram_fixed_width 
returns histogram (as 1D array) with fixed bins width
Input arrays:
 input array with elements to be binned into output histogram
 range array with first element being bottom limit and second element being top limit of histogram,
please note that input_value <= range[0] will be mapped to histogram[0], input_value >= range[1] will be mapped to histogram[1]
Input integer arguments:
nbins (optional)  number of histogram bins, default value is 100

Nd4jCpu.huber_loss 
Implementation of Huber loss function:
0.5 * (labelspredictions)^2 if labelspredictions <= delta
0.5 * delta^2 + delta * (labelspredictions  delta) if labelspredictions > delta
Input arrays:
0: predictions  the predicted values, type float
1: weights  is used for weighting (multiplying) of loss values, type float.

Nd4jCpu.huber_loss_grad  
Nd4jCpu.identity 
This is Indentity operation.

Nd4jCpu.identity_bp  
Nd4jCpu.identity_n 
This is Indentity operation.

Nd4jCpu.igamma 
Broadcastable igamma implementation
igamma(a, x) = gamma(а, x) / Gamma(a)  Gamma distribution function P(a,x)
Gamma(a) = int from 0 to infinity { t ^ {a  1} e^{t}dt }
gamma(a, x) = int from 0 to x { t ^ {a  1} e^{t}dt }
\tparam T

Nd4jCpu.igammac 
Broadcastable igammac implementation
igammac(a, x) = Gamma(a,x)/Gamma(а)  Gamma distribution function Q(a,x)
Gamma(a) = int from 0 to infinity { t ^ {a  1} e^{t}dt }
Gamma(a, x) = int from x to infinity { t ^ {a  1} e^{t}dt }
\tparam T

Nd4jCpu.IGenerator  
Nd4jCpu.im2col 
This op implements im2col algorithm, widely used in convolution neural networks
Input: 4D input expected
Int args:
0: kernel height
1: kernel width
2: stride height
3: stride width
4: padding height
5: padding width
6: dilation height
7: dilation width
8: isSameMode

Nd4jCpu.im2col_bp  
Nd4jCpu.image_resize 
This op make interpolated resize for given tensor with given algorithm.

Nd4jCpu.in_top_k 
in_top_k operation returns a vector of k boolean values for
given NDArray as 2D matrix of predicted in the NDArray k top values
The first parameter is a NDArray of predicted values (2d array).

Nd4jCpu.IndicesList 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.Intervals 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.IntIntPair  
Nd4jCpu.IntVectorVector  
Nd4jCpu.invert_permutation  
Nd4jCpu.is_non_decreasing 
This op takes 1 ndimensional array as input, and returns true if for every adjacent pair we have x[i] <= x[i+1].

Nd4jCpu.is_numeric_tensor 
This op takes 1 ndimensional array as input, and returns true if input is a numeric array.

Nd4jCpu.is_strictly_increasing 
This op takes 1 ndimensional array as input, and returns true if for every adjacent pair we have x[i] < x[i+1].

Nd4jCpu.ismax 
This op produces binary matrix wrt to target dimension.

Nd4jCpu.KeyPair 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.l2_loss 
l2_loss op.

Nd4jCpu.LaunchContext 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.layer_norm 
applies layer normalization to input
y = g * standardize(x) + b
see sd::ops::standardize

Nd4jCpu.layer_norm_bp  
Nd4jCpu.less 
This op takes 2 equally shaped arrays as input, and provides binary matrix as output.

Nd4jCpu.less_equal 
This op takes 2 equally shaped arrays as input, and provides binary matrix as output.

Nd4jCpu.lgamma 
This op calculates lgamma function lgamma(x) = log(Gamma(x))
Input arrays:
0: x  input matrix
Output array:
0: log of Gamma(x)

Nd4jCpu.lin_space 
lin_space  op porting from TF (https://www.tensorflow.org/api_docs/python/tf/lin_space)
optional input params:
0  startVal  NDArray scalar (float point)
1  finishVal  NDArray scalar (float point)
2  numOfElements  NDArray scalar (integer)
Optional:
T args
0  startVal
1  finishVal]
2  numOfElements
output:
0  1D NDArray with the same type as input and length as given with numOfElements param.

Nd4jCpu.listdiff 
This operation takes 2 arrays: original values, and values to be excluded.

Nd4jCpu.log_loss 
Implementation of logarithmic loss function ( y_i * log(p_i) + (1  y_i) * log(1  p_i) )
Input arrays:
0: predictions  the predicted values, type float
1: weights  is used for weighting (multiplying) of loss values, type float.

Nd4jCpu.log_loss_grad  
Nd4jCpu.log_matrix_determinant 
log_matrix_determinant op.

Nd4jCpu.log_poisson_loss 
This op calculates logarithmic loss of poisson distributed input.

Nd4jCpu.log_poisson_loss_grad  
Nd4jCpu.log_softmax  
Nd4jCpu.log_softmax_bp  
Nd4jCpu.Log1p  
Nd4jCpu.logdet 
logdet op.

Nd4jCpu.LogicOp 
Logic ops are unique snowflakes in any Graph.

Nd4jCpu.LongVectorVector  
Nd4jCpu.lrelu 
This is Leaky RELU activation function.

Nd4jCpu.lrelu_bp  
Nd4jCpu.lrn 
Local response normalization implementation as TF.

Nd4jCpu.lrn_bp 
Local response normalization  backprop variant.

Nd4jCpu.lstm 
Implementation of operation "LSTM time sequences" with peep hole connections:
Input arrays:
0: input with shape [time x batchSize x inSize], time  number of time steps, batchSize  batch size, inSize  number of features
1: initial cell output [batchSize x numProj], that is at time step = 0, in case of projection=false > numProj=numUnits!!!

Nd4jCpu.lstmBlock 
Implementation of operation for LSTM layer with optional peep hole connections.

Nd4jCpu.lstmBlockCell 
Implementation of operation for LSTM cell with optional peep hole connections:
S.

Nd4jCpu.lstmCell 
Implementation of operation for LSTM cell with peep hole connections:
S.

Nd4jCpu.lstmLayer  
Nd4jCpu.lstmLayer_bp  
Nd4jCpu.lstmLayerCell  
Nd4jCpu.lstmLayerCellBp  
Nd4jCpu.lstsq 
matrix_solve_ls op (lstsq)  solves one or more linear leastsquares problems.

Nd4jCpu.lt_scalar 
This is scalar boolean op.

Nd4jCpu.lte_scalar 
This is scalar boolean op.

Nd4jCpu.lu 
lu op.  make LUP decomposition of given batch of 2D square matricies
input params:
0  float tensor with dimension (x * y * z * ::: * M * M)
return value:
0  float tensor with dimension (x * y * z * ::: * M * M) with LU M x M matricies in it
1  int (32 or 64) batched vector of permutations with length M  shape (x * y * z * ::: * M)
int argument:
0  data type of output permutaion vector (int32 or int64), optional, default INT32

Nd4jCpu.matmul 
This op is general matmum implementation.

Nd4jCpu.matmul_bp  
Nd4jCpu.matrix_band_part 
Copy a tensor setting everything outside a central band in each innermost matrix
input array:
x: given tensor with shape {..., M, N}  as vector (matrix) of matricies MxN
int arguments:
lower band
upper band
output array:
matrix with given bands between lower and upper diagonals

Nd4jCpu.matrix_determinant 
matrix_determinant op.

Nd4jCpu.matrix_diag 
Inserts elements provided by diagonal array into the main diagonal of innermost matrices of output array,
rest output elements are set to zeros
Input array:
diagonal: array containing elements to be inserted into output array,
following rank condition is present: diagonal_rank = ouput_rank  1
Output array:
0: is considered as batch of matrices, if for example diagonal array has shape [A,B,C] then output array has shape [A,B,C,C]

Nd4jCpu.matrix_diag_part 
Returns a diagonal vector for any submatricies with in a given tensor.

Nd4jCpu.matrix_inverse 
matrix_inverse op.  make inverse for all 2D square matricies found in the input tensor
input params:
0  the tensor with dimension (x * y * z * ::: * M * M)
return value:
tensor with dimension (x * y * z * ::: * M * M) with inverse M x M matricies in it

Nd4jCpu.matrix_set_diag 
Inserts elements provided by diagonal array into the main diagonal of innermost matrices of input array
Input arrays:
0: input array, considered as batch of matrices
1: diagonal array containing elements to be inserted into input array,
following rank condition should be satisfied: diagonal_rank = input_rank  1,
the shapes of diagonal and input arrays must be equal except last dimension of input array,
for example if input_shape = [A,B,C,D] then diagonal_shape = [A,B,C],
also last dimension of diagonal array should be equal to smaller of last and last but one input dimensions
that is: diagonal_shape[1] = min(input_shape[1], input_shape[2])
Output array:
0: has the same shape as input, corresponding diagonal elements are substituted

Nd4jCpu.max_pool_with_argmax 
This op same as maxpool2d with a variant to return a matrix of indexes for max values
Input  4D tensor
Output:
0  4D tensor as input
1  4D tensor with max value indexes
Int params:
9 int with 2x4 vectors and 1 bool value

Nd4jCpu.maximum 
This is one of autobroadcastable operations.

Nd4jCpu.maximum_bp  
Nd4jCpu.maxpool2d 
This op implements max pooling for convolution networks.

Nd4jCpu.maxpool2d_bp  
Nd4jCpu.maxpool3dnew  
Nd4jCpu.maxpool3dnew_bp  
Nd4jCpu.mean_pairwssqerr_loss 
Implementation of pairwiseerrorssquared loss function
Input arrays:
0: predictions  the predicted values, type float.
1: weights  is used for weighting (multiplying) of loss values, type float.

Nd4jCpu.mean_pairwssqerr_loss_grad  
Nd4jCpu.mean_sqerr_loss 
Implementation of SumofSquares loss function 1/N * sum_{i}^{N}(predictions_i  labels_i)^2
Input arrays:
0: predictions  the predicted values, type float
1: weights  is used for weighting (multiplying) of loss values, type float.

Nd4jCpu.mean_sqerr_loss_grad  
Nd4jCpu.mergeadd  
Nd4jCpu.mergeadd_bp  
Nd4jCpu.mergeavg  
Nd4jCpu.mergeavg_bp  
Nd4jCpu.mergemax  
Nd4jCpu.mergemax_bp  
Nd4jCpu.mergemaxindex  
Nd4jCpu.meshgrid  
Nd4jCpu.minimum 
This is one of autobroadcastable operations.

Nd4jCpu.minimum_bp  
Nd4jCpu.mirror_pad  
Nd4jCpu.mod  
Nd4jCpu.mod_bp  
Nd4jCpu.moments 
moments operation calculate a mean and variation for given NDArray
with reduce a result according to axis array given.

Nd4jCpu.multi_head_dot_product_attention 
This performs multiheaded dot product attention on the given timeseries input
out = concat(head_1, head_2, ..., head_n) * Wo
head_i = dot_product_attention(Wq_i*q, Wk_i*k, Wv_i*v)
Optionally with normalization when calculating the attention for each head.

Nd4jCpu.multi_head_dot_product_attention_bp  
Nd4jCpu.multiply 
This is one of autobroadcastable operations.

Nd4jCpu.multiply_bp  
Nd4jCpu.NDArray  
Nd4jCpu.NDArrayList 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.NDArrayVector  
Nd4jCpu.NDArrayVector.Iterator  
Nd4jCpu.NDIndex 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.NDIndexAll  
Nd4jCpu.NDIndexInterval  
Nd4jCpu.NDIndexPoint  
Nd4jCpu.neq_scalar 
This is scalar boolean op.

Nd4jCpu.NodeProfile 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.non_max_suppression 
image.non_max_suppression ops.

Nd4jCpu.non_max_suppression_overlaps  
Nd4jCpu.non_max_suppression_v3  
Nd4jCpu.noop  
Nd4jCpu.norm 
This operation provides various normalization modes:
0: frobenius
1: euclidean (norm2)
2: norm1
3: norm2
4: infnorm
5: pnorm
Expected arguments:
input: Ndimensional array
Int args:
0...: axis
T args:
0: norm mode
1: p for pnorm

Nd4jCpu.normalize_moments 
normalize_moments operation normalize already calculated mean and variation
accordingly to shift and count.

Nd4jCpu.not_equals 
This op takes 2 equally shaped arrays as input, and provides binary matrix as output.

Nd4jCpu.nth_element  
Nd4jCpu.onehot 
This operation return onehot encoded ndimensional array
Expected arguments:
input: Ndimensional array
T args:
0: 'on' value
1: 'off' value
Int args:
0: depth
1: axis

Nd4jCpu.ones_as 
This operation takes input's shape, and returns new NDArray filled with ones
Expected arguments:
input: Ndimensional array

Nd4jCpu.OpArgsHolder 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.OpDescriptor 
This class is very basic info holder for ops. bean/pojo pretty much.

Nd4jCpu.OpRegistrator 
This class provides runtime ops lookup, based on opName or opHash.

Nd4jCpu.order 
This op changes order of given array to specified order.

Nd4jCpu.pad  
Nd4jCpu.Pair 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.parallel_stack  
Nd4jCpu.percentile 
This operation performs calculation of percentile of input array along given axises
Input  tensor with rank N > 0
Output  tensor with rank (N  length(axis)) or scalar if number of Integer arguments is zero
Float arguments:
0: percentile (scalar) in range [0,100] (inclusively)
1: interpolation (optional), possible values are 0"lower", 1"higher", 2"nearest"(default)
2: keepDims (optional), if it is non zero, then unities are kept in reduced resulting shape of output array, default is 0
Integer arguments  axis  the sequence of axises to calculate percentile along, if sequence is empty then calculate percentile for whole input tensor and return result as scalar

Nd4jCpu.permute 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.pick_list 
This operations selects specified indices fron NDArrayList and returns them as NDArray
Expected arguments:
x: nonempty list
indices: optional, vector with indices
Int args:
optional, indices

Nd4jCpu.PlatformHelper 
This abstract class defines methods used by platformspecific helpers implementations

Nd4jCpu.pnormpool2d 
This op implements pnorm pooling for convolution networks.

Nd4jCpu.pnormpool2d_bp  
Nd4jCpu.pointwise_conv2d 
pointwise 2D convolution
Expected input:
x: 4D array
weight: 4D Array [1, 1, iC, oC] (NHWC) or [oC, iC, 1, 1] (NCHW)
bias: optional vector, length of oC
IntArgs:
0: data format: 1 NHWC, 0 NCHW (optional, by default = NHWC)

Nd4jCpu.polygamma 
This op calculates polygamma function psi^(n)(x).

Nd4jCpu.Pow 
Broadcastable pow implementation
\tparam T

Nd4jCpu.Pow_bp  
Nd4jCpu.prelu 
Parametric Rectified Linear Unit
f(x) = alpha * x for x < 0, f(x) = x for x >= 0

Nd4jCpu.prelu_bp  
Nd4jCpu.qr 
QR decomposition: A = QR, where Q is ortogonal (Q * QT = I) and R is upper triangular.

Nd4jCpu.random_bernoulli  
Nd4jCpu.random_crop  
Nd4jCpu.random_exponential  
Nd4jCpu.random_gamma 
random_gamma op.

Nd4jCpu.random_multinomial  
Nd4jCpu.random_normal  
Nd4jCpu.random_poisson 
random_poisson op.

Nd4jCpu.random_shuffle  
Nd4jCpu.RandomBuffer 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.RandomGenerator 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.randomuniform  
Nd4jCpu.range 
This operation generate sequences.

Nd4jCpu.rank 
This operation returns rank of input array as scalar value.

Nd4jCpu.rationaltanh 
This is RationalTanh activation function.

Nd4jCpu.rationaltanh_bp  
Nd4jCpu.read_list 
This operations selects specified index fron NDArrayList and returns it as NDArray
Expected arguments:
x: nonempty list
indices: optional, scalar with index
Int args:
optional, index

Nd4jCpu.realdiv 
This is one of autobroadcastable operations.

Nd4jCpu.realdiv_bp  
Nd4jCpu.rectifiedtanh 
This is RectifiedTanh activation function.

Nd4jCpu.rectifiedtanh_bp  
Nd4jCpu.reduce_dot_bp 
This op calculates backprop dot for two tensors along given dimensions
input array:
x: tensor to calculate dot for
y: tensor to calculate dot for
z: tensor with gradient output of the FF dot for x and y
int arguments:
list of integers  dimensions to calculate dot along,
default corresponds to empty list in which case calculation
is performed for all dimensions and scalar is returned.

Nd4jCpu.reduce_logsumexp 
reduce_logsumexp  tf.reduce_logsumexe operation
input params:
0  NDArray (input)
1  1D NDArray (axis) (optional)  integer array
T_ARG param (optional):
0  keep_dims !

Nd4jCpu.reduce_max 
This op calculates max of elements along given dimensions
input array:
x: tensor to calculate maxes for
float arguments:
keepDims: if non zero, then keep reduced dimensions with length = 1, default value is zero
int arguments:
list of integers  dimensions to calculate max along, default corresponds to empty list in which case calculation is performed for all dimensions and scalar is returned
output array:
reduced tensor with calculated maxes

Nd4jCpu.reduce_max_bp  
Nd4jCpu.reduce_mean 
This op calculates mean of elements along given dimensions
input array:
x: tensor to calculate mean for
float arguments:
keepDims: if non zero, then keep reduced dimensions with length = 1, default value is zero
int arguments:
list of integers  dimensions to calculate mean along, default corresponds to empty list in which case calculation is performed for all dimensions and scalar is returned
output array:
reduced tensor with calculated means

Nd4jCpu.reduce_mean_bp  
Nd4jCpu.reduce_min 
This op calculates min of elements along given dimensions
input array:
x: tensor to calculate mins for
float arguments:
keepDims: if non zero, then keep reduced dimensions with length = 1, default value is zero
int arguments:
list of integers  dimensions to calculate min along, default corresponds to empty list in which case calculation is performed for all dimensions and scalar is returned
output array:
reduced tensor with calculated mins

Nd4jCpu.reduce_min_bp  
Nd4jCpu.reduce_norm_max 
This op calculates norm max of elements along given dimensions
input array:
x: tensor to calculate norm max for
float arguments:
keepDims: if non zero, then keep reduced dimensions with length = 1, default value is zero
int arguments:
list of integers  dimensions to calculate norm max along, default corresponds to empty list in which case calculation is performed for all dimensions and scalar is returned
output array:
reduced tensor with calculated norm

Nd4jCpu.reduce_norm_max_bp  
Nd4jCpu.reduce_norm1 
This op calculates norm1 of elements along given dimensions
input array:
x: tensor to calculate norm1 for
float arguments:
keepDims: if non zero, then keep reduced dimensions with length = 1, default value is zero
int arguments:
list of integers  dimensions to calculate norm1 along, default corresponds to empty list in which case calculation is performed for all dimensions and scalar is returned
output array:
reduced tensor with calculated norm1

Nd4jCpu.reduce_norm1_bp  
Nd4jCpu.reduce_norm2 
This op calculates norm2 of elements along given dimensions
input array:
x: tensor to calculate norm2 for
float arguments:
keepDims: if non zero, then keep reduced dimensions with length = 1, default value is zero
int arguments:
list of integers  dimensions to calculate norm2 along, default corresponds to empty list in which case calculation is performed for all dimensions and scalar is returned
output array:
reduced tensor with calculated norm2

Nd4jCpu.reduce_norm2_bp  
Nd4jCpu.reduce_prod 
reduction_prod  tf.reduction_prod operation
input params:
0  NDArray
T_ARG param (optional):
0  keep_dims !

Nd4jCpu.reduce_prod_bp  
Nd4jCpu.reduce_sqnorm 
This op calculates squared norm of elements along given dimensions
input array:
x: tensor to calculate squared norm for
float arguments:
keepDims: if non zero, then keep reduced dimensions with length = 1, default value is zero
int arguments:
list of integers  dimensions to calculate squared norm along, default corresponds to empty list in which case calculation is performed for all dimensions and scalar is returned
output array:
reduced tensor with calculated norm

Nd4jCpu.reduce_sqnorm_bp  
Nd4jCpu.reduce_stdev 
This op calculates sample standard deviation of elements along given dimensions
input array:
x: tensor to calculate mean for
float arguments:
keepDims: if non zero, then keep reduced dimensions with length = 1, default value is zero
biasCorrected  if non zero, then bias correction will be applied, default value is zero
int arguments:
list of integers  dimensions to calculate mean along, default corresponds to empty list in which case calculation is performed for all dimensions and scalar is returned
output array:
reduced tensor with calculated means

Nd4jCpu.reduce_stdev_bp  
Nd4jCpu.reduce_sum 
reduction_sum  tf.reduction_sum operation
input params:
0  NDArray
T_ARG param (optional):
0  keep_dims !

Nd4jCpu.reduce_sum_bp  
Nd4jCpu.reduce_variance 
This op calculates sample variance of elements along given dimensions
input array:
x: tensor to calculate mean for
float arguments:
keepDims: if non zero, then keep reduced dimensions with length = 1, default value is zero
biasCorrected  if non zero, then bias correction will be applied, default value is zero
int arguments:
list of integers  dimensions to calculate mean along, default corresponds to empty list in which case calculation is performed for all dimensions and scalar is returned
output array:
reduced tensor with calculated means

Nd4jCpu.reduce_variance_bp  
Nd4jCpu.relu 
This is RELU activation function implementation

Nd4jCpu.relu_bp  
Nd4jCpu.relu_layer 
relu_layer = relu(x*w + b)

Nd4jCpu.relu6 
This is RELU6 activation function implementation

Nd4jCpu.relu6_bp  
Nd4jCpu.repeat  
Nd4jCpu.reshape  
Nd4jCpu.reshapeas  
Nd4jCpu.resize_area 
This op make area interpolated resize (as OpenCV INTER_AREA algorithm) for given tensor
input array:
0  images  4DTensor with shape (batch, sizeX, sizeY, channels)
1  size  1DTensor with 2 values (newWidth, newHeight) (if missing a pair of integer args should be provided).

Nd4jCpu.resize_bicubic 
This op make bicubic interpolated resize for given tensor
input array:
0  4DTensor with shape (batch, sizeX, sizeY, channels)
1  1DTensor with 2 values (newWidth, newHeight)
output array:
the 4DTensor with resized image (shape is {batch, newWidth, newHeight, channels})

Nd4jCpu.resize_bilinear 
This op make bilinear interpolated resize for given tensor
input array:
0  4DTensor with shape (batch, sizeX, sizeY, channels)
1  1DTensor with 2 values (newWidth, newHeight) (optional)
int arguments: (optional)
0  new width
1  new height
output array:
the 4DTensor with calculated backproped dots
CAUTION: either size tensor or a pair of int params should be provided.

Nd4jCpu.resize_nearest_neighbor 
This op make nearest neighbor interpolated resize for given tensor
input array:
0  4DTensor with shape (batch, sizeX, sizeY, channels)
1  1DTensor with 2 values (newWidth, newHeight) (optional)
int arguments: (optional)
0  new width
1  new height
output array:
the 4DTensor with resized image (shape is {batch, newWidth, newHeight, channels})
CAUTION: either size tensor or a pair of int params should be provided.

Nd4jCpu.ResultSet 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.ResultWrapper 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.Return  
Nd4jCpu.reverse  
Nd4jCpu.reverse_bp  
Nd4jCpu.reverse_sequence  
Nd4jCpu.reversedivide 
This is one of autobroadcastable operations.

Nd4jCpu.reversedivide_bp  
Nd4jCpu.reversemod 
This is one of autobroadcastable operations.

Nd4jCpu.reversemod_bp  
Nd4jCpu.reversesubtract 
This is one of autobroadcastable operations.

Nd4jCpu.reversesubtract_bp  
Nd4jCpu.rint 
This operation applies elementwise rint (round to integral value) operation

Nd4jCpu.roll 
roll  op porting from numpy (https://docs.scipy.org/doc/numpy1.14.0/reference/generated/numpy.roll.html)
input params:
0  NDArray
int params:
0  shift
1  axe 1
2  axe 2
...

Nd4jCpu.rshift_bits 
This operation shift individual bits of each element in array to the right: >>
PLEASE NOTE: This operation is applicable only to integer data types
\tparam T

Nd4jCpu.scatter_add 
This operation applies Add operation to specific inputs wrt indices
Expected arguments:
input: array to be updated
indices: array containing indexes for first dimension of input
updates: array containing elements to be interfered with input

Nd4jCpu.scatter_div 
This operation applies Divide operation to specific inputs wrt indices
Expected arguments:
input: array to be updated
indices: array containing indexes for first dimension of input
updates: array containing elements to be interfered with input

Nd4jCpu.scatter_list 
This operation unpacks given NDArray into specified NDArrayList wrt specified indices

Nd4jCpu.scatter_max 
This operation applies Max operation to specific inputs through given indices
Expected arguments:
input: array to be updated
indices: array containing indexes for first dimension of input
updates: array containing elements to be interfered with input

Nd4jCpu.scatter_min 
This operation applies Min operation to specific inputs through given indices
Expected arguments:
input: array to be updated
indices: array containing indexes for first dimension of input
updates: array containing elements to be interfered with input

Nd4jCpu.scatter_mul 
This operation applies Multiply operation to specific inputs wrt indices
Expected arguments:
input: array to be updated
indices: array containing indexes for first dimension of input
updates: array containing elements to be interfered with input

Nd4jCpu.scatter_nd 
This operation scatter "updates" elements into new output array according to given "indices"
Expected arguments:
indices: array containing elements/slices indexes of output array to put "updates" elements into, the rest output elements will be zeros
updates: array containing elements to be inserted into output array
shape: contains shape of output array

Nd4jCpu.scatter_nd_add 
This operation adds "updates" elements to input array along given "indices"
Expected arguments:
input: array to be updated
indices: array containing elements/slices indexes of input array to add "updates" elements to
updates: array containing elements to be interfered with input

Nd4jCpu.scatter_nd_sub 
This operation subtract "updates" elements from input array along given "indices"
Expected arguments:
input: array to be updated
indices: array containing elements/slices indexes of input array to subtract "updates" elements from
updates: array containing elements to be interfered with input

Nd4jCpu.scatter_nd_update 
This operation scatter "updates" elements into input array along given "indices"
Expected arguments:
input: array to be updated
indices: array containing elements/slices indexes of input array to put "updates" elements into
updates: array containing elements to be inserted into input array

Nd4jCpu.scatter_sub 
This operation applies Subtract operation to specific inputs wrt indices
Expected arguments:
input: array to be updated
indices: array containing indexes for first dimension of input
updates: array containing elements to be interfered with input

Nd4jCpu.scatter_upd 
This operation applies Assign operation to specific inputs wrt indices
Expected arguments:
input: array to be updated
indices: array containing indexes for first dimension of input
updates: array containing elements to be interfered with input

Nd4jCpu.scatter_update  
Nd4jCpu.sconv2d 
Depthwise convolution2d op:
Expected inputs:
x: 4D array, NCHW format
weightsDepth: 4D array,
weightsPointwise: optional, 4D array
bias: optional, vector

Nd4jCpu.sconv2d_bp  
Nd4jCpu.Scope  
Nd4jCpu.segment_max 
segment_max op.  make a tensor filled by max values according to index tensor given.

Nd4jCpu.segment_max_bp  
Nd4jCpu.segment_mean 
segment_mean op.  make a tensor filled by average of values according to index tensor given.

Nd4jCpu.segment_mean_bp  
Nd4jCpu.segment_min 
segment_min op.  make a tensor filled by min values according to index tensor given.

Nd4jCpu.segment_min_bp  
Nd4jCpu.segment_prod 
segment_prod op.  make a tensor filled by product of values according to index tensor given.

Nd4jCpu.segment_prod_bp  
Nd4jCpu.segment_sum 
segment_sum op.  make a tensor filled by sum of values according to index tensor given.

Nd4jCpu.segment_sum_bp  
Nd4jCpu.select 
This op takes 2 ndimensional arrays as input, and return
array of the same shape, with elements, either from x or y, depending on the condition.

Nd4jCpu.selu 
This is SELU activation function implementation

Nd4jCpu.selu_bp  
Nd4jCpu.sequence_mask 
sequence_mask op.  make mask for given tensor filled by (j > x[i_1, i_2,...

Nd4jCpu.set_seed 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.shape_of  
Nd4jCpu.ShapeDescriptor 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.ShapeInformation 
Shape information approximating
the information on an ndarray

Nd4jCpu.ShapeList 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.shapes_of  
Nd4jCpu.shift_bits 
This operation shift individual bits of each element in array to the left: <<
PLEASE NOTE: This operation is applicable only to integer data types
\tparam T

Nd4jCpu.sigm_cross_entropy_loss 
Implementation of sigmoid crossentropy loss function max(logits, 0.)  logits * labels + log(1. + exp(abs(logits)));
Input arrays:
0: logits  logits, type float
1: weights  is used for weighting (multiplying) of loss values, type float.

Nd4jCpu.sigm_cross_entropy_loss_grad  
Nd4jCpu.sigmoid 
This is Sigmoid activation function implementation
Math is: 1 / 1 + exp(x)

Nd4jCpu.sigmoid_bp  
Nd4jCpu.size 
This operation returns length of input array
Expected arguments:
input: Ndimensional array
TODO: make this operation reduction, to allow TAD > size

Nd4jCpu.size_at  
Nd4jCpu.size_list 
This operations returns scalar, with number of existing arrays within given NDArrayList
Expected arguments:
x: list

Nd4jCpu.slice 
This operation extracts a slice from a tensor.

Nd4jCpu.slice_bp  
Nd4jCpu.softmax 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.softmax_bp  
Nd4jCpu.softmax_cross_entropy_loss 
Implementation of softmax crossentropy loss function max(logits, 0.)  logits * labels + log(1. + exp(abs(logits)));
Input arrays:
0: logits  logits, type float
1: weights  is used for weighting (multiplying) of loss values, type float.

Nd4jCpu.softmax_cross_entropy_loss_grad  
Nd4jCpu.softmax_cross_entropy_loss_with_logits 
Implementation of softmax crossentropy loss function
Input arrays:
0: logits  logits, type float
1: labels  ground truth vales, expected to be 0. or 1., type float.

Nd4jCpu.softmax_cross_entropy_loss_with_logits_grad  
Nd4jCpu.softplus 
This is Softplus activation function implementation
Math is: log(1 + exp(x))

Nd4jCpu.softplus_bp  
Nd4jCpu.softsign 
This is Softsign activation function implementation
Math is: x / 1 + abs(x)

Nd4jCpu.softsign_bp  
Nd4jCpu.solve 
solve op.  solve systems of linear equations  general method.

Nd4jCpu.solve_ls  
Nd4jCpu.space_to_batch 
Zeropads and then rearranges (permutes) blocks of spatial data into batch.

Nd4jCpu.space_to_batch_nd  
Nd4jCpu.space_to_depth 
This operation rearranges blocks of spatial data, into depth.This op output is a copy of the input tensor
where values from the height and width dimensions are moved to the depth dimension.

Nd4jCpu.sparse_softmax_cross_entropy_loss_with_logits 
Implementation of sparse softmax crossentropy loss function
Input arrays:
0: labels  ground truth vales, expected to be within range [0, num_classes), type float.

Nd4jCpu.sparse_softmax_cross_entropy_loss_with_logits_grad  
Nd4jCpu.split 
This operation splits given NDArray into chunks of specific size, along given dimension
0  input array
1  optional axis
Integer arguments:
0  number of splits
1  optional axis

Nd4jCpu.split_list 
This operation splits given NDArray into chunks, and stores them into given NDArrayList wert sizes
Expected arguments:
list: optional, NDArrayList. if not available  new NDArrayList will be created
array: array to be split
sizes: vector with sizes for each chunk

Nd4jCpu.split_v 
This operation splits given NDArray into chunks of specific size, along given dimension
Input arrays:
0  input array
1  array of sizes
2  optional axis
Integer arguments:
0  optional axis

Nd4jCpu.square 
This operation applies elementwise pow(x, 2) to the given input
Expected arguments:
input: NDimensional array

Nd4jCpu.squaredsubtract 
This is one of autobroadcastable operations.

Nd4jCpu.squaredsubtract_bp  
Nd4jCpu.squeeze  
Nd4jCpu.sru 
Implementation of operation for Simple Recurrent Unit: "Training RNNs as Fast as CNNs" Tao Lei, Yu Zhang, Yoav Artzi
Input arrays:
0: input 3d tensor with shape [bS x K x N], N  number of time steps, bS  batch size, K  number of features
1: 2d tensor of weights [3K x K]
2: row of biases with twice length [1 x 2K]
3: 2d tensor of previous cell state [bS x K]
4: optional, 2d tensor of dropout mask [bS x K]
Output arrays:
0: 3d tensor of cell output [bS x K x N]
1: 3d tensor of cell state [bS x K x N]

Nd4jCpu.sru_bi 
Implementation of operation for Simple Recurrent Unit (bidirectional case): "Training RNNs as Fast as CNNs" Tao Lei, Yu Zhang, Yoav Artzi
Input arrays:
0: input 3d tensor with shape [N x bS x 2K], N  number of time steps, bS  batch size, K  number of features
1: 2d tensor of weights [2K x 6K]
2: row of biases with twice length [1 x 4K]
3: 2d tensor of previous cell state [bS x 2K]
4: optional, 2d tensor of dropout mask [bS x 2K]
Output arrays:
0: 3d tensor of cell output [N x bS x 2K]
1: 3d tensor of cell state [N x bS x 2K]

Nd4jCpu.sru_bi_bp 
Implementation of operation for back propagation in Simple Recurrent Unit (bidirectional case): "Training RNNs as Fast as CNNs" Tao Lei, Yu Zhang, Yoav Artzi
Input arrays:
0: input 3d tensor with shape [N x bS x 2K], N  number of time steps, bS  batch size, K  number of features
1: 2d tensor of weights [2K x 6K]
2: row of biases with twice length [1 x 4K]
3: 2d tensor of previous cell state [bS x 2K]
4: 3d tensor of cell state [N x bS x 2K]
5: 2d tensor of cell state gradients [bS x 2K]
6: 3d tensor of state output gradients [N x bS x 2K]
7: optional, 2d tensor of dropout mask [bS x 2K]
Output arrays:
0: 3d tensor of input gradients [N x bS x 2K]
1: 3d tensor of weights gradients [N x 2K x 6K]
2: 2d, row of biases gradients [1 x 4K]
3: 2d, tensor of state gradients [bS x 2K]

Nd4jCpu.sru_bp 
Implementation of operation for back propagation in Simple Recurrent Unit: "Training RNNs as Fast as CNNs" Tao Lei, Yu Zhang, Yoav Artzi
Input arrays:
0: input 3d tensor with shape [bS x K x N], N  number of time steps, bS  batch size, K  number of features
1: 2d tensor of weights [3K x K]
2: row of biases with twice length [1 x 2K]
3: 2d tensor of previous cell state [bS x K]
4: 3d tensor of cell state [bS x K x N]
5: 2d tensor of cell state gradients [bS x K]
6: 3d tensor of state output gradients [bS x K x N]
7: optional, 2d tensor of dropout mask [bS x K]
Output arrays:
0: 3d tensor of input gradients [bS x K x N]
1: 3d tensor of weights gradients [bS x 3K x K]
2: 2d, row of biases gradients [1 x 2K]
3: 2d, tensor of state gradients [bS x K]

Nd4jCpu.sruCell 
Implementation of operations for Simple Recurrent Unit cell: "Training RNNs as Fast as CNNs" Tao Lei, Yu Zhang, Yoav Artzi
Input arrays:
0: input with shape [batchSize x inSize], batchSize  batch size, inSize  number of features
1: previous cell state [batchSize x inSize], that is at previous time step t1
2: weights [inSize x 3*inSize]
3: biases [1 x 2*inSize]
Output arrays:
0: current cell output [batchSize x inSize], that is at current time step t
1: current cell state [batchSize x inSize], that is at current time step t

Nd4jCpu.stack 
This operation stacks a list of rank tensors into one rank(R+1) tensor.

Nd4jCpu.stack_list 
This operation concatenates given NDArrayList, and returns NDArray as result

Nd4jCpu.standardize 
standardizes input array to be zero mean unit variance along the given axis

Nd4jCpu.standardize_bp  
Nd4jCpu.Stash  
Nd4jCpu.static_bidirectional_rnn 
Implementation of operation "static RNN time sequences" with peep hole connections:
Input arrays:
0: input with shape [time x batchSize x inSize], time  number of time steps, batchSize  batch size, inSize  number of features
1: inputtohidden weights for forward RNN, [inSize x numUnitsFW]
2: hiddentohidden weights for forward RNN, [numUnitsFW x numUnitsFW]
3: biases for forward RNN, [2*numUnitsFW]
4: inputtohidden weights for backward RNN, [inSize x numUnitsBW]
5: hiddentohidden weights for backward RNN, [numUnitsBW x numUnitsBW]
6: biases for backward RNN, [2*numUnitsBW]
7: (optional) initial cell output for forward RNN [batchSize x numUnitsFW], that is at time step = 0
8: (optional) initial cell output for backward RNN [batchSize x numUnitsBW], that is at time step = 0
9: (optional) vector with shape [batchSize] containing integer values within [0,time), each element of this vector set max time step per each input in batch, this provides no calculations for time >= maxTimeStep
Output arrays:
0: cell outputs [time x batchSize x (numUnitsFW + numUnitsBW)]
1: cell final nonzero output for forward RNN [batchSize x numUnitsFW]
2: cell final nonzero output for backward RNN [batchSize x numUnitsBW]

Nd4jCpu.static_rnn 
Implementation of operation "static RNN time sequences" with peep hole connections:
Input arrays:
0: input with shape [time x batchSize x inSize], time  number of time steps, batchSize  batch size, inSize  number of features
1: inputtohidden weights, [inSize x numUnits]
2: hiddentohidden weights, [numUnits x numUnits]
3: biases, [2*numUnits]
4: (optional) initial cell output [batchSize x numUnits], that is at time step = 0
5: (optional) vector with shape [batchSize] containing integer values within [0,time), each element of this vector set max time step per each input in batch, this provides no calculations for time >= maxTimeStep
Output arrays:
0: cell outputs [time x batchSize x numUnits]
1: cell final nonzero output [batchSize x numUnits]

Nd4jCpu.stop_gradient 
This operation is missed due it simplicy.

Nd4jCpu.strided_slice 
This operation extracts a strided (optionally) slice from a tensor,

Nd4jCpu.strided_slice_bp  
Nd4jCpu.subtract 
This is one of autobroadcastable operations.

Nd4jCpu.subtract_bp  
Nd4jCpu.sufficient_statistics 
sufficient_statistics operation return calculated mean and variation with data count.

Nd4jCpu.svd 
performs singular value decomposition (SVD) of one or more matrices, evaluates the SVD of each innermost 2D matrix in input array:
x[..., :, :] = u[..., :, :] * s[...,:] * transpose(v[..., :, :])
Input array:
x[..., Rows, Cols], the necessary condition is: rank of x >= 2
Outputs arrays:
s[..., diagSize]  array with singular values which are stored in decreasing order, diagSize is smaller among Rows and Cols
u[..., Rows, Rows] if IArgs[1] is true, else u[..., Rows, diagSize]  array with right singular vectors
v[..., Cols, Cols] if IArgs[1] is true, else v[..., Cols, diagSize]  array with left singular vectors
Integer arguments:
IArgs[0]  bool, whether to calculate u and v, s is calculated in any case
IArgs[1]  bool, whether to calculate fullsized u and v
IArgs[2]  the number of cols or rows which determines what algorithm to use.

Nd4jCpu.Switch  
Nd4jCpu.TadDescriptor 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.TadPack 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.tanh 
This is Tanh activation function implementation

Nd4jCpu.tanh_bp  
Nd4jCpu.tear 
This operation splits input NDArray into multiple TADs along given dimensions
Expected arguments:
input: Ndimensional array
Int args:
0..: TAD axis

Nd4jCpu.tensormmul 
tensorMmul/tensorDot operation
takes 2 ndarrays, and 2 sets of axes
Integer argumens map:
IArgs[0]  number of axes along for first array
IArgs[1]... axes values for first array
IArgs[]  number of axes along for second array
IArgs[1]... axes values for second array

Nd4jCpu.tensormmul_bp  
Nd4jCpu.test_output_reshape 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.test_scalar  
Nd4jCpu.testcustom  
Nd4jCpu.testop2i2o  
Nd4jCpu.testreduction  
Nd4jCpu.tf_atan2 
Special atan2 op impl for TF's args order
\tparam T

Nd4jCpu.thresholdedrelu 
Thresholded Rectified Linear Unit
f(x) = x for x > theta, f(x) = 0 otherwise
theta must be >= 0

Nd4jCpu.thresholdedrelu_bp  
Nd4jCpu.tile  
Nd4jCpu.tile_bp  
Nd4jCpu.tile_to_shape 
This op boosts specified input up to specified shape
\tparam T

Nd4jCpu.tile_to_shape_bp  
Nd4jCpu.to_double 
This operation casts elements of input array to double data type
PLEASE NOTE: This op is disabled atm, and reserved for future releases.

Nd4jCpu.to_float16 
This operation casts elements of input array to float16 data type
PLEASE NOTE: This op is disabled atm, and reserved for future releases.

Nd4jCpu.to_float32 
This operation casts elements of input array to float data type
PLEASE NOTE: This op is disabled atm, and reserved for future releases.

Nd4jCpu.to_int32 
This operation casts elements of input array to int32 data type
PLEASE NOTE: This op is disabled atm, and reserved for future releases.

Nd4jCpu.to_int64 
This operation casts elements of input array to int64 (aka long long) data type
PLEASE NOTE: This op is disabled atm, and reserved for future releases.

Nd4jCpu.to_uint32 
This operation casts elements of input array to unsinged int32 data type
PLEASE NOTE: This op is disabled atm, and reserved for future releases.

Nd4jCpu.to_uint64 
This operation casts elements of input array to unsigned int64 (aka unsigned long long) data type
PLEASE NOTE: This op is disabled atm, and reserved for future releases.

Nd4jCpu.toggle_bits 
This operation toggles individual bits of each element in array
PLEASE NOTE: This operation is possible only on integer data types
\tparam T

Nd4jCpu.top_k 
top_k operation returns a vector of k top values for
given NDArray as tensor with default boolean (true)
as sort for result index array
will be sorted by the values in descending order.

Nd4jCpu.trace  
Nd4jCpu.transpose  
Nd4jCpu.tri  
Nd4jCpu.triangular_solve 
triangular_solve op.  reverse Gaussian method for solve systems of linear equations.

Nd4jCpu.triu  
Nd4jCpu.triu_bp  
Nd4jCpu.truncatediv 
\tparam T

Nd4jCpu.unique 
This operation returns unique elements from input array as vector, and their original indices in input array
Expected input:
input: Ndimensional array

Nd4jCpu.unique_with_counts 
This operation returns 3 1D arrays for given 1D array with unique element count and indexes
input:
0  1D array
output:
0  1D array with unique values
1  1D array with ids for values in array above
2  1D array with counts for values in array above

Nd4jCpu.unsorted_segment_max 
unsorted_segment_max op.  make a tensor filled by max values according to index tensor given.

Nd4jCpu.unsorted_segment_max_bp  
Nd4jCpu.unsorted_segment_mean 
unsorted_segment_mean op.  make a tensor filled by average of values according to index tensor given.

Nd4jCpu.unsorted_segment_mean_bp  
Nd4jCpu.unsorted_segment_min 
unsorted_segment_min op.  make a tensor filled by min values according to index tensor given.

Nd4jCpu.unsorted_segment_min_bp  
Nd4jCpu.unsorted_segment_prod 
unsorted_segment_prod op.  make a tensor filled by product of values according to index tensor given.

Nd4jCpu.unsorted_segment_prod_bp  
Nd4jCpu.unsorted_segment_sqrt_n 
unsorted_segment_sqrt_n op.  computes the sum along segments of a tensor divided by the sqrt(N).

Nd4jCpu.unsorted_segment_sqrt_n_bp  
Nd4jCpu.unsorted_segment_sum 
unsorted_segment_sum op.  make a tensor filled by sum of values according to index tensor given.

Nd4jCpu.unsorted_segment_sum_bp  
Nd4jCpu.unstack 
This op does the same as tear, just uses different input format:
\tparam T

Nd4jCpu.unstack_list 
This operation unstacks given NDArray into NDArrayList by the first dimension

Nd4jCpu.upsampling2d 
Expected input: 4D array
IntArgs:
0: scale factor for rows (height)
1: scale factor for columns (width)
2: data format: 0 NHWC (default), 1 NCHW

Nd4jCpu.upsampling2d_bp  
Nd4jCpu.upsampling3d 
Expected input: 4D array
IntArgs:
0: scale factor for depth
1: scale factor for rows (height)
2: scale factor for columns (width)
3: data format: 0 NDHWC (default), 1 NCDHW

Nd4jCpu.upsampling3d_bp  
Nd4jCpu.utf8string 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.Variable 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.VariableSpace 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.VariablesSet 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.weighted_cross_entropy_with_logits 
This op calculates weighted logarithmic loss of input
Input arguments
0  target
1  input
2  weights (scalar or vector with same as last dimension)
return value  a tensor with the same shape as target or input

Nd4jCpu.Where 
This op takes 2 ndimensional arrays as input, and return
array of the same shape, with elements, either from x or y, depending on the condition.

Nd4jCpu.where_np  
Nd4jCpu.While  
Nd4jCpu.Workspace 
Copyright (c) 20152018 Skymind, Inc.

Nd4jCpu.write_list 
This operations puts given NDArray into (optionally) given NDArrayList.

Nd4jCpu.Xoroshiro128  
Nd4jCpu.xw_plus_b 
xw_plus_b op.

Nd4jCpu.xw_plus_b_bp  
Nd4jCpu.zero_fraction 
zero_fraction op.

Nd4jCpu.zeros_as 
This operation takes input's shape, and returns new NDArray filled with zeros
Expected arguments:
input: Ndimensional array

Nd4jCpu.zeta 
This op calculates Hurwitz zeta function zeta(x, q) = sum_{n=0}^{inf} (q + n)^{x}
Implementation is based on EulerMaclaurin summation formula
Input arrays:
x: define power {x}, must be > 1, type float.

Nd4jCpuHelper  
Nd4jCpuPresets  
OpaqueConstantDataBuffer  
OpaqueContext  
OpaqueDataBuffer 
This class is a opaque pointer to InteropDataBuffer, used for Java/C++ interop related to INDArray DataBuffer

OpaqueLaunchContext  
OpaqueRandomGenerator  
OpaqueResultWrapper  
OpaqueShapeList  
OpaqueTadPack  
OpaqueVariable  
OpaqueVariablesSet  
PointerPointerWrapper 
Wrapper for DoublePointer > LongPointer

ResultWrapperAbstraction 
Copyright © 2020. All rights reserved.