org.nd4j.linalg.dimensionalityreduction

## Class PCA

• ```public class PCA
extends Object```
PCA class for dimensionality reduction and general analysis
Author:
Adam Gibson, Luke Czapla - added methods used in non-static usage of PCA
• ### Constructor Summary

Constructors
Constructor and Description
`PCA(INDArray dataset)`
Create a PCA instance with calculated data: covariance, mean, eigenvectors, and eigenvalues.
• ### Method Summary

All Methods
Modifier and Type Method and Description
`INDArray` `convertBackToFeatures(INDArray data)`
Take the data that has been transformed to the principal components about the mean and transform it back into the original feature set.
`INDArray` `convertToComponents(INDArray data)`
Takes a set of data on each row, with the same number of features as the constructing data and returns the data in the coordinates of the basis set about the mean.
`static INDArray[]` `covarianceMatrix(INDArray in)`
Returns the covariance matrix of a data set of many records, each with N features.
`double` ```estimateVariance(INDArray data, int ndims)```
Estimate the variance of a single record with reduced # of dimensions.
`INDArray` `generateGaussianSamples(long count)`
Generates a set of count random samples with the same variance and mean and eigenvector/values as the data set used to initialize the PCA object, with same number of features N.
`INDArray` `getCovarianceMatrix()`
`INDArray` `getEigenvalues()`
`INDArray` `getEigenvectors()`
`INDArray` `getMean()`
`static INDArray` ```pca_factor(INDArray A, double variance, boolean normalize)```
Calculates pca vectors of a matrix, for a given variance.
`static INDArray` ```pca_factor(INDArray A, int nDims, boolean normalize)```
Calculates pca factors of a matrix, for a flags number of reduced features returns the factors to scale observations The return is a factor matrix to reduce (normalized) feature sets
`static INDArray` ```pca(INDArray A, double variance, boolean normalize)```
Calculates pca reduced value of a matrix, for a given variance.
`static INDArray` ```pca(INDArray A, int nDims, boolean normalize)```
Calculates pca vectors of a matrix, for a flags number of reduced features returns the reduced feature set The return is a projection of A onto principal nDims components To use the PCA: assume A is the original feature set then project A onto a reduced set of features.
`static INDArray` ```pca2(INDArray in, double variance)```
This method performs a dimensionality reduction, including principal components that cover a fraction of the total variance of the system.
`static INDArray[]` `principalComponents(INDArray cov)`
Calculates the principal component vectors and their eigenvalues (lambda) for the covariance matrix.
`INDArray` `reducedBasis(double variance)`
Return a reduced basis set that covers a certain fraction of the variance of the data
• ### Methods inherited from class java.lang.Object

`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`
• ### Constructor Detail

• #### PCA

`public PCA(INDArray dataset)`
Create a PCA instance with calculated data: covariance, mean, eigenvectors, and eigenvalues.
Parameters:
`dataset` - The set of data (records) of features, each row is a data record and each column is a feature, every data record has the same number of features.
• ### Method Detail

• #### reducedBasis

`public INDArray reducedBasis(double variance)`
Return a reduced basis set that covers a certain fraction of the variance of the data
Parameters:
`variance` - The desired fractional variance (0 to 1), it will always be greater than the value.
Returns:
The basis vectors as columns, size N rows by ndims columns, where ndims is less than or equal to N
• #### convertToComponents

`public INDArray convertToComponents(INDArray data)`
Takes a set of data on each row, with the same number of features as the constructing data and returns the data in the coordinates of the basis set about the mean.
Parameters:
`data` - Data of the same features used to construct the PCA object
Returns:
The record in terms of the principal component vectors, you can set unused ones to zero.
• #### convertBackToFeatures

`public INDArray convertBackToFeatures(INDArray data)`
Take the data that has been transformed to the principal components about the mean and transform it back into the original feature set. Make sure to fill in zeroes in columns where components were dropped!
Parameters:
`data` - Data of the same features used to construct the PCA object but as the components
Returns:
The records in terms of the original features
• #### estimateVariance

```public double estimateVariance(INDArray data,
int ndims)```
Estimate the variance of a single record with reduced # of dimensions.
Parameters:
`data` - A single record with the same N features as the constructing data set
`ndims` - The number of dimensions to include in calculation
Returns:
The fraction (0 to 1) of the total variance covered by the ndims basis set.
• #### generateGaussianSamples

`public INDArray generateGaussianSamples(long count)`
Generates a set of count random samples with the same variance and mean and eigenvector/values as the data set used to initialize the PCA object, with same number of features N.
Parameters:
`count` - The number of samples to generate
Returns:
A matrix of size count rows by N columns
• #### pca

```public static INDArray pca(INDArray A,
int nDims,
boolean normalize)```
Calculates pca vectors of a matrix, for a flags number of reduced features returns the reduced feature set The return is a projection of A onto principal nDims components To use the PCA: assume A is the original feature set then project A onto a reduced set of features. It is possible to reconstruct the original data ( losing information, but having the same dimensionality )
``` ```

INDArray Areduced = A.mmul( factor ) ;
INDArray Aoriginal = Areduced.mmul( factor.transpose() ) ;

```
```
Parameters:
`A` - the array of features, rows are results, columns are features - will be changed
`nDims` - the number of components on which to project the features
`normalize` - whether to normalize (adjust each feature to have zero mean)
Returns:
the reduced parameters of A
• #### pca_factor

```public static INDArray pca_factor(INDArray A,
int nDims,
boolean normalize)```
Calculates pca factors of a matrix, for a flags number of reduced features returns the factors to scale observations The return is a factor matrix to reduce (normalized) feature sets
Parameters:
`A` - the array of features, rows are results, columns are features - will be changed
`nDims` - the number of components on which to project the features
`normalize` - whether to normalize (adjust each feature to have zero mean)
Returns:
the reduced feature set
`pca(INDArray, int, boolean)`
• #### pca

```public static INDArray pca(INDArray A,
double variance,
boolean normalize)```
Calculates pca reduced value of a matrix, for a given variance. A larger variance (99%) will result in a higher order feature set. The returned matrix is a projection of A onto principal components
Parameters:
`A` - the array of features, rows are results, columns are features - will be changed
`variance` - the amount of variance to preserve as a float 0 - 1
`normalize` - whether to normalize (set features to have zero mean)
Returns:
the matrix representing a reduced feature set
`pca(INDArray, int, boolean)`
• #### pca_factor

```public static INDArray pca_factor(INDArray A,
double variance,
boolean normalize)```
Calculates pca vectors of a matrix, for a given variance. A larger variance (99%) will result in a higher order feature set. To use the returned factor: multiply feature(s) by the factor to get a reduced dimension INDArray Areduced = A.mmul( factor ) ; The array Areduced is a projection of A onto principal components
Parameters:
`A` - the array of features, rows are results, columns are features - will be changed
`variance` - the amount of variance to preserve as a float 0 - 1
`normalize` - whether to normalize (set features to have zero mean)
Returns:
the matrix to mulitiply a feature by to get a reduced feature set
`pca(INDArray, double, boolean)`
• #### pca2

```public static INDArray pca2(INDArray in,
double variance)```
This method performs a dimensionality reduction, including principal components that cover a fraction of the total variance of the system. It does all calculations about the mean.
Parameters:
`in` - A matrix of datapoints as rows, where column are features with fixed number N
`variance` - The desired fraction of the total variance required
Returns:
The reduced basis set
• #### covarianceMatrix

`public static INDArray[] covarianceMatrix(INDArray in)`
Returns the covariance matrix of a data set of many records, each with N features. It also returns the average values, which are usually going to be important since in this version, all modes are centered around the mean. It's a matrix that has elements that are expressed as average dx_i * dx_j (used in procedure) or average x_i * x_j - average x_i * average x_j
Parameters:
`in` - A matrix of vectors of fixed length N (N features) on each row
Returns:
INDArray[2], an N x N covariance matrix is element 0, and the average values is element 1.
• #### principalComponents

`public static INDArray[] principalComponents(INDArray cov)`
Calculates the principal component vectors and their eigenvalues (lambda) for the covariance matrix. The result includes two things: the eigenvectors (modes) as result[0] and the eigenvalues (lambda) as result[1].
Parameters:
`cov` - The covariance matrix (calculated with the covarianceMatrix(in) method)
Returns:
Array INDArray[2] "result". The principal component vectors in decreasing flexibility are the columns of element 0 and the eigenvalues are element 1.
• #### getCovarianceMatrix

`public INDArray getCovarianceMatrix()`
• #### getMean

`public INDArray getMean()`
• #### getEigenvectors

`public INDArray getEigenvectors()`
• #### getEigenvalues

`public INDArray getEigenvalues()`