public class SparkComputationGraph extends SparkListenable
Modifier and Type | Field and Description |
---|---|
static int |
DEFAULT_EVAL_SCORE_BATCH_SIZE |
static int |
DEFAULT_EVAL_WORKERS |
static int |
DEFAULT_ROC_THRESHOLD_STEPS |
trainingMaster
Constructor and Description |
---|
SparkComputationGraph(org.apache.spark.api.java.JavaSparkContext sparkContext,
ComputationGraphConfiguration conf,
TrainingMaster trainingMaster) |
SparkComputationGraph(org.apache.spark.api.java.JavaSparkContext javaSparkContext,
ComputationGraph network,
TrainingMaster trainingMaster) |
SparkComputationGraph(org.apache.spark.SparkContext sparkContext,
ComputationGraphConfiguration conf,
TrainingMaster trainingMaster) |
SparkComputationGraph(org.apache.spark.SparkContext sparkContext,
ComputationGraph network,
TrainingMaster trainingMaster)
Instantiate a ComputationGraph instance with the given context, network and training master.
|
Modifier and Type | Method and Description |
---|---|
double |
calculateScore(org.apache.spark.api.java.JavaRDD<DataSet> data,
boolean average)
Calculate the score for all examples in the provided
JavaRDD<DataSet> , either by summing
or averaging over the entire data set. |
double |
calculateScore(org.apache.spark.api.java.JavaRDD<DataSet> data,
boolean average,
int minibatchSize)
Calculate the score for all examples in the provided
JavaRDD<DataSet> , either by summing
or averaging over the entire data set. |
double |
calculateScoreMultiDataSet(org.apache.spark.api.java.JavaRDD<MultiDataSet> data,
boolean average)
Calculate the score for all examples in the provided
JavaRDD<MultiDataSet> , either by summing
or averaging over the entire data set. |
double |
calculateScoreMultiDataSet(org.apache.spark.api.java.JavaRDD<MultiDataSet> data,
boolean average,
int minibatchSize)
Calculate the score for all examples in the provided
JavaRDD<MultiDataSet> , either by summing
or averaging over the entire data set |
<T extends IEvaluation> |
doEvaluation(org.apache.spark.api.java.JavaRDD<DataSet> data,
int evalNumWorkers,
int evalBatchSize,
T... emptyEvaluations)
Perform distributed evaluation on a single output ComputationGraph form DataSet objects using Spark.
|
<T extends IEvaluation> |
doEvaluation(org.apache.spark.api.java.JavaRDD<DataSet> data,
int evalBatchSize,
T... emptyEvaluations)
Perform distributed evaluation on a single output ComputationGraph form DataSet objects using Spark.
|
<T extends IEvaluation> |
doEvaluation(org.apache.spark.api.java.JavaRDD<DataSet> data,
T emptyEvaluation,
int evalBatchSize)
Perform distributed evaluation of any type of
IEvaluation . |
IEvaluation[] |
doEvaluation(org.apache.spark.api.java.JavaRDD<String> data,
DataSetLoader loader,
IEvaluation... emptyEvaluations)
Perform evaluation on serialized DataSet objects on disk, (potentially in any format), that are loaded using an
DataSetLoader . |
IEvaluation[] |
doEvaluation(org.apache.spark.api.java.JavaRDD<String> data,
int evalNumWorkers,
int evalBatchSize,
DataSetLoader loader,
IEvaluation... emptyEvaluations)
Perform evaluation on serialized DataSet objects on disk, (potentially in any format), that are loaded using an
DataSetLoader . |
protected IEvaluation[] |
doEvaluation(org.apache.spark.api.java.JavaRDD<String> data,
int evalNumWorkers,
int evalBatchSize,
DataSetLoader loader,
MultiDataSetLoader mdsLoader,
IEvaluation... emptyEvaluations) |
IEvaluation[] |
doEvaluation(org.apache.spark.api.java.JavaRDD<String> data,
int evalNumWorkers,
int evalBatchSize,
MultiDataSetLoader loader,
IEvaluation... emptyEvaluations)
Perform evaluation on serialized MultiDataSet objects on disk, (potentially in any format), that are loaded using an
MultiDataSetLoader |
IEvaluation[] |
doEvaluation(org.apache.spark.api.java.JavaRDD<String> data,
MultiDataSetLoader loader,
IEvaluation... emptyEvaluations)
Perform evaluation on serialized MultiDataSet objects on disk, (potentially in any format), that are loaded using an
MultiDataSetLoader . |
<T extends IEvaluation> |
doEvaluationMDS(org.apache.spark.api.java.JavaRDD<MultiDataSet> data,
int evalNumWorkers,
int evalBatchSize,
T... emptyEvaluations) |
<T extends IEvaluation> |
doEvaluationMDS(org.apache.spark.api.java.JavaRDD<MultiDataSet> data,
int evalBatchSize,
T... emptyEvaluations)
Perform distributed evaluation on a single output ComputationGraph form MultiDataSet objects using Spark.
|
<T extends Evaluation> |
evaluate(org.apache.spark.api.java.JavaRDD<DataSet> data)
Evaluate the network (classification performance) in a distributed manner on the provided data
|
<T extends Evaluation> |
evaluate(org.apache.spark.api.java.JavaRDD<DataSet> data,
List<String> labelsList)
Evaluate the network (classification performance) in a distributed manner, using default batch size and a provided
list of labels
|
<T extends Evaluation> |
evaluate(org.apache.spark.api.java.JavaRDD<DataSet> data,
List<String> labelsList,
int evalBatchSize)
Evaluate the network (classification performance) in a distributed manner, using specified batch size and a provided
list of labels
|
<T extends Evaluation> |
evaluate(org.apache.spark.rdd.RDD<DataSet> data)
RDD<DataSet> overload of evaluate(JavaRDD) |
<T extends Evaluation> |
evaluate(org.apache.spark.rdd.RDD<DataSet> data,
List<String> labelsList)
RDD<DataSet> overload of evaluate(JavaRDD, List) |
Evaluation |
evaluate(String path,
DataSetLoader loader)
Evaluate the single-output network on a directory containing a set of DataSet objects to be loaded with a
DataSetLoader . |
Evaluation |
evaluate(String path,
MultiDataSetLoader loader)
Evaluate the single-output network on a directory containing a set of MultiDataSet objects to be loaded with a
MultiDataSetLoader . |
<T extends Evaluation> |
evaluateMDS(org.apache.spark.api.java.JavaRDD<MultiDataSet> data)
Evaluate the network (classification performance) in a distributed manner on the provided data
|
<T extends Evaluation> |
evaluateMDS(org.apache.spark.api.java.JavaRDD<MultiDataSet> data,
int minibatchSize)
Evaluate the network (classification performance) in a distributed manner on the provided data
|
<T extends RegressionEvaluation> |
evaluateRegression(org.apache.spark.api.java.JavaRDD<DataSet> data)
Evaluate the network (regression performance) in a distributed manner on the provided data
|
<T extends RegressionEvaluation> |
evaluateRegression(org.apache.spark.api.java.JavaRDD<DataSet> data,
int minibatchSize)
Evaluate the network (regression performance) in a distributed manner on the provided data
|
<T extends RegressionEvaluation> |
evaluateRegressionMDS(org.apache.spark.api.java.JavaRDD<MultiDataSet> data)
Evaluate the network (regression performance) in a distributed manner on the provided data
|
<T extends RegressionEvaluation> |
evaluateRegressionMDS(org.apache.spark.api.java.JavaRDD<MultiDataSet> data,
int minibatchSize)
Evaluate the network (regression performance) in a distributed manner on the provided data
|
<T extends ROC> |
evaluateROC(org.apache.spark.api.java.JavaRDD<DataSet> data)
Perform ROC analysis/evaluation on the given DataSet in a distributed manner, using the default number of
threshold steps (
DEFAULT_ROC_THRESHOLD_STEPS ) and the default minibatch size (DEFAULT_EVAL_SCORE_BATCH_SIZE ) |
<T extends ROC> |
evaluateROC(org.apache.spark.api.java.JavaRDD<DataSet> data,
int thresholdSteps,
int evaluationMinibatchSize)
Perform ROC analysis/evaluation on the given DataSet in a distributed manner
|
ROC |
evaluateROCMDS(org.apache.spark.api.java.JavaRDD<MultiDataSet> data)
Perform ROC analysis/evaluation on the given DataSet in a distributed manner, using the default number of
threshold steps (
DEFAULT_ROC_THRESHOLD_STEPS ) and the default minibatch size (DEFAULT_EVAL_SCORE_BATCH_SIZE ) |
<T extends ROC> |
evaluateROCMDS(org.apache.spark.api.java.JavaRDD<MultiDataSet> data,
int rocThresholdNumSteps,
int minibatchSize)
Perform ROC analysis/evaluation on the given DataSet in a distributed manner, using the specified number of
steps and minibatch size
|
<T extends ROCMultiClass> |
evaluateROCMultiClass(org.apache.spark.api.java.JavaRDD<DataSet> data)
Perform ROC analysis/evaluation (for the multi-class case, using
ROCMultiClass on the given DataSet in a distributed manner |
<T extends ROCMultiClass> |
evaluateROCMultiClass(org.apache.spark.api.java.JavaRDD<DataSet> data,
int thresholdSteps,
int evaluationMinibatchSize)
Perform ROC analysis/evaluation (for the multi-class case, using
ROCMultiClass on the given DataSet in a distributed manner |
<K> org.apache.spark.api.java.JavaPairRDD<K,INDArray[]> |
feedForwardWithKey(org.apache.spark.api.java.JavaPairRDD<K,INDArray[]> featuresData,
int batchSize)
Feed-forward the specified data, with the given keys. i.e., get the network output/predictions for the specified data
|
<K> org.apache.spark.api.java.JavaPairRDD<K,INDArray> |
feedForwardWithKeySingle(org.apache.spark.api.java.JavaPairRDD<K,INDArray> featuresData,
int batchSize)
Feed-forward the specified data, with the given keys. i.e., get the network output/predictions for the specified data
|
ComputationGraph |
fit(org.apache.spark.api.java.JavaRDD<DataSet> rdd)
Fit the ComputationGraph with the given data set
|
ComputationGraph |
fit(org.apache.spark.rdd.RDD<DataSet> rdd)
Fit the ComputationGraph with the given data set
|
ComputationGraph |
fit(String path)
Fit the SparkComputationGraph network using a directory of serialized DataSet objects
The assumption here is that the directory contains a number of
DataSet objects, each serialized using
DataSet.save(OutputStream) |
ComputationGraph |
fit(String path,
int minPartitions)
Deprecated.
Use
fit(String) |
ComputationGraph |
fitMultiDataSet(org.apache.spark.api.java.JavaRDD<MultiDataSet> rdd)
Fit the ComputationGraph with the given data set
|
ComputationGraph |
fitMultiDataSet(org.apache.spark.rdd.RDD<MultiDataSet> rdd)
Fit the ComputationGraph with the given data set
|
ComputationGraph |
fitMultiDataSet(String path)
Fit the SparkComputationGraph network using a directory of serialized MultiDataSet objects
The assumption here is that the directory contains a number of serialized
MultiDataSet objects |
ComputationGraph |
fitMultiDataSet(String path,
int minPartitions)
Deprecated.
|
ComputationGraph |
fitPaths(org.apache.spark.api.java.JavaRDD<String> paths)
Fit the network using a list of paths for serialized DataSet objects.
|
ComputationGraph |
fitPaths(org.apache.spark.api.java.JavaRDD<String> paths,
DataSetLoader loader) |
ComputationGraph |
fitPaths(org.apache.spark.api.java.JavaRDD<String> paths,
MultiDataSetLoader loader) |
ComputationGraph |
fitPathsMultiDataSet(org.apache.spark.api.java.JavaRDD<String> paths)
Fit the network using a list of paths for serialized MultiDataSet objects.
|
int |
getDefaultEvaluationWorkers()
Returns the currently set default number of evaluation workers/threads.
|
ComputationGraph |
getNetwork() |
double |
getScore()
Gets the last (average) minibatch score from calling fit.
|
org.apache.spark.api.java.JavaSparkContext |
getSparkContext() |
SparkTrainingStats |
getSparkTrainingStats() |
TrainingMaster |
getTrainingMaster() |
<K> org.apache.spark.api.java.JavaPairRDD<K,Double> |
scoreExamples(org.apache.spark.api.java.JavaPairRDD<K,DataSet> data,
boolean includeRegularizationTerms)
DataSet version of
scoreExamples(JavaPairRDD, boolean) |
<K> org.apache.spark.api.java.JavaPairRDD<K,Double> |
scoreExamples(org.apache.spark.api.java.JavaPairRDD<K,DataSet> data,
boolean includeRegularizationTerms,
int batchSize)
DataSet version of
scoreExamples(JavaPairRDD, boolean, int) |
org.apache.spark.api.java.JavaDoubleRDD |
scoreExamples(org.apache.spark.api.java.JavaRDD<DataSet> data,
boolean includeRegularizationTerms)
DataSet version of
scoreExamples(JavaRDD, boolean) |
org.apache.spark.api.java.JavaDoubleRDD |
scoreExamples(org.apache.spark.api.java.JavaRDD<DataSet> data,
boolean includeRegularizationTerms,
int batchSize)
DataSet version of
scoreExamples(JavaPairRDD, boolean, int) |
<K> org.apache.spark.api.java.JavaPairRDD<K,Double> |
scoreExamplesMultiDataSet(org.apache.spark.api.java.JavaPairRDD<K,MultiDataSet> data,
boolean includeRegularizationTerms)
Score the examples individually, using the default batch size
DEFAULT_EVAL_SCORE_BATCH_SIZE . |
<K> org.apache.spark.api.java.JavaPairRDD<K,Double> |
scoreExamplesMultiDataSet(org.apache.spark.api.java.JavaPairRDD<K,MultiDataSet> data,
boolean includeRegularizationTerms,
int batchSize)
Score the examples individually, using a specified batch size.
|
org.apache.spark.api.java.JavaDoubleRDD |
scoreExamplesMultiDataSet(org.apache.spark.api.java.JavaRDD<MultiDataSet> data,
boolean includeRegularizationTerms)
Score the examples individually, using the default batch size
DEFAULT_EVAL_SCORE_BATCH_SIZE . |
org.apache.spark.api.java.JavaDoubleRDD |
scoreExamplesMultiDataSet(org.apache.spark.api.java.JavaRDD<MultiDataSet> data,
boolean includeRegularizationTerms,
int batchSize)
Score the examples individually, using a specified batch size.
|
void |
setCollectTrainingStats(boolean collectTrainingStats) |
void |
setDefaultEvaluationWorkers(int workers)
Set the default number of evaluation workers/threads.
|
void |
setNetwork(ComputationGraph network) |
void |
setScore(double lastScore) |
setListeners, setListeners, setListeners, setListeners
public static final int DEFAULT_ROC_THRESHOLD_STEPS
public static final int DEFAULT_EVAL_SCORE_BATCH_SIZE
public static final int DEFAULT_EVAL_WORKERS
public SparkComputationGraph(org.apache.spark.SparkContext sparkContext, ComputationGraph network, TrainingMaster trainingMaster)
sparkContext
- the spark context to usenetwork
- the network to usetrainingMaster
- Required for training. May be null if the SparkComputationGraph is only to be used
for evaluation or inferencepublic SparkComputationGraph(org.apache.spark.api.java.JavaSparkContext javaSparkContext, ComputationGraph network, TrainingMaster trainingMaster)
public SparkComputationGraph(org.apache.spark.SparkContext sparkContext, ComputationGraphConfiguration conf, TrainingMaster trainingMaster)
public SparkComputationGraph(org.apache.spark.api.java.JavaSparkContext sparkContext, ComputationGraphConfiguration conf, TrainingMaster trainingMaster)
public org.apache.spark.api.java.JavaSparkContext getSparkContext()
public void setCollectTrainingStats(boolean collectTrainingStats)
public SparkTrainingStats getSparkTrainingStats()
public ComputationGraph getNetwork()
public TrainingMaster getTrainingMaster()
public void setNetwork(ComputationGraph network)
network
- The network to be used for any subsequent training, inference and evaluation stepspublic int getDefaultEvaluationWorkers()
DEFAULT_EVAL_WORKERS
will be usedpublic void setDefaultEvaluationWorkers(int workers)
DEFAULT_EVAL_WORKERS
will be usedpublic ComputationGraph fit(org.apache.spark.rdd.RDD<DataSet> rdd)
rdd
- Data to train onpublic ComputationGraph fit(org.apache.spark.api.java.JavaRDD<DataSet> rdd)
rdd
- Data to train onpublic ComputationGraph fit(String path)
DataSet
objects, each serialized using
DataSet.save(OutputStream)
path
- Path to the directory containing the serialized DataSet objcets@Deprecated public ComputationGraph fit(String path, int minPartitions)
fit(String)
public ComputationGraph fitPaths(org.apache.spark.api.java.JavaRDD<String> paths)
paths
- List of pathspublic ComputationGraph fitPaths(org.apache.spark.api.java.JavaRDD<String> paths, DataSetLoader loader)
public ComputationGraph fitMultiDataSet(org.apache.spark.rdd.RDD<MultiDataSet> rdd)
rdd
- Data to train onpublic ComputationGraph fitMultiDataSet(org.apache.spark.api.java.JavaRDD<MultiDataSet> rdd)
rdd
- Data to train onpublic ComputationGraph fitMultiDataSet(String path)
MultiDataSet
objectspath
- Path to the directory containing the serialized MultiDataSet objcetspublic ComputationGraph fitPathsMultiDataSet(org.apache.spark.api.java.JavaRDD<String> paths)
paths
- List of pathspublic ComputationGraph fitPaths(org.apache.spark.api.java.JavaRDD<String> paths, MultiDataSetLoader loader)
@Deprecated public ComputationGraph fitMultiDataSet(String path, int minPartitions)
fitMultiDataSet(String)
public double getScore()
public void setScore(double lastScore)
public double calculateScore(org.apache.spark.api.java.JavaRDD<DataSet> data, boolean average)
JavaRDD<DataSet>
, either by summing
or averaging over the entire data set. To calculate a score for each example individually, use scoreExamples(JavaPairRDD, boolean)
or one of the similar methods. Uses default minibatch size in each worker, DEFAULT_EVAL_SCORE_BATCH_SIZE
data
- Data to scoreaverage
- Whether to sum the scores, or average thempublic double calculateScore(org.apache.spark.api.java.JavaRDD<DataSet> data, boolean average, int minibatchSize)
JavaRDD<DataSet>
, either by summing
or averaging over the entire data set. To calculate a score for each example individually, use scoreExamples(JavaPairRDD, boolean)
or one of the similar methodsdata
- Data to scoreaverage
- Whether to sum the scores, or average themminibatchSize
- The number of examples to use in each minibatch when scoring. If more examples are in a partition than
this, multiple scoring operations will be done (to avoid using too much memory by doing the whole partition
in one go)public double calculateScoreMultiDataSet(org.apache.spark.api.java.JavaRDD<MultiDataSet> data, boolean average)
JavaRDD<MultiDataSet>
, either by summing
or averaging over the entire data set.
Uses default minibatch size in each worker, DEFAULT_EVAL_SCORE_BATCH_SIZE
data
- Data to scoreaverage
- Whether to sum the scores, or average thempublic double calculateScoreMultiDataSet(org.apache.spark.api.java.JavaRDD<MultiDataSet> data, boolean average, int minibatchSize)
JavaRDD<MultiDataSet>
, either by summing
or averaging over the entire data set.
*data
- Data to scoreaverage
- Whether to sum the scores, or average themminibatchSize
- The number of examples to use in each minibatch when scoring. If more examples are in a partition than
this, multiple scoring operations will be done (to avoid using too much memory by doing the whole partition
in one go)public org.apache.spark.api.java.JavaDoubleRDD scoreExamples(org.apache.spark.api.java.JavaRDD<DataSet> data, boolean includeRegularizationTerms)
scoreExamples(JavaRDD, boolean)
public org.apache.spark.api.java.JavaDoubleRDD scoreExamples(org.apache.spark.api.java.JavaRDD<DataSet> data, boolean includeRegularizationTerms, int batchSize)
scoreExamples(JavaPairRDD, boolean, int)
public <K> org.apache.spark.api.java.JavaPairRDD<K,Double> scoreExamples(org.apache.spark.api.java.JavaPairRDD<K,DataSet> data, boolean includeRegularizationTerms)
scoreExamples(JavaPairRDD, boolean)
public <K> org.apache.spark.api.java.JavaPairRDD<K,Double> scoreExamples(org.apache.spark.api.java.JavaPairRDD<K,DataSet> data, boolean includeRegularizationTerms, int batchSize)
scoreExamples(JavaPairRDD, boolean, int)
public org.apache.spark.api.java.JavaDoubleRDD scoreExamplesMultiDataSet(org.apache.spark.api.java.JavaRDD<MultiDataSet> data, boolean includeRegularizationTerms)
DEFAULT_EVAL_SCORE_BATCH_SIZE
. Unlike calculateScore(JavaRDD, boolean)
,
this method returns a score for each example separately. If scoring is needed for specific examples use either
scoreExamples(JavaPairRDD, boolean)
or scoreExamples(JavaPairRDD, boolean, int)
which can have
a key for each example.data
- Data to scoreincludeRegularizationTerms
- If true: include the l1/l2 regularization terms with the score (if any)ComputationGraph.scoreExamples(MultiDataSet, boolean)
public org.apache.spark.api.java.JavaDoubleRDD scoreExamplesMultiDataSet(org.apache.spark.api.java.JavaRDD<MultiDataSet> data, boolean includeRegularizationTerms, int batchSize)
calculateScore(JavaRDD, boolean)
,
this method returns a score for each example separately. If scoring is needed for specific examples use either
scoreExamples(JavaPairRDD, boolean)
or scoreExamples(JavaPairRDD, boolean, int)
which can have
a key for each example.data
- Data to scoreincludeRegularizationTerms
- If true: include the l1/l2 regularization terms with the score (if any)batchSize
- Batch size to use when doing scoringComputationGraph.scoreExamples(MultiDataSet, boolean)
public <K> org.apache.spark.api.java.JavaPairRDD<K,Double> scoreExamplesMultiDataSet(org.apache.spark.api.java.JavaPairRDD<K,MultiDataSet> data, boolean includeRegularizationTerms)
DEFAULT_EVAL_SCORE_BATCH_SIZE
. Unlike calculateScore(JavaRDD, boolean)
,
this method returns a score for each example separatelyK
- Key typedata
- Data to scoreincludeRegularizationTerms
- If true: include the l1/l2 regularization terms with the score (if any)JavaPairRDD<K,Double>
containing the scores of each exampleMultiLayerNetwork.scoreExamples(DataSet, boolean)
public <K> org.apache.spark.api.java.JavaPairRDD<K,INDArray> feedForwardWithKeySingle(org.apache.spark.api.java.JavaPairRDD<K,INDArray> featuresData, int batchSize)
K
- Type of data for key - may be anythingfeaturesData
- Features data to feed through the networkbatchSize
- Batch size to use when doing feed forward operationspublic <K> org.apache.spark.api.java.JavaPairRDD<K,INDArray[]> feedForwardWithKey(org.apache.spark.api.java.JavaPairRDD<K,INDArray[]> featuresData, int batchSize)
K
- Type of data for key - may be anythingfeaturesData
- Features data to feed through the networkbatchSize
- Batch size to use when doing feed forward operationspublic <K> org.apache.spark.api.java.JavaPairRDD<K,Double> scoreExamplesMultiDataSet(org.apache.spark.api.java.JavaPairRDD<K,MultiDataSet> data, boolean includeRegularizationTerms, int batchSize)
calculateScore(JavaRDD, boolean)
,
this method returns a score for each example separatelyK
- Key typedata
- Data to scoreincludeRegularizationTerms
- If true: include the l1/l2 regularization terms with the score (if any)JavaPairRDD<K,Double>
containing the scores of each exampleMultiLayerNetwork.scoreExamples(DataSet, boolean)
public Evaluation evaluate(String path, DataSetLoader loader)
DataSetLoader
.
Uses default batch size of DEFAULT_EVAL_SCORE_BATCH_SIZE
path
- Path/URI to the directory containing the datasets to loadpublic Evaluation evaluate(String path, MultiDataSetLoader loader)
MultiDataSetLoader
.
Uses default batch size of DEFAULT_EVAL_SCORE_BATCH_SIZE
path
- Path/URI to the directory containing the datasets to loadpublic <T extends Evaluation> T evaluate(org.apache.spark.rdd.RDD<DataSet> data)
RDD<DataSet>
overload of evaluate(JavaRDD)
public <T extends Evaluation> T evaluate(org.apache.spark.api.java.JavaRDD<DataSet> data)
data
- Data to evaluate onpublic <T extends Evaluation> T evaluate(org.apache.spark.rdd.RDD<DataSet> data, List<String> labelsList)
RDD<DataSet>
overload of evaluate(JavaRDD, List)
public <T extends RegressionEvaluation> T evaluateRegression(org.apache.spark.api.java.JavaRDD<DataSet> data)
data
- Data to evaluateRegressionEvaluation
instance with regression performancepublic <T extends RegressionEvaluation> T evaluateRegression(org.apache.spark.api.java.JavaRDD<DataSet> data, int minibatchSize)
data
- Data to evaluateminibatchSize
- Minibatch size to use when doing performing evaluationRegressionEvaluation
instance with regression performancepublic <T extends Evaluation> T evaluate(org.apache.spark.api.java.JavaRDD<DataSet> data, List<String> labelsList)
data
- Data to evaluate onlabelsList
- List of labels used for evaluationpublic <T extends ROC> T evaluateROC(org.apache.spark.api.java.JavaRDD<DataSet> data)
DEFAULT_ROC_THRESHOLD_STEPS
) and the default minibatch size (DEFAULT_EVAL_SCORE_BATCH_SIZE
)data
- Test set data (to evaluate on)public <T extends ROC> T evaluateROC(org.apache.spark.api.java.JavaRDD<DataSet> data, int thresholdSteps, int evaluationMinibatchSize)
data
- Test set data (to evaluate on)thresholdSteps
- Number of threshold steps for ROC - see ROC
evaluationMinibatchSize
- Minibatch size to use when performing ROC evaluationpublic <T extends ROCMultiClass> T evaluateROCMultiClass(org.apache.spark.api.java.JavaRDD<DataSet> data)
ROCMultiClass
on the given DataSet in a distributed mannerdata
- Test set data (to evaluate on)public <T extends ROCMultiClass> T evaluateROCMultiClass(org.apache.spark.api.java.JavaRDD<DataSet> data, int thresholdSteps, int evaluationMinibatchSize)
ROCMultiClass
on the given DataSet in a distributed mannerdata
- Test set data (to evaluate on)thresholdSteps
- Number of threshold steps for ROC - see ROC
evaluationMinibatchSize
- Minibatch size to use when performing ROC evaluationpublic <T extends Evaluation> T evaluate(org.apache.spark.api.java.JavaRDD<DataSet> data, List<String> labelsList, int evalBatchSize)
data
- Data to evaluate onlabelsList
- List of labels used for evaluationevalBatchSize
- Batch size to use when conducting evaluationspublic <T extends Evaluation> T evaluateMDS(org.apache.spark.api.java.JavaRDD<MultiDataSet> data)
public <T extends Evaluation> T evaluateMDS(org.apache.spark.api.java.JavaRDD<MultiDataSet> data, int minibatchSize)
public <T extends RegressionEvaluation> T evaluateRegressionMDS(org.apache.spark.api.java.JavaRDD<MultiDataSet> data)
data
- Data to evaluateRegressionEvaluation
instance with regression performancepublic <T extends RegressionEvaluation> T evaluateRegressionMDS(org.apache.spark.api.java.JavaRDD<MultiDataSet> data, int minibatchSize)
data
- Data to evaluateminibatchSize
- Minibatch size to use when doing performing evaluationRegressionEvaluation
instance with regression performancepublic ROC evaluateROCMDS(org.apache.spark.api.java.JavaRDD<MultiDataSet> data)
DEFAULT_ROC_THRESHOLD_STEPS
) and the default minibatch size (DEFAULT_EVAL_SCORE_BATCH_SIZE
)data
- Test set data (to evaluate on)public <T extends ROC> T evaluateROCMDS(org.apache.spark.api.java.JavaRDD<MultiDataSet> data, int rocThresholdNumSteps, int minibatchSize)
data
- Test set data (to evaluate on)rocThresholdNumSteps
- See ROC
for detailsminibatchSize
- Minibatch size for evaluationpublic <T extends IEvaluation> T doEvaluation(org.apache.spark.api.java.JavaRDD<DataSet> data, T emptyEvaluation, int evalBatchSize)
IEvaluation
. For example, Evaluation
, RegressionEvaluation
,
ROC
, ROCMultiClass
etc.T
- Type of evaluation instance to returndata
- Data to evaluate onemptyEvaluation
- Empty evaluation instance. This is the starting point (serialized/duplicated, then merged)evalBatchSize
- Evaluation batch sizepublic <T extends IEvaluation> T[] doEvaluation(org.apache.spark.api.java.JavaRDD<DataSet> data, int evalBatchSize, T... emptyEvaluations)
Evaluation
and
ROC
) at the same time.getDefaultEvaluationWorkers()
will be useddata
- Data to evaluatieevalBatchSize
- Minibatch size for evaluationemptyEvaluations
- Evaluations to performpublic <T extends IEvaluation> T[] doEvaluation(org.apache.spark.api.java.JavaRDD<DataSet> data, int evalNumWorkers, int evalBatchSize, T... emptyEvaluations)
Evaluation
and
ROC
) at the same time.data
- Data to evaluatieevalNumWorkers
- Number of worker threads (per machine) to use for evaluation. May want tis to be less than
the number of Spark threads per machine/JVM to reduce memory requirementsevalBatchSize
- Minibatch size for evaluationemptyEvaluations
- Evaluations to performpublic <T extends IEvaluation> T[] doEvaluationMDS(org.apache.spark.api.java.JavaRDD<MultiDataSet> data, int evalBatchSize, T... emptyEvaluations)
Evaluation
and
ROC
) at the same time.data
- Data to evaluatieevalBatchSize
- Minibatch size for evaluationemptyEvaluations
- Evaluations to performpublic <T extends IEvaluation> T[] doEvaluationMDS(org.apache.spark.api.java.JavaRDD<MultiDataSet> data, int evalNumWorkers, int evalBatchSize, T... emptyEvaluations)
public IEvaluation[] doEvaluation(org.apache.spark.api.java.JavaRDD<String> data, DataSetLoader loader, IEvaluation... emptyEvaluations)
DataSetLoader
.DEFAULT_EVAL_WORKERS
with the default
minibatch size of DEFAULT_EVAL_SCORE_BATCH_SIZE
data
- List of paths to the data (that can be loaded as / converted to DataSets)loader
- Used to load DataSets from their pathsemptyEvaluations
- Evaluations to performpublic IEvaluation[] doEvaluation(org.apache.spark.api.java.JavaRDD<String> data, int evalNumWorkers, int evalBatchSize, DataSetLoader loader, IEvaluation... emptyEvaluations)
DataSetLoader
.data
- List of paths to the data (that can be loaded as / converted to DataSets)evalNumWorkers
- Number of workers to perform evaluation with. To reduce memory requirements and cache thrashing,
it is common to set this to a lower value than the number of spark threads per JVM/executorevalBatchSize
- Batch size to use when performing evaluationloader
- Used to load DataSets from their pathsemptyEvaluations
- Evaluations to performpublic IEvaluation[] doEvaluation(org.apache.spark.api.java.JavaRDD<String> data, MultiDataSetLoader loader, IEvaluation... emptyEvaluations)
MultiDataSetLoader
.DEFAULT_EVAL_WORKERS
with the default
minibatch size of DEFAULT_EVAL_SCORE_BATCH_SIZE
data
- List of paths to the data (that can be loaded as / converted to DataSets)loader
- Used to load MultiDataSets from their pathsemptyEvaluations
- Evaluations to performpublic IEvaluation[] doEvaluation(org.apache.spark.api.java.JavaRDD<String> data, int evalNumWorkers, int evalBatchSize, MultiDataSetLoader loader, IEvaluation... emptyEvaluations)
MultiDataSetLoader
data
- List of paths to the data (that can be loaded as / converted to DataSets)evalNumWorkers
- Number of workers to perform evaluation with. To reduce memory requirements and cache thrashing,
it is common to set this to a lower value than the number of spark threads per JVM/executorevalBatchSize
- Batch size to use when performing evaluationloader
- Used to load MultiDataSets from their pathsemptyEvaluations
- Evaluations to performprotected IEvaluation[] doEvaluation(org.apache.spark.api.java.JavaRDD<String> data, int evalNumWorkers, int evalBatchSize, DataSetLoader loader, MultiDataSetLoader mdsLoader, IEvaluation... emptyEvaluations)
Copyright © 2020. All rights reserved.