DataVec Filters

Using filters

Filters are a part of transforms and gives a DSL for you to keep parts of your dataset. Filters can be one-liners for single conditions or include complex boolean logic.

TransformProcess tp = new TransformProcess.Builder(inputDataSchema)
    .filter(new ConditionFilter(new CategoricalColumnCondition("MerchantCountryCode", ConditionOp.NotInSet, new HashSet<>(Arrays.asList("USA","CAN")))))
    .build();

You can also write your own filters by implementing the Filter interface, though it is much more often that you may want to create a custom condition instead.

Available filters


ConditionFilter

[source]

If condition is satisfied (returns true): remove the example or sequence
If condition is not satisfied (returns false): keep the example or sequence

removeExample
public boolean removeExample(Object writables) 
  • param writables Example
  • return true if example should be removed, false to keep
removeSequence
public boolean removeSequence(Object sequence) 
  • param sequence sequence example
  • return true if example should be removed, false to keep
transform
public Schema transform(Schema inputSchema) 

Get the output schema for this transformation, given an input schema

  • param inputSchema
outputColumnName
public String outputColumnName() 

The output column name after the operation has been applied

  • return the output column name
columnName
public String columnName() 

The output column names This will often be the same as the input

  • return the output column names

Filter

[source]

Filter: a method of removing examples (or sequences) according to some condition


FilterInvalidValues

[source]

FilterInvalidValues: a filter operation that removes any examples (or sequences) if the examples/sequences contains invalid values in any of a specified set of columns. Invalid values are determined with respect to the schema

transform
public Schema transform(Schema inputSchema) 
  • param columnsToFilterIfInvalid Columns to check for invalid values
removeExample
public boolean removeExample(Object writables) 
  • param writables Example
  • return true if example should be removed, false to keep
removeSequence
public boolean removeSequence(Object sequence) 
  • param sequence sequence example
  • return true if example should be removed, false to keep
outputColumnName
public String outputColumnName() 

The output column name after the operation has been applied

  • return the output column name
columnName
public String columnName() 

The output column names This will often be the same as the input

  • return the output column names

InvalidNumColumns

[source]

Remove invalid records of a certain size.

removeExample
public boolean removeExample(Object writables) 
  • param writables Example
  • return true if example should be removed, false to keep
removeSequence
public boolean removeSequence(Object sequence) 
  • param sequence sequence example
  • return true if example should be removed, false to keep
removeExample
public boolean removeExample(List<Writable> writables) 
  • param writables Example
  • return true if example should be removed, false to keep
removeSequence
public boolean removeSequence(List<List<Writable>> sequence) 
  • param sequence sequence example
  • return true if example should be removed, false to keep
transform
public Schema transform(Schema inputSchema) 

Get the output schema for this transformation, given an input schema

  • param inputSchema
outputColumnName
public String outputColumnName() 

The output column name after the operation has been applied

  • return the output column name
columnName
public String columnName() 

The output column names This will often be the same as the input

  • return the output column names

API Reference

API Reference

Detailed API docs for all libraries including DL4J, ND4J, DataVec, and Arbiter.

Examples

Examples

Explore sample projects and demos for DL4J, ND4J, and DataVec in multiple languages including Java and Kotlin.

Tutorials

Tutorials

Step-by-step tutorials for learning concepts in deep learning while using the DL4J API.

Guide

Guide

In-depth documentation on different scenarios including import, distributed training, early stopping, and GPU setup.

Deploying models? There's a tool for that.