Custom Datasets

All input to the deep-learning nets – whether it’s words, images or other data – must be transformed into numbers known as vectors, in a process called vectorization. A vector is simply a one-column matrix with an extendible number of rows.


DataVec is an Apache2 Licensed open-sourced tool for machine learning ETL (Extract, Transform, Load) operations. The goal of DataVec is to transform raw data into usable vector formats across machine learning tools.

DataVec provides tools to transform images into vectors, including labelling images based on directory name and structure. DataVec also provides tools to read CSV data and transform fields to the appropriate numeric format.

DataVec examples are available in our examples.

For more information on DataVec is available here.

Chat with us on Gitter