Open Data for Deep Learning

Recent Additions

Symbolic Music Datasets

Natural-Image Datasets

Artificial Datasets

Facial Datasets

Text Datasets

Speech Datasets

  • TIMIT Speech Corpus: phoneme classification
  • MovieLens The first dataset has 100,000 ratings for 1682 movies by 943 users, subdivided into five disjoint subsets. The second dataset has about 1 million ratings for 3900 movies by 6040 users.
  • Jester: 4.1 million continuous ratings (-10.00 to +10.00) of 100 jokes from 73,421 users.
  • Netflix Prize: Netflix released an anonymized version of their movie rating dataset; it consists of 100 million ratings, done by 480,000 users who have rated between 1 and all of the 17,770 movies.
  • Book-Crossing dataset: From the Book-Crossing community. Contains 278,858 users providing 1,149,780 ratings about 271,379 books.

Miscellaneous Datasets

