In machine learning and deep learning we can’t do anything without data. So the people that create datasets for us to train our models are the (often under-appreciated) heroes. Some of the most useful and important datasets are those that become important “academic baselines”; that is, datasets that are widely studied by researchers and used to compare algorithmic changes. Some of these become household names (at least, among households that train models!), such as MNIST, CIFAR 10, and Imagenet.
We all owe a debt of gratitude to those kind folks who have made datasets available for the research community. So fast.ai and the AWS Public Dataset Program have teamed up to try to give back a little: we’ve made some of the most important of these datasets available in a single place, using standard formats, on reliable and fast infrastructure. For a full list and links see the fast.ai datasets page.
fast.ai uses these datasets in the Deep Learning for Coders courses, because they provide great examples of the kind of data that students are likely to encounter, and the academic literature has many examples of model results using these datasets which students can compare their work to. If you use any of these datasets in your research, please show your gratitude by citing the original paper (we’ve provided the appropriate citation link below for each), and if you use them as part of a commercial or educational project, consider adding a note of thanks and a link to the dataset.