Computer Vision Datasets

Top 15 Popular

CIFAR-10 and CIFAR-100

The Canadian Institute For Advanced Research provides both CIFAR-10 and CIFAR-100. The CIFAR-10 dataset is developed by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. It has 60000 photos divided into ten classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck. CIFAR-100 is similar in that there are 60000 photos altogether, but there are 100 classes, each with 600 images.


The Modified National Institute of Standards and Technology database of handwritten digits, is among the most common datasets for computer vision, which was compiled by Professor Yann LeCun. It comprises 70000 photos of handwritten digits structured in 28×28 grayscale for each number, i.e. 0–9.


This is a computer vision dataset based on Zalando’s (a fashion retailer) article images includes a training set of 60,000 instances and a test set of 10,000. Each instance in this collection is a 28×28 grayscale image with a label from one of ten classifications, with fashion-related topics including T-shirt/top, trousers, pullover, dress, coat, sandal, shirt, sneaker, bag, and ankle boot.

Labeled Faces in the Wild

This is a computer vision open-source dataset comprising of images of people’s faces that was created to research the challenge of unconstrained facial recognition. More than 13,000 photos of faces were gathered from the internet for the data collection. Each face has been identified with the name of the individual shown. In the data set, 1680 of the persons featured had two or more different photographs.


This deep learning computer vision dataset was created jointly by Stanford University and Princeton University for the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). Only nouns are chosen in this dataset, which is based on the WordNet (A lexical database for English) hierarchy. Each node of the hierarchy has an average of over 500 images.


This image dataset for computer vision was developed by researchers at the MIT-IBM Watson AI Lab with the purpose of eliminating biases in existing image datasets. The researchers used Mechanical Turk, Amazon’s micro-task platform, to crowdsource the photographs instead of curating them from existing online sources.


This medical image classification dataset was obtained from the TensorFlow website. PatchCamelyon is a brand-new and complex image classification dataset. It is made up of 327.680 color pictures (96 x 96px) taken from lymph node histopathologic scans.

IMDB-Wiki Dataset

This is said to be the world’s biggest publicly available training dataset of face images with gender and age information. This collection comprises 460,723 face images from 20,284 IMDb celebrities and 62,328 Wikipedia celebrities, for a total of 523,051. Typically, this dataset is used for gender and age prediction tasks.


DOTA (Dataset of Object deTection in Aerial Images) is a large-scale dataset for aerial object detection which can be used to design and test object detectors using high-altitude cameras. Experts in aerial image interpretation mark the occurrences in DOTA photos using an arbitrary (8 d.o.f.) quadrilateral.

MPII Human Pose

This dataset is used to test the accuracy of estimated articulated human poses. It contains around 25K photos of over 40K humans with annotated body joints. Each image is taken from a separate YouTube video and accompanied with a description.


Microsoft’s COCO stands for Common Objects in Context and is a large-scale dataset for object detection, segmentation, and captioning. The dataset includes images from 91 different stuff categories and 80 different object categories. This dataset has over 120000 photos with over 880000 tags (each image could have several tags).

Embrapa Wine Grape Instance Segmentation Dataset

This computer vision agriculture dataset for aimed at providing images and annotation for research into object recognition and instance segmentation in viticulture for image-based monitoring and field robots. It includes examples from five distinct grape types that were harvested in the field.

Bosch Small Traffic Lights Dataset

When developing an automated driving vehicle for urban cityscapes, it is crucial that the computer vision model is efficient in vision-only based traffic light detection and tracking. This dataset comprises around 24000 annotated traffic lights and 13427 camera photos with a resolution of 1280×720 pixels.

Google Open Images

This computer vision open-source dataset from Google is a 9 million-image URL to images that have been annotated with labels spanning over 6000 categories. This computer vision dataset has 16 million bounding boxes for 600 object classes on 1.9 million images, making it the biggest collection with object location annotations currently available.

Waymo Open Dataset

The Waymo Open Dataset is the most extensive and diversified multimodal autonomous driving dataset to date. It includes images from a variety of high-resolution cameras and points clouds from a variety of high-quality LiDAR sensors, as well as 12 million LiDAR box annotations and around 12 million camera box annotations.



Top 10 Robotics Companies

Drone Manufacturing Companies in India

Largest Data Centers in the World

Produced by: Analytics Drift Designed by: Prathamesh