Monday, February 26, 2024
HomeData ScienceTop Facial Recognition Datasets

Top Facial Recognition Datasets

This is a list of top facial recognition datasets that can be used and studied for facial recognition projects. 

Facial recognition has become a part of our daily life in mobile phones, computers, biometrics, and more, providing a sense of personal security. Computer vision is the new age of technology that powers facial recognition and sometimes outperforms humans in the facial recognition solution of face detection, analysis, and recognition. The algorithms use computer vision techniques to map, examine, and verify to identify a face in a picture or a video. Thereby, we rely on facial recognition along with biometrics greatly for information security, access control, and surveillance systems. According to Allied Market Research, the global facial recognition market has been increasing since the COVID-19 pandemic and is predicted to reach $16.74 billion by 2030 at CAGR of 16.0%. This will lead to significant advancements in computer vision, particularly for facial recognition, and the idea to pursue a profession in computer vision is a good idea, or learn computer vision out of curiosity. You can practice model building using the listed facial recognition datasets to get started.

1. Flickr Faces HQ (FFHQ) Dataset

Flickr Faces HQ dataset is a high-quality image dataset of human faces created in 2019 as a benchmark for generative adversarial networks (GAN) in the research paper “A Style-Based Generator Architecture for Generative Adversarial Networks” by Tero Karras, Samuli Laine, and Timo Aila. This facial recognition dataset comprises 70,000 high-quality PNG images at 1024×1024 resolution and has age, ethnicity, and image background variations. The images collected in this dataset are crawled from Flickr, an American image hosting and video hosting service. To note, under NVIDIA Research Licensing, the dataset is not intended to be used in any development or improvement of facial recognition projects and technologies. 

Download the dataset: FFHQ 

2. Tufts Face Dataset

Tufts Face dataset is a comprehensive and large-scale facial recognition dataset containing over 10,000 images and have seven image modalities, including visible, near-infrared, thermal, computerized sketch, LYTRO, recorded video, and 3D images. The Tufts Face dataset collected images from more than 15 countries, of which 74 are females, 38 are males and an age range from 4 to 70 years. The dataset was created in 2019 and made available to researchers worldwide to use for non-commercial and educational purposes benchmarking facial recognition algorithms such as sketches, thermal, NIR (near-infrared), 3D face recognition, and heterogamous face recognition. 

Download the dataset: Tufts Face

3. Real and Fake Face Detection Dataset

The Real and Fake Face Detection dataset is a face dataset created by Computer Intelligence and Photography Lab at Yonsei University in 2019. The dataset is well known for its high-quality photoshopped fake face images generated by experts. The collection of real and fake face images are put in separate files under the parent directory as training_real and training_fake files and contains around 1000+ real face images and 900 fake face images. The images in the Real and Fake Face Detection dataset vary for different face sizes and the features of the eyes, nose, mouth, and whole face. Also, the fake face images have a label for recognizable difficulty ranging from easy, mid, and hard. 

Download the dataset: Real and Fake Face Detection

Read more: Clearview AI Releases 2.0 Version of its Facial Recognition Platform

4. Multi-Attribute Labelled Faces (MALF) Dataset

Multi-Attribute Labelled Faces dataset is the first face dataset supporting fine-grain evaluation of face detection in the wild. The dataset contains 5,250 images with 11,931 labelled faces collected from the internet and introduced in 2015 in the paper “Fine-grained Evaluation on Face Detection in the Wild” by Zhen Lei, Bin Yang, Junjie Yan, and Stan Z.Li at the Chinese Academy of Sciences. The dataset has two main features that the annotations or labels of multiple facial attributes make it possible for fine-grained performance analysis, and it reveals the true performance of algorithms in practice. 

Download the dataset: MALF

5. Wider Face Dataset

Wider Face dataset is one of the biggest large-scale face detection benchmark datasets containing rich annotations, including poses, event categories, face bounding boxes, and more. The dataset was created in 2018 by the Multimedia laboratory at the Chinese University of Hong Kong. The Wider face dataset contains 32,203 images and labels 393,703 faces with scale, pose, and occlusion variety. Additionally, the dataset follows an event class-based organization with 61 event classes, and for each event class, random sets are selected in the ratio of 40%/10%/50% for training, validation, and testing.

Download the dataset: Wider Face

6. Face Detection Dataset and Benchmark (FDDB) Dataset

Face Detection Dataset and Benchmark dataset is a facial recognition dataset designed to study the problem of unconstrained face detection. The dataset was created by the Department of computer science at the University of Massachusetts Amherst and introduced in the paper, “FDDB: A Benchmark for Face Detection in Unconstrained Settings.” This dataset contains 2845 images from the Labelled Faces in the Wild dataset, with annotations for 5171 faces. FDDB can be challenging to work with as it includes difficult pose angles, out-of-focus faces, low-resolution images as the resolutions of images varies greatly, and greyscale and color images. 

Download the dataset: FDDB

7. Google Facial Expression Comparison Dataset

Google Facial Expression Comparison dataset is a facial recognition dataset created and introduced by Raviteja Vemulapalli and Aseem Agarwala, Research scientists at Google. This is a large-scale facial expression dataset with face image triplets and human annotations specifying which two faces show the most similar expression. It contains more than 156k face images with 500k triplets. The dataset is intended for researchers who are interested in facial expression analysis, including emotion classification, expression synthesis, and more expression-based analysis. It was published in 2018 and focuses primarily on discrete emotion classification or action unit detection than plane old face expression datasets. 

Download the dataset: Google Facial Expression Comparison

Read more: Deep Learning Enables AI to see in visible spectrum in ‘Dark’

8. Face Images with Marked Landmark Points Dataset

Face Images with Marked Landmark Points dataset is a facial recognition dataset containing 7049 images with up to 15 key points. This dataset is a primary face dataset that can be used as a building block in various face recognition projects like tracking faces in images and videos, detecting dysmorphic facial signs for medical diagnosis and biometrics, and so on. The dataset was published in 2018 by Kaggle and was provided by Dr. Yoshua Bengio of the University of Montreal. 

Download the dataset: Face Images with Marked Landmark Points

9. Labelled Faces in the Wild (LFW) Dataset

Labelled Faced in the Wild dataset is one of the most popular facial recognition datasets. The dataset is a public benchmark for face verification, also called pair matching, containing web images of 13233 images of 5749 people and 1680 people with two or more images. LFW was published in 2018 by the University of Massachusetts and was designed to study the problem of unconstrained face recognition. This dataset provides information for supervised learning with image-restricted and unrestricted training modules. Also, the modeling results using the LFW dataset are promising. To date, 123 models have been applied to the dataset, and the results have been publicly released on the website. 

Download the dataset: LFW

10. YouTube Faces with Facial key points Dataset

YouTube Faces with facial key points is a processed version of the YouTube Faces Dataset, a collection of short videos of celebrities downloaded from YouTube. The dataset comprises around 1293 videos with up to 240 frames for each video and 155,560 single image frames. It was created and uploaded by Dr. Guillermo on Kaggle, the inspiration came from the Face Images with Marked Landmark Points and was intended for facial recognition across videos. The dataset can be used for test transfer learning between other face datasets, other face recognition projects like animal face detection, and many more.

Download the dataset: YouTube Faces with facial key points

11. CelebFaces Attributes (CelebA) Dataset

CelebFaces Attributes dataset is a large-scale face attributes dataset having more than 200k+ celebrity images each with 40 attributes. The diversity of images in CelebA is vast, the dataset comprises 10,177 identities and five landmark locations, and rich annotations. CelebA can be used in many facial recognition projects, including face classification, face detection, face editing and synthesis, face localization, and so on. The dataset was released in 2015 by Multimedia Lab at the Chinese University of Hong Kong for non-commercial research purposes. The CelebA dataset was used in the paper “Deep Learning Face Attributes in the Wild,” which may provide more insight into the dataset. 

Download the dataset: CelebA

Bonus: iQIYI-VID dataset

The iQIYI-VID dataset is the largest in size among this list of facial recognition datasets due to the presence of face videos. The face videos make it unique and challenging to handle compared to other facial recognition datasets. It is a large-scale dataset for multi-modal person identification comprising 600k videos of 5,000 celebrities collected from the website iQIYI, a Chinese online video platform. All the clips in the dataset pass through a careful human annotation process, and the error rate of labels to be lower than 0.2% is considered a part of it. The dataset was introduced in the paper “iQIYI-VID: A Large Dataset for Multi-modal Person Identification” by iQIYI Incorporation. The dataset is not available yet but will soon be public. 

Subscribe to our newsletter

Subscribe and never miss out on such trending AI-related articles.

We will never sell your data

Join our WhatsApp Channel and Discord Server to be a part of an engaging community.

Gareema Lakra
Gareema Lakra
I'm an aspiring writer in the field of data science with enthusiasm for tech and content development. I have a master's degree in computer science. Besides work, I'm a bibliophile and dive into the fantasy genre whenever I can.


Please enter your comment!
Please enter your name here

Most Popular