The world now realizes the value of machine learning algorithms in computing and forecasting, which has fueled a boom in machine learning research. According to the data mining blog, roughly 100 machine learning papers are published in a day on Arxiv, a well-known public repository of research papers. There are more open repositories, including Paper with Code, Crossminds, Connected Papers, and others.
Papers with Code is a website organizing free access to technical published papers and providing the software used in the papers. The objective of the website is to create a free and open resource for machine learning and computer vision researchers, including machine learning papers, code, datasets, methods, and evaluation tables. These resources are provided with the support of the natural language processing and machine learning community. Papers with Code has 79,817 papers, 9,327 benchmarks, and 3,681 tasks till now, and we expect to see more state-of-art papers in the future. Some significant methods and modules are popular for research and study in these papers. Here is the list of the popular machine learning papers on Papers with Code.
List of top machine learning papers on Papers with Code
This list contains the top 10 machine learning papers available on Papers with Code.
1. TensorFlow Eager: A Multi-Stage, Python-Embedded DSL for Machine Learning
“TensorFlow Eager: A Multi-Stage, Python-Embedded DSL for Machine Learning” paper introduces TensorFlow Eager, a multi-stage Python-embedded domain-specific language for hardware accelerated machine learning suitable for both interactive research and production. TensorFlow has shown remarkable performance but requires users to represent competitions as dataflow graphs, hindering rapid prototyping and run-time dynamism. On the contrary, TensorFlow Eager excels TensorFlow and eliminates the usability cost without sacrificing the benefits of graphs. TensorFlow Eager provides a crucial front-end to TensorFlow used to execute operations immediately and a JIT tracker translating Python functions composed of TensorFlow operations. The paper concludes by providing TensorFlow Eager that is easier to interpolate between imperative and staged execution in a single package.
Link to the paper: TensorFlow Eager: A Multi-Stage, Python-Embedded DSL for Machine Learning
2. A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation
The “A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation” paper put forth a simple vision transformer design to use object localization and instance segmentation tasks. With the adaptation of vision transformer (ViT) for object detection and dense prediction tasks, many models inherited multistage designs. The multi-stage design provides a better trade-off among computational costs and effective aggregation of multiscale global contexts. This paper comprises three architectural options in ViT: spatial reduction, doubled channels, and multiscale features, demonstrating that a vanilla ViT architecture can provide a better trade-off without multiscale features. Additionally, the paper proposes a simple and compact ViT architecture called Universal Vision Transformer (UViT) to achieve high performance on common objects in context (COCO) object detection and instance segmentation tasks.
3. Visual Attention Network
“Visual Attention Network” paper proposes a novel linear attention named large kernel attention (LKA). LKA enables the self-adaptive and long-range correlations in self-attention while avoiding the three shortcomings: training images as 1D sequences neglecting their 2D structures, quadratic complexity is too expensive, and capturing only spatial adaptability but ignores channel adaptability. This research paper is authored by Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, and Shi-Min Hu. The research is an attention VAN based on LKA, which is similar to ViTs and convolutional neural networks (CNNs). This paper shows that VANs outperform ViTs and CNNs in extensive experiments for tasks like image classification, semantic segmentation, pose estimation, and more.
Link to the paper: Visual Attention Network
4. A ConvNet for the 2020s
The “A ConvNet for the 2020s” paper revolves around visual recognition tasks and introduces vision transformers or ViTs. The ViTs were found to overthrow ConNets or CNNs, the state-of-the-art image classification models in the 20s. Although ViTs showed great performance and gained popularity, ViTs got issues when applied to general computer vision tasks like objection detection and semantic segmentation. This paper contains the diligence of Zhuang Liu, Hanzi Mao, Chao-yuan Wu, Christoph Feichtenhofer, Trevor Darrell, and Saining Xie to reexamine design spaces and test a pure ConvNet. The paper shows a gradual modernization of a standard residual network (ResNet), directing it to the design of a vision transformer. The conclusion of the paper brings out a family of pure ConvNet models called ConvNeXt. ConvNeXt models are entirely constructed in ConvNet and use depthwise convolution, which can achieve accuracy and scalability similar to transformers or even outperforms the vision transformers.
Link to the paper: A ConvNet for the 2020s
5. Scikit-learn: Machine Learning in Python
“Scikit-learn: Machine Learning in Python” is an old research paper released in 2012 that proposes the famous Python module Scikit-learn or Sklearn. The module integrates a wide range of state-of-the-art machine learning algorithms for medium-scaled supervised and unsupervised learning tasks. The creation of Sklearn was done by David Cournapeau, who is a co-author of this paper, and a group of researchers known as the Scikit-learn community. Also, the Scikit-learn website provides source code, binaries, and documentation. The package aims to help non-specialists implement machine learning using a high-level language. The focus of the paper remains on the package being user-friendly, great performance, documentation, and API consistency. The outcome of the paper is the popular Scikit-learn module which makes an essential Python library for building machine learning models today.
Link to the paper: Scikit-learn: Machine Learning in Python
6. Adapting the Tesseract Open Source OCR Engine for Multilingual OCR
“Adapting the Tesseract Open Source OCR Engine for Multilingual OCR” is the machine learning paper where the authors Ray Smith, Daria Antonova, and Dar-Shyang Lee describe efforts to adapt the Tesseract open source optical character recognition (OCR) engine for multiple scripts and languages. The focus was centered on enabling generic multi-lingual operation, so there is negligible customization for new language beyond providing a corpus of text. The paper concluded to find that the Tesseract classifier easily adapts to simplified Chinese, and tests on English, European, and Russian languages were run and calculated a consistent word error rate between 3.72% to 5.78%.
Link to the paper: Adapting the Tesseract Open Source OCR Engine for Multilingual OCR
7. COLD: A Benchmark for Chinese Offensive Language Detection
To maintain a healthy and safe social platform, offensive language detection and prevention models are important. Although much research is conducted on offensive language detection and prevention, most studies only focus on English. In the paper “COLD: A Benchmark for Chinese Offensive Language Detection,” Jiawen Deng, Jingyan Zhou, Hao Sun, Fei Mi, and Minlie Huang facilitate to create and evaluate a Chinese offensive language detection model. Here, the COLDataset is used, which is a Chinese offensive language dataset consisting of 37k annotated sentences, and to detect offensive language, a baseline classifier called COLDetector is used, having 81% accuracy. The paper concludes with two main findings around the popular Chinese language models, CPM and CDialGPT. The findings are that the CPM tends to give more offensive output in comparison to CDialGPT, and certain prompts such as anti-bias sentences, trigger the offensive outputs.
Link to the paper: COLD: A Benchmark for Chinese Offensive Language Detection
8. Gender Classification and Bias Mitigation in Facial Images
Gender classification algorithms are useful in domains like demographic research, law enforcement, and human-computer interaction. Recent research showed two issues in gender classification, biased benchmark datasets result in algorithmic bias, and the emergence of gender minorities like LGBTQ and non-binary has been left out in gender classification. The paper “Gender Classification and Bias Mitigation in Facial Images” sheds light on the two issues mentioned above. Through surveys conducted under this paper, it was discovered that the current benchmark datasets lack representation of gender minority subgroups, so bias occurs. Here, two new facial image databases were created, a radically balanced inclusive database with the addition of LGBTQ subset and an inclusive gender database collectively with non-binary people. In the paper, an ensemble model was created and evaluated, producing an accuracy of 90.39%.
Link to the paper: Gender Classification and Bias Mitigation in Facial Images
9. DeepFaceLab: Integrated, flexible and extensible face-swapping framework
Deepfake defense requires research of detection and the efforts of generation methods, and the current Deepfake methods have obscure workflow and poor performance. The “DeepFaceLab: Integrated, flexible and extensible face-swapping framework” paper introduces DeepFaceLab as a solution to Deepfake defense issues. DeepFaceLab is the current dominant deepfake framework with the necessary tools to conduct high-quality face-swapping. Here, the focus is on the implementation of DeepFaceLab with detailed principles and introduces the whole control over pipeline from where one can modify to have customized pipelines. The performance of DeepFaceLab is excellent, and it can achieve cinema-quality results with high fidelity.
Link to the paper: DeepFaceLab: Integrated, flexible and extensible face-swapping framework
10. Generalized End-to-End Loss for Speaker Verification
“Generalized End-to-End Loss for Speaker Verification” paper proposes a new loss function called the generalized end-to-end (GE2E) loss. This paper found that the GE2E loss function is more efficient than previous tuple-based end-to-end (TE2E) loss functions in the training of speaker verification models. GE2E loss function updates the network focusing on examples that are difficult to verify at each step of training and so do not require the initial stage of example selection. With the GE2E loss function, the model decreases speaker verification EER by more than 10% and simultaneously reduces the training time by 60%. Also, the paper introduces the MultiReader technique allowing domain adaptation, in which a model can be trained more accurately with the support of multiple keywords and multiple dialects.
Link to the paper: Generalized End-to-End Loss for Speaker Verification