Machine learning is a booming industry, and reports by Fortune Business insights proved that the global machine learning market exhibited a higher growth of 36.1% in 2020 compared to 2019. This gained trend in machine learning has resulted in high demand for automated machine learning (AutoML) tools, which makes the use of ML algorithms & models easier. The goal of AutoML tools is to efficiently automate all regular, manual, and tedious workloads of ML implementations. It provides the techniques to automatically find the best performing ML model for a given dataset. Today, several AutoML software and platforms are available online for aspiring AI & ML developers and professionals.
AutoML tools enable users to train and test their models with minimal domain knowledge of either machine learning or their data.
This article provides a list of the best AutoML tools. To be noted, no ranking of AutoML tools is done.
- Auto-Sklearn
Auto-Sklearn is an open-source automated machine learning tool built around Sklearn. Sklearn (Scikit-learn) is an open-source machine learning library built upon Scientific Python (SciPy), providing a range of supervised and unsupervised learning algorithms. Conventionally, the extensions or modules in SciPy are named SciKits, thus the name Scikit-learn. Auto-sklearn was first introduced in an article named ‘Efficient and robust automated machine learning’ in 2015. Later in 2020, the second version of Auto-Sklearn was developed on GitHub and presented in the paper ‘Auto-Sklearn 2.0: The Next Generation’.
Auto-sklearn processes the tasks like feature selection, data preprocessing, hyperparameter optimization, model selection, and evaluation. This toolkit is a drop-in replacement for sklearn estimators and classifiers. It works on ML problems by defining them as CASH (Combined Algorithm Selection and Hyperparameter optimization) problems. Auto-sklearn works to automatically search for the right algorithm to implement for a dataset and apply optimization techniques to its hyperparameters. The automated process is based on the usage of Bayesian optimization with meta-learning. The goal is to find the optimal model pipeline and ensemble from the individual model pipelines. Auto-sklearn consists of 15 classification algorithms, 14 feature preprocessors and performs data scaling, encoding categorical parameters, and also handle missing values.
- Auto-PyTorch
Auto-PyTorch is one of the best AutoML tools, a machine learning automation toolkit built on top of PyTorch. PyTorch is an open source ML framework formulated on the Torch library, with applications in computer vision and natural language processing (NLP). The focus of the AutoML framework is enhanced with Auto-PyTorch by jointly combining optimization of traditional ML pipelines and the neural architecture. It has a similar API to Auto-sklearn and thus, requires only a few inputs to fit a DL pipeline, and it was designed to support tabular data and time series data. The workflow of Auto-PyTorch is a combined effort of multi-fidelity optimization with portfolio construction for meta-learning and ensembling of deep neural networks. Auto-PyTorch features were explained in the papers ‘Auto-PyTorch Tabular: Multi-Fidelity metalearning for Efficient and Robust AutoDL’ and ‘Efficient Automated Deep Learning for Time Series Forecasting.’Â
- Auto-Keras
Auto-Keras is an open-source software library that implements AutoML for deep learning models using the Keras API. Keras is a high-level API that runs on top of TensorFlow and is written in Python. Auto-Keras automatically searches for architecture and hyperparameters of DL models by using Keras models via TensorFlow tf.keras API. The searching process in Auto-Keras is known as neural architecture search (NAS) and can be labeled as a modeling task. DATA Lab developed Auto-Keras intending to make machine learning accessible to everyone.Â
Auto-Keras works on a simple and effective way of finding the top-performing models among various predictive modeling tasks, making it one of the best AutoML tools. It supports several tasks, including classification & regression of images, text, and structured data. The current version is a pre-release version of Auto-Keras, as this package is still evolving. Thus, tasks like time series forecasting, object detection, and image segmentation are under development and will be available in the future version of Auto-Keras. It provides an easy-to-use interface where the user is only required to specify the location of data and number of models to try and is returned the most appropriate model that can achieve the best results. These days the usage of Auto-Keras is at the user’s own risk if any loss of data and libraries occurs because it is not providing warranties on the ‘as available’ basis.
- Google Cloud AutoML
Google Cloud AutoML is an automated machine learning software that lets you train custom ML models without coding. Google Cloud AutoML was announced as a suite of machine learning products by Google in 2018. It provides simple, secure, and flexible products with an easy GUI (graphical user interface). One of the latest products in Google Cloud AutoML is Vertex AI which helps you to build, deploy, and scale ML models faster. It contains a pretrained and custom tooling within a unified AI platform. The advantages of Google Cloud AutoML are that it builds with groundbreaking ML tools powered by Google, deploying the models faster with 80% fewer lines of code, and using of MLOps tools for easy management of data and models.Â
- DataRobot
DataRobot stands out among the best AutoML tools because it is an AutoML platform that manages and simplifies complex enterprise workflows. You can use the platform for any function to make predictions, perform what-if analysis, and automate & optimize model creation. It also helps executives manage value by giving real-time insights into how many models are running in production. DataRobot was invented by Jeremy Achin and Tom de Godoy in 2012 to automate tasks needed to develop artificial intelligence and machine learning applications.
The features of DataRobot include data formatting, feature engineering, model selection, hyperparameter tuning, and monitoring. Additionally, it helps us to understand the important variables for prediction and evaluate the significance of different variables in determining the target variable. The deployment of models in DataRobot is a simple process. Once the instructions are given, it pulls the REST APIs, and then ML operations enable you to check the model’s accuracy and behavior with production data. DataRobot also offers pertained models, a data catalog, and a user-friendly GUI to visualize the entire training and deployment process. Hence, DataRobot provides complete transparency into the workflow and monitors it from a single place.
- BigML AutoML
BigML AutoML tool is an automated machine learning platform that offers to build and share the datasets & models. It can be considered software as a service (SaaS) that provides services to build complex ML-based solutions affordably by processing predictive patterns from the data into usable real-life intelligent applications. The functions of BigML include private deployments and a rich toolkit helping the customers to create, experiment, automate and manage machine learning workflows. In 2011, the team of BigML company headquartered in Corvallis, Oregon, and in Valencia, Spain, founded BigML to make machine learning easy and approachable for everyone. The company serves around 169,000 users across the world and promotes machine learning in academia through an education program in over 700 universities.
BigML is one of the best AutoML tools contributing to being a consumable, programmable, and scalable ML platform that gives easy solutions. And automates classification, regression, time series forecasting, cluster analysis, anomaly detection, association discovery, and topic modeling tasks. BigML offers various features in many ways to load raw data, including cloud storage systems, clustering algorithms & visualization, and flexible pricing. The main modes of services BigML AutoML provides are web interface, command line interface, and API.
- H2O AutoML
H2O AutoML,also known as H2O, is an open-source distributed in-memory machine learning platform with linear scalability which supports statistical and machine learning algorithms. It is designed to have minimum parameters so that the process of automated training and tuning models is more accessible. All you need to do in H2O is point to the dataset, identify the response column, and specify a time constraint. H2O.ai developed H2O platform with cutting-edge and distributed implementation of many ML algorithms. H2O supports these algorithms in Java, Python, Spark, Scala, and R. It also has a web GUI that uses JSON, and the models trained can be deployed on Spark server, AWS etc.Â
Among the best AutoML tools, the advantage H2O has is that it automates the steps of basic data preprocessing, model training & tuning, and ensemble & stacking of models to provide the best-performing model. Thus, users are free to focus on other tasks like data collection, feature engineering, and deployment. There are several impressive functionalities of H2O such as providing necessary data processing capabilities, training a random grid of algorithms, individual models tuned using cross-validation, training two stacked ensembles, returning a sorted ‘Leaderboard’ of all models applied, and easy export of all models to production.
- MLBox
MLBox is a dynamic and automated machine learning python library focusing on drift identification, entity embedding, and hyperparameter optimization. It has been developed and used by active community members of MLBox and is compatible with Linux, macOS & Windows operating systems, and Python versions 3.5 to 3.7 & 64-bit only. The main features of MLBox are to provide fast reading & distributed data processing/cleaning/ formatting, robust feature selection & leak detection, accurate hyperparameter optimization, and state-of-art predictive models.Â
A fully automated MLBox pipeline has three components — initialization, validation, and application. The tasks are further divided into three steps. In initialization, raw data goes through preprocessing, cleaning, and encoding. Then in validation, MLBox performs feature engineering & selection and model tuning. Lastly, in application, the whole pipeline is fitted to predict the target and perform model interpretation.