Let us imagine for a minute the most beautiful moments of our lives? How do you mentally visualize those memories? Let us look at our surroundings. Did you see the constant changes taking place with time? The things you see, the sounds you hear, the activities you do, the decisions you make, and the thoughts in your mind always come in some sequence. These kinds of physical changes are sequential. Out of all the tools that we currently possess to model sequential data, which one has the highest accuracies in modeling such sequential data? You will find that the brain is the right answer.
Thankfully, we have a mathematical structure, the ordinary differential equations (ODEs), to capture the changes taking place with time. But do we have any computational analogies for such a magnificent learning model? Yes, the famous residual neural networks (RNNs). Then an obvious question arises — do we have a common link between both? Yes, absolutely. There is a different breed of neural models, the hybrids of both structures — Continuous-time models (CT).
CT models are mostly three types – Continuous-time RNNs, Continuous-time Gated Recurrent Units (CT-GRU), and Neural ODEs. These models are very successful in modeling time-series data from financial to medical domains. There were specific questions regarding the scope of improvement in their expressibility and limitations of their current forms to learn richer feature representations. To answer these open questions, MIT researchers unveiled a new class of a neural network that was selected in the prestigious AAAI 2021 venue. These are flexible enough to capture the data stream’s varying input distribution fed into it, even after completing their training. Such flexibility has conferred the ‘Liquid’ term to the networks, and they are now popular as liquid machine learning models.
Also Read: Concept-Whitening For Interpreting Neural Networks At Ease
In these models, there are not many successive hidden layers but only a set of differential equations. The hidden states are calculated from the states of the differential equation via an ODE solver. The authors allow updating the parameters, shared across all layers, based on the state from the ‘liquid’ machine learning model’s innate differential equations.
The design has drawn inspiration from biological neural systems. The lead author Hasani, in his doctoral thesis, had worked on designing worm-inspired neural networks. In this paper, the researcher chose a similar set of differential equations for the ‘liquid’ machine learning model that governs the neural dynamics of the nematode Caenorhabditis elegans with 302 neurons and 8000 synapses. In a related paper, the researcher had shown that bio-inspired CT models construct a Neural Circuit Policy where neurons have increased expressibility and interpretability.
To test their model, the researchers open-sourced the code for others to pit it against all other CT models. In the time-series modeling task, the model outperformed all CT models in four out of eight benchmarks. For the other four, the model was lagging with a minimal difference. The LTC model set the highest accuracy, 88.2%, on the Half-Cheetah kinematic modeling dataset. It travels the least distance amongst all other CT models in terms of trajectory length, reflecting the efficiency gains in parameter count. The smaller network results in lesser data requirement and energy cost for training but faster inference time.
As impact evaluation, the researchers claim that the liquid machine learning models can help in decision making where uncertainties prevail the most — medical diagnosis and autonomous driving. In learning richer representations with fewer nodes, they are hopeful of computationally efficient applications in signal processing domains like robot control, natural language processing, and video processing.
The researchers, however, pointed out the increased memory footprint. They also noted the absence of long-term dependencies that remains the core of all state-of-the-art sequential models. The most considerable drawback of all is the dependency on the numerical optimization of specific ODE solvers. Hence, the implementations may not be used in an industrial setting as of now.