Last year, researchers at MIT created a unique kind of neural network that learns while performing tasks. Dubbed liquid neural network, this deep learning model is capable of adjusting its inherent behavior after the initial training phase and was believed to be the key to bringing exceptional advancements in dynamic scenarios where conditions can change quickly — like autonomous driving, controlling robots, or diagnosing medical conditions. In other words, a liquid neural network can actively adapt to new data inputs in real-time to anticipate future behavior, allowing algorithms to make decisions based on data streams that change frequently.
The research team eventually discovered that as the models’ number of neurons and synapses grows, they become computationally costly and necessitate cumbersome computer programs to solve the underlying, complex math necessary for the algorithms. Due to the magnitude of the equations, the math problems become increasingly challenging to solve, frequently taking multiple computing steps to arrive at a solution.
On Tuesday, MIT researchers reported that they had developed a solution to that constraint, not by expanding the data pipeline, but by solving a differential equation that had puzzled mathematicians since 1907. This differential equation explains how two neurons connect through synapses and could be the key to developing new, quick artificial intelligence systems. These modes are orders of magnitude quicker and more scalable than liquid neural networks, yet they share the same flexible, causal, robust, and explainable properties. Because they are small and flexible even after training, unlike many traditional models, these neural networks could be applied to any task that requires gaining insight into data over time.
The team calls the new network the “closed-form continuous-time” neural network (CfC). In their paper published in Nature Machine Intelligence, the researchers describe a type of machine learning system called continuous-time neural networks that can handle representation learning on spatiotemporal decision-making tasks. These models are generally defined by continuous differential equations, where differential equations are used to describe the state of a system at distinct, discrete points or stages of a process. For instance, differential equations help in understanding how a body X would move from point A to point B in space with time.
The ordinary differential equation (ODE) based continuous neural network designs are expressive models helpful in modeling data with complicated dynamics. These models enable parameter sharing, adaptive computations, and function approximation for non-uniformly sampled data by transforming the depth dimension of static neural networks (SNNs) and the temporal dimension of recurrent neural networks (RNNs) into a continuous vector field.
On comparatively small benchmarks, ODE-based neural networks with careful memory and gradient propagation design outperform advanced discretized recurrent models. However, due to the employment of complex numerical differential equation solvers, their training and inference are slow. Consider the same body X now has to move from point A to point B via point C, then point D, back to Point A before pausing at point E – implying the need for costly and complex calculations. This becomes increasingly evident when the complexity of the data, task and state space rises, as in open-world issues like processing medical data, operating self-driving vehicles, analyzing financial time series, and simulating physics. In simple words, numerical differential equation solvers, impose a limit on their expressive power when used in advanced computation applications. This restriction has significantly slowed down the scaling and interpretation of many physical processes that occur in nature, such as the understanding dynamics of nervous systems.
Read More: Mechanical Neural Network: Architectured Material that adapts to changing conditions
The closed-form continuous-time neural network models preserve the impressive characteristics of liquid networks without the need for numerical integration by replacing the differential equation governing the computation of the neuron with a closed-form approximation. These networks can scale exceptionally well compared to other deep learning instances, which is a significant improvement over conventional differential equation-based continuous networks. Moreover, since these models are developed from liquid networks, they outperform advanced, recurrent neural network models in time-series modeling.
Closed-form continuous-time neural network models are causal, compact, explainable, and economical to train and predict, according to MIT Professor Daniela Rus, director of the Computer Science and Artificial Intelligence Laboratory (CSAIL) and senior author of the paper. They also pave the way for reliable machine learning in safety-critical applications and can solve the issue with an even less number of neural nodes, making the process quicker and less computationally costly.
While evaluating the performance in making predictions and finishing tasks, it has already outperformed a number of other artificial neural networks. It also executes faster and with better accuracy when identifying human activities from motion sensors, simulating the physical dynamics of a walker robot, and performing event-based sequential image processing. On a sample of 8,000 patients, the Cfc’s medical predictions were 220 times faster than their equivalents.
MIT researchers are optimistic that they will be able to create models of the human brain that measure the millions of synaptic connections using closed-form continuous-time neural networks, which is now not conceivable. The team also speculates that this model could be able to automatically generalize outside of its distribution (out-of-distribution generalization) by using the visual training it acquired in one environment to solve problems in a completely another one.