DeepMind is a British artificial intelligence-centric subsidiary of Alphabet Inc. Founded in September 2010 and acquired by Google in 2014, this research lab was created by Demis Hassabis, Shane Legg, and Mustafa Suleyman. Starting out by using gaming to develop state-of-the-art AI models, DeepMind has been at the forefront of many technological innovations that push the boundaries of Artificial General Intelligence. DeepMind was central to boost energy efficiency in Google’s already-optimized data centers. It also developed an AI system that used a dataset of anonymized retinal scans from Moorfields Eye Hospital patients to forecast the development of exudative Age-related macular degeneration (exAMD).
Below is a list of the top 10 most exciting innovations from DeepMind that were huge milestones for the scientific community.
1. AlphaGo
After being given millions of Go situations and plays from human-played games, AlphaGo utilizes deep learning and neural networks to effectively train itself to play. DeepMind keeps reinforcing and improving the system’s abilities by forcing it to play millions of games against modified versions of itself. This helps AlphaGo forecast the future moves by training a “policy” network, which then trains a “value” network to determine and assess those positions. AlphaGo considers all potential moves and permutations ahead of time, running through numerous scenarios before deciding on the one that is most likely to succeed.
2. AlphaGo Zero
An updated version of AlphaGo, but unlike its previous version, this software is entirely self-taught. By competing against itself, Zero honed its Go abilities. It began by making random moves on the board, but each time it won, Zero updated its own system and repeated the process millions of times. Zero was powerful enough to overcome the version of itself that beat 18-time world champion Lee Se-dol after three days of self-play, winning by a score of 100 games to nil. It also uses less processing power than its predecessor, with only four TPUs, Google’s advanced AI processors, compared to 48 in previous iterations.
3. AlphaZero
Similar to AlphaGo Zero, AlphaZero too was self-taught. Dubbed as the most popular innovation of DeepMind, AlphaZero has two components viz, Neural Network, which takes board configuration as input and outputs the board’s value, plus a probability distribution for all the possible moves. Monte Carlo Tree Search (MCTS), which is an algorithm that aids AlphaZero in analyzing board configurations and traversing nodes to determine the best next move.
After eight hours of self-play, it bested AlphaGo Zero that first beat the human world Go champion. Next, after four hours of training, it beat the 2016 TCEC (Season 9) world champion Stockfish. Later it trained for just two hours and defeated the 2017 CSA world champion version of a shogi game called Elmo.
4. AlphaStar
AlphaStar is likewise based on a reinforcement learning algorithm, in which agents generally play the game by trial and error while attempting to achieve certain goals such as winning or simply staying alive. They learn by imitating human players and then competing against one another to improve their abilities. The most powerful agents are kept, while the weakest are discarded. By the time of its presentation, AlphaStar had knowledge equivalent to 200 years of playing time-like competition.
During a pre-recorded session in January 2019, the AlphaStar system defeated top pro players 10 times in a row but ultimately lost to pro player Grzegorz “MaNa” Komincz in the final match, which was live-streamed online. By October in the same year, DeepMind made further improvements as it trained AlphaStar to play the Blizzard Entertainment game StarCraft II.
Despite having a limited view of the area of the map that a human would see, AlphaStar was able to reach Grandmaster level in the game. To match it with regular human movement, it also had a limited amount of mouse clicks, allowing it to register only 22 non-duplicated actions every five seconds of play.
5. AlphaFold
Understanding protein structures is critical for detecting and treating disorders caused by misfolded proteins, such as Alzheimer’s disease. It also brings up new possibilities for drug development. Considering experimental methods for determining protein structures are time-consuming and costly, there is a dire need for better computer algorithms that can calculate protein structures directly from the gene sequences that encode them. AlphaFold can predict the 3D shape that any protein will fold into.
AlphaFold won the 13th Critical Assessment of Techniques for Protein Structure Prediction (CASP) in December 2018, beating out 98 other competitors. It correctly anticipated the structure of 25 out of 43 proteins, leapfrogging the second-place team, which correctly predicted the structures of only three.
In November 2020, DeepMind unveiled Alpha Fold 2, a computational model that predicts how undeciphered proteins would fold based on 170,000 known protein structures. While the AlphaFold was made of convolutional neural networks, Alpha Fold 2 leveraged graph neural networks. A database including the 3D structures of nearly every protein in the human body was released in July this year by a team from DeepMind, the European Bioinformatics Institute, and others.
6. WaveNet
WaveNet is text-to-speech software that generates voices by sampling real human speech and modeling audio waveforms directly from it, as well as previously generated sounds. It analyses waveforms from a large database of human voices and recreates them at 24,000 samples per second. The final output includes details like lip-smacking and dialects to make the voice sound more ‘human.’
It was too computationally intensive for consumer products at first, but in late 2017, it was ready for use in consumer apps like Google Assistant. The team employed distillation where they reengineered WaveNet to run 1,000 times faster than our research prototype, creating one second of speech in just 50 milliseconds. In 2018, Google released Cloud Text-to-Speech, a commercial text-to-speech application based on WaveNet.
Former NFL player Tim Shaw, who suffers from Amyotrophic Lateral Sclerosis (ALS), has worked with Google AI, the ALS Therapy Institute, and Project Euphonia to improve his speech. WaveRNN was integrated with other speech technologies and a collection of previously recorded media interviews to generate a natural-sounding rendition of Shaw’s voice, which was used to help him read out a letter written to his younger self.
7. MuZero
This reinforcement learning algorithm is considered as a successor to the AlphaZero algorithm, but unlike AlphaZero, this algorithm was not given any training rules. MuZero predicts the quantities most significant to game planning such that it achieved industry-leading performance on 57 distinct Atari games and nearly equaled AlphaZero’s performance in Go, chess, and shogi during its first trial. It integrates a learned model with a Monte Carlo tree-based search (a tree is a data structure used for locating information inside a set).
The algorithm begins by receiving an input, such as an Atari screen, which is then translated into a hidden state. The hidden state is then iterated depending on the previous hidden state and a suggested next course of action. The model predicts three variables: policy (the move to playing), value function (the predicted winner), and immediate reward (the points scored by playing a move), every time the hidden state is changed. After that, the model is trained to properly predict the values of the three variables listed above.
Read More: DeepMind Trains AI Agent in a New Dynamic and Interactive XLand
8. TF-Replicator
This is a software library from DeepMind that helps researchers deploy their TensorFlow models on GPUs and Cloud TPUs with minimal effort and no previous experience with distributed systems. In other words, TF-Replicator is a framework that makes it easier to write distributed TensorFlow code for training machine learning models so that they may be deployed to a variety of cluster topologies with ease.
9. PonderNet
Neural network models are using everyday terms in ways that might be dangerously deceptive. This includes implying that the computer is doing human-like functions such as thinking, reasoning, knowing, perceiving, and wondering. Meanwhile, instead of analyzing the complexity of the problem being learned, the amount of computation run in typical neural networks is directly proportional to the size of the inputs.
This motivated DeepMind researchers to create PonderNet, a novel algorithm that teaches artificial neural networks to ponder for an indefinite amount of time before responding. This latest innovation by DeepMind increases the neural networks’ capacity to generalize outside their training distribution and answer difficult problems with greater certainty than ever before.
Pondering, in this context, refers to changing the number of network layers, and therefore the network’s compute, in order to determine if the computer should give up or continue. This is accomplished through the use of a Markov Decision Process, which is a state model in which the software calculates the likelihood that it is time to cease calculating at each layer of the network’s processing.
10. Perceiver
Perceiver is a cutting-edge deep-learning model that accepts and analyses a wide range of input data, from audio to pictures, in a manner comparable to how the human brain perceives multimodal data. It is based on transformers, which make no assumptions about the input data type. This allows it to ingest all of those sorts of input and execute the many tasks, such as image recognition, that need different types of neural networks.
Perceiver collects three types of data: pictures, videos, and point clouds, which are a collection of dots that depict what a LiDAR sensor mounted on the roof of a car “sees” of the road. Once trained, the system can do well on benchmark tests such as the famous ImageNet image recognition test; Audio Set, a Google-developed test that needs a neural net to recognize different types of audio samples from a movie; and ModelNet, a Princeton-developed test that requires a neural net to properly identify an object using 2,000 points in space.
In terms of accuracy, Perceiver outperforms the industry standard ResNet-50 neural network on ImageNet, as well as the Vision Transformer, released this year by Alexey Dosovitskiy and colleagues at Google.
Perceiver can process pictures, point clouds, audio, video, and their combinations, however, it is restricted to single classification label. As a result, Perceiver IO was created, which is a more generic version of the Perceiver model that can be used in complicated multi-modal activities like computer games.