Researchers at Meta AI have developed a new model, ESMFold, that predicts over 600 million protein structures from lesser-studied sources. ESMFold framework is a first-of-its-kind and is known to accelerate the performance of protein-folding AI by 60x.
Additionally, Meta has provided a platform, ESMFold Metagenomic Atlas, where users can instantly retrieve protein sequences.
Numerous databases published by NCBI, Joint Genome Institute, and a few others have already aided in cataloging newly uncovered protein structures. While breakthroughs in genomics have made it possible to identify the sequences of many unique proteins, this data alone cannot explain how proteins fit together to form a functional molecule.
Meta AI’s new protein-folding approach will utilize large language models to represent an initial comprehensive view of protein structures in a metagenomics database containing millions of proteins. This model allows scientists to analyze structural relationships and discover new combinations that benefit medicine and other fields.
Read More: AlphaFold2 reveals new 3D structure in rotavirus spike protein.
ESMFold works in a two-fold manner. Initially, the network is trained with an intuitive understanding of protein structures and sequences. The second step combines this information with the information containing possible protein combinations/relationships.
The model closely resembles DeepMind’s AlphaFold AI, which came earlier this year. AlphaFold AI predicted more than 200M cataloged proteins using a protein’s 1D amino acid sequence. While the ESMFold is not as accurate, it is about 60x faster in making predictions, allowing researchers to scale protein structure cataloging to much larger databases.