In a new study, researchers from the University of Washington School of Medicine assert that deep learning can be used for protein molecule synthesis much more accurately and quickly than previously possible. The scientists hope this revelation will lead to many new vaccines, treatments, tools for carbon capture, and sustainable biomaterials.
Proteins are composed of lengthy chains of amino acids connected by peptide bonds. They are essential for the chemistry of the body as well as the structure of cells and communication between them. A protein’s function is determined by its shape. When protein creation goes wrong, the resulting malformed proteins cause failure to perform their crucial tasks or neurological diseases, like Alzheimer’s, Parkinson’s, Huntington’s, and Lou Gehrig’s (ALS) disease. A protein must “fold” in order to perform its function in the cell. Protein folding is the procedure by which a molecule is changed into a complicated 3D structure that can interact with its target in the cell.
The protein won’t form the correct alignment nor carry out its function inside the body if the folding is interrupted. To better understand how cells work and how misfolded proteins contribute to disease, researchers have been focusing on understanding protein folding. Better protein prediction methods will also aid in developing drugs that can target a specific topological region of a protein where chemical reactions occur.
According to the study led by biochemist David Baker at the University of Washington in Seattle, the shapes that can be found in nature are only a tiny portion of what is thought to be conceivable.
In December 2020, DeepMind’s protein structure prediction tool AlphaFold made headlines when it won the Critical Assessment of Protein Structure Prediction, or CASP, competition. The competition, which is held every two years, assesses advancement in one of biology’s most challenging problems: figuring out proteins’ three-dimensional (3D) structures strictly from their amino-acid sequence. Entries made using computer software are compared to protein structures identified using experimental methods like X-ray crystallography or cryo-electron microscopy (cryo-EM), which shoot X-ray or electron beams at proteins to produce an image of their shape.
In the 1990s, Baker’s team began creating a software called Rosetta that helps with protein folding. The software first determined an amino acid sequence corresponding to the structure researchers had originally envisioned for a novel protein, typically by fusing pieces of other proteins together.
But when created in the lab, these “first draft” proteins rarely folded into the required form and were instead locked in various ways. Therefore, another step was required to modify the protein sequence such that it would only fold into the particular desirable structure. This stage was computationally intensive because it involves modeling every possible folding scenario for various sequences.
That time-consuming procedure has been made instantaneous by employing AlphaFold. In a method known as “hallucination,” which Baker’s team created, scientists input random amino-acid sequences into a structure-prediction network; this changes the structure so that it becomes ever more protein-like, as evaluated by the network’s predictions. In simple words, this method consists of taking pieces of an existing structure and asking the AI to fill in the gaps. In a 2021 study, Baker’s group reported finding evidence that around one-fifth of the tiny, “hallucinated” proteins they produced in the lab resembled the predicted form.
In the past year, a team under the direction of Minkyung Baek, a postdoctoral scholar in the Baker lab, created software that employs deep learning to swiftly and accurately predict protein structures from sparse data. Dubbed as RoseTTAFold, the team developed this AI software as it wasn’t clear when DeepMind would make the AlphaFold software or its forecasts publicly available.
Researchers describe RoseTTAFold as a “three-track” neural network, which means it constantly considers potential three-dimensional structures of proteins, patterns in protein sequences, and how amino acids interact with one another. This architecture enables the network to collectively reason about the link between a protein’s chemical components and its folded structure by exchanging one-, two-, and three-dimensional information.
Rosetta is a revolutionary toolset for protein structure prediction, however, its success rates are really relatively low, i.e., just a small percentage of its designs successfully fold and function as intended. Deep learning models like AlphaFold and RoseTTAFold are stepping up the game!
Baker’s team divided the problem of protein design into three pieces and employed novel software solutions for each to create proteins that go beyond the proteins found in nature.
To begin, a new protein form must be created. The scientists demonstrated how artificial intelligence could create novel protein forms in two methods in an article released on July 21 in the journal Science. The first was “hallucination,” and the second was “inpainting,” which is similar to the autocomplete function seen in current search bars.
Baker and his colleagues believed they could create self-assembling proteins using hallucination that would form variously sized and shaped nanoparticles. However, none of the 150 designs worked when scientists taught microbes to create them in the laboratory.
To speed up the process, a deep-learning tool was developed simultaneously by Justas Dauparas, a machine-learning expert. This main objective was to handle the so-called inverse folding issue, which is identifying the protein sequence that matches a given protein’s overall structure. According to the September 15 edition of Science, the ProteinMPNN approach for designing protein sequences is based on deep learning and has exceptional performance in both silico and experimental tests. By changing sequences while keeping the overall shape of the molecules, it can serve as a “spellcheck” for designer proteins developed with AlphaFold and other tools.
The researchers evaluated and improved the predicted sequences using protein structure prediction algorithms and laboratory protein synthesis. Next, the scientists used X-ray crystallography to confirm the protein structures and cryo-electron microscopy to determine the proteins’ shapes. The researchers identified the structures of 30 of their novel proteins, and 27 of them matched the AI-led designs.
ProteinMPNN has a sequence recovery rate of 52.4% on native protein backbones, compared to 32.9% for Rosetta. In creating the molecules experimentally, Baker and his team had significantly more success when they used this second network on their hallucinated protein nanoparticles.
Finally, using AlphaFold, the team independently determines if the proposed amino acid sequences will likely fold into the desired shapes. According to Baker, ProteinMPNN is to protein design what AlphaFold was to the prediction of protein structure. The team explains that while protein structure prediction software is a part of the solution, it cannot provide any original ideas by itself. In order to discover any new functional proteins, you would need to comb through billions of sequences, even if you had the ideal technology for predicting how protein sequences fold.
A group from the Baker lab demonstrated in a different study published on September 15 in Science that combining ProteinMPNN and the other new machine learning methods could consistently produce proteins that worked in the lab. According to the author, you need to understand these molecules in the real world rather than just relying on the computer to build proteins correctly. According to project scientist Basile Wicky, the team discovered that proteins generated using ProteinMPNN were significantly more likely to fold up as planned, and we could construct highly complicated protein assemblies using these approaches.
According to Baker and his colleagues, synthesizing a brand-new protein in the laboratory is the true test of deep learning-based protein structure prediction approaches. This is evident from their early failure to create hallucinated protein assemblies. The scientists created a variety of novel proteins, including nanoscale rings that they anticipate could be repurposed as components for innovative nanomachines. The rings, which had widths about a billion times smaller than a poppy seed, were examined using electron microscopes.