Structural biology is concerned with the molecular structure of biological macromolecules as proteins, RNA, and DNA. It is a branch of molecular biology, biochemistry, and biophysics. Macromolecules are responsible for carrying out most cell functions, but they can perform their functions only by coiling into specific three-dimensional shapes. Scientists can’t see the structure of biomolecules even with the most advanced light microscopes because they are too small to see in detail.
However, since the development of technology and AI, it has become slightly easy to determine the 3D structures of biological molecules. Recent research in structural biology at Stanford was able to determine the structure of proteins and RNAs accurately. The study is published in two papers: Hierarchical, rotation-equivariant neural networks to select structural models of protein complexes that appeared in Proteins in December 2020, and the second account is titled Geometric deep learning of RNA structure that appeared in Science on August 27, 2021.
Ron O. Dror, Ph.D., associate professor of computer science, led the first study published in Proteins. The second study was co-led by Dror and Rhiju Das, Ph.D., associate professor of biochemistry. Assisting in both studies were Stanford University Ph.D. students, Stephan Eismann and Raphael Townshend. Both the studies used an ML Algorithm to predict the 3D structures of biological molecules accurately.
“Structural biology, which is the study of the shapes of molecules, has this mantra that structure determines function,” said Townshend. The accurate prediction of the molecular structure has implications in informed drug design practices and fundamental biological research. It also allows researchers to explain how different molecules work.
The researchers let the algorithm discover what features make a structural prediction more or less accurate to ensure there is no bias towards certain features if given as an input. “The problem with these hand-crafted features in an algorithm is that the algorithm becomes biased towards what the person who picks these features thinks is important, and you might miss some information that you would need to do better,” said Eismann.
In this process, the algorithm recovered features that researchers knew and also discovered new characteristics. After applying the ML algorithm to proteins, researchers tested it on ‘RNA puzzles.’ The tool outperformed all the other puzzle participants.
But why does a protein’s shape matter? The structure or shape of a protein determines its interaction with other molecules and also its function. For instance, we know now that antibodies are shaped like a Y, and DNA polymerase III is donut-shaped. The Y shape of antibodies helps the immune-system protein bind with foreign molecules such as bacteria or viruses with one end while supplying other immune-system proteins with the other. Whereas misfolded or misshapen proteins lead to diseases, and they stop functioning correctly. Parkinson’s disease, Alzheimer’s disease, and cystic fibrosis are examples of diseases caused by misfolded proteins.
Structure-based understanding of proteins is imperative for developing certain drugs as they work by either supporting or blocking the activity of specific proteins. For instance, researchers will have to use structures to understand how two proteins work together to turn off or alter one protein. This method was used to develop protease inhibitors, anti-HIV drugs. Since HIV protease keeps the virus alive, researchers used the structure design to determine molecules that block HIV protease.
In the study by Stanford researchers, the resulting scoring function substantially outperformed previous methods. The algorithm could consistently produce the best results in community-wide blind RNA structure prediction challenges. The ML algorithm uses only atomic coordinates as inputs and was trained in only 18 currently known RNA structures. Yet, it could effectively overcome a major limitation of standard deep neural networks in structural biology. The algorithm was initially used to determine protein structure, and it doesn’t use any RNA-specific information. The approach can apply to solving diverse problems in biochemistry, structural biology, materials science, and beyond.