In recent years, artificial intelligence (AI) has transformed structural biology by predicting the shapes of many proteins more accurately than previous computational methods. For decades, scientists have relied on protein structure models to understand how they function in the body.
AI and computational approaches fall short when proteins aren’t rigid but shift and bend as they work. That was the main finding from a new study by a team of researchers from Duke and Lawrence Berkeley National Laboratory (LBNL), led by Bruce Donald, PhD, professor of biochemistry and James B. Duke professor of computer science, Terrance Oas, PhD, professor of chemistry and biochemistry at Duke, and Susan Tsutakawa, PhD, biochemist and Department Head, Structural Biology at LBNL.
The Critical Assessment of Techniques for Protein Structure Prediction, or CASP, is the world championship of protein structure prediction, and has been held every two years since 1994. At CASP16, in 2024, the experiments did not go as planned.
“We ran the first-ever blind competition to predict ensembles of flexible protein structures,” Donald said. “Previously, models for flexible proteins existed, but they weren't blind. This time, nobody knew the answers ahead of time, and no one performed well.”
Movement can be part of how proteins function and impacts how drugs interact with them. Some proteins have regions that constrain them in terms of distance but allow for full flexibility, like a charging cord between the outlet and a phone. Other proteins have engineered constrained flexibility, like wings on a bird. The wings can move but only in a limited direction to enable flight. Knowing which flexibility constraints a protein has is essential for figuring out how the proteins work. If models can’t capture that, scientists risk missing key details.
Donald’s team found every computational technique tested – including AI, machine learning, and molecular dynamics – failed to accurately predict the structures of flexible proteins. “This is a huge surprise,” Donald said.
To tackle this problem, Donald and colleagues developed new techniques that used statistics to measure how far off a prediction is, and the individual factors in the model that could be modified to improve the predictions.
“The integration of two experimental techniques, one angular and one that combined distance and rotation, provided different perspectives,” Tsutakawa said. “While both techniques captured the constrained flexibility, the two perspectives enabled more rigorous assessment.”
Because most medicines target proteins whose disease-driving behavior depends on flexible motion, understanding that motion is key to better therapies, making the CASP16 results a turning point rather than a setback. As the first blind benchmark of its kind, the results of this study offer a clear standard the team hopes will accelerate innovation in drug design.
"I was glad for this chance to show the community why it can help to describe flexibility using just a few numbers to tune a probability distribution, rather than thousands or millions of numbers to try to capture all the places each atom might be. We hope researchers can use this way of looking at flexibility to build better prediction models," said Allen McBride, PhD, Postdoctoral Associate in the Donald Lab at Duke and first author on the study.
Next steps include more research focused on improving the models along with continued collaboration with other research teams that participate in CASP. “Having informed the community of our assessment, we hope not only to develop better prediction algorithms, but also to run future competitions to see if the community has improved in their computational techniques,” Donald said.
This study was supported by the National Institutes of Health.