AlphaSync Brings Continuous Updates to Global Protein Structure Predictions

The shape, or structure, of proteins determines how they work, making it essential for researchers to have access to high-quality structural models. St. Jude Children's Research Hospital investigators today announce AlphaSync, a free database that improves upon existing protein structure prediction resources through continuous updating. In the fast-moving fields of biomedical research, structural biology and protein science, new protein sequence information is constantly generated, allowing better prediction of protein structures. AlphaSync maintains a database of 2.6 million predicted protein structures across hundreds of species, updating as soon as new or modified sequences are available. The database's capabilities, including additional generated data, were published in Nature Structural & Molecular Biology. 

It takes considerable effort to determine a protein's structure, and prediction tools have historically been limited in their accuracy. However, in 2021, an approach called AlphaFold2 applied machine learning to enable high-accuracy structure predictions of proteins based on the sequence of their building blocks, amino acids. This resource super-charged structural biology, giving new insights into how proteins function and how mutations contribute to disease. In 2022, the AlphaFold Protein Structure Database was launched, providing predictions for nearly all catalogued protein sequences known to science at the time. However, because it does not automatically update when new protein sequences are discovered, nor when an existing sequence is corrected based on new data, the quality of the predicted models can decrease over time, leading to out-of-date structures and potentially cascading errors. 

In a rapidly evolving scientific landscape, having access to the most current and detailed information on protein structural models is essential for breakthroughs in medicine and biology. With AlphaSync, we ensure predicted protein structures stay continuously updated and enriched with key information such as amino acid interaction networks, surface accessibility and disorder status so that researchers can move from sequence to insight faster than ever before."

M. Madan Babu, PhD, FRS, senior co-corresponding author, St. Jude Senior Vice President of Data Science, Chief Data Scientist, Center of Excellence for Data-Driven Discovery director and Department of Structural Biology member

"AlphaSync performs an important job in keeping all of these predicted structures updated," said first and co-corresponding author Benjamin Lang, PhD, formerly of the St. Jude Department of Structural Biology. "The AlphaSync database ensures that the structure you are looking at matches the sequence of the protein you are working with." 

Alphasync Empowers Researchers With Up-To-Date Protein Predictions and Additional Data

AlphaSync updates its information using the latest data from UniProt, the largest database of protein sequences. It checks the database for new or modified sequences from UniProt, then runs structure predictions for proteins that have new or changed sequence information. When the researchers first performed this task, they found a backlog of 60,000 structures that were outdated, including 3% of human proteins. 

"To establish AlphaSync, we ran a massive set of structure predictions that required enormous computational power," Lang said. "Now, all of the data that we've collected in the database, and our ongoing efforts, enable scientists to look at important sites within proteins from over 200 species and be confident they reflect the latest experimental evidence and sequence information." 

In addition to updating structures, the database also provides pre-computed data and other ease-of-use features. This pre-computed data includes residue interaction networks (i.e., which amino acid contacts each other), surface area (i.e., whether an amino acid is accessible or not) and conformational state (i.e., whether the amino acid is in a structured or unstructured region). The scientists also chose to alter the data's format for ease of use to empower researchers to make discoveries.  

"3D structural information is quite a complex format, so we broke it down further into a simpler 2D tabular format, which we hope will enable more insight into individual proteins," Lang said. "In addition, this tabular format is easier for downstream machine learning applications, which will help future biomedical research projects find and understand disease mechanisms." 

"AlphaSync provides high-quality predicted protein structures along with detailed, amino acid–level information in a user-friendly format, making it easy for researchers to explore and analyze," Babu said. "We hope it not only minimizes structural and sequence inaccuracies from propagating but also enhances our understanding of proteins relevant to human disease, ultimately accelerating the development of better treatments and cures." 

Source:
Journal reference:

Lang, B., et al. (2025). AlphaSync is an enhanced AlphaFold structure database synchronized with UniProt. Nature Structural & Molecular Biology. doi: 10.1038/s41594-025-01719-x. https://www.nature.com/articles/s41594-025-01719-x

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoLifeSciences.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Rapid Cell-Free Crystallization Method Reveals 3D Structures of Flexible Sugars