New computational approach enhances the performance of bioinformatics tools

UVA Health researchers have created an innovative tool to assist scientists in distinguishing signals from noise as they investigate the genetic underpinnings of cancer and other disorders. The new technology could assist improve cancer detection by making it simpler for doctors to discover malignant cells, in addition to increasing research and potentially speeding up new treatments.

New computational approach enhances the performance of bioinformatics tools
UVA researchers Chongzhi Zang, PhD (left), and Shengen Shawn Hu, PhD, developed an important new research tool and have made it freely available. Image Credit: University of Virginia Health System.

The new tool, created by UVA’s Chongzhi Zang, PhD, and his team and partners, is a mathematical model that will help preserve the integrity of “big data” on the genetic material that makes up the chromosomes, known as chromatin. Chromatin, a mix of DNA and protein, is crucial in guiding the activation of human genes. When chromatin goes wrong, it can render a healthy cell cancerous or lead to other diseases.

Scientists can now investigate chromatin inside individual cells using a cutting-edge technology known as “single-cell ATAC-seq,” but this produces a massive amount of data that contains a lot of noise and bias. Zang’s new tool cuts through all of it, rescuing researchers from false leads and wasted time.

Even in the best of circumstances, large-scale, single-cell genomics research is like “hunting a needle in a haystack,” according to Zang. However, his new tool will make things much easier by removing a lot of bad hay.

Using the traditional way of analyzing the data, you might see some patterns that look like real signals of a particular chromatin state, but they are actually fake due to the bias of the experimental technology itself. Such fake signals can confuse scientists. We developed a model to better capture and filter out such fake signals, so that the real needle we are looking for can more easily stand out of the hay.”

Chongzhi Zang, Computational Biologist, Center for Public Health Genomics, University of Virginia Health System

Chongzhi Zang is also associated with the UVA Health Cancer Center.

About the genomics tool

Zang’s new tool is based on a number theory and cryptography concept known as “simplex encoding.” Using that, he and his coworkers were able to mathematically represent DNA sequences and, ultimately, transform the complex genome sequence into a far more straightforward mathematical form. They can then compare alternative forms to find bias and noise in the sequence data that conventional techniques cannot easily detect.

The DNA sequences’ complexity increases exponentially when they get longer. They are difficult to model because a typical dataset has millions of sequences from thousands of cells. But the simplex encoding model can give an accurate estimation of sequence biases because of its beautiful mathematical property.”

Shengen Shawn Hu PhD, Study Lead Author and Research Scientist, University of Virginia Health System

The technology performed substantially better when evaluating complex single-cell data to classify different cell types, according to tests. This is critical for basic biology research as well as disease diagnosis since doctors must identify minuscule numbers of disease cells amid much larger specimens containing tens of thousands to millions of cells.

The biases were not easy to find because they were tangled with real signals and hidden in the big data. It might not be a big deal if people are only going to pick the strongest signals from a large number of cells. But when you look at single-cell data, there are no low-hanging fruits anymore. The signals are always weak on the individual cell level, and the effect of noise and biases can be catastrophic. Bias correction is often ignored but can be vital in single-cell data analysis.”

Chongzhi Zang, Computational Biologist, Center for Public Health Genomics, University of Virginia Health System

Zang co-led numerous single-cell genomics research in investigating coronary artery disease and gut development.

The researchers have developed free, open-source software and released it online to make their new tool broadly accessible. The software can be found at

Zang concludes, “We hope this tool can benefit the biomedical research community in studying chromatin biology and genomics, and eventually help disease research. It is always exciting to see our peers use the tools we developed to make important scientific discoveries in their own research.”

Journal reference:

Hu, S. S., et al. (2022) Intrinsic bias estimation for improved analysis of bulk and single-cell chromatin accessibility profiles using SELMA. Nature Communications.


The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoLifeSciences.
Post a new comment
You might also like...
Newest variant of mScarlet3 to help track protein movements in living cells