Statistical methodology uncovers changes in genomic data

A novel statistical technique makes it easier to find biologically significant changes in genomic data that span multiple conditions, like cell types or tissues.

Statistical methodology uncovers changes in genomic data
A new statistical method called CLIMB provides a more efficient way to uncover biologically meaningful changes in genomic data that span multiple conditions—such as gene regulation across tissue types. CLIMB first conducts a series of pairwise analyses. Then, it starts to construct “association vectors” that include all possible combinations of a subject’s status across each condition. Then, CLIMB eliminates any combination from the association vector that the pairwise analyses in Step 1 do not support. Finally, CLIMB performs a joint analysis, clustering together subjects that follow the same pattern across conditions. Image Credit: Qunhua Li research group/Penn State.

Whole genome studies provide massive volumes of data, ranging from millions of individual DNA sequences to details about where and how many of the thousands of genes are expressed, as well as the location of functional elements throughout the genome. Because of the volume and complexity of the data, contrasting different biological situations or research conducted by different labs can be statistically difficult.

The difficulty when you have multiple conditions is how to analyze the data together in a way that can be both statistically powerful and computationally efficient. Existing methods are computationally expensive or produce results that are difficult to interpret biologically.”

Qunhua Li, Associate Professor, Statistics, The Pennsylvania State University

Qunhua Li adds, “We developed a method called CLIMB that improves on existing methods, is computationally efficient, and produces biologically interpretable results. We test the method on three types of genomic data collected from hematopoietic cells—related to blood stem cells—but the method could also be used in analyses of other ‘omic’ data.

The CLIMB (Composite LIkelihood eMpirical Bayes) technique is described in the research published in the journal Nature Communications.

In experiments where there is so much information but from relatively few individuals, it helps to be able to use information as efficiently as possible. There are statistical advantages to be able to look at everything together and even to use information from related experiments. CLIMB allows us to do just that.”

Hillary Koch, Senior Statistician, Moderna

Hillary Koch was a graduate student at Penn State at the time of the research.

To analyze data across multiple circumstances, the CLIMB method employs ideas from two conventional methodologies. One method employs a series of pairwise comparisons between conditions, but it becomes extremely difficult to comprehend as more conditions are added.

A different method integrates each subject’s activity pattern across conditions into an “association vector,” such as a gene that is up-regulated, down-regulated, or unchanged in each of numerous cell types. The association vector represents the pattern of condition specificity directly and is simple to interpret.

However, because many distinct combinations are feasible even when just a few requirements are present, the calculations are tremendously computationally intensive. To address this issue, the second approach makes assertions about how to streamline the data that are not necessarily correct.

CLIMB uses aspects of both of these approaches. We ultimately analyze association vectors, but first we use pairwise analyses to identify the patterns that are likely to exist up front. Rather than making assumptions about the data, we use the pairwise information to eliminate combinations that the data don't strongly support. This dramatically reduces the space of possible patterns across conditions that would otherwise make the computations so intensive.”

Hillary Koch, Senior Statistician, Moderna

After the compilation of the reduced set of possible association vectors, the approach groups together subjects who exhibit the same pattern across conditions. For instance, the findings could reveal sets of genes that are up-regulated in some cell types but down-regulated in others.

The researchers evaluated their method on data acquired from tests using RNA-seq, a technique that can quantify the amount of RNA produced by all the genes expressed in a cell, to see if particular genes help define which sorts of cells the hematopoietic stem cell eventually becomes.

Li says, “Compared to the popular pair-wise method, our results are more specific. Our gene list is more succinct and biologically more relevant.”

While the classic pair-wise method yielded a list of six to seven thousand genes of interest, CLIMB yielded a significantly smaller list of two to three thousand genes, with at least a thousand of those genes being detected in both analyses.

The different blood cell types have a variety of functions—some become red blood cells and others become immune cells—and we wanted to know which genes are more likely to be involved in determining each distinct cell types,” stated Ross Hardison, T. Ming Chu Professor of Biochemistry and Molecular Biology at Penn State.

Ross Hardison adds, “The CLIMB approach pulled out some important genes; some of them we already knew about and others add to what we know. But the difference is these results were a lot more specific and a lot more interpretable than those from previous analyses.”

CLIMB was also applied to data generated by a distinct experimental approach, ChIP-seq, which can determine where particular proteins attach to DNA along the genome. They investigated how the binding of CTCF, a transcription factor that aids in the establishment of interactions required for gene regulation in the cell nucleus, changes or does not vary among 17 cell populations derived from the same hematopoietic stem cell. The CLIMB research reveals several categories of CTCF-bound sites, some of which imply involvement for this transcription factor in all blood cells and others in particular types of cells.

Finally, the researchers compared the accessibility of chromatin—a complex of DNA and proteins—in 38 human cell types using data from yet another experimental technology named DNase-seq, which can detect regulatory areas.

Koch notes, “For all three tests, we wanted to see if our results had biological relevance, so we compared our results against independent data, such as studies of high-throughput sequencing of histone modifications and transcription factor footprinting.”

In each case, our results correspond with these other methods. Next, we would like to improve the computational speed of our method and increase the number of conditions it can handle. For example, chromatin-accessibility data are available for many more cell types, so we’d love to increase the scale of CLIMB,” concluded Koch.

Journal reference:

Koch, H., et al. (2022) CLIMB: High-dimensional association detection in large scale genomic data. Nature Communications.


Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Penn State University. (2022, December 20). Statistical methodology uncovers changes in genomic data. AZoLifeSciences. Retrieved on April 23, 2024 from

  • MLA

    Penn State University. "Statistical methodology uncovers changes in genomic data". AZoLifeSciences. 23 April 2024. <>.

  • Chicago

    Penn State University. "Statistical methodology uncovers changes in genomic data". AZoLifeSciences. (accessed April 23, 2024).

  • Harvard

    Penn State University. 2022. Statistical methodology uncovers changes in genomic data. AZoLifeSciences, viewed 23 April 2024,


The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoLifeSciences.
Post a new comment

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Climate change is no more a problem for diary farmers, suggests study