According to two recent research published in Nature Genetics, recently created artificial intelligence (AI) tools effectively identified the function of DNA’s regulatory regions and three-dimensional (3D) structure based only on its raw sequence.
According to study author Jian Zhou, PhD, Assistant Professor in the Lyda Hill Department of Bioinformatics at UTSW, these tools may one day enable researchers to better understand how genetic mutations cause disease as well as how the spatial organization and function of chromosomal DNA in the nucleus are influenced by genetic sequence.
Taken together, these two programs provide a more complete picture of how changes in DNA sequence, even in noncoding regions, can have dramatic effects on its spatial organization and function.”
Dr Zhou, Member, Harold C. Simmons Comprehensive Cancer Center
Zhou was also a Lupe Murchison Foundation Scholar in Medical Research, and a Cancer Prevention and Research Institute of Texas (CPRIT) Scholar.
Instructions for building proteins are only encoded in roughly 1% of human DNA. Recent studies have revealed that a large portion of the non-coding genetic material still present contains regulatory components that regulate the expression of the coding DNA, like promoters, enhancers, silencers, and insulators. According to Dr Zhou, it is unclear how sequencing affects how most of these regulatory components work.
He and co-workers at Princeton University and the Flatiron Institute created the Sei deep learning model to help comprehend these regulatory elements. Sei correctly classifies these noncoding DNA snippets into 40 “sequence classes” or jobs, such as an enhancer for stem cell or brain cell gene activity.
More than 97% of the human genome is represented by these 40 sequence classes, which were created from approximately 22,000 data sets from earlier studies investigating genome control. Additionally, Sei may rank every sequence according to its expected activity in each of the 40 types of sequences and forecast how mutations would affect such activities.
The scientists were able to define the regulatory architecture of 47 traits and disorders listed in the UK Biobank database and describe how mutations in regulatory components induce particular pathologies by applying Sei to human genetics data. Such talents can aid in the systematic knowledge of the links between changes in genomic sequence and diseases and other features. This month saw the publication of the results.
In May, Dr Zhou announced the creation of a separate tool dubbed Orca, which uses DNA sequence to forecast the 3D architecture of chromosomes. Dr Zhou trained the model to build connections and assessed the model’s capability to anticipate structure at various length scales using existing datasets of DNA sequences and structural data acquired from prior studies that showed the molecule’s folds, twists, and turns.
The results demonstrated that Orca accurately predicted both small and big DNA structures based on their sequences, especially for sequences bearing mutations linked to a variety of medical disorders including a form of leukemia and limb abnormalities. The researchers’ use of Orca also allowed them to come up with fresh theories regarding how the DNA sequence affects both the local and global 3D structure.
Sei and Orca, which are both publicly accessible on web servers and as open-source code, will be used by Dr Zhou and his team to further examine the role of genetic mutations in provoking the molecular and physical manifestations of diseases. This research may one day result in new treatments for these conditions.
The National Institutes of Health (DP2GM146336), CPRIT (RR190071), and the UT Southwestern Endowed Scholars Program in Medical Science all provided funding for the Orca study.
Zhou, J. (2022) Sequence-based modeling of three-dimensional genome architecture from kilobase to chromosome scale. Nature Genetics. doi.org/10.1038/s41588-022-01065-4.