The human genome is divided into a "coding" genome containing approximately ~23,000 protein-coding genes and a "dark" genome that does not encode proteins. Although non-coding genome regions do not provide instructions for making proteins, they contain key hereditary information from a regulatory point of view.
Image Credit: majcot/Shutterstock.com
The identification and interpretation of the genetic variation located within 'dark' genome regions is a fundamental issue for understanding developmental biology and its implications for human health.
What does 'the Dark Genome' mean?
The dark genome is a vast concept that involves non-coding genomic regions capable of regulating gene expression and may also apply to protein-coding regions that have been identified but whose biological importance is not yet known. These regulatory genome regions include nucleotide sequences transcribed to non-coding RNAs (ncRNAs) and Transposable Elements (TEs).
The Encyclopedia of DNA Elements (ENCODE) Project is a major public research project aimed at identifying functional elements present in the human genome. Over the last decade, it has been discovered that over 98 percent of the human genome does not code for protein. Moreover, more than 23,000 human genes have already been identified, most of which have unknown functions.
Navigating the Dark Genome: New Horizons in Functional Genomics
Recent advancements in genome sequencing and assembly have enabled the identification and characterization of regulatory regions in mammalian genomes. Although non-coding genomic regions were initially considered 'junk DNA' without function, they have been found to play important regulatory roles in gene expression over the last decade.
Moreover, it is now increasingly clear that the functional importance of the dark genome tails in its regulatory non-coding RNAs (ncRNAs) and Transposable Elements (TEs), which were largely overlooked in genetics studies.
Non-coding RNAs (ncRNAs) are a heterogeneous group of RNA transcripts that do not translate into proteins. It has been shown that ncRNAs can play key regulatory roles in the transcription of protein-coding genes both at a post-transcriptional and translational level. These RNA molecules can bind to target complementary messenger RNA (mRNA) transcripts and/or DNA nucleotide sequences to control gene expression.
ncRNAs are arbitrarily classified according to their size into small non-coding RNAs (sncRNAs, <200 nucleotides-long) and long non-coding RNAs (lncRNAs, >200 nucleotides-long). Depending on their mechanisms of action and biogenesis, sncRNAs can be further classified into different types, including, among others, endogenous small interfering RNAs (endo-siRNAs), microRNAs (miRNAs), piwi-interacting RNAs (piRNAs), etc.
Transposable Elements, also known as "jumping genes," are mobile DNA sequences capable of jumping from one place to another on the genome. TEs were originally discovered in the 1940s by the Nobel laureate Barbara McClintock based on her work on maize (Zea mays). Still, the relative importance of TEs in terms of genome organization and function was dismissed for several decades.
Nowadays, we know that TEs play critical roles in evolving new genes and gene regulation. TEs can downregulate or upregulate the expression of specific linkage groups by carrying regulatory sequences (e.g. promoters) to new genomic locations and provide a source for chromosome rearrangements and gene inactivation.
TEs make up a large part of mammalian genomes, representing over 45 percent of the human genome. Another important issue to consider is the repetitive nature of these mobile elements. For instance, the human genome contains around 100,000 copies of one active TE called long interspersed nuclear element 1 (LINE-1 or L1), and only this sequence makes up over 17 percent of our genomes.
The dark side of your genome | Pieter Mestdagh | TEDxUHasselt
Targeting "Dark Genes"
The term 'dark genome' also applies to protein-coding genes whose functions have not yet been explored. This category includes some recently discovered genes such as SLX4IP, involved in glucose metabolism, HSF2BP, associated with coronary artery diseases, or ELFN, involved in attention-deficit/hyperactivity disorder.
Although there are publicly available databases that contain information about such genes, including tissue-specific gene expression data, orthology and paralogy assignments, disease association studies, etc., genome-wide datasets have not yet been combined and processed. A holistic analysis of the protein-coding portion of our dark genome may offer interesting therapeutic opportunities for drug development.
The massive volumes of genomic data that we are currently dealing with can help search for new sources of genetic variation within the dark genome, which is relevant to human health and therefore deserves further investigation. Non-coding genome regions are also the main target of epigenetic modifications that largely determine gene expression patterns in different cell types and disease states.
Moreover, the identification and functional characterization of orthologous and paralogous groups of genes, as well as the analysis of publicly available genome-wide datasets, will also help discover novelty within our dark genome.
- Blaxter, Mark. "Revealing the dark matter of the genome." Science 330.6012 (2010): 1758-1759.
- Bozgeyik, Ibrahim. "The dark matter of the human genome and its role in human cancers." Gene 811 (2022): 146084.
- Brown, Steve DM, and Heena V. Lad. "The dark genome and pleiotropy: challenges for precision medicine." Mammalian Genome 30.7 (2019): 212-216.
- Di Iulio, J., Bartha, I., Wong, E. H., Yu, H. C., Lavrenko, V., Yang, D., ... & Telenti, A. (2018). The human non-coding genome defined by genetic diversity. Nature Genetics, 50(3), 333-337.
- Oprea, T. I., Bologa, C. G., Brunak, S., Campbell, A., Gan, G. N., Gaulton, A., ... & Zahoránszky-Köhalmi, G. (2018). Unexplored therapeutic opportunities in the human genome. Nature Reviews Drug Discovery, 17(5), 317-332.
- Oprea, Tudor I. "Exploring the dark genome: implications for precision medicine." Mammalian Genome 30.7 (2019): 192-200.
- Senft, Anna D., and Todd S. Macfarlan. "Transposable elements shape the evolution of mammalian development." Nature Reviews Genetics 22.11 (2021): 691-711.