Non-Coding DNA is the genetic material that does not encode proteins and represents the main portion of the human genome. Non-coding DNA plays crucial roles during the transcription of non-protein-coding RNAs (e.g., transfer RNAs, ribosomal RNAs, and regulatory RNAs), which largely determine when and where genes are 'switched on' or 'switched off'. Recent studies derived from high-throughput transcriptomics data have shed new light on the roles of non-protein-coding DNA sequences. However, how many of these sequences regulate gene expression and ultimately contribute to genome organization and genome evolution remains unclear.
Image Credit: Peshkova/Shutterstock.com
What is Non-coding DNA?
Non-coding DNA can be defined as the portion of the genome that does not code for proteins. It has been estimated that around 98.5% of the human genome does not code for proteins and, therefore, can be classified as non-coding DNA.
The separation between proteome and transcriptome began during the 1970s with the discovery of post-transcriptional processing mechanisms such as alternative RNA splicing and the completion of the first sequencing runs. Many non-coding DNA sequences form part of the transcriptome but do not involve protein translation.
How is Non-coding DNA Classified?
From a purely structural point of view, non-coding DNA can be classified into intergenic regions, i.e., genomic sequences found between genes and intron (intragenic) regions. Both intragenic and intergenic regions can eventually be transcribed into non-coding RNAs. Indeed, recent transcriptomic analyses have shown that a major portion of the human genome is transcribed into non-coding RNAs.
Moreover, intergenic regions may contain regulatory sequences (e.g., promoters, enhancers, insulators, etc.) capable of attaching transcription factors to activate and/or repress transcription.
Non-coding RNA transcripts are classified into different functional categories, including:
- Transfer RNAs (tRNAs), which function as connecting links between a messenger RNA (mRNA) and its corresponding polypeptide sequence (protein)
- Ribosomal RNAs (rRNAs) also involved in protein translation
- Small nucleolar RNAs (snoRNAs), responsible for chemical modifications of RNA species
- Different types of regulatory RNAs (e.g., miRNAs, piRNAs, lncRNAs, circRNAs, etc.) play major roles in the regulation of gene expression
Most human non-coding DNA has a repetitive nature. It has been estimated that repetitive/non-coding DNA accounts for over two-thirds (66%–69%) of the human genome. According to their structural organization, repetitive non-coding DNA sequences can be classified as tandem repetitive DNA (also called satellite DNA) and interspersed repetitive DNA.
Image Credit: Design_Cells/Shutterstock.com
Tandem Repetitive DNA vs. Interspersed Repetitive DNA
Satellite DNA consists of long tandem repeat arrays whose range in size is highly variable between eukaryotic genomes. For instance, telomeric and centromeric regions of the chromosomes are mainly composed of satellite DNA.
Telomeres are long tandem arrays consisting of short nucleotide sequences (TTAGGG repeat bases) that protect the ends of chromosomes from degradation during DNA replication, while centromeres are constricted regions involved in chromosome pairing during cell division.
The functional importance of satellite DNA in gene regulatory mechanisms (e.g., dosage compensation) is now increasingly recognized with the completion of high-resolution human genome sequencing projects.
Interspersed repetitive DNA are identical and/or nearly identical DNA nucleotide sequences dispersed throughout the genome rather than clustered in specific genomic regions. Interspersed repeats represent a functional consequence of transposable element insertion events.
Transposable elements, originally known as selfish genes, are mobile genetic elements capable of self-replicating and reintegrating at new regions in the genome through cut-paste and copy-paste mechanisms.
It has been observed that transposable elements (TEs) and TE-derived repeat sequences occupy over 40% of the human genome. The Alu and LINE (L1) elements are the most abundant types of transposable elements. For instance, over one million Alu sequences are scattered throughout the genome, and it has been estimated that only this type of mobile element accounts for approximately 10 percent of the human genome.
Both types of transposable elements (Alu and L1) are currently active; thus, they are a frequent cause of the emergence of genetic diseases and a source of functional genomic variation between human populations.
Epigenetic Modifications: A Multilayer Regulatory System on Non-coding DNA
Epigenetics refers to the heritable modifications in gene activity that cannot be explained by changes in the nucleotide (DNA) sequence. Epigenetic marks include DNA methylation and histone modifications (e.g., histone methylation, histone acetylation, etc.).
The spreading of epigenetic marks on non-coding DNA regions may lead to either condensed transcriptionally inactive (heterochromatin), or decondensed active (euchromatin) states that alter the expression of specific genes and linkage groups.
It is especially important when considering that the epigenetic activation of transposable elements may result in somatic mutations associated with diseases. For instance, it has been shown that interspersed non-coding repeats are often hypomethylated in cancer cells, thereby the detection of the activation of Alu/L1 elements may serve as a prognostic marker to assess the impact of the global hypomethylation status on tumor cells.
Future Perspectives of Non-coding DNA in Research
In evolution, nothing is 'free'. Although once dismissed as "junk DNA", it is clear that non-coding DNA sequences may play important roles in regulating gene expression. Nonetheless, given its size, complexity, and potential, non-coding DNA still represents a relatively unexplored dimension in genome function, organization, and evolution.
The recent completion of the high-throughput sequencing project by the Telomere-to-Telomere (T2T) Consortium will surely shed more light on the importance of human non-coding DNA, especially for telomeric regions and evolutionary recent duplication events.
Continue Reading: What is non-coding RNA (ncRNA)?
- Barragán, M. J. L., et al. Highly repeated DNA sequences in three species of the genus Pteropus (Megachiroptera, Mammalia). Heredity 88.5 (2002): 366-370. https://doi.org/10.1038/sj.hdy.6800064
- Bennett, E. Andrew, et al. Natural genetic variation caused by transposable elements in humans. Genetics 168.2 (2004).
- de Koning, AP Jason, et al. Repetitive elements may comprise over two-thirds of the human genome. PLoS genetics 7.12 (2011): e1002384. https://doi.org/10.1371/journal.pgen.1002384
- Ehrlich, M. Cancer-linked DNA hypomethylation and its relationship to hypermethylation. DNA Methylation: Development, Genetic Disease and Cancer (2006): 251-274. https://doi.org/10.1038/sj.onc.1205651
- Nurk, S., et al. The complete sequence of a human genome. Science 376.6588 (2022): 44-53. DOI: 10.1126/science.abj6987
- Shabalina, Svetlana A., and Nikolay A. Spiridonov. The mammalian transcriptome and the function of non-coding DNA sequences. Genome biology 5.4 (2004): 1-8. https://doi.org/10.1186/gb-2004-5-4-105