By definition, the genome is ‘the complete set of genes or genetic material present in a cell or organism’. Current evidence suggests that all living organisms today arose from one single origin. From the early RNA protogenome arose the first DNA genome, which went through many expansions and alterations to give rise to what we know now as the human genome (and the genome of every other current-day organism).
Genome Sequence Map. Image Credit: Tartila/Shutterstock.com
Historically, it has been easier to observe how the phenotype of an organism has changed over time, however with the development of DNA sequencing technologies, it has become possible to uncover how the genome has evolved too.
There is huge variation in the size and constitution of the genomes of present-day organisms. The genome consists of deoxyribonucleic acid (DNA) arranged into base pairs, with both coding regions (genes) and non-coding regions, as well as mitochondrial DNA and in some cases, chloroplast DNA. Surprisingly, the size of the genome is not always associated with the number of genes, or the complexity of the organism, as most variation in genome size is a result of varying amounts of non-coding DNA.
The genome of the bacteria Escherichia coli contains 4.6 million base pairs, the human genome contains 3 billion base pairs, and the Japanese plant Paris japonica contains 149 billion base pairs. Genome evolution occurs primarily through mechanisms that take place during meiosis; gene duplication, mutation, deletion, recombination, and the action of transposable elements.
Early DNA genomes were likely very small, primarily consisting of functional genes. Since then, there have been several occurrences of huge genome expansion, which is achieved most efficiently through whole-genome duplication. Whole-genome duplication occurs when meiotic errors produce diploid gametes, as opposed to haploid. When two diploid gametes fuse, they produce a polyploid containing 4 copies of each gene.
Gene expression typically requires two functional copies of a gene, so whilst genome duplication does not in of itself give rise to new genes, it provides ‘spare’ copies of a gene that can undergo random mutation without compromising the function of the original gene. If a new functional gene is created it may be conserved, though it is more common for duplications to become non-coding pseudogenes.
Individual gene duplication
It is difficult to observe evidence of whole-genome duplications in modern genomes, as over time non-coding sequences can undergo numerous changes or be lost entirely. Where duplicated genes can be observed, it is not always clear whether this is due to whole-genome duplication, or duplication of the individual gene.
Single gene duplication occurs as a result of recombination, and as with whole-genome duplication, provides an original functional gene and a ‘spare’ copy that can be mutated and potentially give rise to a new gene. Whilst most duplicated genes are lost as pseudogenes, it is believed that the majority of novel genes originate from gene duplications.
Although the development of novel genes plays a large part in genome evolution, many genomes contain a significant proportion of non-coding DNA. Though this DNA does not directly code for proteins, non-coding DNA can be functional, for example, enhancer sequences that confer binding sites for transcription factors that can alter the folded conformation of the DNA, making it more accessible and therefore increasing transcription of the associated gene.
It is unclear how many regulatory sequences have evolved. It has been suggested that when a duplicated gene decays into a pseudogene, the now non-functional DNA sequence can still undergo a random mutation that may confer a new regulatory function that is then conserved. It has also been shown that protein-coding genes can lose the capacity to be translated into proteins, resulting in transcribable RNA sequences (most notably Xist, the dosage-regulating RNA associated with the X chromosome).
Interestingly, primates and humans share a large proportion of their DNA, with the majority of functional genes being very similar, if not the same. It is therefore suggested that a significant proportion of the genomic evolution between primate and human involved the non-coding, likely regulatory regions of DNA; regions of DNA that alter the expression of other genes.
Transposable elements are DNA sequences that can be integrated into new sites within the genome. DNA transposons can be excised and re-integrated into other sites, whilst retrotransposons are transcribed into RNA sequences which are then reverse transcribed back into DNA at the new site.
Transposable elements make up a large proportion of the human genome, with DNA transposons being active in early primate evolution whilst retrotransposons are still active in the human genome today. It is suggested that transposable elements have been a major factor in the rapid evolution of the human genome.
Horizontal gene transfer
Horizontal gene transfer is the mechanism through which the genome of one organism can acquire a new gene from another organism. This is common in bacteria through the process of conjugation, with the most notable example being antibiotic resistance genes.
Horizontal gene transfer is not typical in eukaryotic cells, as the germline cells that undergo meiosis tend to be more protected than somatic cells, therefore horizontally acquired genes would not be heritable.
When studying genome evolution, it is important to consider that DNA sequencing is a relatively new technology and as such, much of the genome is not fully understood. Elements such as long non-coding RNAs are still in the early stages of characterization, and without fully understanding these sequences as they are today, it can be difficult to infer the ways in which they evolved.
Additionally, as genomes continue to mutate and change past the point of divergence, it can be challenging to attribute two DNA sequences to a common ancestral sequence if there has been a long period since that of divergence. Still, the rapid expansion of the so-called ‘Genomic Era’ means that new advancements are continuously made, and as such our understanding of genome evolution continues to expand.
- Ayarpadikannan, S., Kim, HS. (2014). The impact of transposable elements in genome evolution and genetic instability and their implications in various diseases. Genomics and Informatics. 12(3):98-104. doi:10.5808/GI.2014.12.3.98
- Blommaert, J. (2020) Genome size evolution: towards new model systems for old questions. Proceedings of the Royal Society B: Biological Sciences. 287. doi:10.1098/rspb.2020.1441
- Brown, TA. (2002). Genomes. 2nd edition. Oxford: Wiley-Liss. Chapter 15, How Genomes Evolve. Available from: https://www.ncbi.nlm.nih.gov/books/NBK21112/
- Klein, J.C., Keith, A., Agarwal, V. (2018). Functional characterization of enhancer evolution in the primate lineage. Genome Biology. 19, 99. https://doi.org/10.1186/s13059-018-1473-6
- Koonin EV. (2009). Evolution of genome architecture. International Journal of Biochemistry and Cell. 41(2):298-306. doi:10.1016/j.biocel.2008.09.015