The complete analysis of the human genome has revolutionized how diseases are studied and treated. In addition to the technologies like next-generation sequencing that allow scientists to study the genetic sequences that comprise the human genome, bioinformatics is also crucial to ensuring the reliability of these scientific results.
Genome Analysis. Image Credit: PopTika/Shutterstock.com
An introduction to genome analysis
Between 1990 and 2003, the international and collaborative research program known as the Human Genome Project (HGP) successfully identified over 20,000 genes that are shared by human beings. Since 2003, an additional 20,000 genes have been identified, thus resulting in a human genome that is made up of a total of up to 40,000 genes. This complete sequencing of the human genome also leads to the conclusion that 99.9% of human genomes are identical, with the remaining 0.1% of the human genome accounting for the differences between people.
With the use of various technologies ranging from Next Generation Sequencing (NGS) to CRISPR-Cas9 systems, scientists have been able to identify each of the variants in both coding and non-coding sequences of the human genome.
In addition to exploring the DNA sequences that make up the genome, scientists of the post-genome era are also interested in studying the transcriptome, which is the DNA sequences that get transcribed into messenger RNA (mRNA), as well as the proteome, which is comprised of both translated and posttranslationally modified protein sequences.
Genome analysis, therefore, aims to describe the functions of genes and proteins, as well as the relationship that exists between a given genotype and phenotype.
Genome analysis in cancer
The progression of cancer cells is largely the result of a series of genetic changes that results in significant changes to their metabolism. New genes will replace those that have been lost and acquire new roles in promoting the growth of primary cancer cells, as well as their invasion and metastasis into other areas of the body.
The differences in the genetic makeup of cancer cells will not only differ between cancer types but can also vary between the same tumor type when present in different individuals. These genetic variations can therefore further affect disease progression and an individual’s response to therapy.
The genome-wide analysis of both cancer cells and tissues has led to the identification of new drug targets in the form of both single genes or proteins, as well as entire sets of genes and proteins. The design of these novel therapeutics begins with the analysis of both the genome and transcriptome of cancer tissues to identify genetic variations that are found to be involved in cancer risk and/or disease outcome.
Whereas the genome analysis of cancer tissues will include comparative genome hybridization and single nucleotide polymorphism (SNP) analyses, the transcriptome analyses will involve genome-wide gene expression profiling methods including microarrays, RNAi knockdown of gene expression, and alternative splicing analysis.
Bioinformatics, the human genome, and cancer
Throughout the entire process of developing genome-based therapies for cancer, bioinformatics is involved. For example, during the preliminary steps of gene variation identification, bioinformatics is used to analyze the sequence and any related molecular data to determine the precise genetic differences present within the sample genome.
Comparative genome studies on these variations will identify the types of genes, gene families, and their location, as well as provide information on the history of evolutionary rearrangements of the gene and any duplications that might be responsible for the identified genetic variation.
Recent human genome analyses have led to the discovery of tandem replicated regions, for example, which are associated with certain human diseases as a result of this instability within these affected regions of the genome.
Bioinformatics and SNPs
The identification of inherited genetic variations like SNPs and the discovery of how these variations can alter protein function, as well as gene regulation and expression can also increase the understanding of an individual’s risk of developing cancer and/or their likely response to potential therapies.
To date, the SNP consortium has identified over 2 million SNPs, which has led to a total of over 10 million documented SNPs. SNPs often have a dense distribution across the genome, which allows this specific type of genetic variation to be considered an ideal marker for large-scale genome-wide association studies for various diseases and cancers.
When used to detect cancer, a candidate gene is typically chosen and screened for SNPs. This information is then used to determine any haplotypes, haplotype frequencies, and the disease and/or drug response risk associated with each haplotype.
Bioinformatics plays an active role in determining the statistical analysis of SNP data, as well as identifying signature SNPs for a given haplotype block. One bioinformatic technique that is used to determine the optical alignment of genetic sequences is dynamic programming.
Additional bioinformatic tools that are used for SNP analysis include performing linkage analysis, haplotyping, linkage disequilibrium assays, and public data repository tools.
Genomics, bioinformatics, and infectious disease
The combined approach of both genomics and bioinformatics has significantly improved the current understanding of the pathogenesis and mechanisms of many infectious diseases. Tuberculosis, for example, which is the disease that arises following infection by M. tuberculosis, affects 9 million people and kills approximately 2 million people each year.
The genome of M. tuberculosis was first sequenced in 1998 and has since allowed scientists to develop improved diagnostic and drug susceptibility tools, as well as expand the understanding of human-mycobacterium interactions. For example, whole-genome analyses of M. tuberculosis have demonstrated that SNPs are largely responsible for its antimycobacterial drug resistance.
The collection and storage of genetic mutations that have been identified and found to be associated with such resistance have led to the creation of many genomics-based tools like PhyResSE, TB-Profiler, and Mykrobe Predictor. These tools allow researchers without bioinformatics expertise to predict drug resistance immediately after obtaining a sample’s genetic sequencing data, which can be useful in field settings where the disease burden of TB is typically high.
References and Further Reading
- What is the Human Genome Project? [Online]. Available from: https://www.genome.gov/human-genome-project/What.
- Gasperskaja, E., & Kucinskas, V. (2017). The most common technologies and tools for functional genome analysis. Acta medica Lituanica 24(1); 1-11. doi:10.6001/actamedica.v24i1.3457.
- Mount, D. W., & Pandey, R. (2005). Using bioinformatics and genome analysis for new therapeutic interventions. Molecular Cancer Therapeutics. doi:10.1158/1535-7163.MCT-05-0150.
- Bah, S. Y., Morang’a, C. M., Kengne-Ouafo, J. A., et al. (2018). Highlights on the Application of Genomics and Bioinformatics in the Fight Against Infectious Diseases: Challenges and Opportunities in Africa. Frontiers in Genetics. doi:10.3389/fgene.2018.00575.