At the end of 2019, there has been an outbreak of COVID-19, a disease caused by Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2).
It has been spreading throughout the globe and has been declared a pandemic in March 2020. To better understand the virus and its spreading patterns to limit its spread and develop therapeutic treatments, bioinformatics analysis is employed.
Image Credit: Kateryna Kon/Shutterstock.com
What is bioinformatics?
Bioinformatics is a way of collecting and analyzing complex biological data. This includes analyzing genome and protein sequences to deduce the similarity and relatedness of species and molecules, analyzing epidemiological data to understand the spreading of the disease, and running structural analysis to elucidate the interactions between molecules.
In the light of testing relatedness of species or proteins and their hypothetical common ancestors, a phylogenetic tree is computed. Nucleotide or protein sequences are obtained and aligned with one another.
The greater the number of differences in sequence (deletion or insertion of nucleotides, or mutation of bases or amino acids), the more distantly related the analyzed entries are.
Different methods can be used for calculating the branch distances and constructing the tree, including maximum likelihood and maximum parsimony methods.
SARS-CoV-2 and other related viruses
SARS-CoV and MERS-CoV are both found in bats. Many coronaviruses are found in bats and it is speculated that SARS-CoV-2 has bats as its natural host as well.
An article published on Nature summarised the bioinformatics analysis of the whole genome of SARS-CoV-2.
It is revealed that the genome of SARS-CoV-2 is 96% identical to a bat coronavirus (RaTG13), suggesting that they share the same host – bats (Figure 1).
Moreover, spike (S) proteins are important for the virus to bind to a human cell receptor and mediates cell entry.
It is one of the most important functional proteins of SARS-CoV-2. There is 93.1% S protein nucleotide similarity between that of SARS-CoV-2 and RaTG13, thus further showing their relatedness.
Figure 1. Phylogenetic tree of SARS-CoV-2 and other related coronaviruses. Red: SARS-CoV-2 isolated from patients in Wuhan and its related coronavirus Bat CoV RaTG13. Figure adapted from Zhou et al. (2020).
After revealing bats as the natural host of SARCS-CoV-2, another question lies - are there intermediate hosts of the virus between bats and human beings?
By reanalyzing viral sequences found in the lungs of two dead Malayan Pangolins, it is found that at the genome level, there is a ~90% sequence similarity between Pangolin coronavirus (Pangolin-CoV) and SARS-CoV-2, with 97.5% similarity between their S proteins.
This is a higher percentage of S protein similarity between Pangolin-CoV and SARS-CoV-2 than Bat CoV RaTG13 and SARS-CoV-2, suggesting Pangolin is an intermediate host aiding the transmission of Bat CoV RaTG13 to human SARS-CoV-2.
In addition, the S protein receptor-binding domain is highly conserved between Pangolin-CoV and SARS-CoV-2, with only 1 amino acid difference. This shows a higher likelihood of Pangolin-CoV S protein to bind to human cell receptors than Bat CoV RaTG13 S protein.
Figure 2. Phylogenetic tree of SARS-CoV-2 and other related coronaviruses. SARS-CoV-2 isolated from patients in Wuhan (pink) and its related coronavirus Pangolin-CoV (red) and Bat CoV RaTG13 (green). Figure adapted from Zhange et al. (2020).
Implications on therapeutic development
With the whole genome sequence of SARS-CoV-2 available, protein structures can be predicted based on bioinformatics methods. Studies by Chang et al. (2020) and Ahmed et al. (2020) utilized the protein sequence of SARS-CoV-2 to identify potential vaccine targets and drug treatments.
By searching the database of patients infected with SARS-CoV-2, scientists can identify the antibodies developed against SARS-CoV-2 by the immune system and the structure of the epitope on those antibodies.
This sheds light on the viral molecules responsible for triggering an immune response. Generating an immune response towards the virus is the goal of developing a vaccine.
With the structure of viral molecules known, further analysis can be done to compare viral protein structures predicted by the genome sequence of SARS-CoV-2 and it is identified that S proteins and nucleocapsid (N) proteins are responsible for triggering the immune response. This provides crucial information for developing recombinant, or similarly structured molecules for vaccines against SARS-CoV-2.
On the other hand, anti-viral drugs with known structures such as Oseltamir and Indinavir can be analyzed to determine their effectiveness in inhibiting important protein functions of SARS-CoV-2.
By predicting the 3D structures of 3 main proteases of SARS-CoV-2, molecular docking of the drugs can be simulated.
It is found that Remdesivir, an FDA-approved HIV protease inhibitor, has the highest predicted binding affinity to SARS-CoV-2 proteases and it is now a proposed treatment for clinical trials.
- Ahmed, S.F.; Quadeer, A.A.; McKay, M.R. Preliminary Identification of Potential Vaccine Targets for the COVID-19 Coronavirus (SARS-CoV-2) Based on SARS-CoV Immunological Studies. Viruses 2020, 12, 254.
- Barry G. Hall, Building Phylogenetic Trees from Molecular Data with MEGA, Molecular Biology and Evolution, Volume 30, Issue 5, May 2013, Pages 1229–1235, https://doi.org/10.1093/molbev/mst012
- Chang, Y.; Tung, Y.; Lee, K.; Chen, T.; Hsiao, Y.; Chang, H.; Hsieh, T.; Su, C.; Wang, S.; Yu, J.; Shih, S.; Lin, Y.; Lin, Y.; Tu, Y.E.; Hsu, C.; Juan, H.; Tung, C.; Chen, C. Potential Therapeutic Agents for COVID-19 Based on the Analysis of Protease and RNA Polymerase Docking. Preprints 2020, 2020020242 (doi: 10.20944/preprints202002.0242.v2).
- Zhou, P., Yang, X., Wang, X. et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 579, 270–273 (2020). https://doi.org/10.1038/s41586-020-2012-7