Bioinformatics Analysis During Wuhan Coronavirus Outbreak


At the end of 2019, there has been an outbreak of COVID-19, a disease caused by Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2).

It has been spreading throughout the globe and has been declared a pandemic in March 2020. To better understand the virus and its spreading patterns to limit its spread and develop therapeutic treatments, bioinformatics analysis is employed.

SARS-CoV-2 Virus

Image Credit: Kateryna Kon/

What is bioinformatics?

Bioinformatics is a way of collecting and analyzing complex biological data. This includes analyzing genome and protein sequences to deduce the similarity and relatedness of species and molecules, analyzing epidemiological data to understand the spreading of the disease, and running structural analysis to elucidate the interactions between molecules.

In the light of testing relatedness of species or proteins and their hypothetical common ancestors, a phylogenetic tree is computed. Nucleotide or protein sequences are obtained and aligned with one another.

The greater the number of differences in sequence (deletion or insertion of nucleotides, or mutation of bases or amino acids), the more distantly related the analyzed entries are.

Different methods can be used for calculating the branch distances and constructing the tree, including maximum likelihood and maximum parsimony methods.

SARS-CoV-2 and other related viruses

SARS-CoV and MERS-CoV are both found in bats. Many coronaviruses are found in bats and it is speculated that SARS-CoV-2 has bats as its natural host as well.

An article published on Nature summarised the bioinformatics analysis of the whole genome of SARS-CoV-2.

It is revealed that the genome of SARS-CoV-2 is 96% identical to a bat coronavirus (RaTG13), suggesting that they share the same host – bats (Figure 1).

Moreover, spike (S) proteins are important for the virus to bind to a human cell receptor and mediates cell entry.

It is one of the most important functional proteins of SARS-CoV-2. There is 93.1% S protein nucleotide similarity between that of SARS-CoV-2 and RaTG13, thus further showing their relatedness.

Figure 1. Phylogenetic tree of SARS-CoV-2 and other related coronaviruses. Red: SARS-CoV-2 isolated from patients in Wuhan and its related coronavirus Bat CoV RaTG13. Figure adapted from Zhou et al. (2020).

After revealing bats as the natural host of SARCS-CoV-2, another question lies - are there intermediate hosts of the virus between bats and human beings?

By reanalyzing viral sequences found in the lungs of two dead Malayan Pangolins, it is found that at the genome level, there is a ~90% sequence similarity between Pangolin coronavirus (Pangolin-CoV) and SARS-CoV-2, with 97.5% similarity between their S proteins.

This is a higher percentage of S protein similarity between Pangolin-CoV and SARS-CoV-2 than Bat CoV RaTG13 and SARS-CoV-2, suggesting Pangolin is an intermediate host aiding the transmission of Bat CoV RaTG13 to human SARS-CoV-2.

In addition, the S protein receptor-binding domain is highly conserved between Pangolin-CoV and SARS-CoV-2, with only 1 amino acid difference. This shows a higher likelihood of Pangolin-CoV S protein to bind to human cell receptors than Bat CoV RaTG13 S protein.

Figure 2. Phylogenetic tree of SARS-CoV-2 and other related coronaviruses. SARS-CoV-2 isolated from patients in Wuhan (pink) and its related coronavirus Pangolin-CoV (red) and Bat CoV RaTG13 (green). Figure adapted from Zhange et al. (2020).

Implications on therapeutic development

With the whole genome sequence of SARS-CoV-2 available, protein structures can be predicted based on bioinformatics methods. Studies by Chang et al. (2020) and Ahmed et al. (2020) utilized the protein sequence of SARS-CoV-2 to identify potential vaccine targets and drug treatments.

By searching the database of patients infected with SARS-CoV-2, scientists can identify the antibodies developed against SARS-CoV-2 by the immune system and the structure of the epitope on those antibodies.

This sheds light on the viral molecules responsible for triggering an immune response. Generating an immune response towards the virus is the goal of developing a vaccine.

With the structure of viral molecules known, further analysis can be done to compare viral protein structures predicted by the genome sequence of SARS-CoV-2 and it is identified that S proteins and nucleocapsid (N) proteins are responsible for triggering the immune response. This provides crucial information for developing recombinant, or similarly structured molecules for vaccines against SARS-CoV-2.

On the other hand, anti-viral drugs with known structures such as Oseltamir and Indinavir can be analyzed to determine their effectiveness in inhibiting important protein functions of SARS-CoV-2.

By predicting the 3D structures of 3 main proteases of SARS-CoV-2, molecular docking of the drugs can be simulated.

It is found that Remdesivir, an FDA-approved HIV protease inhibitor, has the highest predicted binding affinity to SARS-CoV-2 proteases and it is now a proposed treatment for clinical trials.


  • Ahmed, S.F.; Quadeer, A.A.; McKay, M.R. Preliminary Identification of Potential Vaccine Targets for the COVID-19 Coronavirus (SARS-CoV-2) Based on SARS-CoV Immunological Studies. Viruses 2020, 12, 254.
  • Barry G. Hall, Building Phylogenetic Trees from Molecular Data with MEGA, Molecular Biology and Evolution, Volume 30, Issue 5, May 2013, Pages 1229–1235,
  • Chang, Y.; Tung, Y.; Lee, K.; Chen, T.; Hsiao, Y.; Chang, H.; Hsieh, T.; Su, C.; Wang, S.; Yu, J.; Shih, S.; Lin, Y.; Lin, Y.; Tu, Y.E.; Hsu, C.; Juan, H.; Tung, C.; Chen, C. Potential Therapeutic Agents for COVID-19 Based on the Analysis of Protease and RNA Polymerase Docking. Preprints 2020, 2020020242 (doi: 10.20944/preprints202002.0242.v2).
  • Zhou, P., Yang, X., Wang, X. et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 579, 270–273 (2020).

Further Reading

Last Updated: Apr 23, 2020

Christy Cheung

Written by

Christy Cheung

Christy is passionate about communicating science to a wide range of audiences- from the general public to researchers in various fields. She has a BSc in Biological Sciences and is now an MRes student in Biomedical Research Bacterial Pathogenesis and Infection stream at Imperial College London. She has a great interest in tackling the problem of antimicrobial resistance and in translating pre-clinical research into therapeutic solutions.


Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Cheung, Christy. (2020, April 23). Bioinformatics Analysis During Wuhan Coronavirus Outbreak. AZoLifeSciences. Retrieved on November 29, 2020 from

  • MLA

    Cheung, Christy. "Bioinformatics Analysis During Wuhan Coronavirus Outbreak". AZoLifeSciences. 29 November 2020. <>.

  • Chicago

    Cheung, Christy. "Bioinformatics Analysis During Wuhan Coronavirus Outbreak". AZoLifeSciences. (accessed November 29, 2020).

  • Harvard

    Cheung, Christy. 2020. Bioinformatics Analysis During Wuhan Coronavirus Outbreak. AZoLifeSciences, viewed 29 November 2020,


The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoLifeSciences.
You might also like... ×
Study reveals significant neighborhood preferences of tumor cells in Hodgkin lymphoma