Novel computational tool facilitates accurate analysis of complex genomes

HudsonAlpha Institute for Biotechnology faculty investigator Josh Clevenger, Ph.D., in his nearly eight years of working in the peanut breeding and genetics sectors, has become passionate about enhancing crops for more sustainable and robust agriculture.

Novel computational tool facilitates accurate analysis of complex genomes
Josh Clevenger, Ph.D., Faculty Investigator with the HudsonAlpha Institute for Biotechnology. Image Credit: HudsonAlpha Institute for Biotechnology.

This passion led to the association with HudsonAlpha computational biologist, Walid Korani, Ph.D., and the making of a computational tool named Khufu to instantly and accurately recognize and analyze variants in such complex genomes.

I wanted to bridge the gap between science and nature by more rapidly introducing beneficial traits into cultivated crops that farmers can plant on their land.”

Josh Clevenger, Faculty Investigator, HudsonAlpha Institute for Biotechnology

To achieve this, the researchers at HudsonAlpha created a better computational tool to help recognize genetic factors to select for beneficial traits and novel, rapid breeding practices to insert these traits into prevailing crop lines.

For mapping traits to genes, the DNA sequences of the plants being examined should be lined up against a reference genome. While concentrating on complex plant genomes, it is difficult for the software to map short DNA reads to a reference genome and correctly recognize molecular markers such as single-nucleotide polymorphisms (SNPs) that associate with an observed trait.

Struggling for a long time with recognizing SNPs in peanuts, Clevenger collaborated with Korani to create a solution, the new computational tool Khufu (

Khufu employs low-coverage, short-read sequencing data to offer genotyping results at a fraction of the normal cost. With the help of a new technique, Khufu offers extremely accurate SNP recognition using very low coverage sequence data.

As each individual calls for less sequence, the cost of whole-genome sequencing is possible even in small breeding programs. The availability of genome-wide markers highly increases the power and preciseness of trait mapping and integration and delivers these tools to a broader array of breeders and geneticists.

Collaborators can expect fast results, with analyses being done within days of the sequence being generated. Khufu is not a data generation service, but a data analysis service that can provide sequencing for a low cost or can analyze generated sequences for different applications.”

Josh Clevenger, Faculty Investigator, HudsonAlpha Institute for Biotechnology

For a similar cost to the raw output of SNP arrays, Khufu provides full analysis of genotypes, provides marker targets for traits of interest, and saves valuable time. Khufu is a highly accurate informatics platform that outperforms published methods,” added Clevenger.

Recognizing SNPs from Illumina™ low-depth short-read plant samples is difficult because a majority of the plant genomes are polyploids and feature large chunks of repeated regions. Even the use of hard filtering approaches results in a lot of informative SNPs being lost.

Khufu created a series of algorithms that efficiently extract falsely-identified SNPs, making it 99.9% accurate at identifying SNPs correlated to a given trait in both plant and animal populations. In addition, Khufu utilizes computational resources very efficiently to significantly speed up the calling process.”

Walid Korani, Computational Biologist, HudsonAlpha Institute for Biotechnology

Khufu was primarily created for Clevenger and Korani’s work with peanut populations; however, by re-analyzing other large datasets from earlier studies, it is clear how well this tool worked across various species and populations. Khufu is efficient in genotyping small or large numbers of plants, irrespective of the population structure or the genome complexity.

We hope that by offering other researchers the ability to use this low-cost, highly accurate computational software, we can help to advance genomic research in many different fields,” concluded Clevenger.

Journal reference:

Korani, W., et al. (2021) Accurate analysis of short read sequencing in complex genomes: A case study using QTL-seq to target blanchability in peanut (Arachis hypogaea). bioRxiv.


The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoLifeSciences.
Post a new comment
You might also like...
Cryo-electron microscopes help determine how “911” molecule repairs DNA damage