Haoyu Cheng, PhD, assistant professor of biomedical informatics and data science at Yale School of Medicine, has developed a new algorithm capable of building complete human genomes using standard laboratory technology. His tool, called hifiasm (ONT), eliminates the need for costly DNA sequencing that requires 40 times more genetic material and often cannot be performed on patient samples.
Cheng and colleagues recently published their findings in Nature, detailing how the algorithm runs 10 times faster than existing methods while producing more complete results. The approach could make this type of genome assembly accessible for clinical diagnostics and research previously constrained by cost and technical limitations.
You recently published findings in Nature showing the first algorithm capable of telomere-to-telomere assembly using standard DNA sequencing. What does 'telomere to telomere' mean and why has this been so difficult to achieve?
In human genomes, each chromosome has two telomeres - one at the left end and one at the right end. “Telomere-to-telomere” (T2T) means assembling a chromosome continuously from one end to the other, without any gaps. In other words, it is a complete, end-to-end DNA sequence for each chromosome, rather than a draft that still contains missing regions.
This has been very challenging to achieve because the remaining unresolved parts of the genome are dominated by highly repetitive DNA. Complete T2T assembly was only achieved a few years ago, roughly 20 years after the release of the first human genome from the Human Genome Project. However, producing T2T assemblies still requires specialized “ultra-long” DNA sequencing, which is expensive, technically demanding, and often not feasible for many clinical or limited DNA samples.
Your algorithm addresses a major practical barrier - the need for ultra-long DNA sequencing. What made ultra-long sequencing so challenging for researchers and clinicians, and how does your approach change that?
Ultra-long sequencing requires extracting and preserving extremely long DNA molecules - think of needing to pull out a single thread from fabric without breaking it. In practice, generating ultra-long reads often requires labs to grow cells for weeks or months to get enough high-quality DNA, which isn’t possible with most patient tissue samples or blood draws. This makes ultra-long sequencing difficult to scale, expensive, and inaccessible for many real-world applications.
Our approach removes this barrier by enabling near T2T assembly using standard sequencing data that any hospital lab can prepare with routine DNA preparation protocols. This could even be used in intensive care units, making complete genome assembly much more feasible in practice.
One of the most compelling findings in your paper is the successful resolution of the SMN1 and SMN2 genes, which are linked to spinal muscular atrophy. What does this breakthrough mean for patients with genetic diseases?
SMN1 and SMN2 are a classic example of a medically important region that has been difficult to resolve because the two genes are nearly identical and lie within highly repetitive regions of the human genome. With our approach, clinicians can more routinely obtain a complete view of disease-causing variation in SMN1 and SMN2.
For patients, this could enable more reliable genetic diagnoses and better interpretation of disease risk and severity. For doctors, it reduces uncertainty and reliance on multiple specialized assays, moving closer to a comprehensive, genome-wide diagnostic approach.
Your results show that this approach not only makes T2T assembly more accessible but also produces higher-quality assemblies - reconstructing more complete chromosomes in a fraction of the time. What made this leap in both speed and accuracy possible?
This leap was made possible by combining two advances. First, sequencing chemistry has improved, so common techniques are now accurate enough to support high-quality assembly. Second, we redesigned the assembly algorithm to better model the remaining errors in genetic data while also improving efficiency. Think of it like reading slightly blurred text - our algorithm can figure out what unclear letters should be by recognizing patterns and using context from surrounding sequences.
Looking ahead, where do you see this technology having the biggest impact? Are there specific medical conditions, populations, or research questions that could benefit most from making complete genome sequencing this accessible?
In basic science, this approach can enable building complete, high-quality genomes from diverse populations. Rather than relying on a single reference genome, scientists can now study how DNA differs across thousands of people from different backgrounds - including both human and non-human genomes. High-quality genome assemblies are foundational and represent the first step for nearly all applications in genomics research, making it possible to better understand health and disease patterns across different populations.
In clinical settings, this approach can also make it easier to diagnose conditions that were previously difficult to detect - like the spinal muscular atrophy genes we discussed earlier. It also helps researchers study genetic changes that occur within a person over time. For example, scientists can now compare the same person’s DNA at different ages - say, at 20 versus 60 - to study aging. Or they can identify genetic differences between different tissues in the same person, such as heart versus brain, which helps explain why certain diseases like cancer or neurodegenerative conditions affect specific organs. Ultimately, this brings powerful genetic testing capabilities to hospitals and clinics everywhere.
Source:
Journal reference:
Cheng, H., et al. (2026). Efficient near-telomere-to-telomere assembly of nanopore simplex reads. Nature. DOI: 10.1038/s41586-026-10105-6. https://www.nature.com/articles/s41586-026-10105-6