To celebrate National DNA Day 2022, we spoke to Jonas Korlach, Chief Scientific Officer of PacBio about how PacBio HiFi Sequencing contributed to the first complete reference Human genome.
Please could you introduce yourself and tell us about your background in molecular biology and what inspired T2T’s latest research?
From the time that I was a graduate student in biochemistry, molecular and cell biology at Cornell University in the late 1990s (around the time the first draft of the human genome was released), I – alongside thousands of life scientists - eagerly awaited the day when the full human reference sequence could be created. Although the $3B+ project was declared as a “complete” draft in 2003, the reality is that technology available at that time prevented the full completion of the sequence – something everyone understood. The sequence celebrated in 2003, was really only about 92% of the human genome. The Telomere-to-Telomere (T2T) Consortium recognized that the invention of PacBio’s HiFi sequencing chemistry finally made it possible to sequence long molecules of DNA at the high accuracy needed for creating a complete reference genome.
What are human reference genomes? How are they constructed, and what is their use within scientific/medical research?
The human reference genome is the complete set of biological instructions for a human being. It comprises roughly 3 billion DNA building blocks (known as bases). These bases are organized into 23 biological units known as chromosomes. Reference genomes are created by sequencing long pieces of DNA many times over to identify the order of the bases, then using computers to re-assemble the pieces of DNA by looking at overlapping sections to put the whole genome in order.
You can think of the reference genome as a book. The human genome project was able to put most of the book together, but it was missing a few pages, and sometimes equivalent to half a chapter, which meant that scientists had an incomplete understanding of how the information on these pages related to the rest of the book or to other books. Having a complete reference genome opens new possibilities for understanding how human variation relates to genetic diseases, it also enables new research into evolutionary biology - such as how humans are like and unlike evolutionary neighbors (like primates).
PacBio HiFi Sequencing significantly contributed to filling in the gaps in the human genome. Can you tell us how this particular sequencing technology works?
HiFi sequencing reads are generated by preparing long strands of DNA for sequencing with an approach that turns the piece of DNA into a circle which can be sequenced over and over again using PacBio’s single molecule, real time (SMRT) sequencing technology on one of the company’s Sequel II or Sequel IIe instruments. Because the DNA forms a circle, it can easily be read multiple times, helping to improve the accuracy of the data.
What benefits does HiFi Sequencing have over other sequencing techniques such as short-reads and long-reads?
HiFi sequencing is the only sequencing technology that offers high accuracy long reads. Long reads are a superior approach for understanding the DNA reference sequence because a researcher can more easily put the pieces of DNA together to produce the genome. To extend the book metaphor, short-read technologies would require the researcher to put together a book looking at individual sentences, other long-read technologies might have complete pages, but each page would have many typos. HiFi sequencing is the only technology that can offer the researcher full pages of the book at a time with very few errors.
What were the novel aspects, as well as the improvements, of the complete genome that PacBio HiFi Sequencing allowed for?
The human genome has sections that have repeats of the same order of DNA bases over and over again. These sections were impossible to sequence using the technologies available to researchers working on the Human Genome Project in the 1990s and early 2000s. The T2T consortium used PacBio HiFi sequencing technology to finally decode every single section of the genome, including the most difficult sections with numerous DNA repeats.
How was the T2T-CHM13 genome assembly used to study genetic variation, particularly in reference to genetic disease?
Having a more complete picture of the genome provides researchers with more opportunities to uncover differences between individuals and the reference sequence. These differences may provide clues about the genetic nature of diseases and may help researchers identify new ways of treating these conditions one day.
For example, Scientists in the T2T Consortium applied the new reference to look at genetic variation in 3,202 globally diverse individuals, identifying hundreds of thousands of new variants per genome. The improvements are particularly large for some critical duplicated genes such as FRG1, which is linked to facioscapulohumeral muscular dystrophy (FSHD). The T2T reference has 23 copies of FRG1, up from only nine copies in the prior reference. The result is a T2T reference that, in this regard, provides a better foundation for the study of this disease.
The success of the T2T-CHM13 genome assembly has inspired researchers in the Human Pangenome Reference Consortium (HPRC) to sequence more human genomes from diverse backgrounds using PacBio technology. How important are inclusivity and diversity within genetic sequencing data?
We know that genetics play an important role in human health and that certain genes are more prevalent in specific population groups. The work of the Human Pangenome Reference Consortium will help scientists better understand genetic differences across human populations. Just like the human reference genome can help uncover new insights into human disease, understanding what differentiates the genomes of individuals from one population, may help researchers establish better, more personalized approaches to treating individuals from that population facing disease.
In an ever more health-conscious world, how do you hope this reference genome may impact human health?
This new, complete reference genome will power new scientific discovery. We certainly hope that the insights this research provides will enable the development of new disease prevention and treatment strategies that can ultimately improve the human condition.
Given the fact that HiFi Sequencing was not yet developed during the time of the Human Genome Project, do you foresee any future advancements in sequencing that could allow for even more advancements in our understanding of the human genome?
While the T2T-developed human reference genome is complete, our understanding of the impact of each of the genes within the genome is still in its infancy. We don’t anticipate technology driving future changes to this reference, but we do anticipate that the sequencing of more individuals genomes, and the development of reference genomes for new human populations, will better our collective understanding of human health.
What are the next steps for you and your work?
Everyone inherits their DNA from their parents. The next step for T2T is to better our understanding of heredity by creating a diploid reference genome (one that includes the full reference for the set of genes inherited from each parent).
For PacBio, we are proud to be part of the T2T project and diploid reference genome and continue to work to support the research community and our customers to apply PacBio HiFi sequencing to their research.
Where can readers find more information?
About Jonas Korlach, PhD
Jonas Korlach was appointed Chief Scientific Officer of PacBio in July 2012. He was previously a Scientific Fellow, supporting commercial development of the PacBio RS II system and performing research aimed at developing new applications for SMRT technologies. He co-invented the SMRT technology with Stephen Turner, Ph.D., PacBio Founder and Chief Technology Officer, when the two were graduate students at Cornell University. Dr. Korlach joined PacBio as the company’s eighth employee in 2004. Previously, he was a Postdoctoral Researcher at Cornell University.
Dr. Korlach is the recipient of multiple grants, an inventor on 70 issued U.S. patents and 61 international patents, and an author of over 100 scientific studies on the principles and applications of SMRT technology, including publications in Nature, Science, and PNAS. In 2013, Dr. Korlach was honored by the Obama White House as an Immigrant Innovator “Champion of Change.” He received both his Ph.D. and his M.S. degrees in Biochemistry, Molecular and Cell Biology from Cornell, and received MS and BA degrees in Biological Sciences from Humboldt University in Berlin, Germany.