Long-read sequencing (LRS) technologies are capable of reading longer-length genomes, i.e., between 5000 and 30,000 base pairs. LRS has offered new vistas for future technologies.
The genomes of most organisms are too long to be sequenced in one continuous thread. Typically, next-generation sequencing techniques provide short-read sequences (SRS), where DNA is broken into small sections that are amplified and sequenced to produce “reads”. Here, bioinformatic techniques construct a continuous genomic sequence by putting together the fragmented pieces.
Long-Read Sequencing Technologies
LRS technologies, also known as third-generation sequencers, can directly sequence single molecules of DNA. Notably, in some instances, they do not require amplification as well. Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (Nanopore) are two prominent producers of long-read sequencing technologies. Both technologies provide nucleic acid (DNA and RNA) reads in real-time, much faster than short-read technologies.
A Single Molecule Real-Time (SMRT) sequencer has been developed by Pacific Biosciences (PacBio), which can generate reads of more than 10,000 base pairs in less than two hours. PacBio announced its revolutionary LRS system in the October of 2022, known as Revio long-read sequencing platform. It is important to note that the SMRT technology offers 15 times more high-fidelity (HiFi) reads for less than $1,000. Here, sequencing occurs using a zero-mode wavelength (ZMW) chip.
ZMW can guide light energy into subwavelength diameter cylindrical nanoapertures that help analyze single biomolecules at micromolar concentrations. In SMRT sequencers, the ZMW chips are tiny structures that can develop highly confined optical volumes, and DNA polymerase is fixed at the base.
Each ZMW chip comprises two adapters, which are attached at the terminal ends of the DNA molecule, forming a circular single-stranded (SS) structure. The DNA polymerase constructs a complementary strand. The fluorescence intensity is measured to estimate the corresponding nucleotides.
Oxford Nanopore Technologies’ platform can produce reads of up to 1 million base pairs. This technology is based on the changes in the ion flow as nucleotides pass through a nanopore. Here, the DNA biomolecule is passed through a bioengineered channel of a biological membrane. The electrical current across the channel depends on the nucleotide passing through the channel, which is used to determine the base sequence.
Illumina has also entered the long-read market by launching a new high-performance long-read assay in January 2022. Subsequently, in September 2022, it launched the Illumina Complete Long-Reads system with advanced features. Other companies that have focused on entering the LRS device market are Element Biosciences and MGI.
Advantages of Long-Read Sequencing
There are several inherent benefits of using longer reads to analyze genomic data. Some of the benefits of using this technique in clinical genome analysis are discussed below:
It is difficult to detect and quantify certain features of individual genomes using SRS technologies. For instance, SRS technology fails to detect repetitive regions, large insertions or deletions of DNA, highly polymorphic regions, large and complex rearrangements, or regions with little DNA nucleotide diversity. As long reads span across larger sections of genomic regions, it can detect these genomic variations with significant ease. There is a high possibility that the detected genetic variations could have clinical significance.
The human genome is a complete set of nucleic acid sequences for humans. It is comprised of over three billion DNA base pairs in length that include genetic codes. Similar to solving complex jigsaw, identifying genomes from short reads could be extremely challenging due to similarities shared by multiple genomic fragments. This challenge can be overcome by long-read sequence data because these reads look more unique, and can be assembled with less ambiguity and error. This improvement in genome assembly helps to detect the genetic cause of diseases more profoundly.
In reproductive medicine, a process called haplotype phasing is used to identify genetic variants on the same copy of the chromosomes. Unlike the use of SRS for the approximation of phasing, LRS provides relevant information for determining haplotypes without the need for additional statistical inference, sample preparation, or maternal/paternal sequencing.
New Opportunities Presented by LRS Technologies
Unlike other sequencing platforms, devices developed by Oxford Nanopore are based on detecting electronic instead of optical signals. This is beneficial because it allows the designing of smaller devices, similar to the size of a memory (USB) stick. The small size offers great portability for the device. The majority of other sequencers, including the SRS system, are relatively large, i.e., either free-standing machines or large desktops.
Compared to SRS systems, both Oxford Nanopore and PacBio offer faster sequencing runs. For instance, PacBio can complete an entire sequencing in less than 24 hours, from sample preparation to analysis. Nanopore technologies offer real-time analyses and provide an option for the users to set their desired experiment run time. LRS techniques not only provide speed but additional flexibility to the users.
LRS technologies can directly sequence RNA and offer simultaneous detection of epigenetic modifications. However, in the case of the SRS system, additional sequencing runs are required to retrieve such information.
Applications of Long Read Sequencing
Although longer marker genes contain more phylogenetic and taxonomic information, they are difficult to amplify using SRS. LRS can only identify target organisms but can determine biotic interactions. Long PacBio reads of the Cyanobacteria rbcL gene helped enhance phylogenetic and taxonomic resolution. It enabled the discovery of two novel cyanobacterial clades.
PacBio provided accurate phylogenetic placement of fungal taxa that remained unidentified to the level of order or phylum using similarity comparison. This technique is also used to assess therapeutic success; for example, it is used for real-time identification of antibiotic resistance in Neisseria gonorrhoeae.
In plant pathology, LRS has been used to determine insect vectors and identify the plum pox virus from the infected plant. PacBio has been used for the sequencing of long microsatellite-rich markers, which helps to detect multiple antiretroviral drug-resistant genotypes in an HIV-positive patient. LRS has been widely applied in metagenomics and gene expression studies. The PacBio Iso-Seq technique uncovered mutations in viruses that caused differential immune responses among cells.
Marx, V. (2023) Method of the year: long-read sequencing. Nature Methods, 20, pp.6–11. https://doi.org/10.1038/s41592-022-01730-w
Tederso, L. et al. (2021) Perspectives and Benefits of High-Throughput Long-Read Sequencing in Microbial Ecology. Applied and Environmental Microbiology. DOI: https://doi.org/10.1128/AEM.00626-21
Logsdon, G.A. et al. (2020) Long-read human genome sequencing and its applications. Nature Reviews Genetics, 21, pp. 597–614. https://doi.org/10.1038/s41576-020-0236-x
Adewale, B.A. (2020) Will long-read sequencing technologies replace short-read sequencing technologies in the next 10 years? African Journal Laboratory Medicine. 9(1), 1340. doi: 10.4102/ajlm.v9i1.1340.
Mantere, T. et al. (2019) Long-Read Sequencing Emerging in Medical Genetics. Frontiers in Genetics, 10. https://doi.org/10.3389/fgene.2019.00426