Base-Pair Resolution Benchmark Uncovers Structural Variation Complexity in Tomato Genomes

Structural variations (SVs), including insertions, deletions, inversions, and substitutions, can profoundly influence gene regulation and phenotype. In plants, these variants are particularly important because plant genomes often contain large proportions of repetitive sequences, which complicate accurate genome alignment and variant detection. Although long-read sequencing has improved genome assembly, current SV detection algorithms still struggle to resolve complex variants, especially in repetitive regions. As a result, many SVs are inaccurately located, incorrectly classified, or entirely missed, limiting their usefulness in genetic studies and breeding programs. Based on these challenges, there is a clear need to systematically characterize complex SVs and establish reliable standards for their detection and interpretation.

Researchers from Northeast Agricultural University and collaborating institutions reported (DOI: 10.1093/hr/uhaf107) on April 16, 2025, in Horticulture Research, that they have generated the first base-pair–resolution benchmark of complex SVs in tomato genomes. By integrating 14 variant-detection pipelines with extensive manual inspection, the team precisely resolved thousands of ambiguous genomic regions. Their findings show that most existing detection methods perform poorly in repetitive plant genomes, highlighting the need for new standards and improved algorithms to accurately capture genome diversity relevant to crop breeding and functional genomics.

The study began with the construction of a high-quality tomato genome assembly using long-read sequencing, providing a reliable foundation for variant analysis. Researchers then compared this genome with the reference tomato genome using 14 widely used SV detection pipelines, initially identifying more than 30,000 candidate variants. Through careful visualization and manual consolidation, these were refined into 4,532 structurally complex regions.

A major finding was that repetitive DNA caused widespread errors in variant detection. Misaligned copies often led to false deletions, insertions, or inversions, while breakpoint positions varied substantially among algorithms. To overcome this, the team anchored variant boundaries using uniquely aligned sequences flanking repetitive regions, enabling precise breakpoint identification.

Ultimately, 1,635 bona fide structural variants were resolved at base-pair resolution. These included insertions, deletions, inversions, and-importantly-substitutions, which the authors propose as a fundamental SV type often overlooked in plant genomics. The study also revealed that SVs preferentially occur in AT-rich regulatory regions rather than coding sequences and frequently overlap genes involved in defense responses. When evaluated against this benchmark, existing detection tools achieved surprisingly low accuracy, underscoring a critical gap between current methods and the true complexity of plant genomes.

"Structural variation has long been recognized as important, but its real complexity has been underestimated in plant genomes," said one of the study's senior authors. "By resolving these variants at base-pair resolution, we show that many apparent genome changes reported by algorithms are artifacts of repetitive sequence misalignment. Our benchmark provides a clear standard for evaluating detection methods and highlights the urgent need for algorithms specifically designed for complex plant genomes. This work moves us closer to accurately linking genome variation with agronomic traits."

Accurately resolving structural variation is essential for modern crop genetics, from identifying trait-associated loci to building reliable pangenomes. The benchmark developed in this study offers a critical reference for improving SV detection algorithms and training artificial intelligence–based tools tailored to plant genomes. By clarifying how and where structural variants arise, the findings also enhance our understanding of genome evolution and adaptation. In practical terms, this work supports more precise genome-wide association studies and breeding strategies, enabling researchers to better exploit hidden genetic diversity for crop improvement, resilience, and quality enhancement in tomatoes and other plant species.

Source:
Journal reference:

Cui, X., et al. (2025). The nature of complex structural variations in tomatoes. Horticulture Research. doi: doi.org/10.1093/hr/uhaf107. https://academic.oup.com/hr/article/12/7/uhaf107/8114335

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoLifeSciences.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Scientists Discover First Human DNA-Cutting Enzyme That Responds to Tension