1,116 Genomes Mapped to Reveal Hidden Human DNA Variation

A massive new pangenome maps hidden DNA variation across thousands of genomes, revealing how complex genetic changes shape human biology and disease risk.

Image of DNA strand over world map.Study: The 1000 Chinese Pangenome empowers medical and population genetics. Image credit: vectorfusionart/Shutterstock.com

A recent study published in Nature reports the creation of one of the most comprehensive human pangenomes to date, based on diploid genome assemblies from 1,116 individuals recruited from a health examination cohort in Wenzhou, China.

Developed under the 1000 Chinese Pangenome (1KCP) project, this extensive resource captures a broader spectrum of genetic diversity, including rare and complex variants that are difficult to represent or detect with linear reference genomes and conventional short-read approaches. By expanding non-reference sequences and improving variant representation, the 1KCP pangenome provides a resource to support studies of human genetics, disease-relevant genetic variation, and gene regulatory effects.

Limitations Of Conventional Sequencing Approaches

Researchers are continuously working to capture the full extent of human genetic diversity and its relevance to health. Short-read sequencing reliably detects small variants but often overlooks larger, complex genomic alterations such as structural variants (SVs) and tandem repeats (TRs).

In contrast, long-read sequencing combined with advanced assembly methods enables more complete diploid genome reconstruction, improving resolution of these regions. However, most current pangenomes are based on limited sample sizes and underrepresent rare, complex, and population-specific variants.

Building A Large-Scale Chinese Pangenome Resource

In this study, researchers built a large, population-scale pangenome and systematically characterized genetic variation.

The team integrated short-read and long-read whole-genome sequencing (WGS) with Hi-C data for a subset of high-coverage samples to support diploid assembly and phasing, along with RNA sequencing to characterize genetic variation across the cohort, and then performed principal component analysis to assess population structure. Using these data, they generated high-quality diploid genome assemblies by integrating de novo assembly of deeply sequenced samples with a pangenome-informed assembly workflow (PIGA) to scale across the cohort.

The investigators used a population-guided assembly strategy that combined variant calling, haplotype phasing, and personalized reference construction to improve the detection of complex variants. They then built 1KCP, capturing sequences absent from the reference genomes GRCh38 and CHM13. Comprehensive quality control, including k-mer-based filtering and benchmarking, ensured assembly accuracy and variant reliability.

Comprehensive Mapping Of Genomic Variation

Using path-guided pangenome annotation, the authors mapped genomic elements across sequences and characterized diverse variant types. These included single-nucleotide variants, SVs, TRs, and embedded variants. They also visualized complex genomic loci, compared findings with genome-wide association study data, and conducted detailed analyses of highly polymorphic immune-related regions.

The team annotated genes, repeats, and regulatory elements and identified diverse variant classes, including single-nucleotide variants, structural variants, tandem repeats, and embedded variants. They further examined medically relevant genes and complex genomic regions, and performed expression quantitative trait locus (eQTL) analyses using matched RNA sequencing data to assess functional effects. Lastly, the researchers developed a pan-variant imputation reference panel to improve resolution in genetic association studies.

Large-Scale Genome Assemblies And Novel Sequences

The team produced 1,116 high-quality genome assemblies, comprising 55 de novo and 1,061 produced using a pangenome-informed strategy. Collectively, these represented 2,232 haplotypes, with genomes averaging approximately 3.0 Gb. The 1KCP pangenome spanned 3.74 Gb and revealed approximately 405 million base pairs absent from the complete hydatidiform mole 13 (CHM13) and the genome reference consortium human build 38 (GRCh38). Of these base pairs, nearly 26 million were predicted to have functional roles in genes or regulatory regions.

Discovery Of Rare And Complex Genetic Variants

Across this resource, the authors identified extensive and previously undercharacterized patterns of human genomic variation, comprising 110,530 SVs, 35 million small genomic variants, 485,575 TRs, and nearly 0.9 million variants nested in complex genomic regions. Approximately one-third (33.3 %) of SVs were previously unreported, with most showing low allele frequencies (≤0.01), underscoring improved sensitivity for population-specific variation.

Beyond variant discovery, the study assessed functional impact across multiple genomic layers. Importantly, functional analyses identified 5,239 exonic structural variants affecting 3,326 protein-coding genes, with enrichment of rare alleles consistent with selective constraint. Analyses of TRs and gene clusters further highlighted widespread variability, particularly in immune-related genomic regions.

Functional Insights From Gene Regulation Analyses

To evaluate regulatory consequences, eQTL mapping revealed 3,256 lead associations involving complex variants, including TRs, SVs, and embedded variants, underscoring the crucial role of structural and repeat variation in gene regulation and expression variability. The authors also observed clinically relevant rare gene-altering SVs in disease-associated genes such as partner and localizer of BRCA2 (PALB2, related to breast cancer) and solute carrier family 34 member (SLC34A3, bone disease). In addition, they detected multiple TR expansions linked to genomic instability and fragile site formation.

Immune Region Diversity And Improved Genetic Tools

At the population level, detailed human leukocyte antigen (HLA) and gene cluster analyses revealed high-resolution haplotype diversity, particularly in immune-related regions, capturing complex linkage patterns that had not been previously resolved.

Lastly, benchmarking confirmed high accuracy across variant classes and improved detection of complex variation compared with conventional approaches. The resulting 1KCP pan-variant imputation reference panel enhanced resolution for downstream genetic studies, enabling more comprehensive capture of structural, repeat, and embedded variation across the genome.

Implications For Genomic Research And Precision Medicine

Based on the findings, the 1KCP pangenome represents a major advance in mapping human genetic diversity, capturing rare and complex variants with important disease implications. By improving variant resolution and strengthening genetic association analyses, it could improve variant interpretation and support more comprehensive genetic association and diagnostic analyses.

Overall, it provides a scalable foundation for future genomic research and advances population-informed precision medicine, although some limitations remain in resolving highly repetitive regions and certain variant classes.

Download your PDF copy by clicking here.

Journal Reference

Wang, Y., Duan, Z., Chen, D. et al. (2026). The 1000 Chinese Pangenome empowers medical and population genetics. Nature. DOI: 10.1038/s41586-026-10315-y. https://www.nature.com/articles/s41586-026-10315-y

Posted in: Genomics

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Toshniwal Paharia, Pooja Toshniwal Paharia. (2026, April 16). 1,116 Genomes Mapped to Reveal Hidden Human DNA Variation. AZoLifeSciences. Retrieved on April 16, 2026 from https://www.azolifesciences.com/news/20260416/1116-Genomes-Mapped-to-Reveal-Hidden-Human-DNA-Variation.aspx.

  • MLA

    Toshniwal Paharia, Pooja Toshniwal Paharia. "1,116 Genomes Mapped to Reveal Hidden Human DNA Variation". AZoLifeSciences. 16 April 2026. <https://www.azolifesciences.com/news/20260416/1116-Genomes-Mapped-to-Reveal-Hidden-Human-DNA-Variation.aspx>.

  • Chicago

    Toshniwal Paharia, Pooja Toshniwal Paharia. "1,116 Genomes Mapped to Reveal Hidden Human DNA Variation". AZoLifeSciences. https://www.azolifesciences.com/news/20260416/1116-Genomes-Mapped-to-Reveal-Hidden-Human-DNA-Variation.aspx. (accessed April 16, 2026).

  • Harvard

    Toshniwal Paharia, Pooja Toshniwal Paharia. 2026. 1,116 Genomes Mapped to Reveal Hidden Human DNA Variation. AZoLifeSciences, viewed 16 April 2026, https://www.azolifesciences.com/news/20260416/1116-Genomes-Mapped-to-Reveal-Hidden-Human-DNA-Variation.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoLifeSciences.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Study Demonstrates Information Exchange Between Right-Handed and Mirror DNA