By Pooja Toshniwal PahariaReviewed by Lauren HardakerApr 16 2026
A massive new pangenome maps hidden DNA variation across thousands of genomes, revealing how complex genetic changes shape human biology and disease risk.
Study: The 1000 Chinese Pangenome empowers medical and population genetics. Image credit: vectorfusionart/Shutterstock.com
A recent study published in Nature reports the creation of one of the most comprehensive human pangenomes to date, based on diploid genome assemblies from 1,116 individuals recruited from a health examination cohort in Wenzhou, China.
Developed under the 1000 Chinese Pangenome (1KCP) project, this extensive resource captures a broader spectrum of genetic diversity, including rare and complex variants that are difficult to represent or detect with linear reference genomes and conventional short-read approaches. By expanding non-reference sequences and improving variant representation, the 1KCP pangenome provides a resource to support studies of human genetics, disease-relevant genetic variation, and gene regulatory effects.
Limitations Of Conventional Sequencing Approaches
Researchers are continuously working to capture the full extent of human genetic diversity and its relevance to health. Short-read sequencing reliably detects small variants but often overlooks larger, complex genomic alterations such as structural variants (SVs) and tandem repeats (TRs).
In contrast, long-read sequencing combined with advanced assembly methods enables more complete diploid genome reconstruction, improving resolution of these regions. However, most current pangenomes are based on limited sample sizes and underrepresent rare, complex, and population-specific variants.
Building A Large-Scale Chinese Pangenome Resource
In this study, researchers built a large, population-scale pangenome and systematically characterized genetic variation.
The team integrated short-read and long-read whole-genome sequencing (WGS) with Hi-C data for a subset of high-coverage samples to support diploid assembly and phasing, along with RNA sequencing to characterize genetic variation across the cohort, and then performed principal component analysis to assess population structure. Using these data, they generated high-quality diploid genome assemblies by integrating de novo assembly of deeply sequenced samples with a pangenome-informed assembly workflow (PIGA) to scale across the cohort.
The investigators used a population-guided assembly strategy that combined variant calling, haplotype phasing, and personalized reference construction to improve the detection of complex variants. They then built 1KCP, capturing sequences absent from the reference genomes GRCh38 and CHM13. Comprehensive quality control, including k-mer-based filtering and benchmarking, ensured assembly accuracy and variant reliability.
Comprehensive Mapping Of Genomic Variation
Using path-guided pangenome annotation, the authors mapped genomic elements across sequences and characterized diverse variant types. These included single-nucleotide variants, SVs, TRs, and embedded variants. They also visualized complex genomic loci, compared findings with genome-wide association study data, and conducted detailed analyses of highly polymorphic immune-related regions.
The team annotated genes, repeats, and regulatory elements and identified diverse variant classes, including single-nucleotide variants, structural variants, tandem repeats, and embedded variants. They further examined medically relevant genes and complex genomic regions, and performed expression quantitative trait locus (eQTL) analyses using matched RNA sequencing data to assess functional effects. Lastly, the researchers developed a pan-variant imputation reference panel to improve resolution in genetic association studies.
Large-Scale Genome Assemblies And Novel Sequences
The team produced 1,116 high-quality genome assemblies, comprising 55 de novo and 1,061 produced using a pangenome-informed strategy. Collectively, these represented 2,232 haplotypes, with genomes averaging approximately 3.0 Gb. The 1KCP pangenome spanned 3.74 Gb and revealed approximately 405 million base pairs absent from the complete hydatidiform mole 13 (CHM13) and the genome reference consortium human build 38 (GRCh38). Of these base pairs, nearly 26 million were predicted to have functional roles in genes or regulatory regions.
Discovery Of Rare And Complex Genetic Variants
Across this resource, the authors identified extensive and previously undercharacterized patterns of human genomic variation, comprising 110,530 SVs, 35 million small genomic variants, 485,575 TRs, and nearly 0.9 million variants nested in complex genomic regions. Approximately one-third (33.3 %) of SVs were previously unreported, with most showing low allele frequencies (≤0.01), underscoring improved sensitivity for population-specific variation.
Beyond variant discovery, the study assessed functional impact across multiple genomic layers. Importantly, functional analyses identified 5,239 exonic structural variants affecting 3,326 protein-coding genes, with enrichment of rare alleles consistent with selective constraint. Analyses of TRs and gene clusters further highlighted widespread variability, particularly in immune-related genomic regions.
Functional Insights From Gene Regulation Analyses
To evaluate regulatory consequences, eQTL mapping revealed 3,256 lead associations involving complex variants, including TRs, SVs, and embedded variants, underscoring the crucial role of structural and repeat variation in gene regulation and expression variability. The authors also observed clinically relevant rare gene-altering SVs in disease-associated genes such as partner and localizer of BRCA2 (PALB2, related to breast cancer) and solute carrier family 34 member (SLC34A3, bone disease). In addition, they detected multiple TR expansions linked to genomic instability and fragile site formation.
Immune Region Diversity And Improved Genetic Tools
At the population level, detailed human leukocyte antigen (HLA) and gene cluster analyses revealed high-resolution haplotype diversity, particularly in immune-related regions, capturing complex linkage patterns that had not been previously resolved.
Lastly, benchmarking confirmed high accuracy across variant classes and improved detection of complex variation compared with conventional approaches. The resulting 1KCP pan-variant imputation reference panel enhanced resolution for downstream genetic studies, enabling more comprehensive capture of structural, repeat, and embedded variation across the genome.
Implications For Genomic Research And Precision Medicine
Based on the findings, the 1KCP pangenome represents a major advance in mapping human genetic diversity, capturing rare and complex variants with important disease implications. By improving variant resolution and strengthening genetic association analyses, it could improve variant interpretation and support more comprehensive genetic association and diagnostic analyses.
Overall, it provides a scalable foundation for future genomic research and advances population-informed precision medicine, although some limitations remain in resolving highly repetitive regions and certain variant classes.
Download your PDF copy by clicking here.
Journal Reference
Wang, Y., Duan, Z., Chen, D. et al. (2026). The 1000 Chinese Pangenome empowers medical and population genetics. Nature. DOI: 10.1038/s41586-026-10315-y. https://www.nature.com/articles/s41586-026-10315-y