CNSA Advances Open Biological Data Infrastructure at Petabyte Scale

Advances in sequencing technologies have transformed life science research, enabling multi-layered exploration of biological systems across species, tissues, and developmental stages. However, this data explosion presents major challenges, including standardized storage formats, quality control, cross-database interoperability, and scalable data delivery. Large-scale global genome sequencing projects-such as the Earth BioGenome Project and numerous organism-wide genomic initiatives-depend on reliable systems capable of handling diverse data types ranging from whole genomes to spatial transcriptomes. Traditional repositories alone are insufficient to support these broad and evolving needs. Based on these challenges, deeper research into efficient multi-omics data archiving and open-sharing frameworks is required.

Researchers from the China National GeneBank (CNGB) have published (DOI: 10.1093/hr/uhaf036) the 2024 update of the China National GeneBank Sequence Archive (CNSA) in Horticulture Research on May 1, 2025. The report details major advances in CNSA's data scale, data types, visualization tools, international certification, and role in supporting global multi-omics research. CNSA now archives more than 16.3 petabytes of biological data from over 560 institutions worldwide, making it one of the largest open-access repositories for life science data.

CNSA provides public archiving and open-sharing services for a broad spectrum of biological data, including genome assemblies, raw sequencing reads, gene expression matrices, variation data, metabolomics profiles, viral sequences, and single-cell and spatial transcriptomic datasets. As of August 2024, the repository includes 1,122,067 samples, 1,766,269 sequencing datasets, and 125,855 genome assemblies, representing 7,521 species, supported by 47 sequencing platforms. A key update is the addition of a spatial transcriptomics archiving system, which captures tissue section metadata, image files, barcoding information, and spatial gene expression matrices, integrated with an online viewer that enables cell-type annotation, spatial region segmentation, and cell–cell interaction analysis. CNSA now supports high-speed data access through FTP, HTTPS, and Aspera transfer protocols, and has received formal certifications including CoreTrustSeal, FAIRsharing, and re3data, demonstrating global compliance with data management and preservation standards. CNSA also contributes to major international projects such as the 10KP Plant Genome Project, the Earth BioGenome Project, and the SpatioTemporal Omics Consortium, accelerating discovery across evolution, agriculture, ecology, and human health.

"Open and well-curated biological data resources are essential to advancing global scientific collaboration," the authors noted. "The continued development of CNSA reflects the growing need to archive, preserve, and share complex multi-omics datasets at scale. By integrating quality control systems, standardized metadata formats, visualization platforms, and international interoperability frameworks, CNSA provides researchers worldwide with the tools required to accelerate genome science and biodiversity conservation."

The updated CNSA platform supports broad research applications in plant and animal genomics, crop breeding, evolutionary biology, microbial ecology, medical research, environmental monitoring, and biodiversity protection. Its open-access structure encourages data reuse, reduces duplication, and supports integrative analyses that combine genomics, transcriptomics, phenotyping, and spatial mapping. Future developments will integrate artificial intelligence-assisted data curation, application programming interfaces (APIs), and cloud computing platforms to enable large-scale data analysis without requiring local storage. These advancements will further enhance CNSA's role as a critical global infrastructure for accelerating biological discovery and supporting sustainable management of genetic resources.

Source:
Journal reference:

Wang, W., et al. (2025). The China national GeneBank sequence archive (CNSA) 2024 update. Horticulture Research. DOI: https://doi.org/10.1093/hr/uhaf036. https://academic.oup.com/hr/article/doi/10.1093/hr/uhaf036/8003504

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoLifeSciences.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Study Exposes Ancestry Biases in Current Gene Annotations