The “Unknome” Dataset Contains Protein Sequences That Are Still Unknown

Researchers from the UK anticipate that a new, freely accessible database they have developed will get smaller rather than bigger over time. This is due to the fact that it is a compilation of the many understudied proteins that are encoded by genes in the human genome, whose existence is recognized but whose activities are mostly unknown.

Image Credit: vchal/

Image Credit: vchal/

The “unknome” database, created by Sean Munro of the MRC Laboratory of Molecular Biology in Cambridge, England, Matthew Freeman of the Dunn School of Pathology at the University of Oxford, and others, is detailed in the open access journal PLOS Biology. They discovered via their own investigation that the bulk of the proteins in the database supports critical biological processes, such as growth and resistance to stress.

The human genome’s sequence has shown that it presumably encodes hundreds of protein sequences, albeit their identities and functions are yet unknown. This is due to several factors, such as the propensity to concentrate limited research funds on targets that are already well-known and the dearth of tools, such as antibodies, to ask cells about the function of these proteins.

The hazards of disregarding these proteins, however, are substantial, according to the scientists, since it is probable that some if not many, play crucial roles in crucial cell processes and could provide information as well as potential targets for therapeutic intervention.

The unknome database (, which assigns each protein a “knownness” score based on information from the scientific literature about function, conservation across species, subcellular compartmentalization, and other factors, was developed by the authors to encourage more rapid exploration of such proteins.

Thousands of proteins have almost no known information based on this method. Along with proteins from the human genome, model organism proteins are also mentioned. The database is freely accessible to everyone and customizable, enabling users to assign their own weights to various components and produce their own set of knownness scores to organize their own research.

The scientists selected 260 human genes with knownness scores of 1 or less in both species, indicating that little or no information was known about them, to evaluate the database's usefulness. These genes were selected because they had equivalent genes in flies and were found in humans.

The discovery that a significant portion of them contributed to crucial processes influencing fertility, development, tissue growth, protein quality control, or stress resistance was made by partial knockdowns or tissue-specific knockdowns of the genes for many of them, for which a complete knockout of the gene was incompatible with life in the fly.

The findings imply that millions of fly genes still lack even the most fundamental understanding after decades of in-depth research, and the same is undoubtedly true for the human genome.

These uncharacterized genes have not deserved their neglect. Our database provides a powerful, versatile and efficient platform to identify and select important genes of unknown function for analysis, thereby accelerating the closure of the gap in biological knowledge that the unknome represents.

Sean Munro, Group Leader, Medical Research Council Laboratory of Molecular Biology

Munro added, “The role of thousands of human proteins remains unclear and yet research tends to focus on those that are already well understood. To help address this we created an Unknome database that ranks proteins based on how little is known about them, and then performed functional screens on a selection of these mystery proteins to demonstrate how ignorance can drive biological discovery.

Journal reference:

Rocha, J. J., et al. (2023). Functional unknomics: Systematic screening of conserved genes of unknown function. PLOS Biology.


The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoLifeSciences.
Post a new comment

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Protein RAD51 Acts as Guardian Against DNA Over-Replication