A key conundrum in biology is why many very simple creatures, such as the widely studied worm C. elegans, have almost as many genes as the human genome, which has about 20,000 genes.
What causes the dramatic increase in complexity between the two species, if not genes alone?
Proteomics, an area of study that focuses on identifying and classifying the protein building blocks that constitute each individual cell, could be the key to one solution.
Human genes behave like strongly compressed files, where a single gene can code for hundreds of different proteins, each of which performs specific roles in the body, as opposed to one gene coding for one protein with one purpose.
This ability, called alternative splicing, is present in up to 95% of human genes.
The vast number of protein variations created by the human body, as well as the human proteome, are quantified in a new study published on March 23rd, 2023 in the journal Nature Biotechnology. An essential component of biology is proteomics, which provides insight into how defective proteins affect health.
The research team created a technique called “deep proteome sequencing” that offers an unprecedented characterization of the proteins that appear in regular proteomics experiments. Joshua Coon, a professor of biomolecular chemistry at the University of Wisconsin-Madison and an investigator at the Morgridge Institute for Research, led the project.
Six distinct kinds of human cells were employed in the investigation, along with six proteases, which are enzymes that fragment proteins into smaller fragments (peptides) that can subsequently be detected in the experiment.
Following that, using several mass spectrometry techniques—the gold standard for identifying proteins—the scientists examined the peptides.
From 17,717 distinct protein groups, the researchers were able to identify almost 1 million peptides. They were able to detect almost 80% of the individual protein sequences in those samples using the data; this is a significant improvement over current techniques, which sequence just about 20% of proteins.
The ultimate goal of proteomics is to obtain this more comprehensive picture.
In the field of mass spectrometry and proteomics, there has always been a goal of detecting all proteins that are present in a sample, then fully sequencing all the individual proteins present. But we really haven’t been detecting the whole protein, just small pieces of it.”
Joshua Coon, Investigator, Thomas and Margaret Pyle Chair in Metabolism, Morgridge Institute for Research
Coon is also a professor of biomolecular chemistry at the University of Wisconsin-Madison.
Coon added, “Data generated from this study represent the deepest proteomics map collected to date,” Coon adds. “These methods and resources lay the foundation for comprehensive mapping of protein diversity and are expected to catalyze future research efforts.”
Deep-sequencing.app, an online tool created by the study team and made available to the general public, allows researchers to query any gene and look up the matching peptides and protein modifications that are connected to it.
The Max Planck Institute of Biochemistry in Germany, the University of Toronto in Canada, and the Garvin Institute in Australia all contributed significantly to the project, which was principally funded by the National Institutes of Health.
For a study that produced more than five terabytes of data over ten years, Pavel Sinitcyn, a scientist at the Max Planck Institute who is currently a postdoc at the Coon Lab and a Morgridge Interdisciplinary Postdoctoral Fellow, was in charge of the extensive data analysis. Investigator Benjamin Blencowe offered his knowledge of alternative splicing in Toronto.
Since alternative splicing can be extremely difficult to see at the protein level, scientists are split on the extent to which it contributes to protein variety. The Coon Lab initiative is the first to focus particularly on the actual proteins themselves as a source of splicing event evidence.
The majority of the alternative splicing that was revealed at the RNA stage of gene expression, they discovered, is also present in the proteins.
Coon added, “I think this knowledge tells us that, yes, these ideas about splicing—allowing the cell to have this repertoire of proteins for distinct purposes—are now validated. This is the first time we have been able to measure it and prove it.”
Sinitcyn worked at the Max Planck Institute in the laboratory of Jurgen Cox, a renowned expert in computational mass spectrometry and bioinformatics. To be able to identify evidence of single amino acid variations and alternative splicing in the mass spec data, Sinitcyn created software solutions.
We are dealing with more than five terabytes of data from heterogeneous sources, so our first problem was to find a way to account for the high probability of generating false positives. But the second problem, the exciting one, was actually to demonstrate how relevant this dataset could be for important biological questions.”
Pavel Sinitcyn, Postdoctoral Researcher, Morgridge Institute for Research
Sinitcyn, P., et al. (2023). Global detection of human variants and isoforms by deep proteome sequencing. Nature Biotechnology. doi.org/10.1038/s41587-023-01714-x