Soybeans outmatch all other legumes as the protein powerhouses of the plant kingdom, providing a key protein source for humans and livestock around the world. And now, after 30 years, University of Illinois scientists have identified the gene with the largest impact on seed protein in soybean.
"Soybeans are around 40% protein, and this gene increases that about 2%. It doesn't sound like a lot, but compared to any other seed-protein gene that's been mapped for soybean, it's at least double," says Brian Diers, the Charles Adlai Ewing Chair of Soybean Genetics and Breeding in the Department of Crop Sciences and co-author of the study in The Plant Journal.
If we could put the high protein form of the gene into commercially grown varieties, we would be looking at a significant increase in protein for livestock and humans worldwide as even a single percentage point increase in protein concentration would represents millions of tons of protein. That's quite significant."
Matt Hudson, Co-Author, Professor of Bioinformatics in Crop Sciences
In 1992, then-graduate-student Diers published the first seed protein map for soybean. Although he identified the region of the genome where the gene might be located, it took three decades, many technological advances, and the publication of two soybean genomes to nail down the specific gene: Glyma.20G85100, a gene without a known function but closely related to "clock and circadian timing" genes.
"It's satisfying to make the journey from being an eager young grad student, all excited about this finding, to finally determining what the gene is," Diers says. "But if I go back to myself 30 years ago, I could never have imagined it would have taken this long. But better late than never."
Pinpointing a gene like this is complicated because it's one of many quantitative trait loci: locations within the genome contributing to continuous traits like plant height, yield, or in this case, protein content.
Researchers have to grow the plants, measure protein content, and then drill down into the genome to find correlated genetic differences among plants with different amounts of protein. Those genetic differences might not be detectible, or they might only be traceable to large sections of the genome.
Diers says he originally mapped the gene to a section of a chromosome several million base pairs of DNA long. But by testing generation after generation of plants carrying the gene within smaller genetic regions, he slowly narrowed it down.
"We had to screen thousands and thousands of plants and then evaluate them with markers to see if we found an association. It was very laborious, and we had many students and postdocs working on this over the years," Diers says.
Like most genes, Glyma.20G85100 comes in multiple forms, or alleles. Depending on the allele found in a particular soybean line, seed protein content can be high or low. And, as it turns out, most commercial soybean lines contain the low-protein allele.
"Unfortunately, we found the high-protein allele has a deleterious effect on yield. So elite varieties, which are bred for high yield, generally have the low-protein form," Diers says.
The discovery of the gene is complicated by a murky link between the gene and its role in increasing protein content.
"We were hoping that when we finally found the gene, it was going to be involved in something obvious, for example, nitrogen fixation or nitrogen metabolism," Diers says. "But it turns out it really isn't what you would expect for a gene controlling a protein."
Instead, the gene appears to be part of the soybean plant's circadian machinery; the way the plant keeps track of time to maximize photosynthesis during the day, figure out when to flower and set seed, and many other processes.
"It's absolutely a standard part of the circadian clock that's conserved between nearly all plants. It looks like a transposon, or a jumping gene, landed in that circadian clock gene and inserted a whole bunch of new amino acids in the middle of the conserved domain," Hudson says. "It could be that the gene is involved in moving photosynthesis products into the seed or it could be some completely unrelated pathway. It's weird, and we really don't know."
Regardless of how it works, identifying the gene with the biggest single contribution to soybean protein content could have major consequences for global food security.
"If we can understand the mechanism, that should give us some clues as to how we can increase protein without decreasing yield," Diers says.
Hudson adds, "There are significant issues with protein deficiency in many parts of the world. Even a modest increase in protein could go a long way."
Fliege, C.E., et al. (2022) Fine mapping and cloning of the major seed protein quantitative trait loci on soybean chromosome 20. The Plant Journal. doi.org/10.1111/tpj.15658.