AI Models Predict Plant Gene Activity with Unprecedented Precision

Genome sequencing technology provides thousands of new plant genomes annually. In agriculture, researchers merge this genomic information with observational data (measuring various plant traits) to identify correlations between genetic variants and crop traits like seed count, resistance to fungal infections, fruit color, or flavor. However, the grasp of how genetic variation influences gene activity at the molecular level is quite limited. This gap in knowledge hinders the breeding of "smart crops" with enhanced quality and reduced negative environmental impact achieved by combination of specific gene variants of known function.

Researchers from the IPK Leibniz Institute and Forschungszentrum Jülich (FZ) have made a significant breakthrough to tackle this challenge. Led by Dr. Jedrzej Jakub Szymanski, the international research team trained interpretable deep learning models, a subset of AI algorithms, on a vast dataset of genomic information from various plant species. "These models not only were able to accurately predict gene activity from sequences but also pinpoint which sequence parts contribute to these predictions", explains the head of IPK's research group "Network Analysis and Modelling". The AI technology which the researchers applied is akin to that used in computer vision, which involves recognizing facial features in images and inferring emotions.

In contrast to previous approaches based on statistical enrichment, here the researchers combined identification of sequence features with determination of the mRNA copy number in the frame of a mathematical model that has been trained accounting for biological information on gene model structure and sequence homology, thus gene evolution.

We were truly amazed by the effectiveness. Within a few days of training, we rediscovered many known regulatory sequences and found that about 50% of the features identified were entirely new. These models excellently generalized across plant species they were not trained on, making them valuable for analyzing newly sequenced genomes. And we specifically demonstrated their application in diverse tomato cultivars with long-read sequencing data. We pinpointed specific regulatory sequence variations that explained observed differences in gene activity and, consequently, variations in shape, color, and robustness. This is a remarkable improvement over classically used statistical associations of single nucleotide polymorphisms."

Dr. Jedrzej Jakub Szymanski

The team has openly shared their models and provided a web interface for their use. "Interestingly, much effort went into degrading our model's performance. To avoid overly optimistic results due to AI finding shortcuts required from me a deep dive into gene regulation biology to eliminate any potential bias, reduce data leakage and overfitting", says Fritz Forbang Peleke, the lead machine learning researcher and first author of the study, which was published in the journal "Nature Communications".

Dr. Simon Zumkeller, a co-author and evolutionary biologist from FZ Jülich, remarked, "With the presented analyses we can investigate and compare gene regulation in plants and infer its evolution. For practical applications, the method provides a new foundation, too. We are approaching the routine identification of gene regulatory elements in known and newly sequenced plant genomes, in various tissues, and under different environmental conditions."

Journal reference:

Peleke, F. F., et al. (2024). Deep learning the cis-regulatory code for gene expression in selected model plants. Nature Communications.


The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoLifeSciences.
Post a new comment

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Machine Learning Program Offers Insights Into Promoter Sequences and Transcription