Informatics and Data-Driven Approaches in Plant Biology

Learn about the different informatics and data-driven approaches that help us gain deeper insight into plant biology.

Agricultural Technology Concept - AgTech - Intersection Between Biotechnology and Informatics and Agricultural Science

Image Credit: ArtemisDiana/

Over the years, scientists have continued to broaden their horizons on plant research. Different types of scientific data, such as genetic, phenotypic, environmental, genomic, molecular, system biology, and others, have been generated that require high throughput (HT) analysis.

Plant informatics is a branch of informatics that allows the collection, integration, analysis, and visualization of plant-related data. This branch of science is focused on improving agricultural decision-making and supports sustainable agriculture with enhanced crop production.

Importance and Complexities of Big Data in Plant Science Research

Plant breeders have to make diverse decisions during the development of a new variety. These decisions are based on varied biological data and economy. For instance, Syngenta, a leading science-based agricultural company, estimated that their soybean breeders make around 200 binary decisions over the period of six years to develop a single cultivar.

These decisions are interconnected through various stochastic factors, such as genetic, phenotypic, and environmental conditions. Considering all these factors affects the probability of success of the new variety.

The production and accumulation of a massive amount of scientific data, which is referred to as big data, significantly helps in scientific research. Plant science informatics is involved with big data generation, storage, and analysis. The processing of big data presents many challenges. These challenges are described as the 4Vs of Big Data, i.e., velocity, variability, veracity, and volume.

In plant science research, volume and velocity do not impose as much trouble as variability. The genomic complexity of plants and their heterogeneous nature result in significant variable data. 

Large datasets and computational methods have positively transformed crop breeding. The reduction in the cost of generating genomic and HT phenotyping data, and the availability of computationally intensive data analytical tools have significantly revolutionized plant research. Models play a key role in knowledge integration and help predict a phenotypic outcome with higher accuracy. Big data assessment and predictive models have helped plant breeders to accelerate the development of a new variety. 

Plant Science Data Generating Techniques 

To develop effective predictive models that can improve crop sustainability and yield, it is important to produce data that can be used to train the models. Three broad classes of data are important for this purpose, namely, genetic, phenotypic, and environmental. Furthermore, technological advancements in remote-sensing, molecular, computational, and others have significantly increased the volume and variety of data, which helps plant researchers, particularly breeders, enormously. 

Different classes of data are integrated via a varied modeling paradigm to improve plant breeding efforts. The common data-generating techniques and recent advancements involved with plant informatics are discussed below:


Genotyping is a process of determining the DNA sequence or genotype of organisms. This process helps understand the differential genetic makeup across organisms or species. Genotype is the critical link between field-tested materials and untested materials held by breeding companies and international gene banks.

The recent advancements in next-generation sequencing (NGS) technologies, such as mRNA sequencing, have enabled HT discovery of molecular markers. It has also been able to characterize molecular intermediates between DNA and phenotype.

At present, NSG techniques are being used to better understand the importance of quantitative variation at the genomic level (e.g., epigenetic markers, non-coding sequences, and microRNAs) that contribute to phenotypic variations. Simultaneously, large investments are being made to improve computational infrastructure, particularly for storage and data analysis.

Developing cryptographic protocols for genomic data analysis and sharing helped establish collaborations between private and public sectors that leveraged phenotypic and genotypic databases. HT genotyping will allow genetic breeders to consider larger pools of genetic diversity during their breeding decisions.

Plant Genetic Expert researcher holding young Plant for research with other species

Image Credit: Kikujiarm/


The amalgamation of HT phenotyping with machine learning, artificial intelligence, and computer imaging resulted in novel plant phenotypes. At present, the application of HT phenotyping into plant breeding is challenged because of phenotypic differences that could occur between the two fields. Therefore, further research in this field is required to mitigate the aforementioned issue.

The accuracy of HT phenotyping is dependent on data augmentation, multisensory modalities, algorithmic developments, and objective redesign. It must be noted that HT phenotyping data are being used to train and improve predictive models. Application of these models will help plant breeders to select crops that can withstand biotic and abiotic stress and improve yield stability.


Plant growth significantly depends on environmental factors, including weather and climate changes. Many studies have highlighted the interaction between genotype and environment, which significantly influences the phenotype of the plant. While developing new cultivars, plant breeders consider both short-term and long-term environmental predictions. 

At present, environment-based models for agriculture use multidimensional indices and stimulation platforms for better predictions. It must be noted that advancements in environmental-sensing technologies complement HT phenotyping methods. Some of the recent advanced technologies enable linking plant-level phenotypes to microenvironments. Furthermore, it also offers field-level monitoring that assists in agricultural decision-making.

Real World Outcomes and Recent Data Driven Approaches in Agriculture

Data-driven approaches have enabled breeders to understand farmers’ and consumers’ perceptions of quality and desirability while developing a new variety. For instance, during the development of a new tomato variety, sensory testing is implemented. Here, larger sets of crosses that link flavor to the underlying chemical interactions are considered.

Data-driven decentralized breeding or three-dimensional (3D) breeding can significantly scale up varietal testing in larger and varied environmental conditions. This technique has closed the gap between expected and realized gains, particularly in smallholder farming. The 3D breeding technique has been recently used in Ethiopia to develop a wheat variety for small-scale farmers.

The scientific community perceives that the recent collaboration between the Norwich Bioscience Institutes and The Alan Turing Institute would positively improve the application of machine learning and artificial intelligence in plant research. The application of machine learning promises to uncover the hidden genetic patterns of plants, which can alleviate the current problem of reduced crop yield due to poor soil quality and climatic conditions.


Van Etten, J., et al. (2023). Data-driven approaches can harness crop diversity to address heterogeneous needs for breeding products. Proceedings of the National Academy of Sciences. 120(14), e2205771120.

Sharma, N., et al. (2023). Data-driven approaches to improve water-use efficiency and drought resistance in crop plants. Plant Science. 336, 111852.

Leonelli, S. & Williamson, H.F. (2023). Introduction: Towards Responsible Plant Data Linkage. In: Williamson, H.F., Leonelli, S. (eds) Towards Responsible Plant Data Linkage: Data Challenges for Agricultural Research and Development. Springer, Cham.

McAtee, P. A., et al. (2022). A Data Driven Approach to Assess Complex Colour Profiles in Plant Tissues. Frontiers in Plant Science. 12, 808138.

De Sousa, K., et al. (2021). Data-driven decentralized breeding increases prediction accuracy in a challenging crop production environment. Communications Biology. 4(1), pp. 1-9.

Shakoor, N., et al. (2019). Big Data Driven Agriculture: Big Data Analytics in Plant Breeding, Genomics, and the Use of Remote Sensing Technologies to Advance Crop Productivity. The Plant Phenome Journal. 2(1), pp. 1-8.

Further Reading

Last Updated: Dec 28, 2023

Dr. Priyom Bose

Written by

Dr. Priyom Bose

Priyom holds a Ph.D. in Plant Biology and Biotechnology from the University of Madras, India. She is an active researcher and an experienced science writer. Priyom has also co-authored several original research articles that have been published in reputed peer-reviewed journals. She is also an avid reader and an amateur photographer.


Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Bose, Priyom. (2023, December 28). Informatics and Data-Driven Approaches in Plant Biology. AZoLifeSciences. Retrieved on April 19, 2024 from

  • MLA

    Bose, Priyom. "Informatics and Data-Driven Approaches in Plant Biology". AZoLifeSciences. 19 April 2024. <>.

  • Chicago

    Bose, Priyom. "Informatics and Data-Driven Approaches in Plant Biology". AZoLifeSciences. (accessed April 19, 2024).

  • Harvard

    Bose, Priyom. 2023. Informatics and Data-Driven Approaches in Plant Biology. AZoLifeSciences, viewed 19 April 2024,


The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoLifeSciences.
Post a new comment

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.