Deep machine learning adds to the knowledge of one million molecules’ bioactivity

The Structural Bioinformatics and Network Biology team at IRB Barcelona has developed a tool that predicts the biological activity of chemical compounds, which is crucial information for assessing their therapeutic potential.

Deep machine learning adds to the knowledge of one million molecules’ bioactivity
Detail of the compound collections. The first column shows the chemical diversity of the projections. Blue denotes high diversity and red high structural similarity between neighboring compounds. Image Credit: Nature Communications.

Scientists have inferred experimental data for a million compounds using artificial neural networks and produced a bundle of tools to make estimations for any kind of molecule. The findings were reported in the Nature Communications journal.

Using deep machine-learning computational models, the Structural Bioinformatics and Network Biology team, led by ICREA Researcher Dr. Patrick Aloy, has finished the bioactivity information for a million molecules. It also revealed a technique that can predict any molecule’s biological activity even when no experimental data is provided.

This new approach is based on the Chemical Checker, which was developed by the same lab and published in 2020 as the world’s largest database of bioactivity profiles for fake medicines. For each molecule, the Chemical Checker collects data from 25 bioactivity areas.

These areas are linked to the molecule’s chemical structure, the targets with which it interacts, and the clinical or cellular alterations it causes.

However, for most compounds, this very precise information regarding the mechanism of action is insufficient, meaning that information for one or two spaces of bioactivity but not for all 25 is available.

With this new invention, researchers combine all available experimental data with deep machine learning approaches to complete all activity profiles for all compounds, from chemistry to clinical level.

The new tool also allows us to forecast the bioactivity spaces of new molecules, and this is crucial in the drug discovery process as we can select the most suitable candidates and discard those that, for one reason or another, would not work.”

Dr Patrick Aloy, Researcher, ICREA

The software library is provided to the scientific community for free at, and researchers will continue to improve it when more biological activity data becomes available. Artificial neural networks will be updated to refine the estimate with each update of experimental data in the Chemical Checker.

Predictions and reliability

The model’s bioactivity predictions are more or less reliable based on a variety of circumstances, including the amount of experimental data available and the molecule’s properties.

The method built by Dr. Aloy’s team not only predicts characteristics of activity at the biological level but also provides a measure of the degree of predictability for each molecule.

All models are wrong, but some are useful! A measure of confidence allows us to better interpret the results and highlight which spaces of bioactivity of a molecule are accurate and in which ones an error rate can be contemplated.”

Dr Martino Bertoni, Study First Author, IRB Barcelona

Testing the system with the IRB Barcelona compound library

To test the tool, the researchers looked through the IRB Barcelona compounds library for potential drug candidates to modulate the activity of a cancer-related transcription factor (SNAIL1), whose activity is nearly impossible to modulate due to direct drug binding (it is considered an “undruggable” target).

Deep machine learning models predicted features (in their dynamics, interaction with target cells and proteins, and so on) for 131 compounds that fit the target from a preliminary collection of 17,000 chemicals.

The ability of these compounds to degrade SNAIL1 has been proven experimentally, and it has been observed that this degradation capability is compatible with what the models predicted for a large percentage of the time, thus validating the system.

The Government of Catalonia, the Spanish Ministry of Science and Innovation, the European Research Council, the European Commission, the State Research Agency, and the ERDF all contributed to this project.

Journal reference:

Bertoni, M., et al. (2021) Bioactivity descriptors for uncharacterized chemical compounds. Nature Communications.


The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoLifeSciences.
Post a new comment
You might also like...
Study shows how optical and magnetic tweezers revolutionize thermodynamic research