Web-Based Platform Facilitates Chemical Structure Queries in Metabolomics

An international team led by researchers at University of California San Diego and University of California, Riverside has developed a free, web‑based platform designed to make public metabolomics data more accessible.

By allowing users to search for chemical structures across billions of chemical spectra (the unique signatures of molecules) spanning thousands of studies, the tool has the potential to make "big-data" metabolomics as straightforward as a standard internet search. It can be used to discover new metabolites, track drug exposures and connect specific molecules to diseases or environmental sources. The study was published in Nature Biotechnology.

Metabolomics is the large-scale study of small molecules (metabolites like amino acids and lipids) that are the end products of cellular processes. It provides a holistic snapshot of what is happening inside a cell, tissue, organ or entire organism, including biochemical changes driven by genetics, diet, environmental factors or disease.

Until now, searching for specific molecules in public repositories required expert knowledge and was limited to isolated datasets. The new tool, called StructureMASST enables researchers, clinicians and even the public to type in a chemical name, a SMILES string (text that represents 2D and 3D molecular structures) or a sub-structure pattern to instantly locate where those molecules have been documented across human, animal, plant and environmental samples - from recently extinct animals and long-dead dinosaurs to microbial communities on the International Space Station. 

"It will tell you what organs it's found in, which organisms can produce it, what health conditions it's associated with, and what molecules are connected to it," said senior author Pieter C. Dorrestein, PhD, professor at UC San Diego Skaggs School of Pharmacy and Pharmaceutical Sciences and the Departments of Pharmacology and Pediatrics at UC San Diego School of Medicine. 

StructureMASST leverages a massive knowledge base that integrates data from all of the major public metabolomics repositories. To make the data easily searchable, the researchers used indexing technology to tag each chemical spectra from the repositories with their known associations when available, similar to how web search engines work. Tags include the organism (e.g., human, mouse, bacterium), health condition or disease (e.g., inflammatory bowel disease, diabetes, Alzheimer's), sample type (e.g., blood, saliva, soil), geography and environment (e.g., urban vs. rural, marine, soil), gender/sex, and experimental design (e.g., control vs. treatment, dose, time point, disease stage). 

"Search engines allow you to input text and quickly retrieve all the information associated with it because the entire worldwide web has been indexed," said Dorrestein, who also directs the UC San Diego Collaborative Mass Spectrometry Innovation Center. 

 We do essentially the same thing that these web search engines have done, but for molecules."

Pieter C. Dorrestein, Professor, UC San Diego

And like a search engine, indexing enables queries that return results in seconds or a few minutes, a small fraction of the time other methods take. Indexing also makes it possible to search by disease. For example, a search for Alzheimer's disease would retrieve every spectrum linked to the condition across all repositories. 

After building StructureMASST, the researchers put the molecular search engine through its paces with real world examples including well-known compounds, natural products and pharmaceuticals:

  • Caffeine: A single query using the molecular structure of caffeine returned more than 6,000 spectra files, detecting the stimulant not just in samples from coffee plants but also in human blood, milk and even microbial cultures. 
  • Environmental exposure: The tool revealed that the environmental metabolite surfactin, produced by Bacillus subtilis bacteria, is more common in people living in remote, traditional villages compared to urban populations, highlighting how lifestyle and environment shape the human metabolome.
  • Bacterial siderophores: Sub‑structure searches revealed that iron-scavenging compounds produced by certain bacteria are present in human patients with chronic conditions like cystic fibrosis and rheumatoid arthritis, suggesting that these molecules may play a role in immune regulation or trigger opportunistic infections within the human body.
  • Drug distribution: Using the tool to track the cardiac drug amiodarone and its metabolites across dozens of human tissues provided a detailed view of drug exposure and metabolism that could inform safety monitoring. 

In addition to its search capabilities, StructureMASST includes built-in quality control features that flag erroneous data in public libraries that could otherwise lead to false conclusions. It is also being continuously updated as the scientific community contributes new information.

By transforming massive, publicly deposited molecular data into practical insights, StructureMASST could become an essential tool for advancing medicine, basic biology and environmental science. It will help generate hypotheses, uncover new information about metabolism, and speed up the discovery of molecular biomarkers of disease and therapeutic targets.

Source:
Journal reference:

El Abiead, Y., et al. (2026). Structure-centric searching enables global mapping of the public metabolome. Nature Biotechnology. DOI: 10.1038/s41587-026-03082-8. https://www.nature.com/articles/s41587-026-03082-8.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoLifeSciences.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Hackensack Meridian Center for Discovery and Innovation Enhances Skin Fibrosis Research with Efficient Serological Pipet Controller