A New Statistical Method to Mitigate AI Bias in Genetic Research

Researchers at the University of Wisconsin–Madison warn that increasing use of artificial intelligence in genetics and medicine could lead to incorrect conclusions about the relationship between genes and physical traits, such as disease risk factors for conditions like diabetes.

The application of AI in genome-wide association studies, which seek to identify connections between genetic variations and physical traits, has been linked to inaccurate predictions. These studies, often involving large datasets from sources like the UK Biobank and the NIH’s All of Us project, aim to find potential links between genetic differences and specific diseases. However, these databases often lack detailed data on some medical conditions that researchers wish to study.

Complexities of Genetic Disease Links

While genetics play a role in many medical conditions, the connection between genetic variations and physical traits is often complex. Genome-wide association studies have made progress in identifying some genetic links to disease by using extensive databases. However, gaps remain in the data available for certain health conditions, limiting the statistical strength of some findings.

“Certain traits are costly or difficult to measure, so we often don’t have sufficient samples to draw reliable statistical conclusions about their genetic associations,”

Qiongshi Lu, Associate Professor in Biostatistics at UW–Madison.

Risks of Relying on AI to Fill Data Gaps

To address data gaps, researchers increasingly rely on advanced machine learning models to predict complex traits and disease risks with limited data. However, Lu and his team have shown that this approach carries risks if biases in AI models are not addressed. In a recent study published in Nature Genetics, they demonstrate how a widely-used machine learning technique can inadvertently link numerous genetic variants to Type 2 diabetes risk.

“If you trust the AI-predicted diabetes risk as the actual risk, you may believe all these genetic variations are correlated with diabetes, even when they are not,” Lu explains.

New Statistical Method to Reduce AI-Generated False Positives

Lu and his colleagues not only identify the risks of over-reliance on AI tools but also propose a new statistical method to help reduce false positives in AI-assisted genome-wide association studies. This approach, described as “statistically optimal,” helps counteract potential biases in machine learning models and provides more reliable results in studies where data is limited.

“This new strategy is statistically optimal,” Lu notes, adding that they used it to more accurately identify genetic connections with bone mineral density.

Beyond AI: Issues in Proxy-Based Genome-Wide Association Studies

In addition to AI-related challenges, Lu’s team found issues with studies that fill data gaps using proxy data rather than direct measurements. For instance, some researchers attempt to link genetics to Alzheimer’s disease risk by using family health history as a substitute for actual diagnostic data. This approach can lead to misleading correlations, such as an erroneous link between higher cognitive abilities and Alzheimer’s risk.

“Today’s genomic researchers often work with biobank datasets containing hundreds of thousands of individuals,” Lu explains. “While this increases statistical power, it also raises the potential for bias and error in large datasets. Our recent studies emphasize the need for rigorous statistical approaches in biobank-scale research.”

Source:
Journal references:

‌Miao, J., et al. (2024) Valid inference for machine learning-assisted genome-wide association studies. Nature Genetics. doi.org/10.1038/s41588-024-01934-0.

Wu, Y., et al. (2024) Pervasive biases in proxy genome-wide association studies based on parental history of Alzheimer’s disease. Nature Genetics. doi.org/10.1038/s41588-024-01963-9.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoLifeSciences.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Genetic Signatures Driving Early Type 2 Diabetes in South Asians Revealed