As humans, we each have trillions of cells. And each cell has a nucleus with individual genetic information –DNA – that can mutate to create an abnormality. If a human is born with an abundance of abnormalities within cells, or if mutations develop over time, disease ensues.
To make this even more complicated, cells are often a mixture of both abnormal and normal DNA – a mosaic, so to speak, and like the art form, this complex montage is difficult to understand. However, a research team led by Joseph Gleeson, MD, Rady Professor of Neuroscience at UC San Diego School of Medicine and director of neuroscience research at the Rady Children's Institute for Genomic Medicine, has been using the Triton Shared Computing Cluster (TSCC) at San Diego Supercomputer Center (SDSC) at UC San Diego for data processing and model training to unveil new methods for DNA mosaic recognition.
Gleeson and his team recently discovered new genes and pathways in the malformation of cortical development, a spectrum of disorders that cause up to 40 percent of drug-resistant focal epilepsy. Their research shows how computer-generated models can efficiently mimic human recognition work in a much more efficient manner and was published this week in Nature Genetics. A related study was published earlier this month in Nature Biotechnology.
We started with a trial allocation on SDSC's Comet supercomputer many years ago and have been part of the TSCC community for almost a decade. TSCC allows us to plot models generated by a computer recognition program called DeepMosaic and these simulations allowed us to realize that once we trained the supercomputer program to identify abnormal areas of cells, we were able to quickly examine thousands of mosaic variants from each human genome – this would not be possible if done with the human eye."
Xiaoxu Yang, postdoctoral researcher at Dr. Gleeson's Laboratory of Pediatric Brain Disease
This type of computer-generated knowledge is known as convolutional neural network-based deep learning and has been around since the 1970s. Back then, neural networks were already being built to mimic human visual processing. It has just taken a few decades for researchers to develop accurate, efficient systems for this type of modeling.
"The goal of machine learning and deep learning is often to train the computers for prediction or classification tasks on labeled data. When the trained models are proven to be accurate and efficient, researchers would use the learned information – rather than manual annotation to process large amounts of information," explained Xin Xu, a former undergraduate research assistant in Gleeson's lab and now a data scientist at Novartis. "We have come a long way over the past 40 years in developing machine learning and deep learning algorithms, but we are still using that same concept that replicates the human's ability to process data."
Xu is referring to the knowledge needed for better understanding diseases caused when abnormal mosaics overtake normal cells. Yang and Xu work in a laboratory that aims to do just that – better understand these mosaics that lead to diseases – such as epilepsy, congenital brain disorders and more.
"Deep learning approaches are a lot more efficient and their ability to detect hidden structures and connections within the data sometimes even surpass human ability," Xu said. "We can process data so much faster in this way, which leads us more quickly to needed knowledge."
Yang, X., et al. (2023) Control-independent mosaic single nucleotide variant detection with DeepMosaic. Nature Biotechnology. doi.org/10.1038/s41587-022-01559-w.