Thanks to innovative algorithms and developments in computer technology, machines can currently learn complicated structures and even produce high-quality synthetic data, like photo-realistic images as well as resumes of fictional human beings.
A chromosome emerges from random digital noise. Image Credit: Burak Yelmen.
Recently published in the international journal, PLOS Genetics, a new study utilizes machine learning to extract prevailing biobanks and create human genomes that are not a part of real humans but still have the features of real genomes.
Existing genomic databases are an invaluable resource for biomedical research, but they are either not publicly accessible or shielded behind long and exhausting application procedures due to valid ethical concerns. This creates a major scientific barrier for researchers. Machine-generated genomes, or artificial genomes as we call them, can help us overcome the issue within a safe ethical framework.”
Burak Yelmen, Study First Author and Junior Research Fellow, Modern Population Genetics, University of Tartu
The multidisciplinary research team carried out numerous studies to evaluate the quality of the genomes produced when compared to the actual ones.
Surprisingly, these genomes emerging from random noise mimic the complexities that we can observe within real human populations and, for most properties, they are not distinguishable from other genomes from the biobank we used to train our algorithm, except for one detail: they do not belong to any gene donor.”
Dr Luca Pagani, Study Senior Author and Mobilitas Pluss Fellow, Estonian Research Council
In addition, the study involves the evaluation of the proximity of artificial genomes to actual genomes to check if the privacy of the original specimens is maintained.
Although detecting privacy leaks among thousands of genomes could appear as looking for a needle in a haystack, combining multiple statistical measures allowed us to check all models carefully. Excitingly, the detailed exploration of complex leakage patterns can lead to improvements in generative model evaluation and design, and will fuel back the machine learning field.”
Dr Flora Jay, Study Coordinator and CNRS researcher, in the Interdisciplinary Computer Science Laboratory (LRI/LISN, Université Paris-Saclay, French National Centre for Scientific Research)
On the whole, machine learning techniques have provided a handful of fictional humans with faces, biographies, and many other features, and now researchers know more about their biology.
Such imaginary human beings with realistic genomes could act as proxies for all actual genomes that are not freely available or need extensive application processes or partnerships, thus eliminating a significant accessibility hindrance in genomic studies, specifically for under-represented populations.
Yelman, B., et al. (2021) Creating artificial human genomes using generative neural networks. PLOS Genetics. doi.org/10.1371/journal.pgen.1009303.