Study analyzes the risk of re-identification of shared genomic data

Download PDF Copy

Reviewed

Reviewed by Emily Henderson, B.Sc.Nov 19 2021

Direct-to-consumer genetic testing allowed millions of people to ascertain their ancestry and obtain awareness of their genetic pre-disposition to inherited diseases. Even though individual genotyping information is saved securely, some individuals consent to exchange their genomic data for further research.

Study analyzes the risk of re-identification of shared genomic data

It’s sometimes possible to link public face images with public genomic data, but the success rates are well below what prior research papers suggest in idealized settings, new research from Yevgeniy Vorobeychik’s lab shows. Image Credit: iStock photo.

This exchange of data has led to certain valid concerns on genomic privacy. For instance, can hackers reidentify a person—possibly construct a picture of their face—depending on genotype data downloaded legally from open-source web platforms?

In 2017, genomics-based health intelligence company Human Longevity and other research groups declared that it was possible to predict an individual’s facial appearance from their DNA.

Fascinated by the privacy risk effects of this research, Washington University in St. Louis faculty member Yevgeniy “Eugene” Vorobeychik, a specialist in applying game theory to evaluate privacy risks in data sharing settings, initiated research.

We wanted to see to what extent these results can generalize to the real world. We explored whether it was possible to demonstrate in a more practical situation that these concerns were real.”

Yevgeniy Vorobeychik, Associate Professor, Computer Science & Engineering, McKelvey School of Engineering

Vorobeychik and his associates—WashU graduate student Rajagopal Venkatesaramani and Vanderbilt University Biomedical Informatics Professor Bradley Malin—identified that the task of associating faces and genomes is much tougher on average than reported earlier. The observations were published on November 17^th, 2021, in the journal Science Advances.

The researchers devised a technique to evaluate the risk of reidentifying people from a cautiously curated dataset of 126 genomes. The genomes were procured from the OpenSNP genome-sharing platform and linked to publicly posted face images.

The researchers employed neural network models to foretell visible physical traits, like eye and skin color, hair, and sex, and utilized information along with known genotype-trait correlations to score feasible genome-face matches.

Previous researches on phenotype association employed high-quality photos captured in a lab setting with professional quality lighting. But, Vorobeychik’s group carried out their research with the help of real-world photographs found on social media sites.

What we did was construct probabilistic models for these different kinds of visual characteristics and essentially connected the dots by scoring the matching quality between particular genomes and particular faces. We then used that scoring system to predict which matches are most likely.”

Yevgeniy Vorobeychik, Associate Professor, Computer Science & Engineering, McKelvey School of Engineering

The results indicated that at times it is possible to associate public face images with public genomic data, however, the success rates are considerably lesser than what earlier studies indicated in idealized settings.

Vorobeychik stated, “However, our observations are about average privacy risk for a collection of individuals; it is possible that for some people the privacy risk is indeed high.”

To safeguard an individual’s privacy, Vorobeychik’s and his group developed a method that changes a social media photo barely enough to stop the neural network from identifying visible traits. This decreases the risk of the people publicly releasing their genomic data and whose image shows up online.

Our method adds enough imperceptible noise to the image so it’s difficult for a deep neural network to link the phenotype of the face to a particular genome. This carefully crafted noise doesn’t change one’s perception of [the face] to the naked eye.”

Yevgeniy Vorobeychik, Associate Professor, Computer Science & Engineering, McKelvey School of Engineering

This tool can be further developed into image filters that people can utilize to safeguard their social media photos from hackers who could associate their images to genetic data that they shared publicly on OpenSNP or other online sites.

Source:

Washington University in St. Louis

Journal reference:

Venkatesaramani, R., et al. (2021) Re-identification of individuals in genomic datasets using public face images. Science Advances. doi.org/10.1126/sciadv.abg3296.

Posted in: Genomics | Life Sciences News