Machine Learning Reveals Cancer-Causing DNA Repeats

It has been difficult to identify and characterize repeats of DNA sequences, also known as “junk DNA” or “dark matter,” that are found in chromosomes and may be involved in cancer or other diseases.​​​​

New method called ARTEMIS uses machine learning to shed light on human genome “dark matter” involved in cancer and other diseases. Image Credit: Carolyn Hruban

Researchers at the Johns Hopkins Kimmel Cancer Center have now created a novel method that employs machine learning to recognize these components in both cancerous tissue and cell-free DNA (cfDNA), which are pieces of DNA that are released from tumors and drift through the bloodstream.

This new technique may offer a noninvasive way to find cancers and track how well a treatment is working. Artificial intelligence in the form of machine learning leverages data and computer algorithms to expedite research and carry out intricate tasks.

The technique known as ARTEMIS (Analysis of RepeaT EleMents in dISease) was used in lab experiments to study over 1,200 different types of repeat elements, which make up almost half of the human genome. The results showed that many repeats that were not previously thought to be linked to cancer were changed during the formation of tumors.

Changes in these elements in cfDNA were also detected by the researchers, opening up new avenues for the detection and localization of cancer in the body. Science Translational Medicine published the study.

When you think about existing cancer genes and the DNA sequences around them, they are just chock full of these repeats."

Victor E. Velculescu, Professor and Co-Director, Department of Oncology, Johns Hopkins University

Velculescu led the study with Akshaya Annapragada, an M.D./Ph.D. student at the Johns Hopkins University School of Medicine, and Robert Scharpf, Ph.D., an Associate Professor of oncology at Johns Hopkins.

Until ARTEMIS, this dark matter of the genome was essentially ignored, but now we are seeing that these repeats are not occurring randomly, and they end up being clustered around genes that are altered in cancer in a variety of different ways, providing the first glimpse that these sequences may be key to tumor development.”

Victor E. Velculescu, Professor and Co-Director, Department of Oncology, Johns Hopkins University

The researchers first looked at the distribution of 1.2 billion km, or short DNA sequences that define unique repeats, in a series of lab tests. They discovered that these km were enriched in genes that are frequently changed in human cancers.

For instance, 487 of the 736 genes known to cause cancer had an average number of repeat sequences that was 15 times higher than predicted. Additionally, there was a significant increase in these repeat sequences in genes related to cell signaling pathways that are frequently dysregulated in cancers.

The researchers also looked for direct alterations in repeat sequences in cancers using next-generation sequencing technology, which enables researchers to quickly analyze the sequences of entire genomes.

A median of 807 altered elements was found in each tumor after using ARTEMIS to analyze over 1,200 different types of repeat elements in tumor and normal tissues from 525 patients with various cancers who were part of the Pan-Cancer Analysis of Whole Genomes (PCAWG).

Among these, 820 out of 1,280, or nearly two-thirds, had never before been found to be changed in human cancers. Next, they summarized genome-wide repeat element alterations that were predictive of cancer for each sample by using a machine-learning model to create an ARTEMIS score.

ARTEMIS scores, with 1 denoting a perfect score, showed a high performance (AUC = 0.96) across all cancer types analyzed in differentiating the tumors from normal tissues in the 525 PCAWG participants. Regardless of the type of tumor, higher ARTEMIS scores were linked to shorter overall and progression-free survival.

The researchers assessed ARTEMIS's potential for cancer noninvasive detection. Using blood samples from 287 participants in the Danish Lung Cancer Screening Study (LUCAS) who had lung cancer and those who did not, they applied the tool. ARTEMIS assigned an area under the curve (AUC) of 0.82 to patients who had lung cancer.

However, the combination model classified patients with lung cancer with an AUC of 0.91 when combined with another technique known as DELFI (DNA evaluation of fragments for early interception) an assay that was previously developed by Velculescu, Scharpf, and other members of their group and that detects changes in the size and distribution of cfDNA fragments across the genome.

Similar results were found in a group of 208 people who were at risk for liver cancer. With an AUC of 0.87, ARTEMIS was able to identify those who had liver cancer in addition to those who had cirrhosis or viral hepatitis. The AUC rose to 0.90 when DELFI was added.

Lastly, they assessed the possibility of using the ARTEMIS blood test to pinpoint the exact location of a tumor's genesis in cancer patients. After being trained using data from PCAWG participants, the tool was able to identify the origin of tumor tissues among 12 tumor types with an average accuracy of 78%.

After that, the researchers used ARTEMIS and DELFI together to evaluate blood samples from 226 patients who had tumors of the breast, ovary, lung, colorectal, bile duct, stomach, or pancreas. In this instance, the model's accuracy in classifying patients among the various cancer types ranged from an average of 68% to 83% when it was given the option to recommend two tumor types rather than just one.

Our study shows that ARTEMIS can reveal genome-wide repeat landscapes that reflect dramatic underlying changes in human cancers, and by illuminating the so-called ‘dark genome,’ the work offers unique insights into the cancer genome and provides a proof-of-concept for the utility of genome-wide repeat landscapes as tissue and blood-based biomarkers for cancer detection, characterization, and monitoring.”

Akshaya Annapragada, M.D./Ph.D. Student, School of Medicine, Johns Hopkins University

The strategy will next be assessed in larger clinical trials, Velculescu said, “You can imagine this could be used for early detection for a variety of cancer types, but also could have uses in other applications such as monitoring response to treatment or detecting recurrence. This is a totally new frontier.”

Journal reference:

Annapragada, V. A., et al. (2024) Genome-wide repeat landscapes in cancer and cell-free DNA. Science Translational Medicine.


The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoLifeSciences.
Post a new comment

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Discovery of New Cancer Cell Death Pathway Involving Schlafen11 Gene